Re: [ipa-vrp] ice in set_value_range

2016-11-09 Thread kugan

Hi Andrew,

On 09/11/16 17:02, Andrew Pinski wrote:

Either this patch or the patch for "Handle unary pass-through jump
functions for ipa-vrp" caused a bootstrap failure on
aarch64-linux-gnu.
Bootstrap comparison failure!
gcc/go/types.o differs
gcc/fortran/class.o differs
gcc/tree-ssa-live.o differs
gcc/data-streamer-out.o differs
gcc/ira-build.o differs
gcc/hsa-gen.o differs
gcc/hsa-brig.o differs
gcc/omp-low.o differs
gcc/lto-streamer-in.o differs
gcc/real.o differs
gcc/final.o differs
gcc/df-core.o differs

I bootstrap with the following options:

--with-cpu=thunderx+lse --enable-languages=c,c++,fortran,go
--disable-werror --with-sysroot=/ --enable-plugins
--enable-gnu-indirect-function

I have not tried removing the +lse part though


Sorry about the breakage. I will try to reproduce this.

Thanks,
Kugan


Re: [PATCH] Fix PR78189

2016-11-09 Thread Richard Biener
On Tue, 8 Nov 2016, Richard Biener wrote:

> On Tue, 8 Nov 2016, Richard Biener wrote:
> 
> > On Mon, 7 Nov 2016, Christophe Lyon wrote:
> > 
> > > Hi Richard,
> > > 
> > > 
> > > On 7 November 2016 at 09:01, Richard Biener  wrote:
> > > >
> > > > The following fixes an oversight when computing alignment in the
> > > > vectorizer.
> > > >
> > > > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.
> > > >
> > > > Richard.
> > > >
> > > > 2016-11-07  Richard Biener  
> > > >
> > > > PR tree-optimization/78189
> > > > * tree-vect-data-refs.c (vect_compute_data_ref_alignment): Fix
> > > > alignment computation.
> > > >
> > > > * g++.dg/torture/pr78189.C: New testcase.
> > > >
> > > > Index: gcc/testsuite/g++.dg/torture/pr78189.C
> > > > ===
> > > > --- gcc/testsuite/g++.dg/torture/pr78189.C  (revision 0)
> > > > +++ gcc/testsuite/g++.dg/torture/pr78189.C  (working copy)
> > > > @@ -0,0 +1,41 @@
> > > > +/* { dg-do run } */
> > > > +/* { dg-additional-options "-ftree-slp-vectorize -fno-vect-cost-model" 
> > > > } */
> > > > +
> > > > +#include 
> > > > +
> > > > +struct A
> > > > +{
> > > > +  void * a;
> > > > +  void * b;
> > > > +};
> > > > +
> > > > +struct alignas(16) B
> > > > +{
> > > > +  void * pad;
> > > > +  void * misaligned;
> > > > +  void * pad2;
> > > > +
> > > > +  A a;
> > > > +
> > > > +  void Null();
> > > > +};
> > > > +
> > > > +void B::Null()
> > > > +{
> > > > +  a.a = nullptr;
> > > > +  a.b = nullptr;
> > > > +}
> > > > +
> > > > +void __attribute__((noinline,noclone))
> > > > +NullB(void * misalignedPtr)
> > > > +{
> > > > +  B* b = reinterpret_cast(reinterpret_cast(misalignedPtr) 
> > > > - offsetof(B, misaligned));
> > > > +  b->Null();
> > > > +}
> > > > +
> > > > +int main()
> > > > +{
> > > > +  B b;
> > > > +  NullB(&b.misaligned);
> > > > +  return 0;
> > > > +}
> > > > diff --git gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
> > > > index 9346cfe..b03cb1e 100644
> > > > --- gcc/tree-vect-data-refs.c
> > > > +++ gcc/tree-vect-data-refs.c
> > > > @@ -773,10 +773,25 @@ vect_compute_data_ref_alignment (struct 
> > > > data_reference *dr)
> > > >base = ref;
> > > >while (handled_component_p (base))
> > > >  base = TREE_OPERAND (base, 0);
> > > > +  unsigned int base_alignment;
> > > > +  unsigned HOST_WIDE_INT base_bitpos;
> > > > +  get_object_alignment_1 (base, &base_alignment, &base_bitpos);
> > > > +  /* As data-ref analysis strips the MEM_REF down to its base operand
> > > > + to form DR_BASE_ADDRESS and adds the offset to DR_INIT we have to
> > > > + adjust things to make base_alignment valid as the alignment of
> > > > + DR_BASE_ADDRESS.  */
> > > >if (TREE_CODE (base) == MEM_REF)
> > > > -base = build2 (MEM_REF, TREE_TYPE (base), base_addr,
> > > > -  build_int_cst (TREE_TYPE (TREE_OPERAND (base, 1)), 
> > > > 0));
> > > > -  unsigned int base_alignment = get_object_alignment (base);
> > > > +{
> > > > +  base_bitpos -= mem_ref_offset (base).to_short_addr () * 
> > > > BITS_PER_UNIT;
> > > > +  base_bitpos &= (base_alignment - 1);
> > > > +}
> > > > +  if (base_bitpos != 0)
> > > > +base_alignment = base_bitpos & -base_bitpos;
> > > > +  /* Also look at the alignment of the base address DR analysis
> > > > + computed.  */
> > > > +  unsigned int base_addr_alignment = get_pointer_alignment (base_addr);
> > > > +  if (base_addr_alignment > base_alignment)
> > > > +base_alignment = base_addr_alignment;
> > > >
> > > >if (base_alignment >= TYPE_ALIGN (TREE_TYPE (vectype)))
> > > >  DR_VECT_AUX (dr)->base_element_aligned = true;
> > > 
> > > Since you committed this patch (r241892), I'm seeing execution failures:
> > >   gcc.dg/vect/pr40074.c -flto -ffat-lto-objects execution test
> > >   gcc.dg/vect/pr40074.c execution test
> > > on armeb-none-linux-gnueabihf --with-mode=arm --with-cpu=cortex-a9
> > > --with-fpu=neon-fp16
> > > (using qemu as simulator)
> > 
> > The difference is that we now vectorize the testcase with versioning
> > for alignment (but it should never execute the vectorized variant).
> > I need arm peoples help to understand what is wrong.
> > 
> > At least the testcase shows there is (kind-of) a missed optimization
> > that we no longer figure out versioning for alignment is useless.
> > I'll look into that.
> 
> I'm testing the following which restores previous behavior.  You
> might still want to figure out what went wrong.

This is what I have applied.

Bootstrapped / regtested on x86_64-unknown-linux-gnu.

Richard.

2016-11-09  Richard Biener  

* tree-vect-data-refs.c (vect_compute_data_ref_alignment):
Look at the DR_BASE_ADDRESS object for forcing alignment.

Index: gcc/tree-vect-data-refs.c
===
--- gcc/tree-vect-data-refs.c   (revision 241990)
+++ gcc/tree-vect-d

[PATCH] Fix PR78007

2016-11-09 Thread Richard Biener

The following implements vectorization of bswap via VEC_PERM_EXPR
on the corresponding QImode vector.

ARM already has backend handling via the builtin_vectorized_call
hook and thus there were already testcases available.  It doesn't
end up working for vect-bswap16.c because we have a promoted
argument to __builtin_bswap16 which confuses vectorization.

Eventually the testcase should also succeed on vect_perm_byte
targets but I have no way to verify that.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2016-11-09  Richard Biener  

PR tree-optimization/78007
* tree-vect-stmts.c (vectorizable_bswap): New function.
(vectorizable_call): Call vectorizable_bswap for
BUILT_IN_BSWAP{16,32,64} if arguments are not promoted.

* gcc.dg/vect/vect-bswap32.c: Adjust.
* gcc.dg/vect/vect-bswap64.c: Likewise.

Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   (revision 241959)
+++ gcc/tree-vect-stmts.c   (working copy)
@@ -2432,6 +2432,116 @@ vectorizable_mask_load_store (gimple *st
   return true;
 }
 
+/* Check and perform vectorization of BUILT_IN_BSWAP{16,32,64}.  */
+
+static bool
+vectorizable_bswap (gimple *stmt, gimple_stmt_iterator *gsi,
+   gimple **vec_stmt, slp_tree slp_node,
+   tree vectype_in, enum vect_def_type *dt)
+{
+  tree op, vectype;
+  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
+  loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
+  unsigned ncopies, nunits;
+
+  op = gimple_call_arg (stmt, 0);
+  vectype = STMT_VINFO_VECTYPE (stmt_info);
+  nunits = TYPE_VECTOR_SUBPARTS (vectype);
+
+  /* Multiple types in SLP are handled by creating the appropriate number of
+ vectorized stmts for each SLP node.  Hence, NCOPIES is always 1 in
+ case of SLP.  */
+  if (slp_node)
+ncopies = 1;
+  else
+ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits;
+
+  gcc_assert (ncopies >= 1);
+
+  tree char_vectype = get_same_sized_vectype (char_type_node, vectype_in);
+  if (! char_vectype)
+return false;
+
+  unsigned char *elts
+= XALLOCAVEC (unsigned char, TYPE_VECTOR_SUBPARTS (char_vectype));
+  unsigned char *elt = elts;
+  unsigned word_bytes = TYPE_VECTOR_SUBPARTS (char_vectype) / nunits;
+  for (unsigned i = 0; i < nunits; ++i)
+for (unsigned j = 0; j < word_bytes; ++j)
+  *elt++ = (i + 1) * word_bytes - j - 1;
+
+  if (! can_vec_perm_p (TYPE_MODE (char_vectype), false, elts))
+return false;
+
+  if (! vec_stmt)
+{
+  STMT_VINFO_TYPE (stmt_info) = call_vec_info_type;
+  if (dump_enabled_p ())
+dump_printf_loc (MSG_NOTE, vect_location, "=== vectorizable_bswap ==="
+ "\n");
+  if (! PURE_SLP_STMT (stmt_info))
+   {
+ add_stmt_cost (stmt_info->vinfo->target_cost_data,
+1, vector_stmt, stmt_info, 0, vect_prologue);
+ add_stmt_cost (stmt_info->vinfo->target_cost_data,
+ncopies, vec_perm, stmt_info, 0, vect_body);
+   }
+  return true;
+}
+
+  tree *telts = XALLOCAVEC (tree, TYPE_VECTOR_SUBPARTS (char_vectype));
+  for (unsigned i = 0; i < TYPE_VECTOR_SUBPARTS (char_vectype); ++i)
+telts[i] = build_int_cst (char_type_node, elts[i]);
+  tree bswap_vconst = build_vector (char_vectype, telts);
+
+  /* Transform.  */
+  vec vec_oprnds = vNULL;
+  gimple *new_stmt = NULL;
+  stmt_vec_info prev_stmt_info = NULL;
+  for (unsigned j = 0; j < ncopies; j++)
+{
+  /* Handle uses.  */
+  if (j == 0)
+vect_get_vec_defs (op, NULL, stmt, &vec_oprnds, NULL, slp_node, -1);
+  else
+vect_get_vec_defs_for_stmt_copy (dt, &vec_oprnds, NULL);
+
+  /* Arguments are ready. create the new vector stmt.  */
+  unsigned i;
+  tree vop;
+  FOR_EACH_VEC_ELT (vec_oprnds, i, vop)
+   {
+tree tem = make_ssa_name (char_vectype);
+new_stmt = gimple_build_assign (tem, build1 (VIEW_CONVERT_EXPR,
+ char_vectype, vop));
+vect_finish_stmt_generation (stmt, new_stmt, gsi);
+tree tem2 = make_ssa_name (char_vectype);
+new_stmt = gimple_build_assign (tem2, VEC_PERM_EXPR,
+tem, tem, bswap_vconst);
+vect_finish_stmt_generation (stmt, new_stmt, gsi);
+tem = make_ssa_name (vectype);
+new_stmt = gimple_build_assign (tem, build1 (VIEW_CONVERT_EXPR,
+ vectype, tem2));
+vect_finish_stmt_generation (stmt, new_stmt, gsi);
+ if (slp_node)
+   SLP_TREE_VEC_STMTS (slp_node).quick_push (new_stmt);
+   }
+
+  if (slp_node)
+continue;
+
+  if (j == 0)
+STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt;
+  else
+STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt;
+
+  prev_stmt_in

Re: [PATCH] combine: Do not call simplify from inside change_zero_ext (PR78232)

2016-11-09 Thread Eric Botcazou
> This can also happen if a splitter in the MD uses nonzero_bits (for
> example).  I tried to make the splitting code in combine save and restore
> the i2dest reg_stat info, but that causes one of the acats tests to fail.

Could you elaborate?  Did you try to temporarily invalidate it, e.g. by means 
of last_set_label, or something more involved?

-- 
Eric Botcazou


Re: [match.pd] Fix for PR35691

2016-11-09 Thread Prathamesh Kulkarni
On 8 November 2016 at 16:46, Richard Biener  wrote:
> On Tue, 8 Nov 2016, Prathamesh Kulkarni wrote:
>
>> On 8 November 2016 at 13:23, Richard Biener  wrote:
>> > On Mon, 7 Nov 2016, Prathamesh Kulkarni wrote:
>> >
>> >> On 7 November 2016 at 23:06, Prathamesh Kulkarni
>> >>  wrote:
>> >> > On 7 November 2016 at 15:43, Richard Biener  wrote:
>> >> >> On Fri, 4 Nov 2016, Prathamesh Kulkarni wrote:
>> >> >>
>> >> >>> On 4 November 2016 at 13:41, Richard Biener  wrote:
>> >> >>> > On Thu, 3 Nov 2016, Marc Glisse wrote:
>> >> >>> >
>> >> >>> >> On Thu, 3 Nov 2016, Richard Biener wrote:
>> >> >>> >>
>> >> >>> >> > > > > The transform would also work for vectors 
>> >> >>> >> > > > > (element_precision for
>> >> >>> >> > > > > the test but also a value-matching zero which should 
>> >> >>> >> > > > > ensure the
>> >> >>> >> > > > > same number of elements).
>> >> >>> >> > > > Um sorry, I didn't get how to check vectors to be of equal 
>> >> >>> >> > > > length by a
>> >> >>> >> > > > matching zero.
>> >> >>> >> > > > Could you please elaborate on that ?
>> >> >>> >> > >
>> >> >>> >> > > He may have meant something like:
>> >> >>> >> > >
>> >> >>> >> > >   (op (cmp @0 integer_zerop@2) (cmp @1 @2))
>> >> >>> >> >
>> >> >>> >> > I meant with one being @@2 to allow signed vs. Unsigned @0/@1 
>> >> >>> >> > which was the
>> >> >>> >> > point of the pattern.
>> >> >>> >>
>> >> >>> >> Oups, that's what I had written first, and then I somehow managed 
>> >> >>> >> to confuse
>> >> >>> >> myself enough to remove it so as to remove the call to types_match 
>> >> >>> >> :-(
>> >> >>> >>
>> >> >>> >> > > So the last operand is checked with operand_equal_p instead of
>> >> >>> >> > > integer_zerop. But the fact that we could compute bit_ior on 
>> >> >>> >> > > the
>> >> >>> >> > > comparison results should already imply that the number of 
>> >> >>> >> > > elements is the
>> >> >>> >> > > same.
>> >> >>> >> >
>> >> >>> >> > Though for equality compares we also allow scalar results IIRC.
>> >> >>> >>
>> >> >>> >> Oh, right, I keep forgetting that :-( And I have no idea how to 
>> >> >>> >> generate one
>> >> >>> >> for a testcase, at least until the GIMPLE FE lands...
>> >> >>> >>
>> >> >>> >> > > On platforms that have IOR on floats (at least x86 with SSE, 
>> >> >>> >> > > maybe some
>> >> >>> >> > > vector mode on s390?), it would be cool to do the same for 
>> >> >>> >> > > floats (most
>> >> >>> >> > > likely at the RTL level).
>> >> >>> >> >
>> >> >>> >> > On GIMPLE view-converts could come to the rescue here as well.  
>> >> >>> >> > Or we cab
>> >> >>> >> > just allow bit-and/or on floats as much as we allow them on 
>> >> >>> >> > pointers.
>> >> >>> >>
>> >> >>> >> Would that generate sensible code on targets that do not have 
>> >> >>> >> logic insns for
>> >> >>> >> floats? Actually, even on x86_64 that generates inefficient code, 
>> >> >>> >> so there
>> >> >>> >> would be some work (for instance grep finds no gen_iordf3, only 
>> >> >>> >> gen_iorv2df3).
>> >> >>> >>
>> >> >>> >> I am also a bit wary of doing those obfuscating optimizations too 
>> >> >>> >> early...
>> >> >>> >> a==0 is something that other optimizations might use. long
>> >> >>> >> c=(long&)a|(long&)b; (double&)c==0; less so...
>> >> >>> >>
>> >> >>> >> (and I am assuming that signaling NaNs don't make the whole 
>> >> >>> >> transformation
>> >> >>> >> impossible, which might be wrong)
>> >> >>> >
>> >> >>> > Yeah.  I also think it's not so much important - I just wanted to 
>> >> >>> > mention
>> >> >>> > vectors...
>> >> >>> >
>> >> >>> > Btw, I still think we need a more sensible infrastructure for passes
>> >> >>> > to gather, analyze and modify complex conditions.  (I'm always 
>> >> >>> > pointing
>> >> >>> > to tree-affine.c as an, albeit not very good, example for handling
>> >> >>> > a similar problem)
>> >> >>> Thanks for mentioning the value-matching capture @@, I wasn't aware of
>> >> >>> this match.pd feature.
>> >> >>> The current patch keeps it restricted to only bitwise operators on 
>> >> >>> integers.
>> >> >>> Bootstrap+test running on x86_64-unknown-linux-gnu.
>> >> >>> OK to commit if passes ?
>> >> >>
>> >> >> +/* PR35691: Transform
>> >> >> +   (x == 0 & y == 0) -> (x | typeof(x)(y)) == 0.
>> >> >> +   (x != 0 | y != 0) -> (x | typeof(x)(y)) != 0.  */
>> >> >> +
>> >> >>
>> >> >> Please omit the vertical space
>> >> >>
>> >> >> +(for bitop (bit_and bit_ior)
>> >> >> + cmp (eq ne)
>> >> >> + (simplify
>> >> >> +  (bitop (cmp @0 integer_zerop) (cmp @1 integer_zerop))
>> >> >>
>> >> >> if you capture the first integer_zerop as @2 then you can re-use it...
>> >> >>
>> >> >> +   (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
>> >> >> +   && INTEGRAL_TYPE_P (TREE_TYPE (@1))
>> >> >> +   && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE
>> >> >> (@1)))
>> >> >> +(cmp (bit_ior @0 (convert @1)) { build_zero_cst (TREE_TYPE (@0));
>> >> >>
>> >> >> ... here inplace of the { build_zero_cst ... }.
>> >> >>
>> 

Re: [PATCH] Fix PR78189

2016-11-09 Thread Bin.Cheng
On Tue, Nov 8, 2016 at 9:11 AM, Richard Biener  wrote:
> On Mon, 7 Nov 2016, Christophe Lyon wrote:
>
>> Hi Richard,
>>
>>
>> On 7 November 2016 at 09:01, Richard Biener  wrote:
>> >
>> > The following fixes an oversight when computing alignment in the
>> > vectorizer.
>> >
>> > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.
>> >
>> > Richard.
>> >
>> > 2016-11-07  Richard Biener  
>> >
>> > PR tree-optimization/78189
>> > * tree-vect-data-refs.c (vect_compute_data_ref_alignment): Fix
>> > alignment computation.
>> >
>> > * g++.dg/torture/pr78189.C: New testcase.
>> >
>> > Index: gcc/testsuite/g++.dg/torture/pr78189.C
>> > ===
>> > --- gcc/testsuite/g++.dg/torture/pr78189.C  (revision 0)
>> > +++ gcc/testsuite/g++.dg/torture/pr78189.C  (working copy)
>> > @@ -0,0 +1,41 @@
>> > +/* { dg-do run } */
>> > +/* { dg-additional-options "-ftree-slp-vectorize -fno-vect-cost-model" } 
>> > */
>> > +
>> > +#include 
>> > +
>> > +struct A
>> > +{
>> > +  void * a;
>> > +  void * b;
>> > +};
>> > +
>> > +struct alignas(16) B
>> > +{
>> > +  void * pad;
>> > +  void * misaligned;
>> > +  void * pad2;
>> > +
>> > +  A a;
>> > +
>> > +  void Null();
>> > +};
>> > +
>> > +void B::Null()
>> > +{
>> > +  a.a = nullptr;
>> > +  a.b = nullptr;
>> > +}
>> > +
>> > +void __attribute__((noinline,noclone))
>> > +NullB(void * misalignedPtr)
>> > +{
>> > +  B* b = reinterpret_cast(reinterpret_cast(misalignedPtr) - 
>> > offsetof(B, misaligned));
>> > +  b->Null();
>> > +}
>> > +
>> > +int main()
>> > +{
>> > +  B b;
>> > +  NullB(&b.misaligned);
>> > +  return 0;
>> > +}
>> > diff --git gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
>> > index 9346cfe..b03cb1e 100644
>> > --- gcc/tree-vect-data-refs.c
>> > +++ gcc/tree-vect-data-refs.c
>> > @@ -773,10 +773,25 @@ vect_compute_data_ref_alignment (struct 
>> > data_reference *dr)
>> >base = ref;
>> >while (handled_component_p (base))
>> >  base = TREE_OPERAND (base, 0);
>> > +  unsigned int base_alignment;
>> > +  unsigned HOST_WIDE_INT base_bitpos;
>> > +  get_object_alignment_1 (base, &base_alignment, &base_bitpos);
>> > +  /* As data-ref analysis strips the MEM_REF down to its base operand
>> > + to form DR_BASE_ADDRESS and adds the offset to DR_INIT we have to
>> > + adjust things to make base_alignment valid as the alignment of
>> > + DR_BASE_ADDRESS.  */
>> >if (TREE_CODE (base) == MEM_REF)
>> > -base = build2 (MEM_REF, TREE_TYPE (base), base_addr,
>> > -  build_int_cst (TREE_TYPE (TREE_OPERAND (base, 1)), 0));
>> > -  unsigned int base_alignment = get_object_alignment (base);
>> > +{
>> > +  base_bitpos -= mem_ref_offset (base).to_short_addr () * 
>> > BITS_PER_UNIT;
>> > +  base_bitpos &= (base_alignment - 1);
>> > +}
>> > +  if (base_bitpos != 0)
>> > +base_alignment = base_bitpos & -base_bitpos;
>> > +  /* Also look at the alignment of the base address DR analysis
>> > + computed.  */
>> > +  unsigned int base_addr_alignment = get_pointer_alignment (base_addr);
>> > +  if (base_addr_alignment > base_alignment)
>> > +base_alignment = base_addr_alignment;
>> >
>> >if (base_alignment >= TYPE_ALIGN (TREE_TYPE (vectype)))
>> >  DR_VECT_AUX (dr)->base_element_aligned = true;
>>
>> Since you committed this patch (r241892), I'm seeing execution failures:
>>   gcc.dg/vect/pr40074.c -flto -ffat-lto-objects execution test
>>   gcc.dg/vect/pr40074.c execution test
>> on armeb-none-linux-gnueabihf --with-mode=arm --with-cpu=cortex-a9
>> --with-fpu=neon-fp16
>> (using qemu as simulator)
>
> The difference is that we now vectorize the testcase with versioning
> for alignment (but it should never execute the vectorized variant).
> I need arm peoples help to understand what is wrong.
Hi All,
I will look at it.

Thanks,
bin
>
> At least the testcase shows there is (kind-of) a missed optimization
> that we no longer figure out versioning for alignment is useless.
> I'll look into that.
>
> Richard.


Re: [PATCH] combine: Do not call simplify from inside change_zero_ext (PR78232)

2016-11-09 Thread Segher Boessenkool
On Wed, Nov 09, 2016 at 09:30:59AM +0100, Eric Botcazou wrote:
> > This can also happen if a splitter in the MD uses nonzero_bits (for
> > example).  I tried to make the splitting code in combine save and restore
> > the i2dest reg_stat info, but that causes one of the acats tests to fail.
> 
> Could you elaborate?  Did you try to temporarily invalidate it, e.g. by means 
> of last_set_label, or something more involved?

You need more than that, e.g. reg_nonzero_bits_for_combine looks at the
nonzero_bits field no matter what else.  So I set all fields to "safe"
values and restored them later (if no combination is done).  But this
fails in acats c66002c (all other tests for all languages work fine).

I didn't have the heart to look into this testcase breakage; the
save/restore thing would make this already way over-complex code a lot
more complex still, so I made a safer change instead.

Ideally reg_stat will just go away, but that will not happen soon.


Segher


Re: [PATCH] debug/PR78112: remove recent duplicates for DW_TAG_subprogram attributes

2016-11-09 Thread Richard Biener
On Tue, Nov 8, 2016 at 3:24 PM, Pierre-Marie de Rodat
 wrote:
> Hello,
>
> This patch comes from the discussion for PR debug/78112
> (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78112).  It seems to fix
> the regression Rainer detected on x86_64-apple-darwin, although I still
> wonder why remaining duplicate attributes (which were already present
> before my original patch) are not a problem on this platform.
>
> Anyway, bootstrapped and regtested on x86_64-linux.  As I said on the
> PR, I managed to check with a Python script that there were no
> duplicates anymore but I don't know how to turn this into a DejaGNU
> testcase.

Using scan-assembler-times on the dwarf?

> Ok to commit?

Ok.

Richard.

> gcc/
>
> PR debug/78112
> * dwarf2out.c (dwarf2out_early_global_decl): Call dwarf2out_decl
> on the context only when it has no DIE yet.
> ---
>  gcc/dwarf2out.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
> index 1dfff38..f9ec090 100644
> --- a/gcc/dwarf2out.c
> +++ b/gcc/dwarf2out.c
> @@ -25256,11 +25256,11 @@ dwarf2out_early_global_decl (tree decl)
>   if (!DECL_STRUCT_FUNCTION (decl))
> goto early_decl_exit;
>
> - /* For nested functions, emit DIEs for the parents first so that all
> -nested DIEs are generated at the proper scope in the first
> -shot.  */
> + /* For nested functions, make sure we have DIEs for the parents 
> first
> +so that all nested DIEs are generated at the proper scope in the
> +first shot.  */
>   tree context = decl_function_context (decl);
> - if (context != NULL)
> + if (context != NULL && lookup_decl_die (context) == NULL)
> {
>   current_function_decl = context;
>   dwarf2out_decl (context);
> --
> 2.10.2
>


Re: RFA (libstdc++): C++ PATCH to implement C++17 noexcept in type system

2016-11-09 Thread Bin.Cheng
On Tue, Nov 8, 2016 at 9:11 AM, Christophe Lyon
 wrote:
> Hi,
>
> On 7 November 2016 at 23:56, Jonathan Wakely  wrote:
>> On 7 November 2016 at 22:49, Jason Merrill wrote:
>>> Tested x86_64-pc-linux-gnu.  Are the libstdc++ changes OK for trunk?
>>
>> Yes, I like the approach, thanks.
>
> The new test "g++.dg/cpp1z/noexcept-type11.C" fails on arm:
> /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/g++.dg/cpp1z/noexcept-type11.C:3:6:
> warning: mangled name for 'void f(int (*)() noexcept)' will change in
> C++17 because the exception specification is part of a function type
> [-Wc++1z-compat]
> In function 'void f(int (*)() noexcept)':
> cc1plus: warning: mangled name for '' will change in C++17
> because the exception specification is part of a function type
> [-Wc++1z-compat]
> cc1plus: warning: mangled name for '' will change in C++17
> because the exception specification is part of a function type
> [-Wc++1z-compat]
>
> FAIL: g++.dg/cpp1z/noexcept-type11.C   (test for excess errors)
I can confirm this and created PR78269 for tracking.  I also noticed
noexcept-type9.C fails on both arm/aarch64 target.

Thanks,
bin
>
>
> Christophe


Re: [patch, avr] Add flash size to device info and make wrap around default

2016-11-09 Thread Pitchumani Sivanupandi

On Tuesday 08 November 2016 02:57 PM, Georg-Johann Lay wrote:

On 08.11.2016 08:08, Pitchumani Sivanupandi wrote:
I have updated patch to include the flash size as well. Took that 
info from
device headers (it was fed into crt's device information note section 
also).


The new option would render -mn-flash superfluous, but we should 
keep it for

backward compatibility.

Ok.

Shouldn't link_pmem_wrap then be removed from link_relax, i.e. from
LINK_RELAX_SPEC?  And what happens if relaxation is off?

Yes. Removed link_pmem_wrap from link_relax.
Disabling relaxation doesn't change -mpmem-wrap-around behavior.

Now, wrap around behavior is changed as follows:

For 8K flash devices:
Device specs adds --pmem-wrap-around=8k linker option if 
-mno-pmem-wrap-around

is NOT enabled.
It makes the --pmem-wrap-around=8k linker option default for 8k flash 
devices.


For 16/32/64K flash devices:
Spec string 'link_pmem_wrap' added to all 16/32/64k flash devices specs.
Other wise no changes i.e. It adds --pmem-wrap-around=16/32/64k 
option if

-mpmem-wrap-around option is enabled.

For other devices, no changes in device specs.

Reg tested with default and -mrelax options enabled. No issues.

Regards,
Pitchumani


gcc/ChangeLog

2016-11-08  Pitchumani Sivanupandi 

* config/avr/avr-arch.h (avr_mcu_t): Add flash_size member.
* config/avr/avr-devices.c(avr_mcu_types): Add flash size info.
* config/avr/avr-mcu.def: Likewise.
* config/avr/gen-avr-mmcu-specs.c (print_mcu): Remove hard-coded 
prefix
check to find wrap-around value, instead use MCU flash size. For 
8k flash
devices, update link_pmem_wrap spec string to add 
--pmem-wrap-around=8k.

* config/avr/specs.h: Remove link_pmem_wrap from LINK_RELAX_SPEC and
add to linker specs (LINK_SPEC) directly.

flashsize-and-wrap-around.patch



diff --git a/gcc/config/avr/avr-mcus.def b/gcc/config/avr/avr-mcus.def
index 6bcc6ff..9d4aa1a 100644



 /* Classic, == 128K.  */
-AVR_MCU ("avr31",ARCH_AVR31, AVR_ERRATA_SKIP, 
NULL,0x0060, 0x0, 2)
-AVR_MCU ("atmega103",ARCH_AVR31, AVR_ERRATA_SKIP, 
"__AVR_ATmega103__", 0x0060, 0x0, 2)
-AVR_MCU ("at43usb320",   ARCH_AVR31, AVR_ISA_NONE, 
"__AVR_AT43USB320__",   0x0060, 0x0, 2)
+AVR_MCU ("avr31",ARCH_AVR31, AVR_ERRATA_SKIP, 
NULL,0x0060, 0x0, 2, 0x2)
+AVR_MCU ("atmega103",ARCH_AVR31, AVR_ERRATA_SKIP, 
"__AVR_ATmega103__", 0x0060, 0x0, 2, 0x2)
+AVR_MCU ("at43usb320",   ARCH_AVR31, AVR_ISA_NONE, 
"__AVR_AT43USB320__",   0x0060, 0x0, 2, 0x1)


This looks incorrect: either .flash_size should be 0x2 or n_flash 
be 1.  As you draw the information from internal hardware 
descriptions, I'd guess that the new information is more reliable?


...as this also determines whether AT43USB320supports ELPM this also 
means that the device is in the wrong multilib set?


I couldn't find the data sheet at atmel.com, and the one I found on 
the net only mentions 64KiB program memory.  It mentions LPM on pp. 9 
but has no reference to the supported instruction set.  From the 
manual I would conclude that this device should be avr3, not avr31?

Yes. I couldn't find any other source other than the datasheet in net.
This device do not have internal program memory. Flash size is set to 64K
as the device could address only 64K. No RAMPZ register, so no ELPM. Moved
device to AVR3 architecture.

+AVR_MCU ("atxmega384c3", ARCH_AVRXMEGA6, AVR_ISA_RMW,  
"__AVR_ATxmega384C3__", 0x2000, 0x0, 6, 0x62000)
+AVR_MCU ("atxmega384d3", ARCH_AVRXMEGA6, AVR_ISA_NONE, 
"__AVR_ATxmega384D3__", 0x2000, 0x0, 6, 0x62000)

 /* Xmega, 128K < Flash, RAM > 64K RAM.  */


Two more glitches; presumably, .n_flash should be 7 here? Or even 
better, we can drop .n_flash field in the future and use the more 
reliable, finer grained information from .flash_size instead?

Updated. Yes, in future we could use only flash_size.

Regards,
Pitchumani


gcc/ChangeLog

2016-11-09  Pitchumani Sivanupandi 

* config/avr/avr-arch.h (avr_mcu_t): Add flash_size member.
* config/avr/avr-devices.c(avr_mcu_types): Add flash size info.
* config/avr/avr-mcu.def: Likewise. Moved at43usb32x to AVR3 
architecture

as it do not support ELPM (no RAMPZ register).
* config/avr/gen-avr-mmcu-specs.c (print_mcu): Remove hard-coded 
prefix
check to find wrap-around value, instead use MCU flash size. For 8k 
flash
devices, update link_pmem_wrap spec string to add 
--pmem-wrap-around=8k.

* config/avr/specs.h: Remove link_pmem_wrap from LINK_RELAX_SPEC and
add to linker specs (LINK_SPEC) directly.

diff --git a/gcc/config/avr/avr-arch.h b/gcc/config/avr/avr-arch.h
index 42eaee5..e0961d4 100644
--- a/gcc/config/avr/avr-arch.h
+++ b/gcc/config/avr/avr-arch.h
@@ -122,6 +122,9 @@ typedef struct
 
   /* Number of 64k segments in the flash.  */
   int n_flash;
+
+  /* Flash size in bytes.  */
+  int flash_size;
 } avr_mcu_t;
 
 /* AVR dev

Re: [match.pd] Fix for PR35691

2016-11-09 Thread Richard Biener
On Wed, 9 Nov 2016, Prathamesh Kulkarni wrote:

> On 8 November 2016 at 16:46, Richard Biener  wrote:
> > On Tue, 8 Nov 2016, Prathamesh Kulkarni wrote:
> >
> >> On 8 November 2016 at 13:23, Richard Biener  wrote:
> >> > On Mon, 7 Nov 2016, Prathamesh Kulkarni wrote:
> >> >
> >> >> On 7 November 2016 at 23:06, Prathamesh Kulkarni
> >> >>  wrote:
> >> >> > On 7 November 2016 at 15:43, Richard Biener  wrote:
> >> >> >> On Fri, 4 Nov 2016, Prathamesh Kulkarni wrote:
> >> >> >>
> >> >> >>> On 4 November 2016 at 13:41, Richard Biener  
> >> >> >>> wrote:
> >> >> >>> > On Thu, 3 Nov 2016, Marc Glisse wrote:
> >> >> >>> >
> >> >> >>> >> On Thu, 3 Nov 2016, Richard Biener wrote:
> >> >> >>> >>
> >> >> >>> >> > > > > The transform would also work for vectors 
> >> >> >>> >> > > > > (element_precision for
> >> >> >>> >> > > > > the test but also a value-matching zero which should 
> >> >> >>> >> > > > > ensure the
> >> >> >>> >> > > > > same number of elements).
> >> >> >>> >> > > > Um sorry, I didn't get how to check vectors to be of equal 
> >> >> >>> >> > > > length by a
> >> >> >>> >> > > > matching zero.
> >> >> >>> >> > > > Could you please elaborate on that ?
> >> >> >>> >> > >
> >> >> >>> >> > > He may have meant something like:
> >> >> >>> >> > >
> >> >> >>> >> > >   (op (cmp @0 integer_zerop@2) (cmp @1 @2))
> >> >> >>> >> >
> >> >> >>> >> > I meant with one being @@2 to allow signed vs. Unsigned @0/@1 
> >> >> >>> >> > which was the
> >> >> >>> >> > point of the pattern.
> >> >> >>> >>
> >> >> >>> >> Oups, that's what I had written first, and then I somehow 
> >> >> >>> >> managed to confuse
> >> >> >>> >> myself enough to remove it so as to remove the call to 
> >> >> >>> >> types_match :-(
> >> >> >>> >>
> >> >> >>> >> > > So the last operand is checked with operand_equal_p instead 
> >> >> >>> >> > > of
> >> >> >>> >> > > integer_zerop. But the fact that we could compute bit_ior on 
> >> >> >>> >> > > the
> >> >> >>> >> > > comparison results should already imply that the number of 
> >> >> >>> >> > > elements is the
> >> >> >>> >> > > same.
> >> >> >>> >> >
> >> >> >>> >> > Though for equality compares we also allow scalar results IIRC.
> >> >> >>> >>
> >> >> >>> >> Oh, right, I keep forgetting that :-( And I have no idea how to 
> >> >> >>> >> generate one
> >> >> >>> >> for a testcase, at least until the GIMPLE FE lands...
> >> >> >>> >>
> >> >> >>> >> > > On platforms that have IOR on floats (at least x86 with SSE, 
> >> >> >>> >> > > maybe some
> >> >> >>> >> > > vector mode on s390?), it would be cool to do the same for 
> >> >> >>> >> > > floats (most
> >> >> >>> >> > > likely at the RTL level).
> >> >> >>> >> >
> >> >> >>> >> > On GIMPLE view-converts could come to the rescue here as well. 
> >> >> >>> >> >  Or we cab
> >> >> >>> >> > just allow bit-and/or on floats as much as we allow them on 
> >> >> >>> >> > pointers.
> >> >> >>> >>
> >> >> >>> >> Would that generate sensible code on targets that do not have 
> >> >> >>> >> logic insns for
> >> >> >>> >> floats? Actually, even on x86_64 that generates inefficient 
> >> >> >>> >> code, so there
> >> >> >>> >> would be some work (for instance grep finds no gen_iordf3, only 
> >> >> >>> >> gen_iorv2df3).
> >> >> >>> >>
> >> >> >>> >> I am also a bit wary of doing those obfuscating optimizations 
> >> >> >>> >> too early...
> >> >> >>> >> a==0 is something that other optimizations might use. long
> >> >> >>> >> c=(long&)a|(long&)b; (double&)c==0; less so...
> >> >> >>> >>
> >> >> >>> >> (and I am assuming that signaling NaNs don't make the whole 
> >> >> >>> >> transformation
> >> >> >>> >> impossible, which might be wrong)
> >> >> >>> >
> >> >> >>> > Yeah.  I also think it's not so much important - I just wanted to 
> >> >> >>> > mention
> >> >> >>> > vectors...
> >> >> >>> >
> >> >> >>> > Btw, I still think we need a more sensible infrastructure for 
> >> >> >>> > passes
> >> >> >>> > to gather, analyze and modify complex conditions.  (I'm always 
> >> >> >>> > pointing
> >> >> >>> > to tree-affine.c as an, albeit not very good, example for handling
> >> >> >>> > a similar problem)
> >> >> >>> Thanks for mentioning the value-matching capture @@, I wasn't aware 
> >> >> >>> of
> >> >> >>> this match.pd feature.
> >> >> >>> The current patch keeps it restricted to only bitwise operators on 
> >> >> >>> integers.
> >> >> >>> Bootstrap+test running on x86_64-unknown-linux-gnu.
> >> >> >>> OK to commit if passes ?
> >> >> >>
> >> >> >> +/* PR35691: Transform
> >> >> >> +   (x == 0 & y == 0) -> (x | typeof(x)(y)) == 0.
> >> >> >> +   (x != 0 | y != 0) -> (x | typeof(x)(y)) != 0.  */
> >> >> >> +
> >> >> >>
> >> >> >> Please omit the vertical space
> >> >> >>
> >> >> >> +(for bitop (bit_and bit_ior)
> >> >> >> + cmp (eq ne)
> >> >> >> + (simplify
> >> >> >> +  (bitop (cmp @0 integer_zerop) (cmp @1 integer_zerop))
> >> >> >>
> >> >> >> if you capture the first integer_zerop as @2 then you can re-use 
> >> >> >> it...
> >> >> >>
> >> >> >> +   (if (

Re: [PATCH] combine: Do not call simplify from inside change_zero_ext (PR78232)

2016-11-09 Thread Eric Botcazou
> You need more than that, e.g. reg_nonzero_bits_for_combine looks at the
> nonzero_bits field no matter what else.

Right, this one is sticky, although you can invalidate it globally by means of 
nonzero_sign_valid AFAICS.

> I didn't have the heart to look into this testcase breakage; the
> save/restore thing would make this already way over-complex code a lot
> more complex still, so I made a safer change instead.

I agree that the save/restore approach is probably overkill but, on the other 
hand, your change only papers over the issue so a big ??? comment is in order 
if it goes in.  I'd still think that we should try to invalidate if possible.

-- 
Eric Botcazou


[Patch, Fortran, committed] PR 71894: [OOP] ICE in gfc_add_component_ref, at fortran/class.c:227

2016-11-09 Thread Janus Weil
Hi all,

I have committed to trunk another obvious patch to fix an ICE on invalid code:

https://gcc.gnu.org/viewcvs?rev=241993&root=gcc&view=rev

Cheers,
Janus


Re: [PATCH] S390: Fix PR/77822.

2016-11-09 Thread Dominik Vogt
On Tue, Nov 08, 2016 at 05:46:04PM +0100, Matthias Klose wrote:
> On 08.11.2016 15:38, Dominik Vogt wrote:
> > The attached patch fixes PR/77822 on s390/s390x dor gcc-6 *only*.
> > See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77822
> > 
> > Bootstrapped and regression tested on s390 and s390x biarch on a
> > zEC12.
> 
> missing the testcase mentioned in the ChangeLog.

Sorry, patch with the missing file attached.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany
gcc/ChangeLog

* config/s390/s390.md ("extzv", "*extzv_zEC12")
("*extzv_z10"): Check validity of zero_extend arguments.
gcc/testsuite/ChangeLog

* gcc.target/s390/pr77822.c: New test for PR/77822.
>From ce5616dca250c4735105346111e5ea7934b49b55 Mon Sep 17 00:00:00 2001
From: Dominik Vogt 
Date: Tue, 8 Nov 2016 09:54:03 +0100
Subject: [PATCH] S390: Fix PR/77822.

Check the range of the arguments in extzv patterns.  This avoids generating
risbg instructions with an out of range shift count.
---
 gcc/config/s390/s390.md |  12 +-
 gcc/testsuite/gcc.target/s390/pr77822.c | 307 
 2 files changed, 317 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/pr77822.c

diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index b3d4370..899ed62 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -3708,6 +3708,10 @@
  (clobber (reg:CC CC_REGNUM))])]
   "TARGET_Z10"
 {
+  if (!IN_RANGE (INTVAL (operands[3]), 0, GET_MODE_BITSIZE (DImode) - 1)
+  || !IN_RANGE (INTVAL (operands[3]) + INTVAL (operands[2]), 1,
+   GET_MODE_BITSIZE (DImode)))
+FAIL;
   /* Starting with zEC12 there is risbgn not clobbering CC.  */
   if (TARGET_ZEC12)
 {
@@ -3726,7 +3730,9 @@
 (match_operand:GPR 1 "register_operand" "d")
 (match_operand 2 "const_int_operand" "")   ; size
 (match_operand 3 "const_int_operand" "")))] ; start]
-  "TARGET_ZEC12"
+  "TARGET_ZEC12
+   && IN_RANGE (INTVAL (operands[3]), 0, GET_MODE_BITSIZE (DImode) - 1)
+   && IN_RANGE (INTVAL (operands[3]) + INTVAL (operands[2]), 1, 
GET_MODE_BITSIZE (DImode))"
   "risbgn\t%0,%1,64-%2,128+63,+%3+%2" ; dst, src, start, end, shift
   [(set_attr "op_type" "RIE")])
 
@@ -3737,7 +3743,9 @@
(match_operand 2 "const_int_operand" "")   ; size
(match_operand 3 "const_int_operand" ""))) ; start
(clobber (reg:CC CC_REGNUM))]
-  "TARGET_Z10"
+  "TARGET_Z10
+   && IN_RANGE (INTVAL (operands[3]), 0, GET_MODE_BITSIZE (DImode) - 1)
+   && IN_RANGE (INTVAL (operands[3]) + INTVAL (operands[2]), 1, 
GET_MODE_BITSIZE (DImode))"
   "risbg\t%0,%1,64-%2,128+63,+%3+%2" ; dst, src, start, end, shift
   [(set_attr "op_type" "RIE")
(set_attr "z10prop" "z10_super_E1")])
diff --git a/gcc/testsuite/gcc.target/s390/pr77822.c 
b/gcc/testsuite/gcc.target/s390/pr77822.c
new file mode 100644
index 000..6789152
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/pr77822.c
@@ -0,0 +1,307 @@
+/* This testcase checks that the shift operand of r*sbg instructions is in
+   range.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=zEC12 -Wno-shift-count-overflow" } */
+
+int g;
+
+void pos_ll_129 (long long b)
+{
+  if (b >> 129 & 1)
+g = b;
+}
+
+void sizepos_ll_134 (long long b)
+{
+  if (b >> 134 & 1)
+g = b;
+}
+
+void pos_ll_65 (long long b)
+{
+  if (b >> 65 & 1)
+g = b;
+}
+
+void sizepos_ll_70 (long long b)
+{
+  if (b >> 70 & 1)
+g = b;
+}
+
+void pos_ll_33 (long long b)
+{
+  if (b >> 33 & 1)
+g = b;
+}
+
+void sizepos_ll_38 (long long b)
+{
+  if (b >> 38 & 1)
+g = b;
+}
+
+void pos_ll_17 (long long b)
+{
+  if (b >> 17 & 1)
+g = b;
+}
+
+void sizepos_ll_22 (long long b)
+{
+  if (b >> 22 & 1)
+g = b;
+}
+
+void pos_ll_8 (long long b)
+{
+  if (b >> 8 & 1)
+g = b;
+}
+
+void sizepos_ll_13 (long long b)
+{
+  if (b >> 13 & 1)
+g = b;
+}
+
+void pos_l_129 (long b)
+{
+  if (b >> 129 & 1)
+g = b;
+}
+
+void sizepos_l_134 (long b)
+{
+  if (b >> 134 & 1)
+g = b;
+}
+
+void pos_l_65 (long b)
+{
+  if (b >> 65 & 1)
+g = b;
+}
+
+void sizepos_l_70 (long b)
+{
+  if (b >> 70 & 1)
+g = b;
+}
+
+void pos_l_33 (long b)
+{
+  if (b >> 33 & 1)
+g = b;
+}
+
+void sizepos_l_38 (long b)
+{
+  if (b >> 38 & 1)
+g = b;
+}
+
+void pos_l_17 (long b)
+{
+  if (b >> 17 & 1)
+g = b;
+}
+
+void sizepos_l_22 (long b)
+{
+  if (b >> 22 & 1)
+g = b;
+}
+
+void pos_l_8 (long b)
+{
+  if (b >> 8 & 1)
+g = b;
+}
+
+void sizepos_l_13 (long b)
+{
+  if (b >> 13 & 1)
+g = b;
+}
+
+void pos_i_129 (int b)
+{
+  if (b >> 129 & 1)
+g = b;
+}
+
+void sizepos_i_134 (int b)
+{
+  if (b >> 134 & 1)
+g = b;
+}
+
+void pos_i_65 (int b)
+{
+  if (b >> 65 & 1)
+g = b;
+}
+
+void sizepos_i_70 (int b)
+{
+  if (b >> 70 & 1)
+g = b;
+}
+
+void pos_i_33 (int b)
+{
+  if (b >> 33 & 1)
+g = b;
+}
+
+void sizepos_i_38 (int b)
+{
+  if (b >> 38 & 1)
+g = b;
+}
+
+void pos_i_17 (int b)

Re: [PATCH][1/2] GIMPLE Frontend, C FE parts (and GIMPLE parser)

2016-11-09 Thread Richard Biener
On Mon, 7 Nov 2016, Richard Biener wrote:

> On Mon, 7 Nov 2016, Richard Biener wrote:
> 
> > On Fri, 4 Nov 2016, Jakub Jelinek wrote:
> > 
> > > Hi!
> > > 
> > > Just 2 nits:
> > > 
> > > On Fri, Oct 28, 2016 at 01:46:57PM +0200, Richard Biener wrote:
> > > > +/* Return a pointer to the Nth token in PARERs tokens_buf.  */
> > > 
> > > PARSERs ?
> > 
> > Fixed.
> > 
> > > > @@ -454,7 +423,7 @@ c_lex_one_token (c_parser *parser, c_token *token)
> > > >  /* Return a pointer to the next token from PARSER, reading it in if
> > > > necessary.  */
> > > >  
> > > > -static inline c_token *
> > > > +c_token *
> > > >  c_parser_peek_token (c_parser *parser)
> > > >  {
> > > >if (parser->tokens_avail == 0)
> > > 
> > > I wonder if turning all of these into non-inlines is a good idea.
> > > Can't you move them to the common header instead?
> > 
> > The issue with moving is that I failed to export the definition of
> > c_parser in c-parser.h due to gengtype putting vec 
> > handlers into gtype-c.h but not gtype-objc.h and thus objc bootstrap
> > fails :/
> 
> If anybody wants to try, f82dc04b921a52a9a5c90d957a824e1c2d04
> has it (objc build) still broken on the gimplefe git branch.

Ok, it seems I found a solution:

diff --git a/gcc/c/c-parser.h b/gcc/c/c-parser.h
index d178254..765acbb 100644
--- a/gcc/c/c-parser.h
+++ b/gcc/c/c-parser.h
@@ -129,7 +129,7 @@ struct GTY(()) c_parser {
 
   /* Buffer to hold all the tokens from parsing the vector attribute for 
the
  SIMD-enabled functions (formerly known as elemental functions).  */
-  vec  *cilk_simd_fn_tokens;
+  vec  * GTY((skip)) cilk_simd_fn_tokens;
 };
 
 /* Possibly kinds of declarator to parse.  */

I believe this is safe as cilk_simd_fn_tokens is state that is
local to parsing a single omp/cilk construct (not sure why that state
was added to c_parser in the first place - I guess it was convenient).

I'll push back c_parser to the header and put inlines I need to export
there as well.

Joseph, is this (with regard to the inlines) your preference as well?

I'm meanwhile testing the following on trunk, ok if it passes there?

Thanks,
Richard.

2016-11-09  Richard Biener  

c/
* c-parser.c (c_parser): Mark cilk_simd_fn_tokens GTY((skip)).

Index: gcc/c/c-parser.c
===
--- gcc/c/c-parser.c(revision 241991)
+++ gcc/c/c-parser.c(working copy)
@@ -256,7 +256,7 @@ struct GTY(()) c_parser {
 
   /* Buffer to hold all the tokens from parsing the vector attribute for 
the
  SIMD-enabled functions (formerly known as elemental functions).  */
-  vec  *cilk_simd_fn_tokens;
+  vec  * GTY((skip)) cilk_simd_fn_tokens;
 };
 
 



Re: [Patch, Fortran, committed] PR 71894: [OOP] ICE in gfc_add_component_ref, at fortran/class.c:227

2016-11-09 Thread Andre Vehreschild
Hi Janus,

may I ask you to attach also the "obvious" patches to the mail you send to the
list? It is far more comfortable to take a look at the patch in the mail than
open the webpage. Furthermore is it considered polite to attach the patches
even when they are obvious. This is not to treat you badly, but to ensure a
certain quality of gfortran. When a diff is attached I look at it, but I will
not open the link, not when I am only on a mobile.

Thank you in advance,
Andre

On Wed, 9 Nov 2016 10:35:10 +0100
Janus Weil  wrote:

> Hi all,
> 
> I have committed to trunk another obvious patch to fix an ICE on invalid code:
> 
> https://gcc.gnu.org/viewcvs?rev=241993&root=gcc&view=rev
> 
> Cheers,
> Janus


-- 
Andre Vehreschild * Email: vehre ad gmx dot de 


[PATCH 0/6][ARM] Implement support for ACLE Coprocessor Intrinsics

2016-11-09 Thread Andre Vieira (lists)
Hello,

This patch series aims to implement support for the ACLE coprocessor
intrinsics. To do so I first refactor the NEON builtin framework to
accommodate the usage of this framework to implement other builtins. I
then port the existing CRC builtins to this framework to reduce
code-duplication and as an exercise for porting the builtins. The last
four patches implement the new ACLE Coprocessor intrinsics. This work
also facilitates the implementation of ACLE builtins in the future.

Tested the series by bootstrapping arm-none-linux-gnuabihf and found no
regressions, also did a normal build for arm-none-eabi and ran the
acle.exp tests for a Cortex-M3.

(6) Andre Vieira
Refactor NEON builtin framework to work for other builtins
Move CRC builtins to refactored framework
Implement support for ACLE Coprocessor CDP intrinsics
Implement support for ACLE Coprocessor LDC and STC intrinsics
Implement support for ACLE Coprocessor MCR and MRC intrinsics
Implement support for ACLE Coprocessor MCRR and MRRC intrinsics


Re: [PATCH] combine: Do not call simplify from inside change_zero_ext (PR78232)

2016-11-09 Thread Segher Boessenkool
On Wed, Nov 09, 2016 at 10:28:37AM +0100, Eric Botcazou wrote:
> > You need more than that, e.g. reg_nonzero_bits_for_combine looks at the
> > nonzero_bits field no matter what else.
> 
> Right, this one is sticky, although you can invalidate it globally by means 
> of 
> nonzero_sign_valid AFAICS.
> 
> > I didn't have the heart to look into this testcase breakage; the
> > save/restore thing would make this already way over-complex code a lot
> > more complex still, so I made a safer change instead.
> 
> I agree that the save/restore approach is probably overkill but, on the other 
> hand, your change only papers over the issue so a big ??? comment is in order 
> if it goes in.  I'd still think that we should try to invalidate if possible.

I think the better solution would be not to reuse i2dest at all, actually
create a new register instead (we do have machinery for handling that).
I'll put the ??? somewhere there.

Thanks for looking,


Segher


[PATCH 1/6][ARM] Refactor NEON builtin framework to work for other builtins

2016-11-09 Thread Andre Vieira (lists)
Hi,

Refactor NEON builtin framework such that it can be used to implement
other builtins.

Is this OK for trunk?

Regards,
Andre

gcc/ChangeLog
2016-11-09  Andre Vieira  

* config/arm/arm-builtins.c (neon_builtin_datum): Rename to ..
(arm_builtin_datum): ... this.
(arm_init_neon_builtin): Rename to ...
(arm_init_builtin): ... this. Add a new parameters PREFIX
and USE_SIG_IN_NAME.
(arm_init_neon_builtins): Replace 'arm_init_neon_builtin' with
'arm_init_builtin'. Replace type 'neon_builtin_datum' with
'arm_builtin_datum'.
(arm_init_vfp_builtins): Likewise.
(builtin_arg): Rename enum's replacing 'NEON_ARG' with
'ARG_BUILTIN' and add a 'ARG_BUILTIN_NEON_MEMORY.
(arm_expand_neon_args): Rename to ...
(arm_expand_builtin_args): ... this. Rename builtin_arg
enum values and differentiate between ARG_BUILTIN_MEMORY
and ARG_BUILTIN_NEON_MEMORY.
(arm_expand_neon_builtin_1): Rename to ...
(arm_expand_builtin_1): ... this. Rename builtin_arg enum
values, arm_expand_builtin_args and add bool parameter NEON.
(arm_expand_neon_builtin): Use arm_expand_builtin_1.
(arm_expand_vfp_builtin): Likewise.
(NEON_MAX_BUILTIN_ARGS): Remove, it was unused.
diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 
e73043db6db69fa64bb1e72cf71a36d7169062db..154dafc6bf2525165406ae07f7551a665b2bdee8
 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -202,7 +202,7 @@ typedef struct {
   const enum insn_code code;
   unsigned int fcode;
   enum arm_type_qualifiers *qualifiers;
-} neon_builtin_datum;
+} arm_builtin_datum;
 
 #define CF(N,X) CODE_FOR_neon_##N##X
 
@@ -242,7 +242,7 @@ typedef struct {
   VAR11 (T, N, A, B, C, D, E, F, G, H, I, J, K) \
   VAR1 (T, N, L)
 
-/* The NEON builtin data can be found in arm_neon_builtins.def and
+/* The builtin data can be found in arm_neon_builtins.def,
arm_vfp_builtins.def.  The entries in arm_neon_builtins.def require
TARGET_NEON to be true.  The feature tests are checked when the
builtins are expanded.
@@ -252,14 +252,14 @@ typedef struct {
would be specified after the assembler mnemonic, which usually
refers to the last vector operand.  The modes listed per
instruction should be the same as those defined for that
-   instruction's pattern in neon.md.  */
+   instruction's pattern, for instance in neon.md.  */
 
-static neon_builtin_datum vfp_builtin_data[] =
+static arm_builtin_datum vfp_builtin_data[] =
 {
 #include "arm_vfp_builtins.def"
 };
 
-static neon_builtin_datum neon_builtin_data[] =
+static arm_builtin_datum neon_builtin_data[] =
 {
 #include "arm_neon_builtins.def"
 };
@@ -915,11 +915,17 @@ arm_init_simd_builtin_scalar_types (void)
 "__builtin_neon_uti");
 }
 
-/* Set up a NEON builtin.  */
+/* Set up a builtin.  It will use information stored in the argument struct D 
to
+   derive the builtin's type signature and name.  It will append the name in D
+   to the PREFIX passed and use these to create a builtin declaration that is
+   then stored in 'arm_builtin_decls' under index FCODE.  This FCODE is also
+   written back to D for future use.  If USE_SIG_IN_NAME is true the builtin's
+   name is appended with type signature information to distinguish between
+   signedness and poly.  */
 
 static void
-arm_init_neon_builtin (unsigned int fcode,
-  neon_builtin_datum *d)
+arm_init_builtin (unsigned int fcode, arm_builtin_datum *d,
+ const char * prefix, bool use_sig_in_name)
 {
   bool print_type_signature_p = false;
   char type_signature[SIMD_MAX_BUILTIN_ARGS] = { 0 };
@@ -1007,12 +1013,12 @@ arm_init_neon_builtin (unsigned int fcode,
 
   gcc_assert (ftype != NULL);
 
-  if (print_type_signature_p)
-snprintf (namebuf, sizeof (namebuf), "__builtin_neon_%s_%s",
- d->name, type_signature);
+  if (print_type_signature_p && use_sig_in_name)
+snprintf (namebuf, sizeof (namebuf), "%s_%s_%s",
+ prefix, d->name, type_signature);
   else
-snprintf (namebuf, sizeof (namebuf), "__builtin_neon_%s",
- d->name);
+snprintf (namebuf, sizeof (namebuf), "%s_%s",
+ prefix, d->name);
 
   fndecl = add_builtin_function (namebuf, ftype, fcode, BUILT_IN_MD,
 NULL, NULL_TREE);
@@ -1048,8 +1054,8 @@ arm_init_neon_builtins (void)
 
   for (i = 0; i < ARRAY_SIZE (neon_builtin_data); i++, fcode++)
 {
-  neon_builtin_datum *d = &neon_builtin_data[i];
-  arm_init_neon_builtin (fcode, d);
+  arm_builtin_datum *d = &neon_builtin_data[i];
+  arm_init_builtin (fcode, d, "__builtin_neon", true);
 }
 }
 
@@ -1062,8 +1068,8 @@ arm_init_vfp_builtins (void)
 
   for (i = 0; i < ARRAY_SIZE (vfp_builtin_data); i++, fcode++)
 {
-  neon_builtin_datum *d = &vfp_builtin_data[i];
-  arm_init_neon_builtin (fcode, d);
+  arm_builtin_datum *d = &vfp_builtin_da

[PATCH 2/6][ARM] Move CRC builtins to refactored framework

2016-11-09 Thread Andre Vieira (lists)
Hi,

This patch refactors the implementation of the ARM ACLE CRC builtins to
use the builtin framework.

Is this OK for trunk?

Regards,
Andre

gcc/ChangeLog
2016-11-09  Andre Vieira  

  * config/arm/arm-builtins.c (arm_unsigned_binop_qualifiers): New.
  (UBINOP_QUALIFIERS): New.
  (si_UP): Define.
  (acle_builtin_data): New. Change comment.
  (arm_builtins): Remove ARM_BUILTIN_CRC32B, ARM_BUILTIN_CRC32H,
  ARM_BUILTIN_CRC32W, ARM_BUILTIN_CRC32CB, ARM_BUILTIN_CRC32CH,
  ARM_BUILTIN_CRC32CW. Add ARM_BUILTIN_ACLE_BASE and include
  arm_acle_builtins.def.
  (ARM_BUILTIN_ACLE_PATTERN_START): Define.
  (arm_init_acle_builtins): New.
  (CRC32_BUILTIN): Remove.
  (bdesc_2arg): Remove entries for crc32b, crc32h, crc32w,
  crc32cb, crc32ch and crc32cw.
  (arm_init_crc32_builtins): Remove.
  (arm_init_builtins): Use arm_init_acle_builtins rather
  than arm_init_crc32_builtins.
  (arm_expand_acle_builtin): New.
  (arm_expand_builtin): Use 'arm_expand_acle_builtin'.
  * config/arm/arm_acle_builtins.def: New.
diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 
154dafc6bf2525165406ae07f7551a665b2bdee8..2130a3004f17c47be6e42412c1ea30f3cff20573
 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -157,6 +157,13 @@ arm_load1_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   qualifier_none, qualifier_struct_load_store_lane_index };
 #define LOAD1LANE_QUALIFIERS (arm_load1_lane_qualifiers)
 
+/* unsigned T (unsigned T, unsigned T, unsigned T).  */
+static enum arm_type_qualifiers
+arm_unsigned_binop_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_unsigned, qualifier_unsigned,
+  qualifier_unsigned };
+#define UBINOP_QUALIFIERS (arm_unsigned_binop_qualifiers)
+
 /* The first argument (return type) of a store should be void type,
which we represent with qualifier_void.  Their first operand will be
a DImode pointer to the location to store to, so we must use
@@ -242,17 +249,16 @@ typedef struct {
   VAR11 (T, N, A, B, C, D, E, F, G, H, I, J, K) \
   VAR1 (T, N, L)
 
-/* The builtin data can be found in arm_neon_builtins.def,
-   arm_vfp_builtins.def.  The entries in arm_neon_builtins.def require
-   TARGET_NEON to be true.  The feature tests are checked when the
-   builtins are expanded.
+/* The builtin data can be found in arm_neon_builtins.def, arm_vfp_builtins.def
+   and arm_acle_builtins.def.  The entries in arm_neon_builtins.def require
+   TARGET_NEON to be true.  The feature tests are checked when the builtins are
+   expanded.
 
-   The mode entries in the following table correspond to the "key"
-   type of the instruction variant, i.e. equivalent to that which
-   would be specified after the assembler mnemonic, which usually
-   refers to the last vector operand.  The modes listed per
-   instruction should be the same as those defined for that
-   instruction's pattern, for instance in neon.md.  */
+   The mode entries in the following table correspond to the "key" type of the
+   instruction variant, i.e. equivalent to that which would be specified after
+   the assembler mnemonic for neon instructions, which usually refers to the
+   last vector operand.  The modes listed per instruction should be the same as
+   those defined for that instruction's pattern, for instance in neon.md.  */
 
 static arm_builtin_datum vfp_builtin_data[] =
 {
@@ -266,6 +272,15 @@ static arm_builtin_datum neon_builtin_data[] =
 
 #undef CF
 #undef VAR1
+#define VAR1(T, N, A) \
+  {#N, UP (A), CODE_FOR_##N, 0, T##_QUALIFIERS},
+
+static arm_builtin_datum acle_builtin_data[] =
+{
+#include "arm_acle_builtins.def"
+};
+
+#undef VAR1
 
 #define VAR1(T, N, X) \
   ARM_BUILTIN_NEON_##N##X,
@@ -518,13 +533,6 @@ enum arm_builtins
 
   ARM_BUILTIN_WMERGE,
 
-  ARM_BUILTIN_CRC32B,
-  ARM_BUILTIN_CRC32H,
-  ARM_BUILTIN_CRC32W,
-  ARM_BUILTIN_CRC32CB,
-  ARM_BUILTIN_CRC32CH,
-  ARM_BUILTIN_CRC32CW,
-
   ARM_BUILTIN_GET_FPSCR,
   ARM_BUILTIN_SET_FPSCR,
 
@@ -556,6 +564,14 @@ enum arm_builtins
 
 #include "arm_neon_builtins.def"
 
+#undef VAR1
+#define VAR1(T, N, X) \
+  ARM_BUILTIN_##N,
+
+  ARM_BUILTIN_ACLE_BASE,
+
+#include "arm_acle_builtins.def"
+
   ARM_BUILTIN_MAX
 };
 
@@ -565,6 +581,9 @@ enum arm_builtins
 #define ARM_BUILTIN_NEON_PATTERN_START \
   (ARM_BUILTIN_NEON_BASE + 1)
 
+#define ARM_BUILTIN_ACLE_PATTERN_START \
+  (ARM_BUILTIN_ACLE_BASE + 1)
+
 #undef CF
 #undef VAR1
 #undef VAR2
@@ -1025,6 +1044,23 @@ arm_init_builtin (unsigned int fcode, arm_builtin_datum 
*d,
   arm_builtin_decls[fcode] = fndecl;
 }
 
+/* Set up ACLE builtins, even builtins for instructions that are not
+   in the current target ISA to allow the user to compile particular modules
+   with different target specific options that differ from the command line
+   options.  Such builtins will be rejected in arm_expand_builtin.  */
+
+static void
+arm_init_acle_builtins (void)
+{
+  unsigned int i, fcode = ARM_BUILTIN_ACLE_PATTERN_START;
+
+  for (i = 0; i < ARRAY_SIZE (acle_bu

[PATCH 3/6][ARM] Implement support for ACLE Coprocessor CDP intrinsics

2016-11-09 Thread Andre Vieira (lists)
Hi,

This patch implements support for the ARM ACLE Coprocessor CDP
intrinsics. See below a table mapping the intrinsics to their respective
instructions:

++--+
| Intrinsic signature| Instruction
pattern  |
++--+
|void __arm_cdp(coproc, opc1, CRd, CRn, CRm, opc2)   |CDP coproc, opc1,
CRd, CRn, CRm, opc2 |
++--+
|void __arm_cdp2(coproc, opc1, CRd, CRn, CRm, opc2)  |CDP2 coproc, opc1,
CRd, CRn, CRm, opc2|
++--+
Note that any untyped variable in the intrinsic signature is required to
be a compiler-time constant and has the type 'unsigned int'.  We do some
boundary checks for coproc:[0-15], opc1:[0-15], CR*:[0-31], opc2:[0-7].
If either of these requirements are not met a diagnostic is issued.

I renamed neon_const_bounds in this patch, to arm_const_bounds, simply
because it is also used in the Coprocessor intrinsics.  It also requires
the expansion of the builtin frame work such that it accepted 'void'
modes and intrinsics with 6 arguments.

I also changed acle.exp to run tests for multiple options, where all lto
option sets are appended with -ffat-objects to allow for assembly scans.

Is this OK for trunk?

Regards,
Andre

gcc/ChangeLog:
2016-11-09  Andre Vieira  

  * config/arm/arm.md (): New.
  * config/arm/arm.c (neon_const_bounds): Rename this ...
  (arm_const_bounds): ... this.
  (arm_coproc_builtin_available): New.
  * config/arm/arm-builtins.c (SIMD_MAX_BUILTIN_ARGS): Increase.
  (arm_type_qualifiers): Add 'qualifier_unsigned_immediate'.
  (CDP_QUALIFIERS): Define to...
  (arm_cdp_qualifiers): ... this. New.
  (void_UP): Define.
  (arm_expand_builtin_args): Add case for 6 arguments.
  * config/arm/arm-protos.h (neon_const_bounds): Rename this ...
  (arm_const_bounds): ... this.
  (arm_coproc_builtin_available): New.
  * config/arm/arm_acle.h (__arm_cdp): New.
  (__arm_cdp2): New.
  * config/arm/arm_acle_builtins.def (cdp): New.
  (cdp2): New.
  * config/arm/iterators.md (CDPI,CDP,cdp): New.
  * config/arm/neon.md: Rename all 'neon_const_bounds' to
  'arm_const_bounds'.
  * config/arm/types.md (coproc): New.
  * config/arm/unspecs.md (VUNSPEC_CDP, VUNSPEC_CDP2): New.
  * gcc/doc/extend.texi (ACLE): Add a mention of Coprocessor intrinsics.

gcc/testsuite/ChangeLog:
2016-11-09  Andre Vieira  

  * gcc.target/arm/acle/acle.exp: Run tests for different options
  and make sure fat-lto-objects is used such that we can still do
  assemble scans.
  * gcc.target/arm/acle/cdp.c: New.
  * gcc.target/arm/acle/cdp2.c: New.
  * lib/target-supports.exp (check_effective_target_arm_coproc1_ok): New.
  (check_effective_target_arm_coproc1_ok_nocache): New.
  (check_effective_target_arm_coproc2_ok): New.
  (check_effective_target_arm_coproc2_ok_nocache): New.
  (check_effective_target_arm_coproc3_ok): New.
  (check_effective_target_arm_coproc3_ok_nocache): New.
diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 
2130a3004f17c47be6e42412c1ea30f3cff20573..bdb8aad8658af089b4977373654bb2d2c0b5c653
 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -38,7 +38,7 @@
 #include "langhooks.h"
 #include "case-cfn-macros.h"
 
-#define SIMD_MAX_BUILTIN_ARGS 5
+#define SIMD_MAX_BUILTIN_ARGS 7
 
 enum arm_type_qualifiers
 {
@@ -53,6 +53,7 @@ enum arm_type_qualifiers
   /* Used when expanding arguments if an operand could
  be an immediate.  */
   qualifier_immediate = 0x8, /* 1 << 3  */
+  qualifier_unsigned_immediate = 0x9,
   qualifier_maybe_immediate = 0x10, /* 1 << 4  */
   /* void foo (...).  */
   qualifier_void = 0x20, /* 1 << 5  */
@@ -164,6 +165,18 @@ arm_unsigned_binop_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   qualifier_unsigned };
 #define UBINOP_QUALIFIERS (arm_unsigned_binop_qualifiers)
 
+/* void (unsigned immediate, unsigned immediate, unsigned immediate,
+unsigned immediate, unsigned immediate, unsigned immediate).  */
+static enum arm_type_qualifiers
+arm_cdp_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_void, qualifier_unsigned_immediate,
+  qualifier_unsigned_immediate,
+  qualifier_unsigned_immediate,
+  qualifier_unsigned_immediate,
+  qualifier_unsigned_immediate,
+  qualifier_unsigned_immediate };
+#define CDP_QUALIFIERS \
+  (arm_cdp_qualifiers)
 /* The first argument (return type) of a store should be void type,
which we represent with qualifier_void.  Their first operand will be
a DImode pointer to the location to store to, so we must use
@@ -200,6 +213,7 @@ arm_storestruct_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define oi_UP   OImode
 #define hf_UP   HFmode
 #define si_UP   SImode
+#define void_UP VOIDmo

[PATCH 4/6][ARM] Implement support for ACLE Coprocessor LDC and STC intrinsics

2016-11-09 Thread Andre Vieira (lists)
Hi,

This patch implements support for the ARM ACLE Coprocessor LDC and STC
intrinsics. See below a table mapping the intrinsics to their respective
instructions:

++--+
| Intrinsic signature| Instruction
pattern  |
++--+
|void __arm_ldc(coproc, CRd, const void* p)  |LDC coproc, CRd,
[...]|
++--+
|void __arm_ldcl(coproc, CRd, const void* p) |LDCL coproc, CRd,
[...]   |
++--+
|void __arm_ldc2(coproc, CRd, const void* p) |LDC2 coproc, CRd,
[...]   |
++--+
|void __arm_ldc2l(coproc, CRd, const void* p)|LDC2L coproc, CRd,
[...]  |
++--+
|void __arm_stc(coproc, CRd, void* p)|STC coproc, CRd,
[...]|
++--+
|void __arm_stcl(coproc, CRd, void* p)   |STCL coproc, CRd,
[...]   |
++--+
|void __arm_stc2(coproc, CRd, void* p)   |STC2 coproc, CRd,
[...]   |
++--+
|void __arm_stc2l(coproc, CRd, void* p)  |STC2L coproc, CRd,
[...]  |
++--+
Note that any untyped variable in the intrinsic signature is required to
be a compiler-time constant and has the type 'unsigned int'.  We do some
boundary checks for coproc:[0-15], CR*:[0-31]. If either of these
requirements are not met a diagnostic is issued.


Is this ok for trunk?

Regards,
Andre

gcc/ChangeLog:
2016-11-09  Andre Vieira  

  * config/arm/arm.md (*ldcstc): New.
  (): New.
  * config/arm/arm.c (arm_coproc_builtin_available): Add
  support for ldc,ldcl,stc,stcl,ldc2,ldc2l,stc2 and stc2l.
  (arm_coproc_ldc_stc_legitimate_address): New.
  * config/arm/arm-builtins.c (arm_type_qualifiers): Add
  'qualifier_const_pointer'.
  (LDC_QUALIFIERS): Define to...
  (arm_ldc_qualifiers): ... this. New.
  (STC_QUALIFIERS): Define to...
  (arm_stc_qualifiers): ... this. New.
  * config/arm/arm-protos.h
  (arm_coproc_ldc_stc_legitimate_address): New.
  * config/arm/arm_acle.h (__arm_ldc, __arm_ldcl, __arm_stc,
  __arm_stcl, __arm_ldc2, __arm_ldc2l, __arm_stc2, __arm_stc2l): New.
  * config/arm/arm_acle_builtins.def (ldc, ldc2, ldcl, ldc2l, stc,
  stc2, stcl, stc2l): New.
  * config/arm/constraints.md (Uz): New.
  * config/arm/iterators.md (LDCSTCI, ldcstc, LDCSTC): New.
  * config/arm/unspecs.md (VUNSPEC_LDC, VUNSPEC_LDC2, VUNSPEC_LDCL,
  VUNSPEC_LDC2L, VUNSPEC_STC, VUNSPEC_STC2, VUNSPEC_STCL,
  VUNSPEC_STC2L): New.

gcc/testsuite/ChangeLog:

2016-11-09  Andre Vieira  

  * gcc.target/arm/acle/ldc: New.
  * gcc.target/arm/acle/ldc2: New.
  * gcc.target/arm/acle/ldcl: New.
  * gcc.target/arm/acle/ldc2l: New.
  * gcc.target/arm/acle/stc: New.
  * gcc.target/arm/acle/stc2: New.
  * gcc.target/arm/acle/stcl: New.
  * gcc.target/arm/acle/stc2l: New.


diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 
bdb8aad8658af089b4977373654bb2d2c0b5c653..e6e58cd6a656732e37c6bca23ad980eea522a710
 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -50,6 +50,8 @@ enum arm_type_qualifiers
   qualifier_const = 0x2, /* 1 << 1  */
   /* T *foo.  */
   qualifier_pointer = 0x4, /* 1 << 2  */
+  /* const T * foo */
+  qualifier_const_pointer = 0x6,
   /* Used when expanding arguments if an operand could
  be an immediate.  */
   qualifier_immediate = 0x8, /* 1 << 3  */
@@ -177,6 +179,23 @@ arm_cdp_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   qualifier_unsigned_immediate };
 #define CDP_QUALIFIERS \
   (arm_cdp_qualifiers)
+
+/* void (unsigned immediate, unsigned immediate,  const void *).  */
+static enum arm_type_qualifiers
+arm_ldc_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_void, qualifier_unsigned_immediate,
+  qualifier_unsigned_immediate, qualifier_const_pointer };
+#define LDC_QUALIFIERS \
+  (arm_ldc_qualifiers)
+
+/* void (unsigned immediate, unsigned immediate,  void *).  */
+static enum arm_type_qualifiers
+arm_stc_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_void, qualifier_unsigned_immediate,
+  qualifier_unsigned_immediate, qualifier_pointer };
+#define STC_QUALIFIERS \
+  (arm_stc_qualifiers)
+
 /* The first argument (retur

[PATCH 5/6][ARM] Implement support for ACLE Coprocessor MCR and MRC intrinsics

2016-11-09 Thread Andre Vieira (lists)
Hi,

This patch implements support for the ARM ACLE Coprocessor MCR and MRC
intrinsics. See below a table mapping the intrinsics to their respective
instructions:

+---+---+
| Intrinsic signature   |
Instruction pattern   |
+---+---+
|void __arm_mcr(coproc, opc1, uint32_t value, CRn, CRm, opc2)   |
MCR coproc, opc1, Rt, CRn, CRm, opc2  |
+---+---+
|void __arm_mcr2(coproc, opc1, uint32_t value, CRn, CRm, opc2)  |
MCR2 coproc, opc1, Rt, CRn, CRm, opc2 |
+---+---+
|uint32_t __arm_mrc(coproc, opc1, CRn, CRm, opc2)   |
MRC coproc, opc1, Rt, CRn, CRm, opc2  |
+---+---+
|uint32_t __arm_mrc2(coproc, opc1, CRn, CRm, opc2)  |
MRC2 coproc, opc1, Rt, CRn, CRm, opc2 |
+---+---+
Note that any untyped variable in the intrinsic signature is required to
be a compiler-time constant and has the type 'unsigned int'.  We do some
boundary checks for coproc:[0-15], opc1[0-7] CR*:[0-31],opc2:[0-7].  If
either of these requirements are not met a diagnostic is issued.

Is this OK for trunk?

Regards,
Andre

gcc/ChangeLog:
2016-11-09  Andre Vieira  

  * config/arm/arm.md (): New.
  (): New.
  * config/arm/arm.c (arm_coproc_builtin_available): Add
  support for mcr, mrc, mcr2 and mrc2.
  * config/arm/arm-builtins.c (MCR_QUALIFIERS): Define to...
  (arm_mcr_qualifiers): ... this. New.
  (MRC_QUALIFIERS): Define to ...
  (arm_mrc_qualifiers): ... this. New.
  (MCR_QUALIFIERS): Define to ...
  (arm_mcr_qualifiers): ... this. New.
  * config/arm/arm_acle.h (__arm_mcr, __arm_mrc, __arm_mcr2,
  __arm_mrc2): New.
  * config/arm/arm_acle_builtins.def (mcr, mcr2, mrc, mrc2): New.
  * config/arm/iterators.md (MCRI, mcr, MCR, MRCI, mrc, MRC): New.
  * config/arm/unspecs.md (VUNSPEC_MCR, VUNSPEC_MCR2, VUNSPEC_MRC,
  VUNSPEC_MRC2): New.


gcc/ChangeLog:
2016-11-09  Andre Vieira  

  * gcc.target/arm/acle/mcr.c: New.
  * gcc.target/arm/acle/mrc.c: New.
  * gcc.target/arm/acle/mcr2.c: New.
  * gcc.target/arm/acle/mrc2.c: New.
diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 
e6e58cd6a656732e37c6bca23ad980eea522a710..44f255356dcc3ea6b8f554ba96f99fd7856bf6a1
 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -196,6 +196,26 @@ arm_stc_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define STC_QUALIFIERS \
   (arm_stc_qualifiers)
 
+/* void (unsigned immediate, unsigned immediate,  T, unsigned immediate,
+unsigned immediate, unsigned immediate).  */
+static enum arm_type_qualifiers
+arm_mcr_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_void, qualifier_unsigned_immediate,
+  qualifier_unsigned_immediate, qualifier_none,
+  qualifier_unsigned_immediate, qualifier_unsigned_immediate,
+  qualifier_unsigned_immediate };
+#define MCR_QUALIFIERS \
+  (arm_mcr_qualifiers)
+
+/* T (unsigned immediate, unsigned immediate, unsigned immediate,
+  unsigned immediate, unsigned immediate).  */
+static enum arm_type_qualifiers
+arm_mrc_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_unsigned_immediate,
+  qualifier_unsigned_immediate, qualifier_unsigned_immediate,
+  qualifier_unsigned_immediate, qualifier_unsigned_immediate };
+#define MRC_QUALIFIERS \
+  (arm_mrc_qualifiers)
 /* The first argument (return type) of a store should be void type,
which we represent with qualifier_void.  Their first operand will be
a DImode pointer to the location to store to, so we must use
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
d3191abee163d5fbf753fb5072be50fdb2b4c785..25439a343e8540c5fca5cbe19e8b76e2fdb97a73
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -30803,6 +30803,8 @@ bool arm_coproc_builtin_available (enum unspecv builtin)
   case VUNSPEC_LDCL:
   case VUNSPEC_STC:
   case VUNSPEC_STCL:
+  case VUNSPEC_MCR:
+  case VUNSPEC_MRC:
if (arm_arch4)
  return true;
break;
@@ -30811,6 +30813,8 @@ bool arm_coproc_builtin_available (enum unspecv builtin)
   case VUNSPEC_LDC2L:
   case VUNSPEC_STC2:
   case VUNSPEC_STC2L:
+  case VUNSPEC_MCR2:
+  case VUNSPEC_MRC2:
/* Only present in ARMv5*, ARMv6 (but not ARMv6-M), ARMv7* and
   ARMv8-{A,M}.  */
if (arm_arch5)
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 
45cdd576fa5b7f26e2d1cfa2c86a8c876

[PATCH 6/6][ARM] Implement support for ACLE Coprocessor MCRR and MRRC intrinsics

2016-11-09 Thread Andre Vieira (lists)
Hi,

This patch implements support for the ARM ACLE Coprocessor MCR and MRC
intrinsics. See below a table mapping the intrinsics to their respective
instructions:

+---+---+
| Intrinsic signature   |
Instruction pattern   |
+---+---+
|void __arm_mcrr(coproc, opc1, uint64_t value, CRm) |
MCRR coproc, opc1, Rt, Rt2, CRm   |
+---+---+
|void __arm_mcrr2(coproc, opc1, uint64_t value, CRm)|
MCRR2 coproc, opc1, Rt, Rt2, CRm  |
+---+---+
|uint64_t __arm_mrrc(coproc, opc1, CRm) |
MRRC coproc, opc1, Rt, Rt2, CRm   |
+---+---+
|uint64_t __arm_mrrc2(coproc, opc1, CRm)|
MRRC2 coproc, opc1, Rt, Rt2, CRm  |
+---+---+
Note that any untyped variable in the intrinsic signature is required to
be a compiler-time constant and has the type 'unsigned int'.  We do some
boundary checks for coproc:[0-15], opc1[0-7] CR*:[0-31]. If either of
these requirements are not met a diagnostic is issued.

I added a new arm_arch variable for ARMv5TE to use when deciding whether
or not the MCRR and MRCC intrinsics are available.

Is this OK for trunk?

Regards,
Andre

gcc/ChangeLog:
2016-11-09  Andre Vieira  

  * config/arm/arm.md (): New.
  (): New.
  * config/arm/arm.c (arm_arch5te): New.
  (arm_option_override): Set arm_arch5te.
  (arm_coproc_builtin_available): Add support for mcrr, mcrr2, mrrc
  and mrrc2.
  * config/arm/arm-builtins.c (MCRR_QUALIFIERS): Define to...
  (arm_mcrr_qualifiers): ... this. New.
  (MRRC_QUALIFIERS): Define to...
  (arm_mrrc_qualifiers): ... this. New.
  * config/arm/arm_acle.h (__arm_mcrr, __arm_mcrr2, __arm_mrrc,
  __arm_mrrc2): New.
  * config/arm/arm_acle_builtins.def (mcrr, mcrr2, mrrc, mrrc2): New.
  * config/arm/iterators.md (MCRRI, mcrr, MCRR): New.
  (MRRCI, mrrc, MRRC): New.
  * config/arm/unspecs.md (VUNSPEC_MCRR, VUNSPEC_MCRR2, VUNSPEC_MRRC,
  VUNSPEC_MRRC2): New.

gcc/testsuite/ChangeLog:

2016-11-09  Andre Vieira  

  * gcc.target/arm/acle/mcrr: New.
  * gcc.target/arm/acle/mcrr2: New.
  * gcc.target/arm/acle/mrrc: New.
  * gcc.target/arm/acle/mrrc2: New.

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 
44f255356dcc3ea6b8f554ba96f99fd7856bf6a1..ab641d6d5fb5f64b5f3317f461e13e5222150237
 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -216,6 +216,24 @@ arm_mrc_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   qualifier_unsigned_immediate, qualifier_unsigned_immediate };
 #define MRC_QUALIFIERS \
   (arm_mrc_qualifiers)
+
+/* void (unsigned immediate, unsigned immediate,  T, unsigned immediate).  */
+static enum arm_type_qualifiers
+arm_mcrr_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_void, qualifier_unsigned_immediate,
+  qualifier_unsigned_immediate, qualifier_none,
+  qualifier_unsigned_immediate };
+#define MCRR_QUALIFIERS \
+  (arm_mcrr_qualifiers)
+
+/* T (unsigned immediate, unsigned immediate, unsigned immediate).  */
+static enum arm_type_qualifiers
+arm_mrrc_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_unsigned_immediate,
+  qualifier_unsigned_immediate, qualifier_unsigned_immediate };
+#define MRRC_QUALIFIERS \
+  (arm_mrrc_qualifiers)
+
 /* The first argument (return type) of a store should be void type,
which we represent with qualifier_void.  Their first operand will be
a DImode pointer to the location to store to, so we must use
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
25439a343e8540c5fca5cbe19e8b76e2fdb97a73..3f4e2a39580217b6564f047998681e5b8419e741
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -797,6 +797,9 @@ int arm_arch5 = 0;
 /* Nonzero if this chip supports the ARM Architecture 5E extensions.  */
 int arm_arch5e = 0;
 
+/* Nonzero if this chip supports the ARM Architecture 5TE extensions.  */
+int arm_arch5te = 0;
+
 /* Nonzero if this chip supports the ARM Architecture 6 extensions.  */
 int arm_arch6 = 0;
 
@@ -3231,6 +3234,7 @@ arm_option_override (void)
   arm_arch4t = arm_arch4 && (ARM_FSET_HAS_CPU1 (insn_flags, FL_THUMB));
   arm_arch5 = ARM_FSET_HAS_CPU1 (insn_flags, FL_ARCH5);
   arm_arch5e = ARM_FSET_HAS_CPU1 (insn_flags, FL_ARCH5E);
+  arm_arch5te = arm_arch5e && ARM_FSET_HAS_CPU1 (insn_flags, FL_THUMB);
   arm_arch6 = ARM_FSET_HAS_CPU1 (insn_flags, FL_ARCH6);
   arm_arch6k

Re: [PATCH] Convert character arrays to string csts

2016-11-09 Thread Richard Biener
On Fri, Nov 4, 2016 at 2:33 PM, Martin Liška  wrote:
> On 11/03/2016 02:00 PM, Jan Hubicka wrote:
>>> On 11/03/2016 01:12 PM, Martin Liška wrote:
 +  tree init = DECL_INITIAL (decl);
 +  if (init
 +  && TREE_READONLY (decl)
 +  && can_convert_ctor_to_string_cst (init))
 +DECL_INITIAL (decl) = build_string_cst_from_ctor (init);
>>>
>>> I'd merge these two new functions since they're only ever called
>>> together. We'd then have something like
>>>
>>> if (init && TREE_READONLY (decl))
>>>   init = convert_ctor_to_string_cst (init);
>>> if (init)
>>>   DECL_INITIAL (decl) = init;
>
> Done.
>
>>>
>>> I'll defer to Jan on whether finalize_decl seems like a good place
>>> to do this.
>>
>> I think finalize_decl may be bit too early because frontends may expects the
>> ctors to be in a way they produced them.  We only want to convert those 
>> arrays
>> seen by middle-end so I would move the logic to varpool_node::analyze
>>
>> Otherwise the patch seems fine to me.
>>
>> Honza
>>>
 diff --git a/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c 
 b/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c
 index 283bd1c..b2d1fd5 100644
 --- a/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c
 +++ b/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c
 @@ -4,12 +4,15 @@
 char *buffer1;
 char *buffer2;

 +const char global[] = {'a', 'b', 'c', 'd', '\0'};
 +
 #define SIZE 1000

 int
 main (void)
 {
   const char* const foo1 = "hello world";
 +  const char local[] = "abcd";

   buffer1 = __builtin_malloc (SIZE);
   __builtin_strcpy (buffer1, foo1);
 @@ -45,6 +48,10 @@ main (void)
 __builtin_abort ();
   if (__builtin_memchr (foo1, null, 12) != foo1 + 11)
 __builtin_abort ();
 +  if (__builtin_memchr (global, null, 5) == 0)
 +__builtin_abort ();
 +  if (__builtin_memchr (local, null, 5) == 0)
 +__builtin_abort ();
>>>
>>> How is that a meaningful test? This seems to work even with an
>>> unpatched gcc. I'd have expected something that shows a benefit for
>>> doing this conversion, and maybe also a test that shows it isn't
>>> done in cases where it's not allowed.
>
> It's meaningful as it scans that there's no __builtin_memchr in optimized 
> dump.
> I'm adding new tests that does the opposite test.
>
>>>
 tree
 -build_string_literal (int len, const char *str)
 +build_string_literal (int len, const char *str, bool build_addr_expr)
>>>
>>> New arguments should be documented in the function comment.
>
> Yep, improved.
>
>>>
 +/* Return TRUE when CTOR can be converted to a string constant.  */
>>>
>>> "if", not "when".
>
> Done.
>
>>>
 +  unsigned HOST_WIDE_INT elements = CONSTRUCTOR_NELTS (ctor);
 +  FOR_EACH_CONSTRUCTOR_ELT (CONSTRUCTOR_ELTS (ctor), idx, key, value)
 +{
 +  if (key == NULL_TREE
 + || TREE_CODE (key) != INTEGER_CST
 + || !tree_fits_uhwi_p (value)
 + || !useless_type_conversion_p (TREE_TYPE (value), char_type_node))
 +   return false;
>>>
>>> Shouldn't all elements have the same type, or do you really have to
>>> call useless_type_conversion in a loop?
>>>
 +  /* Allow zero character just at the end of a string.  */
 +  if (integer_zerop (value))
 +   return idx == elements - 1;
>>>
>>> Don't you also have to explicitly check it's there?
>>>
>>>
>>> Bernd
>
>
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

I'm curious about the

@@ -1495,7 +1495,8 @@ gimplify_decl_expr (tree *stmt_p, gimple_seq *seq_p)
{
  if (!TREE_STATIC (decl))
{
- DECL_INITIAL (decl) = NULL_TREE;
+ if (!TREE_READONLY (decl) || TREE_CODE (init) != STRING_CST)
+   DECL_INITIAL (decl) = NULL_TREE;
  init = build2 (INIT_EXPR, void_type_node, decl, init);
  gimplify_and_add (init, seq_p);
  ggc_free (init);

change.  Why keep DECL_INITIAL if you build a INIT_EXPR anyway?

@@ -4438,6 +4439,19 @@ gimplify_init_constructor (tree *expr_p,
gimple_seq *pre_p, gimple_seq *post_p,
break;
  }

+   /* Replace a ctor with a string constant with possible.  */
+   if (TREE_READONLY (object)
+   && VAR_P (object))
+ {
+   tree string_ctor = convert_ctor_to_string_cst (ctor);
+   if (string_ctor)
+ {
+   TREE_OPERAND (*expr_p, 1) = string_ctor;
+   DECL_INITIAL (object) = string_ctor;
+   break;
+ }
+ }
+
/* Fetch information about the constructor to direct later processing.
   We might want to make static versions of it in various cases, and
   can only do so if it known to be a valid constant initializer.  */

hmm, so both these hunks will end up keeping a DECL_INITIAL
for non-static local consts?  Usually we end up

Re: [Patch, Fortran, committed] PR 71894: [OOP] ICE in gfc_add_component_ref, at fortran/class.c:227

2016-11-09 Thread Janus Weil
Hi Andre,

sorry, didn't want to be impolite. I was just not aware that it makes
a big difference whether I attach the patch or post the link. Patch
below.

Cheers,
Janus


Index: gcc/fortran/class.c
===
--- gcc/fortran/class.c(Revision 241992)
+++ gcc/fortran/class.c(Arbeitskopie)
@@ -224,7 +224,8 @@ gfc_add_component_ref (gfc_expr *e, const char *na
 break;
   tail = &((*tail)->next);
 }
-  if (derived->components->next->ts.type == BT_DERIVED &&
+  if (derived->components && derived->components->next &&
+  derived->components->next->ts.type == BT_DERIVED &&
   derived->components->next->ts.u.derived == NULL)
 {
   /* Fix up missing vtype.  */




2016-11-09 10:54 GMT+01:00 Andre Vehreschild :
> Hi Janus,
>
> may I ask you to attach also the "obvious" patches to the mail you send to the
> list? It is far more comfortable to take a look at the patch in the mail than
> open the webpage. Furthermore is it considered polite to attach the patches
> even when they are obvious. This is not to treat you badly, but to ensure a
> certain quality of gfortran. When a diff is attached I look at it, but I will
> not open the link, not when I am only on a mobile.
>
> Thank you in advance,
> Andre
>
> On Wed, 9 Nov 2016 10:35:10 +0100
> Janus Weil  wrote:
>
>> Hi all,
>>
>> I have committed to trunk another obvious patch to fix an ICE on invalid 
>> code:
>>
>> https://gcc.gnu.org/viewcvs?rev=241993&root=gcc&view=rev
>>
>> Cheers,
>> Janus
>
>
> --
> Andre Vehreschild * Email: vehre ad gmx dot de


Re: [PATCH 7/7, GCC, ARM, V8M] Added support for ARMV8-M Security Extension cmse_nonsecure_caller intrinsic

2016-11-09 Thread Kyrill Tkachov

Hi Andre,

On 25/10/16 17:30, Andre Vieira (lists) wrote:

On 24/08/16 12:01, Andre Vieira (lists) wrote:

On 25/07/16 14:28, Andre Vieira (lists) wrote:

This patch adds support ARMv8-M's Security Extension's
cmse_nonsecure_caller intrinsic. This intrinsic is used to check whether
an entry function was called from a non-secure state.
See Section 5.4.3 of ARM®v8-M Security Extensions: Requirements on
Development Tools
(http://infocenter.arm.com/help/topic/com.arm.doc.ecm0359818/index.html)
for further details.

The FIXME in config/arm/arm_cmse.h is for a diagnostic message that is
suggested in the ARMv8-M Security Extensions document mentioned above,
to diagnose the use of the cmse_nonsecure_caller intrinsic outside of
functions with the 'cmse_nonsecure_entry' attribute.  Checking whether
the intrinsic is called from within such functions can easily be done
inside 'arm_expand_builtin'. However, making the warning point to the
right location is more complicated.  The ARMv8-M Security Extensions
specification does mention that such a diagnostic might become
mandatory, so I might have to pick this up later, otherwise it is left
as a potential extra feature.


*** gcc/ChangeLog ***
2016-07-25  Andre Vieira
 Thomas Preud'homme  

 * config/arm/arm-builtins.c (arm_builtins): Define
 ARM_BUILTIN_CMSE_NONSECURE_CALLER.
 (bdesc_2arg): Add line for cmse_nonsecure_caller.
 (arm_expand_builtin): Handle cmse_nonsecure_caller.
 * config/arm/arm_cmse.h (cmse_nonsecure_caller): New.

*** gcc/testsuite/ChangeLog ***
2016-07-25  Andre Vieira
 Thomas Preud'homme  

 * gcc.target/arm/cmse/cmse-1.c: Add test for
 cmse_nonsecure_caller.


Added more documentation as requested.

---

This patch adds support ARMv8-M's Security Extension's
cmse_nonsecure_caller intrinsic. This intrinsic is used to check whether
an entry function was called from a non-secure state.
See Section 5.4.3 of ARM®v8-M Security Extensions: Requirements on
Development Tools
(http://infocenter.arm.com/help/topic/com.arm.doc.ecm0359818/index.html)
for further details.

The FIXME in config/arm/arm_cmse.h is for a diagnostic message that is
suggested in the ARMv8-M Security Extensions document mentioned above,
to diagnose the use of the cmse_nonsecure_caller intrinsic outside of
functions with the 'cmse_nonsecure_entry' attribute.  Checking whether
the intrinsic is called from within such functions can easily be done
inside 'arm_expand_builtin'. However, making the warning point to the
right location is more complicated.  The ARMv8-M Security Extensions
specification does mention that such a diagnostic might become
mandatory, so I might have to pick this up later, otherwise it is left
as a potential extra feature.


*** gcc/ChangeLog ***
2016-07-xx  Andre Vieira
 Thomas Preud'homme  

 * config/arm/arm-builtins.c (arm_builtins): Define
 ARM_BUILTIN_CMSE_NONSECURE_CALLER.
 (bdesc_2arg): Add line for cmse_nonsecure_caller.
 (arm_expand_builtin): Handle cmse_nonsecure_caller.
 * config/arm/arm_cmse.h (cmse_nonsecure_caller): New.
 * doc/extend.texi (ARM ARMv8-M Security Extensions): New intrinsic.

*** gcc/testsuite/ChangeLog ***
2016-07-xx  Andre Vieira
 Thomas Preud'homme  

 * gcc.target/arm/cmse/cmse-1.c: Add test for
 cmse_nonsecure_caller.


Hi,

Rebased previous patch on top of trunk as requested. No changes to
ChangeLog.

Cheers,
Andre


@@ -1832,6 +1834,17 @@ arm_init_builtins (void)
= add_builtin_function ("__builtin_arm_stfscr", ftype_set_fpscr,
ARM_BUILTIN_SET_FPSCR, BUILT_IN_MD, NULL, 
NULL_TREE);
 }
+
+  if (arm_arch_cmse)
+{
+  tree ftype_cmse_nonsecure_caller
+   = build_function_ty

Should this be use_cmse ?
This looks ok to me otherwise.
I believe patch [6/7] is the only one needing approval after this...

Kyrill


Re: [PATCH, vec-tails] Support loop epilogue vectorization

2016-11-09 Thread Bin.Cheng
On Tue, Nov 1, 2016 at 12:38 PM, Yuri Rumyantsev  wrote:
> Hi All,
>
> I re-send all patches sent by Ilya earlier for review which support
> vectorization of loop epilogues and loops with low trip count. We
> assume that the only patch - vec-tails-07-combine-tail.patch - was not
> approved by Jeff.
>
> I did re-base of all patches and performed bootstrapping and
> regression testing that did not show any new failures. Also all
> changes related to new vect_do_peeling algorithm have been changed
> accordingly.
>
> Is it OK for trunk?

Hi,
I can't approve patches, but had some comments after going through the
implementation.

One confusing part is cost model change, as well as the way it's used
to decide how epilogue loop should be vectorized.  Given vect-tail is
disabled at the moment and the cost change needs further tuning, is it
reasonable to split this part out and get vectorization part
reviewed/submitted independently?  For example, let user specified
parameters make the decision for now.  Cost and target dependent
changes should go in at last, this could make the patch easier to
read.

The implementation computes/shares quite amount information from main
loop to epilogue loop vectorization.  Furthermore, variables/fields
for such information are somehow named in a misleading way.  For
example. LOOP_VINFO_MASK_EPILOGUE gives me the impression this is the
flag controlling whether epilogue loop should be vectorized with
masking.  However, it's actually controlled by exactly the same flag
as whether epilogue loop should be combined into the main loop with
masking:
@@ -7338,6 +8013,9 @@ vect_transform_loop (loop_vec_info loop_vinfo)

   slpeel_make_loop_iterate_ntimes (loop, niters_vector);

+  if (LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo))
+vect_combine_loop_epilogue (loop_vinfo);
+
   /* Reduce loop iterations by the vectorization factor.  */
   scale_loop_profile (loop, GCOV_COMPUTE_SCALE (1, vf),
   expected_iterations / vf);

IMHO, we should decouple main loop vectorization and epilogue
vectorization as much as possible by sharing as few information as we
can.  The general idea is to handle epilogue loop just like normal
short-trip loop.  For example, we can rename
LOOP_VINFO_COMBINE_EPILOGUE into LOOP_VINFO_VECT_MASK (or something
else), and we don't differentiate its meaning between main and
epilogue(short-trip) loop.  It only indicates the current loop should
be vectorized with masking no matter it's a main loop or epilogue
loop, and it works just like the current implementation.

After this change, we can refine vectorization and make it more
general for normal loop and epilogue(short trip) loop.  For example,
this implementation sets LOOP_VINFO_PEELING_FOR_NITER  for epilogue
loop and use it to control how it should be vectorized:
+  if (!LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo))
+{
+  LOOP_VINFO_MASK_EPILOGUE (loop_vinfo) = false;
+  LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo) = false;
+}
+  else if (LOOP_VINFO_CAN_BE_MASKED (loop_vinfo)
+   && min_profitable_combine_iters >= 0)
+{

This works, but not that good for understanding or maintaining.

Thanks,
bin


Reject out-of-range bit pos in bit-fields insns operating on a register.

2016-11-09 Thread Andreas Schwab
As seen by the testcase in PR77822, combine can generate out-of-range
bit pos in a bit-field insn, unless the pattern explicitly rejects it.
This only makes a difference for expressions that are undefined at
runtime.  Without that we would either generate bad assembler or ICE in
output_btst.

PR target/78254
* config/m68k/m68k.md: Reject out-of-range bit pos in bit-fields
insns operating on a register.

Index: gcc/config/m68k/m68k.md
===
--- gcc/config/m68k/m68k.md (revision 241995)
+++ gcc/config/m68k/m68k.md (working copy)
@@ -706,7 +706,7 @@
   (minus:SI (const_int 31)
 (match_operand:SI 1 "general_operand" 
"di")))
 (const_int 0)))]
-  ""
+  "!(CONST_INT_P (operands[1]) && !IN_RANGE (INTVAL (operands[1]), 0, 31))"
 {
   return output_btst (operands, operands[1], operands[0], insn, 31);
 })
@@ -765,9 +765,10 @@
 (cc0)
 (compare (zero_extract:SI (match_operand:SI 0 "register_operand" "do")
  (const_int 1)
-   (match_operand:SI 1 "const_int_operand" "n"))
+ (match_operand:SI 1 "const_int_operand" "n"))
 (const_int 0)))]
-  "!TARGET_COLDFIRE"
+  "!TARGET_COLDFIRE
+   && !(REG_P (operands[0]) && !IN_RANGE (INTVAL (operands[1]), 0, 31))"
 {
   if (GET_CODE (operands[0]) == MEM)
 {
@@ -790,7 +791,8 @@
  (const_int 1)
  (match_operand:SI 1 "const_int_operand" "n"))
 (const_int 0)))]
-  "TARGET_COLDFIRE"
+  "TARGET_COLDFIRE
+   && !(REG_P (operands[0]) && !IN_RANGE (INTVAL (operands[1]), 0, 31))"
 {
   if (GET_CODE (operands[0]) == MEM)
 {
@@ -5397,6 +5399,7 @@
 (match_operand:SI 2 "const_int_operand" "n"))
(match_operand:SI 3 "register_operand" "d"))]
   "TARGET_68020 && TARGET_BITFIELD
+   && IN_RANGE (INTVAL (operands[2]), 0, 31)
&& (INTVAL (operands[1]) == 8 || INTVAL (operands[1]) == 16)
&& INTVAL (operands[2]) % INTVAL (operands[1]) == 0"
 {
@@ -5438,6 +5441,7 @@
 (match_operand:SI 2 "const_int_operand" "n")
 (match_operand:SI 3 "const_int_operand" "n")))]
   "TARGET_68020 && TARGET_BITFIELD
+   && IN_RANGE (INTVAL (operands[3]), 0, 31)
&& (INTVAL (operands[2]) == 8 || INTVAL (operands[2]) == 16)
&& INTVAL (operands[3]) % INTVAL (operands[2]) == 0"
 {
@@ -5480,6 +5484,7 @@
 (match_operand:SI 2 "const_int_operand" "n")
 (match_operand:SI 3 "const_int_operand" "n")))]
   "TARGET_68020 && TARGET_BITFIELD
+   && IN_RANGE (INTVAL (operands[3]), 0, 31)
&& (INTVAL (operands[2]) == 8 || INTVAL (operands[2]) == 16)
&& INTVAL (operands[3]) % INTVAL (operands[2]) == 0"
 {
@@ -5610,7 +5615,7 @@
(sign_extract:SI (match_operand:SI 1 "register_operand" "d")
 (match_operand:SI 2 "const_int_operand" "n")
 (match_operand:SI 3 "const_int_operand" "n")))]
-  "TARGET_68020 && TARGET_BITFIELD"
+  "TARGET_68020 && TARGET_BITFIELD && IN_RANGE (INTVAL (operands[3]), 0, 31)"
   "bfexts %1{%b3:%b2},%0")
 
 (define_insn "*extv_bfextu_reg"
@@ -5618,7 +5623,7 @@
(zero_extract:SI (match_operand:SI 1 "register_operand" "d")
 (match_operand:SI 2 "const_int_operand" "n")
 (match_operand:SI 3 "const_int_operand" "n")))]
-  "TARGET_68020 && TARGET_BITFIELD"
+  "TARGET_68020 && TARGET_BITFIELD && IN_RANGE (INTVAL (operands[3]), 0, 31)"
 {
   if (GET_CODE (operands[2]) == CONST_INT)
 {
@@ -5637,7 +5642,7 @@
 (match_operand:SI 1 "const_int_operand" "n")
 (match_operand:SI 2 "const_int_operand" "n"))
(const_int 0))]
-  "TARGET_68020 && TARGET_BITFIELD"
+  "TARGET_68020 && TARGET_BITFIELD && IN_RANGE (INTVAL (operands[2]), 0, 31)"
 {
   CC_STATUS_INIT;
   return "bfclr %0{%b2:%b1}";
@@ -5648,7 +5653,7 @@
 (match_operand:SI 1 "const_int_operand" "n")
 (match_operand:SI 2 "const_int_operand" "n"))
(const_int -1))]
-  "TARGET_68020 && TARGET_BITFIELD"
+  "TARGET_68020 && TARGET_BITFIELD && IN_RANGE (INTVAL (operands[2]), 0, 31)"
 {
   CC_STATUS_INIT;
   return "bfset %0{%b2:%b1}";
@@ -5659,7 +5664,7 @@
 (match_operand:SI 1 "const_int_operand" "n")
 (match_operand:SI 2 "const_int_operand" "n"))
(match_operand:SI 3 "register_operand" "d"))]
-  "TARGET_68020 && TARGET_BITFIELD"
+  "TARGET_68020 && TARGET_BITFIELD && IN_RANGE (INTVAL (operands[2]), 0, 31)"
 {
 #if 0
   /* These special cases are now recognized by a specific pattern.  */
@@ -5707,7 +5712,8 @@
  (match_operand:SI 1 "const_int_operand" "n")
  (match_ope

Re: Prevent aliasing between arguments in calls to move_alloc

2016-11-09 Thread Paul Richard Thomas
Hi Steve,

Committed as r241995. Thanks.

Paul

On 8 November 2016 at 20:43, Steve Kargl
 wrote:
> Yes.  I saw Ian's analysis in c.l.f.  It seems we both got
> caught out on this one.  The patch looks fine.
>
> --
> steve
>
> On Tue, Nov 08, 2016 at 08:26:37PM +0100, Paul Richard Thomas wrote:
>> Hi Steve,
>>
>> I moved too quickly and caused a regression. See the link in the
>> testcase. The attached fixes the problem and bootstraps/regtests.
>>
>> OK for trunk?
>>
>> Paul
>>
>>
>> On 5 November 2016 at 16:17, Steve Kargl
>>  wrote:
>> > On Sat, Nov 05, 2016 at 10:05:30AM +0100, Paul Richard Thomas wrote:
>> >>
>> >> Bootstraps and regtests on FC21/x86_64 - OK for trunk?
>> >
>> > OK with minor nit (see below).
>> >
>> >>
>> >> +   /*  F2003 12.4.1.7  */
>> >> +   if (to->expr_type == EXPR_VARIABLE && from->expr_type ==EXPR_VARIABLE
>> >
>> > Need a space after ==.
>> >
>> > --
>> > Steve
>>
>>
>>
>> --
>> The difference between genius and stupidity is; genius has its limits.
>>
>> Albert Einstein
>
>> Index: gcc/fortran/check.c
>> ===
>> *** gcc/fortran/check.c   (revision 241872)
>> --- gcc/fortran/check.c   (working copy)
>> *** gfc_check_move_alloc (gfc_expr *from, gf
>> *** 3343,3355 
>>   }
>>
>> /*  F2003 12.4.1.7  */
>> !   if (to->expr_type == EXPR_VARIABLE && from->expr_type ==EXPR_VARIABLE
>> && !strcmp (to->symtree->n.sym->name, from->symtree->n.sym->name))
>>   {
>> !   gfc_error ("The FROM and TO arguments at %L are either the same 
>> object "
>> !  "or subobjects thereof and so violate aliasing restrictions "
>> !  "(F2003 12.4.1.7)", &to->where);
>> !   return false;
>>   }
>>
>> /* CLASS arguments: Make sure the vtab of from is present.  */
>> --- 3343,3380 
>>   }
>>
>> /*  F2003 12.4.1.7  */
>> !   if (to->expr_type == EXPR_VARIABLE && from->expr_type == EXPR_VARIABLE
>> && !strcmp (to->symtree->n.sym->name, from->symtree->n.sym->name))
>>   {
>> !   gfc_ref *to_ref, *from_ref;
>> !   to_ref = to->ref;
>> !   from_ref = from->ref;
>> !   bool aliasing = true;
>> !
>> !   for (; from_ref && to_ref;
>> !from_ref = from_ref->next, to_ref = to_ref->next)
>> ! {
>> !   if (to_ref->type != from->ref->type)
>> ! aliasing = false;
>> !   else if (to_ref->type == REF_ARRAY
>> !&& to_ref->u.ar.type != AR_FULL
>> !&& from_ref->u.ar.type != AR_FULL)
>> ! /* Play safe; assume sections and elements are different.  */
>> ! aliasing = false;
>> !   else if (to_ref->type == REF_COMPONENT
>> !&& to_ref->u.c.component != from_ref->u.c.component)
>> ! aliasing = false;
>> !
>> !   if (!aliasing)
>> ! break;
>> ! }
>> !
>> !   if (aliasing)
>> ! {
>> !   gfc_error ("The FROM and TO arguments at %L violate aliasing "
>> !  "restrictions (F2003 12.4.1.7)", &to->where);
>> !   return false;
>> ! }
>>   }
>>
>> /* CLASS arguments: Make sure the vtab of from is present.  */
>> Index: gcc/testsuite/gfortran.dg/move_alloc_18.f90
>> ===
>> *** gcc/testsuite/gfortran.dg/move_alloc_18.f90   (revision 0)
>> --- gcc/testsuite/gfortran.dg/move_alloc_18.f90   (working copy)
>> ***
>> *** 0 
>> --- 1,21 
>> + ! { dg-do compile }
>> + !
>> + ! Test that the anti-aliasing restriction does not knock out valid code.
>> + !
>> + ! Contributed by  Andrew Balwin on
>> + ! https://groups.google.com/forum/#!topic/comp.lang.fortran/oiXdl1LPb_s
>> + !
>> +   PROGRAM TEST
>> + IMPLICIT NONE
>> +
>> + TYPE FOOBAR
>> +   INTEGER, ALLOCATABLE :: COMP(:)
>> + END TYPE
>> +
>> + TYPE (FOOBAR) :: MY_ARRAY(6)
>> +
>> + ALLOCATE (MY_ARRAY(1)%COMP(10))
>> +
>> + CALL MOVE_ALLOC (MY_ARRAY(1)%COMP, MY_ARRAY(2)%COMP)
>> +
>> +   END PROGRAM TEST
>
>
> --
> Steve



-- 
The difference between genius and stupidity is; genius has its limits.

Albert Einstein


[PATCH] Remove flag_evaluation_order (Java remains)

2016-11-09 Thread Richard Biener

The Java FE was the only thing setting flag_evaluation_order to nonzero
thus the following removes that flag and adjusts code checking it
(a followup will remove the now unused arg from tree_swap_operands_p).

Bootstrap / regtest in progress on x86_64-unknown-linux-gnu.

Richard.

2016-11-09  Richard Biener  

* common.opt (flag_evaluation_order): Remove.
* expr.c (expand_operands): Remove code guarded by
flag_evaluation_order.
* fold-const.c (reorder_operands_p): Remove, it always returns
true.
(negate_expr_p): Remove calls to reorder_operands_p.
(fold_negate_expr): Likewise.
(tree_swap_operands_p): Likewise.
(fold_binary_loc): Likewise.

Index: gcc/common.opt
===
--- gcc/common.opt  (revision 241992)
+++ gcc/common.opt  (working copy)
@@ -58,10 +58,6 @@ int flag_incremental_link = 0
 Variable
 int flag_complex_method = 1
 
-; Nonzero if subexpressions must be evaluated from left-to-right.
-Variable
-int flag_evaluation_order = 0
-
 ; Language specific warning pass for unused results.
 Variable
 bool flag_warn_unused_result = false
Index: gcc/expr.c
===
--- gcc/expr.c  (revision 241992)
+++ gcc/expr.c  (working copy)
@@ -7681,10 +7682,6 @@ expand_operands (tree exp0, tree exp1, r
 }
   else
 {
-  /* If we need to preserve evaluation order, copy exp0 into its own
-temporary variable so that it can't be clobbered by exp1.  */
-  if (flag_evaluation_order && TREE_SIDE_EFFECTS (exp1))
-   exp0 = save_expr (exp0);
   *op0 = expand_expr (exp0, target, VOIDmode, modifier);
   *op1 = expand_expr (exp1, NULL_RTX, VOIDmode, modifier);
 }
Index: gcc/fold-const.c
===
--- gcc/fold-const.c(revision 241992)
+++ gcc/fold-const.c(working copy)
@@ -133,7 +133,6 @@ static tree fold_binary_op_with_conditio
 tree, tree,
 tree, tree, int);
 static tree fold_div_compare (location_t, enum tree_code, tree, tree, tree);
-static bool reorder_operands_p (const_tree, const_tree);
 static tree fold_negate_const (tree, tree);
 static tree fold_not_const (const_tree, tree);
 static tree fold_relational_const (enum tree_code, tree, tree, tree);
@@ -435,9 +434,7 @@ negate_expr_p (tree t)
  && ! TYPE_OVERFLOW_WRAPS (type)))
return false;
   /* -(A + B) -> (-B) - A.  */
-  if (negate_expr_p (TREE_OPERAND (t, 1))
- && reorder_operands_p (TREE_OPERAND (t, 0),
-TREE_OPERAND (t, 1)))
+  if (negate_expr_p (TREE_OPERAND (t, 1)))
return true;
   /* -(A + B) -> (-A) - B.  */
   return negate_expr_p (TREE_OPERAND (t, 0));
@@ -447,9 +444,7 @@ negate_expr_p (tree t)
   return !HONOR_SIGN_DEPENDENT_ROUNDING (element_mode (type))
 && !HONOR_SIGNED_ZEROS (element_mode (type))
 && (! INTEGRAL_TYPE_P (type)
-|| TYPE_OVERFLOW_WRAPS (type))
-&& reorder_operands_p (TREE_OPERAND (t, 0),
-   TREE_OPERAND (t, 1));
+|| TYPE_OVERFLOW_WRAPS (type));
 
 case MULT_EXPR:
   if (TYPE_UNSIGNED (type))
@@ -606,9 +601,7 @@ fold_negate_expr (location_t loc, tree t
  && !HONOR_SIGNED_ZEROS (element_mode (type)))
{
  /* -(A + B) -> (-B) - A.  */
- if (negate_expr_p (TREE_OPERAND (t, 1))
- && reorder_operands_p (TREE_OPERAND (t, 0),
-TREE_OPERAND (t, 1)))
+ if (negate_expr_p (TREE_OPERAND (t, 1)))
{
  tem = negate_expr (TREE_OPERAND (t, 1));
  return fold_build2_loc (loc, MINUS_EXPR, type,
@@ -628,8 +621,7 @@ fold_negate_expr (location_t loc, tree t
 case MINUS_EXPR:
   /* - (A - B) -> B - A  */
   if (!HONOR_SIGN_DEPENDENT_ROUNDING (element_mode (type))
- && !HONOR_SIGNED_ZEROS (element_mode (type))
- && reorder_operands_p (TREE_OPERAND (t, 0), TREE_OPERAND (t, 1)))
+ && !HONOR_SIGNED_ZEROS (element_mode (type)))
return fold_build2_loc (loc, MINUS_EXPR, type,
TREE_OPERAND (t, 1), TREE_OPERAND (t, 0));
   break;
@@ -6761,27 +6753,12 @@ fold_single_bit_test (location_t loc, en
   return NULL_TREE;
 }
 
-/* Check whether we are allowed to reorder operands arg0 and arg1,
-   such that the evaluation of arg1 occurs before arg0.  */
-
-static bool
-reorder_operands_p (const_tree arg0, const_tree arg1)
-{
-  if (! flag_evaluation_order)
-  return true;
-  if (TREE_CONSTANT (arg0) || TREE_CONSTANT (arg1))
-return true;
-  return ! TREE_SIDE_EFFECTS (arg0)
-&& ! TREE_SIDE_EFFECTS (arg1);
-}
-
 /* Test whether it is preferable two swap two operands, ARG0 and
ARG1,

[PATCH, GCC/testsuite] Fix PR78269: noexcept-type9.C should be a compile test

2016-11-09 Thread Thomas Preudhomme
g++.dg/cpp1z/noexcept-type9.C contains a dg-error but is currently a runtime 
test. The intent was clearly to make it a compile test to test that error, 
especially since there is no main. This patch changes it to a compile test.


ChangeLog entry is as follows:

*** gcc/testsuite/ChangeLog ***

2016-11-09  Thomas Preud'homme  

PR testsuite/78269
* g++.dg/cpp1z/noexcept-type9.C: Make it a compile test.


Applied as obvious.

Best regards,

Thomas
diff --git a/gcc/testsuite/g++.dg/cpp1z/noexcept-type9.C b/gcc/testsuite/g++.dg/cpp1z/noexcept-type9.C
index 4547c4cc4d3c2bf3f7e049b4e6c1179b85982a62..a29618a3001266b003ab966daa80ca5e8d5020da 100644
--- a/gcc/testsuite/g++.dg/cpp1z/noexcept-type9.C
+++ b/gcc/testsuite/g++.dg/cpp1z/noexcept-type9.C
@@ -1,6 +1,6 @@
 // Test for PMF template args.
 // { dg-options -std=c++1z }
-// { dg-do run }
+// { dg-do compile }
 
 struct A
 {


[PATCH] Remove redundant canonicalize_comparison from DOM

2016-11-09 Thread Richard Biener

DOM currently canonicalizes GIMPLE_COND operand order.  But it does
so before constant/copy propagation into that stmt (so quite pointless)
and furthermore we fold the stmt if we constant/copy propagated
into it and fold already does the canonicalization.

Thus, removed.

Bootstrap & regtest pending on x86_64-unknown-linux-gnu.

Richard.

2016-11-09  Richard Biener  

* tree-ssa-dom.c (canonicalize_comparison): Remove.
(optimize_stmt): Remove redundant pre-propagation canonicalization
of comparison operand order.

Index: gcc/tree-ssa-dom.c
===
--- gcc/tree-ssa-dom.c  (revision 241992)
+++ gcc/tree-ssa-dom.c  (working copy)
@@ -753,49 +753,6 @@ make_pass_dominator (gcc::context *ctxt)
 }
 
 
-/* Given a conditional statement CONDSTMT, convert the
-   condition to a canonical form.  */
-
-static void
-canonicalize_comparison (gcond *condstmt)
-{
-  tree op0;
-  tree op1;
-  enum tree_code code;
-
-  gcc_assert (gimple_code (condstmt) == GIMPLE_COND);
-
-  op0 = gimple_cond_lhs (condstmt);
-  op1 = gimple_cond_rhs (condstmt);
-
-  code = gimple_cond_code (condstmt);
-
-  /* If it would be profitable to swap the operands, then do so to
- canonicalize the statement, enabling better optimization.
-
- By placing canonicalization of such expressions here we
- transparently keep statements in canonical form, even
- when the statement is modified.  */
-  if (tree_swap_operands_p (op0, op1, false))
-{
-  /* For relationals we need to swap the operands
-and change the code.  */
-  if (code == LT_EXPR
- || code == GT_EXPR
- || code == LE_EXPR
- || code == GE_EXPR)
-   {
-  code = swap_tree_comparison (code);
-
-  gimple_cond_set_code (condstmt, code);
-  gimple_cond_set_lhs (condstmt, op1);
-  gimple_cond_set_rhs (condstmt, op0);
-
-  update_stmt (condstmt);
-   }
-}
-}
-
 /* A trivial wrapper so that we can present the generic jump
threading code with a simple API for simplifying statements.  */
 static tree
@@ -1789,9 +1746,6 @@ optimize_stmt (basic_block bb, gimple_st
   print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
 }
 
-  if (gimple_code (stmt) == GIMPLE_COND)
-canonicalize_comparison (as_a  (stmt));
-
   update_stmt_if_modified (stmt);
   opt_stats.num_stmts++;
 


Re: [PATCH, vec-tails] Support loop epilogue vectorization

2016-11-09 Thread Yuri Rumyantsev
Thanks Richard for your comments.
Your proposed to handle epilogue loop just like normal short-trip loop
but this is not exactly truth since e.g. epilogue must not be peeled
for alignment.

It is not clear for me what are my next steps? Should I re-design the
patch completely or simply decompose the whole patch to different
parts? But it means that we must start review process from beginning
but release is closed to its end.
Note also that i work for Intel till the end of year and have not idea
who will continue working on this project.

Any help will be appreciated.

Thanks.
Yuri.

2016-11-09 13:37 GMT+03:00 Bin.Cheng :
> On Tue, Nov 1, 2016 at 12:38 PM, Yuri Rumyantsev  wrote:
>> Hi All,
>>
>> I re-send all patches sent by Ilya earlier for review which support
>> vectorization of loop epilogues and loops with low trip count. We
>> assume that the only patch - vec-tails-07-combine-tail.patch - was not
>> approved by Jeff.
>>
>> I did re-base of all patches and performed bootstrapping and
>> regression testing that did not show any new failures. Also all
>> changes related to new vect_do_peeling algorithm have been changed
>> accordingly.
>>
>> Is it OK for trunk?
>
> Hi,
> I can't approve patches, but had some comments after going through the
> implementation.
>
> One confusing part is cost model change, as well as the way it's used
> to decide how epilogue loop should be vectorized.  Given vect-tail is
> disabled at the moment and the cost change needs further tuning, is it
> reasonable to split this part out and get vectorization part
> reviewed/submitted independently?  For example, let user specified
> parameters make the decision for now.  Cost and target dependent
> changes should go in at last, this could make the patch easier to
> read.
>
> The implementation computes/shares quite amount information from main
> loop to epilogue loop vectorization.  Furthermore, variables/fields
> for such information are somehow named in a misleading way.  For
> example. LOOP_VINFO_MASK_EPILOGUE gives me the impression this is the
> flag controlling whether epilogue loop should be vectorized with
> masking.  However, it's actually controlled by exactly the same flag
> as whether epilogue loop should be combined into the main loop with
> masking:
> @@ -7338,6 +8013,9 @@ vect_transform_loop (loop_vec_info loop_vinfo)
>
>slpeel_make_loop_iterate_ntimes (loop, niters_vector);
>
> +  if (LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo))
> +vect_combine_loop_epilogue (loop_vinfo);
> +
>/* Reduce loop iterations by the vectorization factor.  */
>scale_loop_profile (loop, GCOV_COMPUTE_SCALE (1, vf),
>expected_iterations / vf);
>
> IMHO, we should decouple main loop vectorization and epilogue
> vectorization as much as possible by sharing as few information as we
> can.  The general idea is to handle epilogue loop just like normal
> short-trip loop.  For example, we can rename
> LOOP_VINFO_COMBINE_EPILOGUE into LOOP_VINFO_VECT_MASK (or something
> else), and we don't differentiate its meaning between main and
> epilogue(short-trip) loop.  It only indicates the current loop should
> be vectorized with masking no matter it's a main loop or epilogue
> loop, and it works just like the current implementation.
>
> After this change, we can refine vectorization and make it more
> general for normal loop and epilogue(short trip) loop.  For example,
> this implementation sets LOOP_VINFO_PEELING_FOR_NITER  for epilogue
> loop and use it to control how it should be vectorized:
> +  if (!LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo))
> +{
> +  LOOP_VINFO_MASK_EPILOGUE (loop_vinfo) = false;
> +  LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo) = false;
> +}
> +  else if (LOOP_VINFO_CAN_BE_MASKED (loop_vinfo)
> +   && min_profitable_combine_iters >= 0)
> +{
>
> This works, but not that good for understanding or maintaining.
>
> Thanks,
> bin


Re: [PATCH, vec-tails] Support loop epilogue vectorization

2016-11-09 Thread Bin.Cheng
On Wed, Nov 9, 2016 at 11:28 AM, Yuri Rumyantsev  wrote:
> Thanks Richard for your comments.
> Your proposed to handle epilogue loop just like normal short-trip loop
> but this is not exactly truth since e.g. epilogue must not be peeled
> for alignment.
Yes there must be some differences, my motivation is to minimize that
so we don't need to specially check normal/epilogue loops at too many
places.
Of course it's just my feeling when going through the patch set, and
could be wrong.

Thanks,
bin
>
> It is not clear for me what are my next steps? Should I re-design the
> patch completely or simply decompose the whole patch to different
> parts? But it means that we must start review process from beginning
> but release is closed to its end.
> Note also that i work for Intel till the end of year and have not idea
> who will continue working on this project.
>
> Any help will be appreciated.
>
> Thanks.
> Yuri.
>
> 2016-11-09 13:37 GMT+03:00 Bin.Cheng :
>> On Tue, Nov 1, 2016 at 12:38 PM, Yuri Rumyantsev  wrote:
>>> Hi All,
>>>
>>> I re-send all patches sent by Ilya earlier for review which support
>>> vectorization of loop epilogues and loops with low trip count. We
>>> assume that the only patch - vec-tails-07-combine-tail.patch - was not
>>> approved by Jeff.
>>>
>>> I did re-base of all patches and performed bootstrapping and
>>> regression testing that did not show any new failures. Also all
>>> changes related to new vect_do_peeling algorithm have been changed
>>> accordingly.
>>>
>>> Is it OK for trunk?
>>
>> Hi,
>> I can't approve patches, but had some comments after going through the
>> implementation.
>>
>> One confusing part is cost model change, as well as the way it's used
>> to decide how epilogue loop should be vectorized.  Given vect-tail is
>> disabled at the moment and the cost change needs further tuning, is it
>> reasonable to split this part out and get vectorization part
>> reviewed/submitted independently?  For example, let user specified
>> parameters make the decision for now.  Cost and target dependent
>> changes should go in at last, this could make the patch easier to
>> read.
>>
>> The implementation computes/shares quite amount information from main
>> loop to epilogue loop vectorization.  Furthermore, variables/fields
>> for such information are somehow named in a misleading way.  For
>> example. LOOP_VINFO_MASK_EPILOGUE gives me the impression this is the
>> flag controlling whether epilogue loop should be vectorized with
>> masking.  However, it's actually controlled by exactly the same flag
>> as whether epilogue loop should be combined into the main loop with
>> masking:
>> @@ -7338,6 +8013,9 @@ vect_transform_loop (loop_vec_info loop_vinfo)
>>
>>slpeel_make_loop_iterate_ntimes (loop, niters_vector);
>>
>> +  if (LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo))
>> +vect_combine_loop_epilogue (loop_vinfo);
>> +
>>/* Reduce loop iterations by the vectorization factor.  */
>>scale_loop_profile (loop, GCOV_COMPUTE_SCALE (1, vf),
>>expected_iterations / vf);
>>
>> IMHO, we should decouple main loop vectorization and epilogue
>> vectorization as much as possible by sharing as few information as we
>> can.  The general idea is to handle epilogue loop just like normal
>> short-trip loop.  For example, we can rename
>> LOOP_VINFO_COMBINE_EPILOGUE into LOOP_VINFO_VECT_MASK (or something
>> else), and we don't differentiate its meaning between main and
>> epilogue(short-trip) loop.  It only indicates the current loop should
>> be vectorized with masking no matter it's a main loop or epilogue
>> loop, and it works just like the current implementation.
>>
>> After this change, we can refine vectorization and make it more
>> general for normal loop and epilogue(short trip) loop.  For example,
>> this implementation sets LOOP_VINFO_PEELING_FOR_NITER  for epilogue
>> loop and use it to control how it should be vectorized:
>> +  if (!LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo))
>> +{
>> +  LOOP_VINFO_MASK_EPILOGUE (loop_vinfo) = false;
>> +  LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo) = false;
>> +}
>> +  else if (LOOP_VINFO_CAN_BE_MASKED (loop_vinfo)
>> +   && min_profitable_combine_iters >= 0)
>> +{
>>
>> This works, but not that good for understanding or maintaining.
>>
>> Thanks,
>> bin


Re: [PATCH] DECL_RTL and DECL_RTL_INCOMING in RTL dumps

2016-11-09 Thread Bernd Schmidt

On 11/08/2016 07:03 PM, David Malcolm wrote:

int __RTL("rtl-combine") f1 (int n)
{
(function "f1"
  (param "n"
(DECL_RTL
  (reg/v:SI %1 [ n ])
) ;; DECL_RTL


The ;; DECL_RTL etc. comments seem somewhat redundant and add clutter. 
Please remove those.


Also, why is the closing paren on its own line? That doesn't seem right.

Later (not for this patch) I'd really like to see some logic not to add 
linebreaks before simple expressions, so that we'd have (DECL_RTL 
(reg:SI xyz)) on a single line.



Bernd



Re: [PATCH, vec-tails] Support loop epilogue vectorization

2016-11-09 Thread Yuri Rumyantsev
I am familiar with SVE extension and understand that implemented
approach might be not suitable for ARM. The main point is that only
load/store instructions are masked but all other calculations are not
(we did special conversion for reduction statements to implement
merging predication semantic). For SVE peeling for niters is not
required but it is not true for x86 -  we must determine what
vectorization scheme is more profitable: loop combining (the only
essential for SVE) or separate epilogue vectorization using masking or
less vectorization factor. So I'd like to have the full list of
required changes to our implementation to try to remove them.

Thanks.
Yuri.

2016-11-09 14:46 GMT+03:00 Bin.Cheng :
> On Wed, Nov 9, 2016 at 11:28 AM, Yuri Rumyantsev  wrote:
>> Thanks Richard for your comments.
>> Your proposed to handle epilogue loop just like normal short-trip loop
>> but this is not exactly truth since e.g. epilogue must not be peeled
>> for alignment.
> Yes there must be some differences, my motivation is to minimize that
> so we don't need to specially check normal/epilogue loops at too many
> places.
> Of course it's just my feeling when going through the patch set, and
> could be wrong.
>
> Thanks,
> bin
>>
>> It is not clear for me what are my next steps? Should I re-design the
>> patch completely or simply decompose the whole patch to different
>> parts? But it means that we must start review process from beginning
>> but release is closed to its end.
>> Note also that i work for Intel till the end of year and have not idea
>> who will continue working on this project.
>>
>> Any help will be appreciated.
>>
>> Thanks.
>> Yuri.
>>
>> 2016-11-09 13:37 GMT+03:00 Bin.Cheng :
>>> On Tue, Nov 1, 2016 at 12:38 PM, Yuri Rumyantsev  wrote:
 Hi All,

 I re-send all patches sent by Ilya earlier for review which support
 vectorization of loop epilogues and loops with low trip count. We
 assume that the only patch - vec-tails-07-combine-tail.patch - was not
 approved by Jeff.

 I did re-base of all patches and performed bootstrapping and
 regression testing that did not show any new failures. Also all
 changes related to new vect_do_peeling algorithm have been changed
 accordingly.

 Is it OK for trunk?
>>>
>>> Hi,
>>> I can't approve patches, but had some comments after going through the
>>> implementation.
>>>
>>> One confusing part is cost model change, as well as the way it's used
>>> to decide how epilogue loop should be vectorized.  Given vect-tail is
>>> disabled at the moment and the cost change needs further tuning, is it
>>> reasonable to split this part out and get vectorization part
>>> reviewed/submitted independently?  For example, let user specified
>>> parameters make the decision for now.  Cost and target dependent
>>> changes should go in at last, this could make the patch easier to
>>> read.
>>>
>>> The implementation computes/shares quite amount information from main
>>> loop to epilogue loop vectorization.  Furthermore, variables/fields
>>> for such information are somehow named in a misleading way.  For
>>> example. LOOP_VINFO_MASK_EPILOGUE gives me the impression this is the
>>> flag controlling whether epilogue loop should be vectorized with
>>> masking.  However, it's actually controlled by exactly the same flag
>>> as whether epilogue loop should be combined into the main loop with
>>> masking:
>>> @@ -7338,6 +8013,9 @@ vect_transform_loop (loop_vec_info loop_vinfo)
>>>
>>>slpeel_make_loop_iterate_ntimes (loop, niters_vector);
>>>
>>> +  if (LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo))
>>> +vect_combine_loop_epilogue (loop_vinfo);
>>> +
>>>/* Reduce loop iterations by the vectorization factor.  */
>>>scale_loop_profile (loop, GCOV_COMPUTE_SCALE (1, vf),
>>>expected_iterations / vf);
>>>
>>> IMHO, we should decouple main loop vectorization and epilogue
>>> vectorization as much as possible by sharing as few information as we
>>> can.  The general idea is to handle epilogue loop just like normal
>>> short-trip loop.  For example, we can rename
>>> LOOP_VINFO_COMBINE_EPILOGUE into LOOP_VINFO_VECT_MASK (or something
>>> else), and we don't differentiate its meaning between main and
>>> epilogue(short-trip) loop.  It only indicates the current loop should
>>> be vectorized with masking no matter it's a main loop or epilogue
>>> loop, and it works just like the current implementation.
>>>
>>> After this change, we can refine vectorization and make it more
>>> general for normal loop and epilogue(short trip) loop.  For example,
>>> this implementation sets LOOP_VINFO_PEELING_FOR_NITER  for epilogue
>>> loop and use it to control how it should be vectorized:
>>> +  if (!LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo))
>>> +{
>>> +  LOOP_VINFO_MASK_EPILOGUE (loop_vinfo) = false;
>>> +  LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo) = false;
>>> +}
>>> +  else if (LOOP_VINFO_CAN_BE_MASKED (loop_vinfo)
>>> + 

[WIP C++ PATCH] P0217R3 - C++17 structured bindings

2016-11-09 Thread Jakub Jelinek
Hi!

The following patch is a WIP on P0217R3 - decomposition declarations.
It contains various FIXMEs, Jason, do you think you could finish it up?

The most important unfinished part in the patch is that cp_finish_decomp
for classes doesn't try to check if std::tuple_size::value is
well-formed integral constant expression and use decl.get() or
get(decl) as initializers and std::tuple_element::type
as type of the individual vars (that need cp_finish_decl then and
dropping of DEC_HAS_VALUE_EXPR_P/DECL_VALUE_EXPR if they have any)
- template-ids, lookups, instantiations isn't something I'm comfortable
enough with to write that.

Another thing is with the non-reference decompositions - I think
int arr[2];
...
auto [ x, y ] = arr;
works properly as copy-initialization, avoids explicit copy constructors,
as I've tried to test in the decomp6.C, though the templates used during
the VEC_INIT_EXPR gimplification aren't actually instantiated (see the
fixme routine in the test that really shouldn't be necessary).
But
auto [ x, y ] { arr };
doesn't seem to work correctly, see the commented out part of decomp6.C,
the cp-gimplify.c hunk is kind of hackish, but it seems to try to
use A::A conversion ctor while I believe it should use the
A::A(const A &) copy ctor.

Another thing is in find_decomp_class_base - I think the current code
will just reject if the same base class with any non-static data members
appears as virtual base in more than one place in the bases tree,
not really sure what should be done, how to check if the paths to
that base are accessible etc.

Another thing is in find_decomp_class_base caller, while
that function can return some base class if the only non-static data
members are in such a base and not in the derived class(es), the caller
isn't prepared to expand that, not sure if we need build_base_path or what
to actually generate the COMPONENT_REFs for the artificial FIELD_DECLs
or what.

Not sure about how the qualifiers from class fields and from the
cv-qualifiers on the decomposition declaration should be treated, at the
moment I'm oring them in.

The match.pd hunk is needed, otherwise the generic folding happily folds
int arr[2];
...
auto [ x, y ] = arr;
&x == &arr[0]
into 0, because it thinks x and arr are distinct VAR_DECLs.  Though, if
such comparisons are required to be folded in constexpr contexts under
certain conditions, we'd need to handle the DECL_VALUE_EXPRs in constexpr.c
somehow.

Mangling of the decomposition declaration base decls at namespace scope
isn't implemented; I think the DECL_ASSEMBLER_NAME for them (they have
NULL DECL_NAME) could be done in cp_finish_decomp, where we have all the
corresponding identifiers available, but it would probably need to be done
through direct calls to the demangler, as the base decl isn't all that is
needed for that.

Per IRC discussions, the associated VAR_DECLs are chained through
DECL_CHAIN.  It isn't that easy though, because decls are pushed into the
bindings in reverse order and afterwards nreversed.  Plus at least for
the (not yet implemented) std::tuple_size::value stuff where the
VAR_DECLs shouldn't have DECL_VALUE_EXPR, the desired final order is
that the nameless artificial base decl comes first and then the other
decls in the order they appear.  But that means that at cp_finish_decl
time of the base decl they are actually in the order y, x, D.1234
for tha above auto [ x, y ] = arr; - so the finalizing of those is done
by separate cp_finish_decomp that needs to know the start of the chain
(the last named variable of the decomposition) plus the base artificial
decl and count.  For range for it uses the DECL_VALUE_EXPRs to find
those, maybe it would be better to pass around during parsing not just
address of a tree (the range decl), but address of a triplet (range decl, first
decomp decl in the chain (i.e. last one) and count, so that range for
parsing doesn't have to rediscover those.  The base decl has the
DECL_DECOMPOSITION_P flag on it set and NULL_TREE DECL_NAME, the
instantiation code in the patch doesn't parse DECL_VALUE_EXPRs though,
because if the decomp initializer is not type dependent, those
DECL_VALUE_EXPRs aren't actually in a standardized form that could be
easily parseable, so it just assumes those named decls have also
DECL_DECOMPOSITION_P flag on and non-NULL DECL_NAME and follow
the base decl in DECL_CHAIN (the new ones created with tsubst_expr
in reverse order again).  If nothing attempts to fold stuff in templates,
perhaps we could avoid setting DECL_VALUE_EXPRs at all when
processing_template_decl?

--- gcc/match.pd.jj 2016-11-07 18:32:56.0 +0100
+++ gcc/match.pd2016-11-08 14:00:05.391773322 +0100
@@ -2547,8 +2547,15 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (with
  {
int equal = 2;
-   if (decl_in_symtab_p (base0)
-  && decl_in_symtab_p (base1))
+   /* Punt in GENERIC on variables with value expressions;
+ the value expressions might point to fields/eleme

Re: [PATCH, vec-tails] Support loop epilogue vectorization

2016-11-09 Thread Bin.Cheng
On Wed, Nov 9, 2016 at 12:12 PM, Yuri Rumyantsev  wrote:
> I am familiar with SVE extension and understand that implemented
> approach might be not suitable for ARM. The main point is that only
> load/store instructions are masked but all other calculations are not
> (we did special conversion for reduction statements to implement
> merging predication semantic). For SVE peeling for niters is not
> required but it is not true for x86 -  we must determine what
> vectorization scheme is more profitable: loop combining (the only
> essential for SVE) or separate epilogue vectorization using masking or
> less vectorization factor. So I'd like to have the full list of
> required changes to our implementation to try to remove them.
Hmm, sorry that my comment gave impression that I was trying to hold
back the patch, it's not what I meant by any means.  Also it's not
related to SVE, As a matter of fact, I haven't read any document about
SVE yet.  Sorry again for the false impression conveyed by previous
messages.

Thanks,
bin
>
> Thanks.
> Yuri.
>
> 2016-11-09 14:46 GMT+03:00 Bin.Cheng :
>> On Wed, Nov 9, 2016 at 11:28 AM, Yuri Rumyantsev  wrote:
>>> Thanks Richard for your comments.
>>> Your proposed to handle epilogue loop just like normal short-trip loop
>>> but this is not exactly truth since e.g. epilogue must not be peeled
>>> for alignment.
>> Yes there must be some differences, my motivation is to minimize that
>> so we don't need to specially check normal/epilogue loops at too many
>> places.
>> Of course it's just my feeling when going through the patch set, and
>> could be wrong.
>>
>> Thanks,
>> bin
>>>
>>> It is not clear for me what are my next steps? Should I re-design the
>>> patch completely or simply decompose the whole patch to different
>>> parts? But it means that we must start review process from beginning
>>> but release is closed to its end.
>>> Note also that i work for Intel till the end of year and have not idea
>>> who will continue working on this project.
>>>
>>> Any help will be appreciated.
>>>
>>> Thanks.
>>> Yuri.
>>>
>>> 2016-11-09 13:37 GMT+03:00 Bin.Cheng :
 On Tue, Nov 1, 2016 at 12:38 PM, Yuri Rumyantsev  
 wrote:
> Hi All,
>
> I re-send all patches sent by Ilya earlier for review which support
> vectorization of loop epilogues and loops with low trip count. We
> assume that the only patch - vec-tails-07-combine-tail.patch - was not
> approved by Jeff.
>
> I did re-base of all patches and performed bootstrapping and
> regression testing that did not show any new failures. Also all
> changes related to new vect_do_peeling algorithm have been changed
> accordingly.
>
> Is it OK for trunk?

 Hi,
 I can't approve patches, but had some comments after going through the
 implementation.

 One confusing part is cost model change, as well as the way it's used
 to decide how epilogue loop should be vectorized.  Given vect-tail is
 disabled at the moment and the cost change needs further tuning, is it
 reasonable to split this part out and get vectorization part
 reviewed/submitted independently?  For example, let user specified
 parameters make the decision for now.  Cost and target dependent
 changes should go in at last, this could make the patch easier to
 read.

 The implementation computes/shares quite amount information from main
 loop to epilogue loop vectorization.  Furthermore, variables/fields
 for such information are somehow named in a misleading way.  For
 example. LOOP_VINFO_MASK_EPILOGUE gives me the impression this is the
 flag controlling whether epilogue loop should be vectorized with
 masking.  However, it's actually controlled by exactly the same flag
 as whether epilogue loop should be combined into the main loop with
 masking:
 @@ -7338,6 +8013,9 @@ vect_transform_loop (loop_vec_info loop_vinfo)

slpeel_make_loop_iterate_ntimes (loop, niters_vector);

 +  if (LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo))
 +vect_combine_loop_epilogue (loop_vinfo);
 +
/* Reduce loop iterations by the vectorization factor.  */
scale_loop_profile (loop, GCOV_COMPUTE_SCALE (1, vf),
expected_iterations / vf);

 IMHO, we should decouple main loop vectorization and epilogue
 vectorization as much as possible by sharing as few information as we
 can.  The general idea is to handle epilogue loop just like normal
 short-trip loop.  For example, we can rename
 LOOP_VINFO_COMBINE_EPILOGUE into LOOP_VINFO_VECT_MASK (or something
 else), and we don't differentiate its meaning between main and
 epilogue(short-trip) loop.  It only indicates the current loop should
 be vectorized with masking no matter it's a main loop or epilogue
 loop, and it works just like the current implementation.

 After this change, we can refine vectorization and 

Re: [RFC] Check number of uses in simplify_cond_using_ranges().

2016-11-09 Thread Richard Biener
On Tue, Nov 8, 2016 at 5:18 PM, Marc Glisse  wrote:
> On Tue, 8 Nov 2016, Dominik Vogt wrote:
>
>> On Fri, Nov 04, 2016 at 01:54:20PM +0100, Richard Biener wrote:
>>>
>>> On Fri, Nov 4, 2016 at 12:08 PM, Dominik Vogt 
>>> wrote:

 On Fri, Nov 04, 2016 at 09:47:26AM +0100, Richard Biener wrote:
>
> On Thu, Nov 3, 2016 at 4:03 PM, Dominik Vogt 
> wrote:
>>
>> Is VRP the right pass to do this optimisation or should a later
>> pass rather attempt to eliminate the new use of b_5 instead?  Uli
>> has brought up the idea a mini "sign extend elimination" pass that
>> checks if the result of a sign extend could be replaced by the
>> original quantity in all places, and if so, eliminate the ssa
>> name.  (I guess that won't help with the above code because l is
>> used also as a function argument.)  How could a sensible approach
>> to deal with the situation look like?
>
>
> We run into this kind of situation regularly and for general foldings
> in match.pd we settled with single_use () even though it is not
> perfect.
> Note the usual complaint is not extra extension instructions but
> the increase of register pressure.
>
> This is because it is hard to do better when you are doing local
> optimization.
>
> As for the question on whether VRP is the right pass to do this the
> answer is two-fold -- VRP has the most precise range information.
> But the folding itself should be moved to generic code and use
> get_range_info ().


 All right, is this a sensible approach then?
>>>
>>>
>>> Yes.
>>>
   1. Using has_single_use as in the experimental patch is good
  enough (provided further testing does not show serious
  regressions).
>>>
>>>
>>> I'd approve that, yes.
>>>
   2. Rip the whole top level if-block from simplify_cond_using_ranges().
   3. Translate that code to match.pd syntax.
>>>
>>>
>>> Might be some work but yes, that's also desired (you'd lose the ability
>>> to emit the warnings though).
>>
>>
>> Could you give me a match-pd-hint please?  We now have something
>> like this:
>>
>> (simplify
>>  (cond (gt SSA_NAME@0 INTEGER_CST@1) @2 @3)
>>  (if (... many conditions ...)
>>   (cond (gt ... ...) @2 @3))
>>
>> The generated code ends up in gimple_simplify_COND_EXPR, but when
>> gimple_simplify is actually called, it goes through the
>> GIMPLE_COND case and calls gimple_resimplify2(..., GT, ...) and
>> there it tries gimple_simplify_GT_EXPR(), peeling of the (cond
>> ...), i.e. it never tries the generated code.
>
>
> Not sure what you mean here.
>
>> There is another pattern in match.pd that uses a (cond ...) as the
>> first operand, and I do not understand how this works.  Should we
>> just use "(gt SSA_NAME@0 INTEGER_CST@1)" as the first operand
>> instead, and wouldn't this pattern be too general that way?
>
>
> IIUC, you are trying to move the second half of simplify_cond_using_ranges
> to match.pd. I don't see any reason to restrict it to the case where the
> comparison result is used directly in a COND_EXPR, so that would look like:
>
> (for cmp (...)
>  (simplify
>   (cmp (convert SSA_NAME@0) INTEGER_CST@1)
>   (if (...)
>(cmp @0 (convert @1)
>
> maybe? I think I may have missed your point.

Yeah, if you'd use (cond (gt ... then it only matches in assignments
with COND_EXPRs on the RHS, _not_ in GIMPLE_CONDs.

So you ran into the (cond vs. GIMPLE_COND "mismatch".

You'd essentially implement sth similar to shorten_compare in match.pd.

Btw, moving to match.pd shouldn't be a blocker for adding proper
single_use tests
just in case you get lost ...

Richard.

> (and yes, the first half would give a very general (simplify (cmp SSA_NAME@0
> INTEGER_CST@1) ...), that doesn't seem so bad)
>
> --
> Marc Glisse


[Patch, fortran] PR44265 - Link error with reference to parameter array in specification expression

2016-11-09 Thread Paul Richard Thomas
Dear All,

The title of this PR says what this is all about, except that it
applies uniquely applicable to character function result string
lengths.

Ian Harvey wrote the first patch for this PR for which many thanks.
However, two issues came up that took some little while to understand;
(i) In comment #1, it was found that calls from sibling procedures
could generate streams of undefined references at the link stage; and
(ii) An entity of the same name as the contained procedure entity in
module scope caused similar problems.

The relationship with Ian's patch is still obvious. The fundamental
difference is that the parameter arrays are automatically promoted to
module scope using a unique symtree. This fixes both the issues above.

Dominique, could you please check that the -m32 issue has gone away?

Bootstrapped and regtested on FC21/x86_64 - OK for trunk?

Paul

2016-11-09  Paul Thomas  

PR fortran/44265
* gfortran.h : Add fn_result_spec bitfield to gfc_symbol.
* resolve.c (flag_fn_result_spec): New function.
(resolve_fntype): Call it for character result lengths.
* symbol.c (gfc_new_symbol): Set fn_result_spec to zero.
* trans-decl.c (gfc_sym_mangled_identifier): Include the
procedure name in the mangled name for symbols with the
fn_result_spec bit set.
(gfc_get_symbol_decl): Mangle the name of these symbols.
(gfc_create_module_variable): Allow them through the assert.
(gfc_generate_function_code): Remove the assert before the
initialization of sym->tlink because the frontend no longer
uses this field.
* trans-expr.c (gfc_map_intrinsic_function): Add a case to
treat the LEN_TRIM intrinsic.

2016-11-09  Paul Thomas  

PR fortran/44265
* gfortran.dg/char_result_14.f90: New test.
* gfortran.dg/char_result_15.f90: New test.


-- 
The difference between genius and stupidity is; genius has its limits.

Albert Einstein
Index: gcc/fortran/gfortran.h
===
*** gcc/fortran/gfortran.h  (revision 241994)
--- gcc/fortran/gfortran.h  (working copy)
*** typedef struct gfc_symbol
*** 1498,1503 
--- 1498,1505 
unsigned equiv_built:1;
/* Set if this variable is used as an index name in a FORALL.  */
unsigned forall_index:1;
+   /* Set if the symbol is used in a function result specification .  */
+   unsigned fn_result_spec:1;
/* Used to avoid multiple resolutions of a single symbol.  */
unsigned resolved:1;
/* Set if this is a module function or subroutine with the
Index: gcc/fortran/resolve.c
===
*** gcc/fortran/resolve.c   (revision 241994)
--- gcc/fortran/resolve.c   (working copy)
*** resolve_equivalence (gfc_equiv *eq)
*** 15732,15737 
--- 15732,15776 
  }
  
  
+ /* Function called by resolve_fntype to flag other symbol used in the
+length type parameter specification of function resuls.  */
+ 
+ static bool
+ flag_fn_result_spec (gfc_expr *expr,
+  gfc_symbol *sym ATTRIBUTE_UNUSED,
+  int *f ATTRIBUTE_UNUSED)
+ {
+   gfc_namespace *ns;
+   gfc_symbol *s;
+ 
+   if (expr->expr_type == EXPR_VARIABLE)
+ {
+   s = expr->symtree->n.sym;
+   for (ns = s->ns; ns; ns = ns->parent)
+   if (!ns->parent)
+ break;
+ 
+   if (!s->fn_result_spec && s->ns->parent != NULL
+ && s->attr.flavor == FL_PARAMETER)
+   {
+ if (ns->proc_name && ns->proc_name->attr.flavor == FL_MODULE)
+   {
+ gfc_symtree *st;
+ s->fn_result_spec = 1;
+ /* Make sure that this symbol is translated as a module
+variable.  */
+ st = gfc_get_unique_symtree (ns);
+ st->n.sym = s;
+ s->refs++;
+   }
+ else if (s->attr.use_assoc || s->attr.used_in_submodule)
+   s->fn_result_spec = 1;
+   }
+ }
+   return false;
+ }
+ 
+ 
  /* Resolve function and ENTRY types, issue diagnostics if needed.  */
  
  static void
*** resolve_fntype (gfc_namespace *ns)
*** 15782,15787 
--- 15821,15829 
el->sym->attr.untyped = 1;
  }
}
+ 
+   if (sym->ts.type == BT_CHARACTER)
+ gfc_traverse_expr (sym->ts.u.cl->length, NULL, flag_fn_result_spec, 0);
  }
  
  
Index: gcc/fortran/symbol.c
===
*** gcc/fortran/symbol.c(revision 241994)
--- gcc/fortran/symbol.c(working copy)
*** gfc_new_symbol (const char *name, gfc_na
*** 2933,2938 
--- 2933,2939 
p->common_block = NULL;
p->f2k_derived = NULL;
p->assoc = NULL;
+   p->fn_result_spec = 0;

return p;
  }
Index: gcc/fortran/trans-decl.c
===
*** gcc/fortran/trans-decl.c(revision 241994)
--- gcc/fortran/trans-decl.c  

[PING][PATCH v2][aarch64] Add mcpu flag for Qualcomm falkor core

2016-11-09 Thread Siddhesh Poyarekar
Ping!

Siddhesh

On 4 November 2016 at 21:17, Siddhesh Poyarekar
 wrote:
> This adds an mcpu option for the upcoming Qualcomm Falkor core.  This
> is identical to the qdf24xx part that was added earlier and hence
> retains the same tuning structure and continues to have the a57
> pipeline for now.  The part number has also been changed and this
> patch fixes this for both qdf24xx and falkor options.
>
> Tested with aarch64 and armhf.
>
> Siddhesh
>
> * gcc/config/aarch64/aarch64-cores.def (qdf24xx): Update part
> number.
> (falkor): New core.
> * gcc/config/aarch64/aarch64-tune.md: Regenerated.
> * gcc/config/arm/arm-cores.def (falkor): New core.
> * gcc/config/arm/arm-tables.opt: Regenerated.
> * gcc/config/arm/arm-tune.md: Regenerated.
> * gcc/config/arm/bpabi.h (BE8_LINK_SPEC): Add falkor support.
> * gcc/config/arm/t-aprofile (MULTILIB_MATCHES): Add falkor
> support.
> * gcc/doc/invoke.texi (AArch64 Options/-mtune): Add falkor.
> (ARM Options/-mtune): Add falkor.
>
> ---
>  gcc/config/aarch64/aarch64-cores.def | 3 ++-
>  gcc/config/aarch64/aarch64-tune.md   | 2 +-
>  gcc/config/arm/arm-cores.def | 1 +
>  gcc/config/arm/arm-tables.opt| 3 +++
>  gcc/config/arm/arm-tune.md   | 2 +-
>  gcc/config/arm/bpabi.h   | 2 ++
>  gcc/config/arm/t-aprofile| 1 +
>  gcc/doc/invoke.texi  | 9 +
>  8 files changed, 16 insertions(+), 7 deletions(-)
>
> diff --git a/gcc/config/aarch64/aarch64-cores.def 
> b/gcc/config/aarch64/aarch64-cores.def
> index f9b7552..4b00f3f 100644
> --- a/gcc/config/aarch64/aarch64-cores.def
> +++ b/gcc/config/aarch64/aarch64-cores.def
> @@ -54,7 +54,8 @@ AARCH64_CORE("cortex-a73",  cortexa73, cortexa57, 8A,  
> AARCH64_FL_FOR_ARCH8 | AA
>  AARCH64_CORE("exynos-m1",   exynosm1,  exynosm1,  8A,  AARCH64_FL_FOR_ARCH8 
> | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, exynosm1,  0x53, 0x001)
>
>  /* Qualcomm ('Q') cores. */
> -AARCH64_CORE("qdf24xx", qdf24xx,   cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 
> | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, qdf24xx,   0x51, 0x800)
> +AARCH64_CORE("falkor",  falkor,cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 
> | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, qdf24xx,   0x51, 0xC00)
> +AARCH64_CORE("qdf24xx", qdf24xx,   cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 
> | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, qdf24xx,   0x51, 0xC00)
>
>  /* Cavium ('C') cores. */
>  AARCH64_CORE("thunderx",thunderx,  thunderx,  8A,  AARCH64_FL_FOR_ARCH8 
> | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx,  0x43, 0x0a1)
> diff --git a/gcc/config/aarch64/aarch64-tune.md 
> b/gcc/config/aarch64/aarch64-tune.md
> index 022b131..29afcdf 100644
> --- a/gcc/config/aarch64/aarch64-tune.md
> +++ b/gcc/config/aarch64/aarch64-tune.md
> @@ -1,5 +1,5 @@
>  ;; -*- buffer-read-only: t -*-
>  ;; Generated automatically by gentune.sh from aarch64-cores.def
>  (define_attr "tune"
> -   
> "cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,exynosm1,qdf24xx,thunderx,xgene1,vulcan,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53"
> +   
> "cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,exynosm1,falkor,qdf24xx,thunderx,xgene1,vulcan,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53"
> (const (symbol_ref "((enum attr_tune) aarch64_tune)")))
> diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def
> index 2072e1e..1bfec9c 100644
> --- a/gcc/config/arm/arm-cores.def
> +++ b/gcc/config/arm/arm-cores.def
> @@ -173,6 +173,7 @@ ARM_CORE("cortex-a57",  cortexa57, cortexa57,   8A,   
>   ARM_FSET_MAKE_CPU1 (FL_LDSCHED
>  ARM_CORE("cortex-a72", cortexa72, cortexa57,   8A, ARM_FSET_MAKE_CPU1 
> (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), cortex_a57)
>  ARM_CORE("cortex-a73", cortexa73, cortexa57,   8A, ARM_FSET_MAKE_CPU1 
> (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), cortex_a73)
>  ARM_CORE("exynos-m1",  exynosm1,  exynosm1,8A, ARM_FSET_MAKE_CPU1 
> (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), exynosm1)
> +ARM_CORE("falkor", falkor,cortexa57,   8A, ARM_FSET_MAKE_CPU1 
> (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), qdf24xx)
>  ARM_CORE("qdf24xx",qdf24xx,   cortexa57,   8A, ARM_FSET_MAKE_CPU1 
> (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), qdf24xx)
>  ARM_CORE("xgene1",  xgene1,xgene1,  8A,ARM_FSET_MAKE_CPU1 
> (FL_LDSCHED | FL_FOR_ARCH8A),xgene1)
>
> diff --git a/gcc/config/arm/arm-tables.opt b/gcc/config/arm/arm-tables.opt
> index ee9e3bb..7b15c8c 100644
> --- a/gcc/config/arm/arm-tables.opt
> +++ b/gcc/config/arm/arm-tables.opt
> @@ -328,6 +328,9 @@ EnumValue
>  Enum(processor_type) String(exynos-m1) Value(exynosm1)
>
>  EnumValue
> +Enum(processor_type) String(falkor) Value(falkor)
> +
> +EnumValue
>  Enum(processor_type) String(qdf24xx) Value(qdf24xx)
>
>  EnumValue
> diff --git a/gcc/config/arm/arm-tune.md b/gcc/config/arm/arm-tune.md

Re: [PATCH, vec-tails] Support loop epilogue vectorization

2016-11-09 Thread Richard Biener
On Wed, 9 Nov 2016, Yuri Rumyantsev wrote:

> Thanks Richard for your comments.
> Your proposed to handle epilogue loop just like normal short-trip loop
> but this is not exactly truth since e.g. epilogue must not be peeled
> for alignment.

But if we know the epilogue data-refs are aligned we should have reflected
that in the code so the vectorizer wouldn't even try to peel for
alignment.  OTOH peeling for alignment already knows that peeling
a short-trip loop is not likely profitable (maybe the condition needs
to be hardened to also work for VF/2).

> It is not clear for me what are my next steps? Should I re-design the
> patch completely or simply decompose the whole patch to different
> parts? But it means that we must start review process from beginning
> but release is closed to its end.

What I disliked about the series from the beginning is that it does
everything at once rather than first introducing vectorizing of
epilogues as an independent patch.  Lumping all in together makes
it hard to decipher the conditions each feature is enabled.

I'm mostly concerned about the predication part and thus if we can
get the other parts separated and committed that would be a much
smaller patch to look at and experiment.

Note that only stage1 is at its end, we usually still accept patches
that were posted before stage1 end during stage3.

> Note also that i work for Intel till the end of year and have not idea
> who will continue working on this project.

Noted.

Richard.

> Any help will be appreciated.
>
> Thanks.
> Yuri.
> 
> 2016-11-09 13:37 GMT+03:00 Bin.Cheng :
> > On Tue, Nov 1, 2016 at 12:38 PM, Yuri Rumyantsev  wrote:
> >> Hi All,
> >>
> >> I re-send all patches sent by Ilya earlier for review which support
> >> vectorization of loop epilogues and loops with low trip count. We
> >> assume that the only patch - vec-tails-07-combine-tail.patch - was not
> >> approved by Jeff.
> >>
> >> I did re-base of all patches and performed bootstrapping and
> >> regression testing that did not show any new failures. Also all
> >> changes related to new vect_do_peeling algorithm have been changed
> >> accordingly.
> >>
> >> Is it OK for trunk?
> >
> > Hi,
> > I can't approve patches, but had some comments after going through the
> > implementation.
> >
> > One confusing part is cost model change, as well as the way it's used
> > to decide how epilogue loop should be vectorized.  Given vect-tail is
> > disabled at the moment and the cost change needs further tuning, is it
> > reasonable to split this part out and get vectorization part
> > reviewed/submitted independently?  For example, let user specified
> > parameters make the decision for now.  Cost and target dependent
> > changes should go in at last, this could make the patch easier to
> > read.
> >
> > The implementation computes/shares quite amount information from main
> > loop to epilogue loop vectorization.  Furthermore, variables/fields
> > for such information are somehow named in a misleading way.  For
> > example. LOOP_VINFO_MASK_EPILOGUE gives me the impression this is the
> > flag controlling whether epilogue loop should be vectorized with
> > masking.  However, it's actually controlled by exactly the same flag
> > as whether epilogue loop should be combined into the main loop with
> > masking:
> > @@ -7338,6 +8013,9 @@ vect_transform_loop (loop_vec_info loop_vinfo)
> >
> >slpeel_make_loop_iterate_ntimes (loop, niters_vector);
> >
> > +  if (LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo))
> > +vect_combine_loop_epilogue (loop_vinfo);
> > +
> >/* Reduce loop iterations by the vectorization factor.  */
> >scale_loop_profile (loop, GCOV_COMPUTE_SCALE (1, vf),
> >expected_iterations / vf);
> >
> > IMHO, we should decouple main loop vectorization and epilogue
> > vectorization as much as possible by sharing as few information as we
> > can.  The general idea is to handle epilogue loop just like normal
> > short-trip loop.  For example, we can rename
> > LOOP_VINFO_COMBINE_EPILOGUE into LOOP_VINFO_VECT_MASK (or something
> > else), and we don't differentiate its meaning between main and
> > epilogue(short-trip) loop.  It only indicates the current loop should
> > be vectorized with masking no matter it's a main loop or epilogue
> > loop, and it works just like the current implementation.
> >
> > After this change, we can refine vectorization and make it more
> > general for normal loop and epilogue(short trip) loop.  For example,
> > this implementation sets LOOP_VINFO_PEELING_FOR_NITER  for epilogue
> > loop and use it to control how it should be vectorized:
> > +  if (!LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo))
> > +{
> > +  LOOP_VINFO_MASK_EPILOGUE (loop_vinfo) = false;
> > +  LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo) = false;
> > +}
> > +  else if (LOOP_VINFO_CAN_BE_MASKED (loop_vinfo)
> > +   && min_profitable_combine_iters >= 0)
> > +{
> >
> > This works, but not that good for understanding or mainta

[PATCH, ARM] Enable ldrd/strd peephole rules unconditionally

2016-11-09 Thread Bernd Edlinger
Hi!


This patch enables the ldrd/strd peephole rules unconditionally.

It is meant to fix cases, where the patch to reduce the sha512
stack usage splits ldrd/strd instructions into separate ldr/str insns,
but is technically independent from the other patch:

See https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00523.html

It was necessary to change check_effective_target_arm_prefer_ldrd_strd
to retain the true prefer_ldrd_strd tuning flag.


Bootstrapped and reg-tested on arm-linux-gnueabihf.
Is it OK for trunk?


Thanks
Bernd.2016-11-02  Bernd Edlinger  

	PR target/77308
	* config/arm/arm.md (*thumb2_ldrd, *thumb2_ldrd_base,
	*thumb2_ldrd_base_neg, *thumb2_strd, *thumb2_strd_base,
	*thumb2_strd_base_neg): Recognize insn regardless of
	current_tune->prefer_ldrd_strd.
	* config/arm/ldrdstrd.md: Enable all ldrd/strd peephole rules
	whenever possible.

testsuite:
2016-11-02  Bernd Edlinger  

	PR target/77308
	* gcc.target/arm/pr53447-5.c: New test.
	* lib/target-supports.exp
	(check_effective_target_arm_prefer_ldrd_strd): Adjust.

Index: gcc/config/arm/arm.md
===
--- gcc/config/arm/arm.md	(revision 241816)
+++ gcc/config/arm/arm.md	(working copy)
@@ -11636,7 +11636,6 @@
 (mem:SI (plus:SI (match_dup 1)
  (match_operand:SI 4 "const_int_operand" ""]
   "TARGET_LDRD && TARGET_THUMB2 && reload_completed
- && current_tune->prefer_ldrd_strd
  && ((INTVAL (operands[2]) + 4) == INTVAL (operands[4]))
  && (operands_ok_ldrd_strd (operands[0], operands[3],
   operands[1], INTVAL (operands[2]),
@@ -11653,7 +11652,6 @@
 (mem:SI (plus:SI (match_dup 1)
  (const_int 4]
   "TARGET_LDRD && TARGET_THUMB2 && reload_completed
- && current_tune->prefer_ldrd_strd
  && (operands_ok_ldrd_strd (operands[0], operands[2],
   operands[1], 0, false, true))"
   "ldrd%?\t%0, %2, [%1]"
@@ -11668,7 +11666,6 @@
(set (match_operand:SI 2 "s_register_operand" "=r")
 (mem:SI (match_dup 1)))]
   "TARGET_LDRD && TARGET_THUMB2 && reload_completed
- && current_tune->prefer_ldrd_strd
  && (operands_ok_ldrd_strd (operands[0], operands[2],
   operands[1], -4, false, true))"
   "ldrd%?\t%0, %2, [%1, #-4]"
@@ -11684,7 +11681,6 @@
  (match_operand:SI 3 "const_int_operand" "")))
 (match_operand:SI 4 "s_register_operand" "r"))]
   "TARGET_LDRD && TARGET_THUMB2 && reload_completed
- && current_tune->prefer_ldrd_strd
  && ((INTVAL (operands[1]) + 4) == INTVAL (operands[3]))
  && (operands_ok_ldrd_strd (operands[2], operands[4],
   operands[0], INTVAL (operands[1]),
@@ -11701,7 +11697,6 @@
  (const_int 4)))
 (match_operand:SI 2 "s_register_operand" "r"))]
   "TARGET_LDRD && TARGET_THUMB2 && reload_completed
- && current_tune->prefer_ldrd_strd
  && (operands_ok_ldrd_strd (operands[1], operands[2],
   operands[0], 0, false, false))"
   "strd%?\t%1, %2, [%0]"
@@ -11716,7 +11711,6 @@
(set (mem:SI (match_dup 0))
 (match_operand:SI 2 "s_register_operand" "r"))]
   "TARGET_LDRD && TARGET_THUMB2 && reload_completed
- && current_tune->prefer_ldrd_strd
  && (operands_ok_ldrd_strd (operands[1], operands[2],
   operands[0], -4, false, false))"
   "strd%?\t%1, %2, [%0, #-4]"
Index: gcc/config/arm/ldrdstrd.md
===
--- gcc/config/arm/ldrdstrd.md	(revision 241816)
+++ gcc/config/arm/ldrdstrd.md	(working copy)
@@ -29,9 +29,7 @@
 (match_operand:SI 2 "memory_operand" ""))
(set (match_operand:SI 1 "arm_general_register_operand" "")
 (match_operand:SI 3 "memory_operand" ""))]
-  "TARGET_LDRD
- && current_tune->prefer_ldrd_strd
- && !optimize_function_for_size_p (cfun)"
+  "TARGET_LDRD"
   [(const_int 0)]
 {
   if (!gen_operands_ldrd_strd (operands, true, false, false))
@@ -63,9 +61,7 @@
 	(match_operand:SI 0 "arm_general_register_operand" ""))
(set (match_operand:SI 3 "memory_operand" "")
 	(match_operand:SI 1 "arm_general_register_operand" ""))]
-  "TARGET_LDRD
- && current_tune->prefer_ldrd_strd
- && !optimize_function_for_size_p (cfun)"
+  "TARGET_LDRD"
   [(const_int 0)]
 {
   if (!gen_operands_ldrd_strd (operands, false, false, false))
@@ -102,9 +98,7 @@
 (match_operand:SI 5 "const_int_operand" ""))
(set (match_operand:SI 3 "memory_operand" "")
 (match_dup 1))]
- "TARGET_LDRD
-  && current_tune->prefer_ldrd_strd
-  && !optimize_function_for_size_p (cfun)"
+  "TARGET_LDRD"
   [(const_int 0)]
 {
   if (!gen_operands_ldrd_strd (operands, false, true, false))
@@ -147,10 +141,8 @@
 (match_dup 0))
(set (match_operand:SI 3 "memory_operand" "")
 (match_dup 1))]
- "TARGET_LDRD
-  && c

Re: [PATCH, RFC] Improve ivopts group costs

2016-11-09 Thread Bin.Cheng
On Thu, Nov 3, 2016 at 4:00 PM, Bin.Cheng  wrote:
> On Thu, Nov 3, 2016 at 1:35 PM, Evgeny Kudryashov  
> wrote:
>> Hello,
>>
>> I'm facing the following problem related to ivopts. The problem is that GCC
>> generates a lot of induction variables and as a result there is an
>> unnecessary increase of stack usage and register pressure.
>>
>> For instance, for the attached testcase (tc_ivopts.c) GCC generates 26
>> induction variables (25 for each of lhsX[{0-4}][{0-4}][k] and one for
>> rhs[k][j][{0-2}]). You should be able to reproduce this issue on arm target
>> using GCC with "-O2 -mcpu=cortex-a9 -mtune=cortex-a9". For reference, I'm
>> attaching assembly I get on current trunk.
>>
>> The reason might be in use groups costs, in particular, in the way of
>> estimation. Currently, the cost of a tuple (group, candidate) is a sum of
>> per-use costs in the group. So, the cost of a group grows proportional to
>> the number of uses in the group. This approach has a negative effect on the
>> algorithm for finding the best set of induction variables: the part of a
>> total cost related to adding a new candidate is almost washed out by the
>> cost of the group. In addition, when there is a lot of groups with many uses
>> inside and a target is out of free registers, the cost of spill is washing
>> out too. As a result, GCC prefers to use native candidates (candidate
>> created for a particular group) for each group of uses instead of
>> considering the real cost of introducing a new variable into a set.
>>
>> The summing approach was added as a part of this patch
>> (https://gcc.gnu.org/ml/gcc-patches/2015-05/msg00641.html) and the
>> motivation for taking the sum does not seem to be
>> specifically discussed.
>>
>> I propose the following patch that changes a group cost from cost summing to
>> selecting the largest one inside a group. For the given test case I have:
>> necessary size of stack is decreased by almost 3 times and ldr\str amount
>> are decreased by less than 2 times. Also I'm attaching assembly after
>> applying the patch.
>>
>> The essential change in the patch is just:
>>
>> diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
>> index f9211ad..a149418 100644
>> --- a/gcc/tree-ssa-loop-ivopts.c
>> +++ b/gcc/tree-ssa-loop-ivopts.c
>> @@ -5151,7 +5151,8 @@ determine_group_iv_cost_address (struct ivopts_data
>> *data,
>>  offset and where the iv_use is.  */
>> cost = get_computation_cost (data, next, cand, true,
>>  NULL, &can_autoinc, NULL);
>> -  sum_cost += cost;
>> +  if (sum_cost < cost)
>> +sum_cost = cost;
>>  }
>>set_group_iv_cost (data, group, cand, sum_cost, depends_on,
>>  NULL_TREE, ERROR_MARK, inv_expr);
>>
>> Any suggestions?
> Hi Evgeny,
>
> I tend to be practical here.  Given cost is not model that well in
> IVOPT, any heuristic tuning in cost is quite likely resulting in
> improvement in some cases, while regressions in others.  At least we
> need to run good number of benchmarks for such changes.  Specific to
> this one, the summary of cost in a group is a natural result of the
> original code, in which uses' cost is added up before candidate
> selection.  Take your code as an example, choosing loop's original
> candidate for group results in lots of memory accesses with [base +
> index << scale] addressing mode, which could have higher cost than
> [base + offset] mode wrto u-arch, IMHO, it makes sense to add up this
> cost.  OTOH, I totally agree that IVOPT tends to choose too many
> candidates at the moment, especially for large loops.  Is it possible
> to achieve the same effect by penalizing register pressure cost?
> Meanwhile, I can collect benchmark data for your patch on AArch64 and
> get back to you later.
I collected spec2k6 data on one AArch64 processors, it doesn't give
meaningful improvement or regression.  Looking at the test, it boils
down to how we choose induction variable for loops having below memory
references:
for (biv)
  MEM[base + biv << scale + off1];
  MEM[base + biv << scale + off2];
  // ...
  MEM[base + biv << scale + offn];

On targets support [base + index << scale + offset] addressing mode ,
the biv should be preferred (if cost of the addressing mode is not
blocking) thus register pressure is 2.  While on targets only support
[base + index << scale], it is more complicated.  Choosing biv
actually increases the register pressure by 1 than choosing {base_1 +
biv << scale, step} as the candidate, or an additional computation
outside of address expression is required for each memory referece.
Well, there is one exception when "base" is actually anticipated on
exit edge of this loop.  So this is really target dependent cost model
issue, IMHO, decreasing group cost in target-independent way is not
that good, what do you think?  I will look deeper why choosing biv can
get rid of spilling.

BTW, the case can be further extended:
for (biv)
  MEM[base_1 

Re: [WIP C++ PATCH] P0217R3 - C++17 structured bindings

2016-11-09 Thread Jakub Jelinek
Hi!

On Wed, Nov 09, 2016 at 01:24:22PM +0100, Jakub Jelinek wrote:
> The following patch is a WIP on P0217R3 - decomposition declarations.
> It contains various FIXMEs, Jason, do you think you could finish it up?

Some more comments:

Invalid? code like
int arr[2];
extern int x, y;
auto [ x, y ] = arr;
depends on PR78217 fix, so I haven't added testsuite coverage for that yet.
Nor for decomp at namespace scope.  There is no coverage for bitfields
either.

And the testsuite coverage surely needs to have some verification of the
exact types and cv quals of the individual decls, the tests only cover
their addresses.  As implemented in the patch, those decls with
DECL_VALUE_EXPRs have non-reference type always, not sure if it is ok
or not.

Jakub


[PATCH] Remove unneeded gcc_assert in gimplifier (PR sanitizer/78270)

2016-11-09 Thread Martin Liška
As shown in the attached test-case, the assert cannot always be true.
Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Ready to be installed?
Martin
>From b55459461f3f7396a094be6801082715ddb4b30d Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 9 Nov 2016 11:52:00 +0100
Subject: [PATCH] Remove unneeded gcc_assert in gimplifier (PR sanitizer/78270)

gcc/ChangeLog:

2016-11-09  Martin Liska  

	PR sanitizer/78270
	* gimplify.c (gimplify_switch_expr):

gcc/testsuite/ChangeLog:

2016-11-09  Martin Liska  

	PR sanitizer/78270
	* gcc.dg/asan/pr78269.c: New test.
---
 gcc/gimplify.c  |  1 -
 gcc/testsuite/gcc.dg/asan/pr78269.c | 13 +
 2 files changed, 13 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/asan/pr78269.c

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index e5930e6..15976e2 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -2266,7 +2266,6 @@ gimplify_switch_expr (tree *expr_p, gimple_seq *pre_p)
 
   labels = gimplify_ctxp->case_labels;
   gimplify_ctxp->case_labels = saved_labels;
-  gcc_assert (gimplify_ctxp->live_switch_vars->elements () == 0);
   delete gimplify_ctxp->live_switch_vars;
   gimplify_ctxp->live_switch_vars = saved_live_switch_vars;
 
diff --git a/gcc/testsuite/gcc.dg/asan/pr78269.c b/gcc/testsuite/gcc.dg/asan/pr78269.c
new file mode 100644
index 000..55840b0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/asan/pr78269.c
@@ -0,0 +1,13 @@
+// { dg-do compile }
+// { dg-additional-options "-Wno-switch-unreachable" }
+
+typedef struct
+{
+} bdaddr_t;
+
+int a;
+void fn1 ()
+{
+  switch (a)
+&(bdaddr_t){};
+}
-- 
2.10.1



Re: [PATCH] Remove unneeded gcc_assert in gimplifier (PR sanitizer/78270)

2016-11-09 Thread Jakub Jelinek
On Wed, Nov 09, 2016 at 02:16:45PM +0100, Martin Liška wrote:
> As shown in the attached test-case, the assert cannot always be true.
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
> 
> Ready to be installed?
> Martin

> >From b55459461f3f7396a094be6801082715ddb4b30d Mon Sep 17 00:00:00 2001
> From: marxin 
> Date: Wed, 9 Nov 2016 11:52:00 +0100
> Subject: [PATCH] Remove unneeded gcc_assert in gimplifier (PR sanitizer/78270)
> 
> gcc/ChangeLog:
> 
> 2016-11-09  Martin Liska  
> 
>   PR sanitizer/78270
>   * gimplify.c (gimplify_switch_expr):

No description on what you've changed.

That said, I'm not 100% sure it is the right fix.
As the testcase shows, for switch without GIMPLE_BIND wrapping the body
we can have variables that are in scope from the switch onwards.
I bet we could also have variables that go out of scope, say if in the
compound literal's initializer there is ({ ... }) that declares variables.
I doubt you can have a valid case/default label in those though, so
perhaps it would be simpler not to create live_switch_vars at all
if SWITCH_BODY is not a BIND_EXPR?

Jakub


Re: [PATCH] Remove unneeded gcc_assert in gimplifier (PR sanitizer/78270)

2016-11-09 Thread Martin Liška
On 11/09/2016 02:29 PM, Jakub Jelinek wrote:
> On Wed, Nov 09, 2016 at 02:16:45PM +0100, Martin Liška wrote:
>> As shown in the attached test-case, the assert cannot always be true.
>> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>>
>> Ready to be installed?
>> Martin
> 
>> >From b55459461f3f7396a094be6801082715ddb4b30d Mon Sep 17 00:00:00 2001
>> From: marxin 
>> Date: Wed, 9 Nov 2016 11:52:00 +0100
>> Subject: [PATCH] Remove unneeded gcc_assert in gimplifier (PR 
>> sanitizer/78270)
>>
>> gcc/ChangeLog:
>>
>> 2016-11-09  Martin Liska  
>>
>>  PR sanitizer/78270
>>  * gimplify.c (gimplify_switch_expr):
> 
> No description on what you've changed.
> 
> That said, I'm not 100% sure it is the right fix.
> As the testcase shows, for switch without GIMPLE_BIND wrapping the body
> we can have variables that are in scope from the switch onwards.
> I bet we could also have variables that go out of scope, say if in the
> compound literal's initializer there is ({ ... }) that declares variables.
> I doubt you can have a valid case/default label in those though, so
> perhaps it would be simpler not to create live_switch_vars at all
> if SWITCH_BODY is not a BIND_EXPR?

I like the approach you introduced! I'll re-trigger regression tests and
send a newer version of patch.

Martin

> 
>   Jakub
> 



[PATCH] Fix folding of memcmp("a", "a", 2) (PR, tree-optimization/78257)

2016-11-09 Thread Martin Liška
Hello.

Following patch fixes [1] (f0 function), where we have off-by-one error.
Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Ready to be installed?
Martin

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78257#c0
>From d7c46e5eb4d295d7653eae188cae04a8f7f0719f Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 9 Nov 2016 10:33:43 +0100
Subject: [PATCH] Fix folding of memcmp("a", "a", 2) (PR
 tree-optimization/78257)

gcc/ChangeLog:

2016-11-09  Martin Liska  

	* fold-const-call.c (fold_const_call): Fix the folding.

gcc/testsuite/ChangeLog:

2016-11-09  Martin Liska  

	* gcc.dg/tree-ssa/builtins-folding-generic.c (main): Add new
	test-case for memcmp.
	* gcc.dg/tree-ssa/builtins-folding-gimple.c: Likewise.
---
 gcc/fold-const-call.c| 10 +-
 gcc/gimplify.c   |  1 -
 gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-generic.c |  5 +
 gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c  |  6 ++
 4 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/gcc/fold-const-call.c b/gcc/fold-const-call.c
index 1b3a755..38f9717 100644
--- a/gcc/fold-const-call.c
+++ b/gcc/fold-const-call.c
@@ -1506,7 +1506,7 @@ tree
 fold_const_call (combined_fn fn, tree type, tree arg0, tree arg1, tree arg2)
 {
   const char *p0, *p1;
-  size_t s2 = 0;
+  size_t s0, s1, s2 = 0;
   switch (fn)
 {
 case CFN_BUILT_IN_STRNCMP:
@@ -1538,11 +1538,11 @@ fold_const_call (combined_fn fn, tree type, tree arg0, tree arg1, tree arg2)
   }
 case CFN_BUILT_IN_BCMP:
 case CFN_BUILT_IN_MEMCMP:
-  if ((p0 = c_getstr (arg0))
-	  && (p1 = c_getstr (arg1))
+  if ((p0 = c_getstr (arg0, &s0))
+	  && (p1 = c_getstr (arg1, &s1))
 	  && host_size_t_cst_p (arg2, &s2)
-	  && s2 <= strlen (p0)
-	  && s2 <= strlen (p1))
+	  && s2 <= s0
+	  && s2 <= s1)
 	return build_cmp_result (type, memcmp (p0, p1, s2));
   return NULL_TREE;
 
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index e5930e6..15976e2 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -2266,7 +2266,6 @@ gimplify_switch_expr (tree *expr_p, gimple_seq *pre_p)
 
   labels = gimplify_ctxp->case_labels;
   gimplify_ctxp->case_labels = saved_labels;
-  gcc_assert (gimplify_ctxp->live_switch_vars->elements () == 0);
   delete gimplify_ctxp->live_switch_vars;
   gimplify_ctxp->live_switch_vars = saved_live_switch_vars;
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-generic.c b/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-generic.c
index 175feff..3c7e001 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-generic.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-generic.c
@@ -66,6 +66,10 @@ main (void)
   if (__builtin_strncasecmp ("aab", "aac", 2) != 0)
 __builtin_abort ();
 
+  /* MEMCMP.  */
+  if (__builtin_memcmp ("a", "a", 6) != 0)
+__builtin_abort ();
+
   return 0;
 }
 
@@ -74,3 +78,4 @@ main (void)
 /* { dg-final { scan-tree-dump-not "__builtin_strncmp" "original" } } */
 /* { dg-final { scan-tree-dump-not "__builtin_strncasecmp" "original" } } */
 /* { dg-final { scan-tree-dump-not "__builtin_memchr" "original" } } */
+/* { dg-final { scan-tree-dump-not "__builtin_memcmp" "original" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c b/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c
index 283bd1c..d08ced5 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c
@@ -63,6 +63,7 @@ main (void)
 
   const char *a = "a";
   const char *hello = "hello";
+  const char *hello2 = "hello";
   const char *empty = "";
   const char *ab = "ab";
   const char *ba = "ba";
@@ -151,6 +152,10 @@ main (void)
   if (__builtin_strncasecmp (++s2, ++s3+2, 0) != 0 || s2 != s1+1 || s3 != s1+5)
 __builtin_abort();
 
+  /* MEMCMP.  */
+  if (__builtin_memcmp (hello, hello2, 6) != 0)
+__builtin_abort ();
+
   return 0;
 }
 
@@ -158,4 +163,5 @@ main (void)
 /* { dg-final { scan-tree-dump-not "__builtin_strcasecmp" "optimized" } } */
 /* { dg-final { scan-tree-dump-not "__builtin_strncmp" "optimized" } } */
 /* { dg-final { scan-tree-dump-not "__builtin_memchr" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "__builtin_memcmp" "optimized" } } */
 /* { dg-final { scan-tree-dump-times "__builtin_strncasecmp" 3 "optimized" } } */
-- 
2.10.1



Re: [PATCH] Fix folding of memcmp("a", "a", 2) (PR, tree-optimization/78257)

2016-11-09 Thread Richard Biener
On Wed, Nov 9, 2016 at 2:56 PM, Martin Liška  wrote:
> Hello.
>
> Following patch fixes [1] (f0 function), where we have off-by-one error.
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>
> Ready to be installed?

Ok (w/o the unrelated gimplify.c hunk).

Richard.

> Martin
>
> [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78257#c0


[PATCH] Remove unused parameter from tree_swap_operands_p

2016-11-09 Thread Richard Biener

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2016-11-09  Richard Biener  

* fold-const.c (tree_swap_operands_p): Remove unused arg.
* fold-const.c (tree_swap_operands_p): Likewise.
(fold_binary_loc): Adjust.
(fold_ternary_loc): Likewise.
* genmatch.c (dt_operand::gen_gimple_exp): Likewise.
* gimple-fold.c (fold_stmt_1): Likewise.
* gimple-match-head.c (gimple_resimplify2): Likewise.
(gimple_resimplify3): Likewise.
(gimple_simplify): Likewise.
* tree-ssa-dom.c (record_equality): Likewise.
* tree-ssa-reassoc.c (optimize_range_tests_var_bound): Likewise.
* tree-ssa-sccvn.c (vn_nary_op_compute_hash): Likewise.
* tree-ssa-threadedge.c (simplify_control_stmt_condition_1): Likewise.

diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index 603aff0..a774848 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -9159,13 +9159,13 @@ fold_binary_loc (location_t loc,
   /* If this is a commutative operation, and ARG0 is a constant, move it
  to ARG1 to reduce the number of tests below.  */
   if (commutative_tree_code (code)
-  && tree_swap_operands_p (arg0, arg1, true))
+  && tree_swap_operands_p (arg0, arg1))
 return fold_build2_loc (loc, code, type, op1, op0);
 
   /* Likewise if this is a comparison, and ARG0 is a constant, move it
  to ARG1 to reduce the number of tests below.  */
   if (kind == tcc_comparison
-  && tree_swap_operands_p (arg0, arg1, true))
+  && tree_swap_operands_p (arg0, arg1))
 return fold_build2_loc (loc, swap_tree_comparison (code), type, op1, op0);
 
   tem = generic_simplify (loc, code, type, op0, op1);
@@ -11271,7 +11271,7 @@ fold_ternary_loc (location_t loc, enum tree_code code, 
tree type,
   /* If this is a commutative operation, and OP0 is a constant, move it
  to OP1 to reduce the number of tests below.  */
   if (commutative_ternary_tree_code (code)
-  && tree_swap_operands_p (op0, op1, true))
+  && tree_swap_operands_p (op0, op1))
 return fold_build3_loc (loc, code, type, op1, op0, op2);
 
   tem = generic_simplify (loc, code, type, op0, op1, op2);
@@ -11400,7 +11400,7 @@ fold_ternary_loc (location_t loc, enum tree_code code, 
tree type,
   /* If the second operand is simpler than the third, swap them
 since that produces better jump optimization results.  */
   if (truth_value_p (TREE_CODE (arg0))
- && tree_swap_operands_p (op1, op2, false))
+ && tree_swap_operands_p (op1, op2))
{
  location_t loc0 = expr_location_or (arg0, loc);
  /* See if this can be inverted.  If it can't, possibly because
diff --git a/gcc/fold-const.h b/gcc/fold-const.h
index ae37142..46dcd28 100644
--- a/gcc/fold-const.h
+++ b/gcc/fold-const.h
@@ -124,7 +124,7 @@ extern tree build_invariant_address (tree, tree, 
HOST_WIDE_INT);
 extern tree constant_boolean_node (bool, tree);
 extern tree div_if_zero_remainder (const_tree, const_tree);
 
-extern bool tree_swap_operands_p (const_tree, const_tree, bool);
+extern bool tree_swap_operands_p (const_tree, const_tree);
 extern enum tree_code swap_tree_comparison (enum tree_code);
 
 extern bool ptr_difference_const (tree, tree, HOST_WIDE_INT *);
diff --git a/gcc/genmatch.c b/gcc/genmatch.c
index b14034d..41951c5 100644
--- a/gcc/genmatch.c
+++ b/gcc/genmatch.c
@@ -2701,7 +2701,7 @@ dt_operand::gen_gimple_expr (FILE *f, int indent)
  gen_opname (child_opname0, 0);
  gen_opname (child_opname1, 1);
  fprintf_indent (f, indent,
- "if (tree_swap_operands_p (%s, %s, false))\n",
+ "if (tree_swap_operands_p (%s, %s))\n",
  child_opname0, child_opname1);
  fprintf_indent (f, indent,
  "  std::swap (%s, %s);\n",
diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
index 5d46405..aabc8ff 100644
--- a/gcc/gimple-fold.c
+++ b/gcc/gimple-fold.c
@@ -4166,7 +4166,7 @@ fold_stmt_1 (gimple_stmt_iterator *gsi, bool inplace, 
tree (*valueize) (tree))
{
  tree rhs1 = gimple_assign_rhs1 (stmt);
  tree rhs2 = gimple_assign_rhs2 (stmt);
- if (tree_swap_operands_p (rhs1, rhs2, false))
+ if (tree_swap_operands_p (rhs1, rhs2))
{
  gimple_assign_set_rhs1 (stmt, rhs2);
  gimple_assign_set_rhs2 (stmt, rhs1);
@@ -4232,7 +4232,7 @@ fold_stmt_1 (gimple_stmt_iterator *gsi, bool inplace, 
tree (*valueize) (tree))
/* Canonicalize operand order.  */
tree lhs = gimple_cond_lhs (stmt);
tree rhs = gimple_cond_rhs (stmt);
-   if (tree_swap_operands_p (lhs, rhs, false))
+   if (tree_swap_operands_p (lhs, rhs))
  {
gcond *gc = as_a  (stmt);
gimple_cond_set_lhs (gc, rhs);
diff --git a/gcc/gimple-match-head.c b/gcc/gimple-match-head.c
index 4eab90d..09f729f 100644
--- a/gcc/gimple-ma

Re: [PATCH] Fix folding of memcmp("a", "a", 2) (PR, tree-optimization/78257)

2016-11-09 Thread Martin Liška
On 11/09/2016 02:58 PM, Richard Biener wrote:
> On Wed, Nov 9, 2016 at 2:56 PM, Martin Liška  wrote:
>> Hello.
>>
>> Following patch fixes [1] (f0 function), where we have off-by-one error.
>> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>>
>> Ready to be installed?
> 
> Ok (w/o the unrelated gimplify.c hunk).

Thanks, installed as r242000 and sorry for the wrong hunk.
As the patch fixed just of the half of issues in the PR, should I close
it or should I keep it for tracking (thus maybe changing to enhancement?)?

Thanks,
Martin


> 
> Richard.
> 
>> Martin
>>
>> [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78257#c0



[arm-embedded][PATCH, gcc/ARM, ping] Add support for Cortex-M23

2016-11-09 Thread Thomas Preudhomme

Hi,

We have decided to backport this patch to add support for ARM Cortex-M23 to our 
embedded-6-branch.


2016-11-09  Thomas Preud'homme  

Backport from mainline
2016-11-04  Thomas Preud'homme  

gcc/
* config/arm/arm-arches.def (armv8-m.base): Set Cortex-M23 as
representative core for this architecture.
* config/arm/arm-cores.def (cortex-m23): Define new processor.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/arm-tune.md: Likewise.
* config/arm/arm.c (arm_v6m_tune): Add Cortex-M23 to the list of cores
this tuning parameters apply to in the comment.
* config/arm/bpabi.h (BE8_LINK_SPEC): Add Cortex-M23 to the list of
valid -mcpu options.
* doc/invoke.texi (ARM Options): Document new Cortex-M23 processor.

Best regards,

Thomas
--- Begin Message ---

On 02/11/16 10:13, Kyrill Tkachov wrote:


On 02/11/16 10:07, Thomas Preudhomme wrote:

Ping?

Best regards,

Thomas

On 26/10/16 17:42, Thomas Preudhomme wrote:

Hi,

This patch adds support for the Cortex-M23 processor launched by ARM [1]. The
patch adds support for the name and wires it up to the ARMv8-M Baseline
architecture and arm_v6m_tune tuning parameters for the time being. It also
updates documentation to mention this new processor.

[1] http://www.arm.com/products/processors/cortex-m/cortex-m23-processor.php

ChangeLog entry is as follows:

*** gcc/Changelog ***

2016-10-26  Thomas Preud'homme 

* config/arm/arm-arches.def (armv8-m.base): Set Cortex-M23 as
representative core for this architecture.
* config/arm/arm-cores.def (cortex-m23): Define new processor.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/arm-tune.md: Likewise.
* config/arm/arm.c (arm_v6m_tune): Add Cortex-M23 to the list of cores
this tuning parameters apply to in the comment.
* config/arm/bpabi.h (BE8_LINK_SPEC): Add Cortex-M23 to the list of
valid -mcpu options.
* doc/invoke.texi (ARM Options): Document new Cortex-M23 processor.


Tested by building libgcc and libstdc++ for Cortex-M23 and running a hello world
compiled for it.

Is this ok for master?



Ok.
Thanks,
Kyrill



Committed, thanks.

Best regards,

Thomas
--- End Message ---


[arm-embedded][PATCH, gcc/ARM, ping] Add support for Cortex-M33

2016-11-09 Thread Thomas Preudhomme

Hi,

We have decided to backport this patch to add support for ARM Cortex-M33 to our 
embedded-6-branch.


2016-11-09  Thomas Preud'homme  

Backport from mainline
2016-11-04  Thomas Preud'homme  

gcc/
* config/arm/arm-arches.def (armv8-m.main+dsp): Set Cortex-M33 as
representative core for this architecture.
* config/arm/arm-cores.def (cortex-m33): Define new processor.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/arm-tune.md: Likewise.
* config/arm/bpabi.h (BE8_LINK_SPEC): Add Cortex-M33 to the list of
valid -mcpu options.
* doc/invoke.texi (ARM Options): Document new Cortex-M33 processor.

Best regards,

Thomas
--- Begin Message ---



On 02/11/16 10:14, Kyrill Tkachov wrote:


On 02/11/16 10:07, Thomas Preudhomme wrote:

Ping?

Best regards,

Thomas

On 26/10/16 17:42, Thomas Preudhomme wrote:

Hi,

This patch adds support for the Cortex-M33 processor launched by ARM [1]. The
patch adds support for the name and wires it up to the ARMv8-M Mainline with DSP
extensions architecture and arm_v7m_tune tuning parameters for the time being.
It also updates documentation to mention this new processor.

[1] http://www.arm.com/products/processors/cortex-m/cortex-m33-processor.php

ChangeLog entry is as follows:

*** gcc/Changelog ***

2016-10-26  Thomas Preud'homme 

* config/arm/arm-arches.def (armv8-m.main+dsp): Set Cortex-M33 as
representative core for this architecture.
* config/arm/arm-cores.def (cortex-m33): Define new processor.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/arm-tune.md: Likewise.
* config/arm/bpabi.h (BE8_LINK_SPEC): Add Cortex-M33 to the list of
valid -mcpu options.
* doc/invoke.texi (ARM Options): Document new Cortex-M33 processor.


Tested by building libgcc and libstdc++ for Cortex-M33 and running a hello world
compiled for it.

Is this ok for master?



Ok.
Thanks,
Kyrill


Committed, thanks.

Best regards,

Thomas
--- End Message ---


Re: [RFC] Check number of uses in simplify_cond_using_ranges().

2016-11-09 Thread Dominik Vogt
On Wed, Nov 09, 2016 at 01:43:58PM +0100, Richard Biener wrote:
> On Tue, Nov 8, 2016 at 5:18 PM, Marc Glisse  wrote:
> > On Tue, 8 Nov 2016, Dominik Vogt wrote:
> >
> >> On Fri, Nov 04, 2016 at 01:54:20PM +0100, Richard Biener wrote:
> >>>
> >>> On Fri, Nov 4, 2016 at 12:08 PM, Dominik Vogt 
> >>> wrote:
> 
>  On Fri, Nov 04, 2016 at 09:47:26AM +0100, Richard Biener wrote:
> >
> > On Thu, Nov 3, 2016 at 4:03 PM, Dominik Vogt 
> > wrote:
> >>
> >> Is VRP the right pass to do this optimisation or should a later
> >> pass rather attempt to eliminate the new use of b_5 instead?  Uli
> >> has brought up the idea a mini "sign extend elimination" pass that
> >> checks if the result of a sign extend could be replaced by the
> >> original quantity in all places, and if so, eliminate the ssa
> >> name.  (I guess that won't help with the above code because l is
> >> used also as a function argument.)  How could a sensible approach
> >> to deal with the situation look like?
> >
> >
> > We run into this kind of situation regularly and for general foldings
> > in match.pd we settled with single_use () even though it is not
> > perfect.
> > Note the usual complaint is not extra extension instructions but
> > the increase of register pressure.
> >
> > This is because it is hard to do better when you are doing local
> > optimization.
> >
> > As for the question on whether VRP is the right pass to do this the
> > answer is two-fold -- VRP has the most precise range information.
> > But the folding itself should be moved to generic code and use
> > get_range_info ().
> 
> 
>  All right, is this a sensible approach then?
> >>>
> >>>
> >>> Yes.
> >>>
>    1. Using has_single_use as in the experimental patch is good
>   enough (provided further testing does not show serious
>   regressions).
> >>>
> >>>
> >>> I'd approve that, yes.
> >>>
>    2. Rip the whole top level if-block from simplify_cond_using_ranges().
>    3. Translate that code to match.pd syntax.
> >>>
> >>>
> >>> Might be some work but yes, that's also desired (you'd lose the ability
> >>> to emit the warnings though).
> >>
> >>
> >> Could you give me a match-pd-hint please?  We now have something
> >> like this:
> >>
> >> (simplify
> >>  (cond (gt SSA_NAME@0 INTEGER_CST@1) @2 @3)
> >>  (if (... many conditions ...)
> >>   (cond (gt ... ...) @2 @3))
> >>
> >> The generated code ends up in gimple_simplify_COND_EXPR, but when
> >> gimple_simplify is actually called, it goes through the
> >> GIMPLE_COND case and calls gimple_resimplify2(..., GT, ...) and
> >> there it tries gimple_simplify_GT_EXPR(), peeling of the (cond
> >> ...), i.e. it never tries the generated code.
> >
> >
> > Not sure what you mean here.
> >
> >> There is another pattern in match.pd that uses a (cond ...) as the
> >> first operand, and I do not understand how this works.  Should we
> >> just use "(gt SSA_NAME@0 INTEGER_CST@1)" as the first operand
> >> instead, and wouldn't this pattern be too general that way?
> >
> >
> > IIUC, you are trying to move the second half of simplify_cond_using_ranges
> > to match.pd. I don't see any reason to restrict it to the case where the
> > comparison result is used directly in a COND_EXPR, so that would look like:
> >
> > (for cmp (...)
> >  (simplify
> >   (cmp (convert SSA_NAME@0) INTEGER_CST@1)
> >   (if (...)
> >(cmp @0 (convert @1)
> >
> > maybe? I think I may have missed your point.
> 
> Yeah, if you'd use (cond (gt ... then it only matches in assignments
> with COND_EXPRs on the RHS, _not_ in GIMPLE_CONDs.
> 
> So you ran into the (cond vs. GIMPLE_COND "mismatch".
> 
> You'd essentially implement sth similar to shorten_compare in match.pd.

Something like the attached patch?  Robin and me have spent quite
some time to figure out the new pattern.  Two questions:

1) In the match expression you cannot just use SSA_NAME@0 because
   then the "case SSA_NAME:" is added to a switch for another
   pattern that already has that label.  Thus we made that "proxy"
   predicate "ssa_name_p" that forces the code for the new pattern
   out of the old switch and into a separate code block.  We
   couldn't figure out whether this joining of case labels is a
   feature in the matching language.  So, is this the right way to
   deal with the conflicting labels?

2) There's an awful lot of if-clauses in the replacement
   expression.  Is it the right place to do all this checking?

> Btw, moving to match.pd shouldn't be a blocker for adding proper
> single_use tests
> just in case you get lost ...

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany
>From 26f75d777254c7e29d721c6813eca33539ac6574 Mon Sep 17 00:00:00 2001
From: Dominik Vogt 
Date: Wed, 2 Nov 2016 14:01:46 +0100
Subject: [PATCH] Convert folding in simplify_cond_using_ranges() to
 match.pd.

In cases like this

  ac_4 = *p_3(D);
  bc_5 = ac_4

Re: [PATCH] Fix folding of memcmp("a", "a", 2) (PR, tree-optimization/78257)

2016-11-09 Thread Richard Biener
On Wed, Nov 9, 2016 at 3:08 PM, Martin Liška  wrote:
> On 11/09/2016 02:58 PM, Richard Biener wrote:
>> On Wed, Nov 9, 2016 at 2:56 PM, Martin Liška  wrote:
>>> Hello.
>>>
>>> Following patch fixes [1] (f0 function), where we have off-by-one error.
>>> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>>>
>>> Ready to be installed?
>>
>> Ok (w/o the unrelated gimplify.c hunk).
>
> Thanks, installed as r242000 and sorry for the wrong hunk.
> As the patch fixed just of the half of issues in the PR, should I close
> it or should I keep it for tracking (thus maybe changing to enhancement?)?

Keep it open for the other issues - this one was minor and easy to fix.

Richard.

> Thanks,
> Martin
>
>
>>
>> Richard.
>>
>>> Martin
>>>
>>> [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78257#c0
>


Re: [patch, avr] Add flash size to device info and make wrap around default

2016-11-09 Thread Georg-Johann Lay

On 09.11.2016 10:14, Pitchumani Sivanupandi wrote:

On Tuesday 08 November 2016 02:57 PM, Georg-Johann Lay wrote:

On 08.11.2016 08:08, Pitchumani Sivanupandi wrote:

I have updated patch to include the flash size as well. Took that info from
device headers (it was fed into crt's device information note section also).


The new option would render -mn-flash superfluous, but we should keep it for
backward compatibility.

Ok.

Shouldn't link_pmem_wrap then be removed from link_relax, i.e. from
LINK_RELAX_SPEC?  And what happens if relaxation is off?

Yes. Removed link_pmem_wrap from link_relax.
Disabling relaxation doesn't change -mpmem-wrap-around behavior.

Now, wrap around behavior is changed as follows:

For 8K flash devices:
Device specs adds --pmem-wrap-around=8k linker option if -mno-pmem-wrap-around
is NOT enabled.
It makes the --pmem-wrap-around=8k linker option default for 8k flash devices.

For 16/32/64K flash devices:
Spec string 'link_pmem_wrap' added to all 16/32/64k flash devices specs.
Other wise no changes i.e. It adds --pmem-wrap-around=16/32/64k option if
-mpmem-wrap-around option is enabled.

For other devices, no changes in device specs.

Reg tested with default and -mrelax options enabled. No issues.

Regards,
Pitchumani

gcc/ChangeLog

2016-11-08  Pitchumani Sivanupandi 

* config/avr/avr-arch.h (avr_mcu_t): Add flash_size member.
* config/avr/avr-devices.c(avr_mcu_types): Add flash size info.
* config/avr/avr-mcu.def: Likewise.
* config/avr/gen-avr-mmcu-specs.c (print_mcu): Remove hard-coded prefix
check to find wrap-around value, instead use MCU flash size. For 8k flash
devices, update link_pmem_wrap spec string to add --pmem-wrap-around=8k.
* config/avr/specs.h: Remove link_pmem_wrap from LINK_RELAX_SPEC and
add to linker specs (LINK_SPEC) directly.

flashsize-and-wrap-around.patch



diff --git a/gcc/config/avr/avr-mcus.def b/gcc/config/avr/avr-mcus.def
index 6bcc6ff..9d4aa1a 100644



 /* Classic, == 128K.  */
-AVR_MCU ("avr31",ARCH_AVR31, AVR_ERRATA_SKIP,
NULL,0x0060, 0x0, 2)
-AVR_MCU ("atmega103",ARCH_AVR31, AVR_ERRATA_SKIP,
"__AVR_ATmega103__", 0x0060, 0x0, 2)
-AVR_MCU ("at43usb320",   ARCH_AVR31, AVR_ISA_NONE,
"__AVR_AT43USB320__",   0x0060, 0x0, 2)
+AVR_MCU ("avr31",ARCH_AVR31, AVR_ERRATA_SKIP,
NULL,0x0060, 0x0, 2, 0x2)
+AVR_MCU ("atmega103",ARCH_AVR31, AVR_ERRATA_SKIP,
"__AVR_ATmega103__", 0x0060, 0x0, 2, 0x2)
+AVR_MCU ("at43usb320",   ARCH_AVR31, AVR_ISA_NONE,
"__AVR_AT43USB320__",   0x0060, 0x0, 2, 0x1)


This looks incorrect: either .flash_size should be 0x2 or n_flash be 1.
As you draw the information from internal hardware descriptions, I'd guess
that the new information is more reliable?

...as this also determines whether AT43USB320supports ELPM this also means
that the device is in the wrong multilib set?

I couldn't find the data sheet at atmel.com, and the one I found on the net
only mentions 64KiB program memory.  It mentions LPM on pp. 9 but has no
reference to the supported instruction set.  From the manual I would conclude
that this device should be avr3, not avr31?

Yes. I couldn't find any other source other than the datasheet in net.
This device do not have internal program memory. Flash size is set to 64K
as the device could address only 64K. No RAMPZ register, so no ELPM. Moved
device to AVR3 architecture.

 /* Classic, > 8K, <= 64K.  */
-AVR_MCU ("avr3", ARCH_AVR3, AVR_ISA_NONE, NULL,
0x0060, 0x0, 1)
-AVR_MCU ("at43usb355",   ARCH_AVR3, AVR_ISA_NONE, "__AVR_AT43USB355__",
0x0060, 0x0, 1)
-AVR_MCU ("at76c711", ARCH_AVR3, AVR_ISA_NONE, "__AVR_AT76C711__",  
0x0060, 0x0, 1)
+AVR_MCU ("avr3", ARCH_AVR3, AVR_ISA_NONE, NULL,
0x0060, 0x0, 1, 0x6000)
+AVR_MCU ("at43usb355",   ARCH_AVR3, AVR_ISA_NONE, "__AVR_AT43USB355__",
0x0060, 0x0, 1, 0x6000)
+AVR_MCU ("at76c711", ARCH_AVR3, AVR_ISA_NONE, "__AVR_AT76C711__",  
0x0060, 0x0, 1, 0x4000)
+AVR_MCU ("at43usb320",   ARCH_AVR3, AVR_ISA_NONE, "__AVR_AT43USB320__",
0x0060, 0x0, 1, 0x1)
 /* Classic, == 128K.  */
-AVR_MCU ("avr31",ARCH_AVR31, AVR_ERRATA_SKIP, NULL,
0x0060, 0x0, 2)
-AVR_MCU ("atmega103",ARCH_AVR31, AVR_ERRATA_SKIP, "__AVR_ATmega103__", 
0x0060, 0x0, 2)
-AVR_MCU ("at43usb320",   ARCH_AVR31, AVR_ISA_NONE, "__AVR_AT43USB320__",   
0x0060, 0x0, 2)
+AVR_MCU ("avr31",ARCH_AVR31, AVR_ERRATA_SKIP, NULL,
0x0060, 0x0, 2, 0x2)
+AVR_MCU ("atmega103",ARCH_AVR31, AVR_ERRATA_SKIP, "__AVR_ATmega103__", 
0x0060, 0x0, 2, 0x2)
 /* Classic + MOVW + JMP/CALL.  */


If at43usb320 is in the wrong multilib, then this should be handled as separate 
issue / patch together with its own PR.  Sorry for the confusion.  I just 
noticed that so

Re: [RFC] Check number of uses in simplify_cond_using_ranges().

2016-11-09 Thread Richard Biener
On Wed, Nov 9, 2016 at 3:30 PM, Dominik Vogt  wrote:
> On Wed, Nov 09, 2016 at 01:43:58PM +0100, Richard Biener wrote:
>> On Tue, Nov 8, 2016 at 5:18 PM, Marc Glisse  wrote:
>> > On Tue, 8 Nov 2016, Dominik Vogt wrote:
>> >
>> >> On Fri, Nov 04, 2016 at 01:54:20PM +0100, Richard Biener wrote:
>> >>>
>> >>> On Fri, Nov 4, 2016 at 12:08 PM, Dominik Vogt 
>> >>> wrote:
>> 
>>  On Fri, Nov 04, 2016 at 09:47:26AM +0100, Richard Biener wrote:
>> >
>> > On Thu, Nov 3, 2016 at 4:03 PM, Dominik Vogt 
>> > wrote:
>> >>
>> >> Is VRP the right pass to do this optimisation or should a later
>> >> pass rather attempt to eliminate the new use of b_5 instead?  Uli
>> >> has brought up the idea a mini "sign extend elimination" pass that
>> >> checks if the result of a sign extend could be replaced by the
>> >> original quantity in all places, and if so, eliminate the ssa
>> >> name.  (I guess that won't help with the above code because l is
>> >> used also as a function argument.)  How could a sensible approach
>> >> to deal with the situation look like?
>> >
>> >
>> > We run into this kind of situation regularly and for general foldings
>> > in match.pd we settled with single_use () even though it is not
>> > perfect.
>> > Note the usual complaint is not extra extension instructions but
>> > the increase of register pressure.
>> >
>> > This is because it is hard to do better when you are doing local
>> > optimization.
>> >
>> > As for the question on whether VRP is the right pass to do this the
>> > answer is two-fold -- VRP has the most precise range information.
>> > But the folding itself should be moved to generic code and use
>> > get_range_info ().
>> 
>> 
>>  All right, is this a sensible approach then?
>> >>>
>> >>>
>> >>> Yes.
>> >>>
>>    1. Using has_single_use as in the experimental patch is good
>>   enough (provided further testing does not show serious
>>   regressions).
>> >>>
>> >>>
>> >>> I'd approve that, yes.
>> >>>
>>    2. Rip the whole top level if-block from simplify_cond_using_ranges().
>>    3. Translate that code to match.pd syntax.
>> >>>
>> >>>
>> >>> Might be some work but yes, that's also desired (you'd lose the ability
>> >>> to emit the warnings though).
>> >>
>> >>
>> >> Could you give me a match-pd-hint please?  We now have something
>> >> like this:
>> >>
>> >> (simplify
>> >>  (cond (gt SSA_NAME@0 INTEGER_CST@1) @2 @3)
>> >>  (if (... many conditions ...)
>> >>   (cond (gt ... ...) @2 @3))
>> >>
>> >> The generated code ends up in gimple_simplify_COND_EXPR, but when
>> >> gimple_simplify is actually called, it goes through the
>> >> GIMPLE_COND case and calls gimple_resimplify2(..., GT, ...) and
>> >> there it tries gimple_simplify_GT_EXPR(), peeling of the (cond
>> >> ...), i.e. it never tries the generated code.
>> >
>> >
>> > Not sure what you mean here.
>> >
>> >> There is another pattern in match.pd that uses a (cond ...) as the
>> >> first operand, and I do not understand how this works.  Should we
>> >> just use "(gt SSA_NAME@0 INTEGER_CST@1)" as the first operand
>> >> instead, and wouldn't this pattern be too general that way?
>> >
>> >
>> > IIUC, you are trying to move the second half of simplify_cond_using_ranges
>> > to match.pd. I don't see any reason to restrict it to the case where the
>> > comparison result is used directly in a COND_EXPR, so that would look like:
>> >
>> > (for cmp (...)
>> >  (simplify
>> >   (cmp (convert SSA_NAME@0) INTEGER_CST@1)
>> >   (if (...)
>> >(cmp @0 (convert @1)
>> >
>> > maybe? I think I may have missed your point.
>>
>> Yeah, if you'd use (cond (gt ... then it only matches in assignments
>> with COND_EXPRs on the RHS, _not_ in GIMPLE_CONDs.
>>
>> So you ran into the (cond vs. GIMPLE_COND "mismatch".
>>
>> You'd essentially implement sth similar to shorten_compare in match.pd.
>
> Something like the attached patch?  Robin and me have spent quite
> some time to figure out the new pattern.  Two questions:
>
> 1) In the match expression you cannot just use SSA_NAME@0 because
>then the "case SSA_NAME:" is added to a switch for another
>pattern that already has that label.  Thus we made that "proxy"
>predicate "ssa_name_p" that forces the code for the new pattern
>out of the old switch and into a separate code block.  We
>couldn't figure out whether this joining of case labels is a
>feature in the matching language.  So, is this the right way to
>deal with the conflicting labels?

No, just do not match SSA_NAME.  And instead of

+  (with { gimple *def_stmt = SSA_NAME_DEF_STMT (@0); }
+   (if (is_gimple_assign (def_stmt)
+   && CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (def_stmt)))

you instead want to change the pattern to

(simpify
  (cmp (convert @0) INTEGER_CST@1)

@0 will then be your innerop

note that you can't use get_value_range but you have

Re: [PATCH 7/7, GCC, ARM, V8M] Added support for ARMV8-M Security Extension cmse_nonsecure_caller intrinsic

2016-11-09 Thread Andre Vieira (lists)
On 09/11/16 10:26, Kyrill Tkachov wrote:
> 
> @@ -1832,6 +1834,17 @@ arm_init_builtins (void)
>  = add_builtin_function ("__builtin_arm_stfscr", ftype_set_fpscr,
>  ARM_BUILTIN_SET_FPSCR, BUILT_IN_MD, NULL, NULL_TREE);
>  }
> +
> +  if (arm_arch_cmse)
> +{
> +  tree ftype_cmse_nonsecure_caller
> += build_function_ty
> 
> Should this be use_cmse ?
> This looks ok to me otherwise.
> I believe patch [6/7] is the only one needing approval after this...
> 
> Kyrill

Hi,

Yeah it should indeed be 'use_cmse'. Here is the reworked version, no
changes to ChangeLog.

Cheers,
Andre
diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 
e73043db6db69fa64bb1e72cf71a36d7169062db..232d2de52106324e9f22b535a1b02cb95a2294b7
 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -528,6 +528,8 @@ enum arm_builtins
   ARM_BUILTIN_GET_FPSCR,
   ARM_BUILTIN_SET_FPSCR,
 
+  ARM_BUILTIN_CMSE_NONSECURE_CALLER,
+
 #undef CRYPTO1
 #undef CRYPTO2
 #undef CRYPTO3
@@ -1832,6 +1834,17 @@ arm_init_builtins (void)
= add_builtin_function ("__builtin_arm_stfscr", ftype_set_fpscr,
ARM_BUILTIN_SET_FPSCR, BUILT_IN_MD, NULL, 
NULL_TREE);
 }
+
+  if (use_cmse)
+{
+  tree ftype_cmse_nonsecure_caller
+   = build_function_type_list (unsigned_type_node, NULL);
+  arm_builtin_decls[ARM_BUILTIN_CMSE_NONSECURE_CALLER]
+   = add_builtin_function ("__builtin_arm_cmse_nonsecure_caller",
+   ftype_cmse_nonsecure_caller,
+   ARM_BUILTIN_CMSE_NONSECURE_CALLER, BUILT_IN_MD,
+   NULL, NULL_TREE);
+}
 }
 
 /* Return the ARM builtin for CODE.  */
@@ -2452,6 +2465,12 @@ arm_expand_builtin (tree exp,
   emit_insn (pat);
   return target;
 
+case ARM_BUILTIN_CMSE_NONSECURE_CALLER:
+  target = gen_reg_rtx (SImode);
+  op0 = arm_return_addr (0, NULL_RTX);
+  emit_insn (gen_addsi3 (target, op0, const1_rtx));
+  return target;
+
 case ARM_BUILTIN_TEXTRMSB:
 case ARM_BUILTIN_TEXTRMUB:
 case ARM_BUILTIN_TEXTRMSH:
diff --git a/gcc/config/arm/arm_cmse.h b/gcc/config/arm/arm_cmse.h
index 
894343bb835b61e09c14668d45aa43a8693fd011..82b58b1c4f4a12ba6062e2cc2632653788d0eeb7
 100644
--- a/gcc/config/arm/arm_cmse.h
+++ b/gcc/config/arm/arm_cmse.h
@@ -163,6 +163,13 @@ __attribute__ ((__always_inline__))
 cmse_TTAT (void *__p)
 __CMSE_TT_ASM (at)
 
+/* FIXME: diagnose use outside cmse_nonsecure_entry functions.  */
+__extension__ static __inline int __attribute__ ((__always_inline__))
+cmse_nonsecure_caller (void)
+{
+  return __builtin_arm_cmse_nonsecure_caller ();
+}
+
 #define CMSE_AU_NONSECURE  2
 #define CMSE_MPU_NONSECURE 16
 #define CMSE_NONSECURE 18
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 
462e6c71e20791b35f02adabfc97b9b013fd296a..88f6e014c3def8e6a2d2452df5d4937a4f0dd1ef
 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -12625,6 +12625,7 @@ cmse_address_info_t cmse_TTAT_fptr (FPTR)
 void * cmse_check_address_range (void *, size_t, int)
 typeof(p) cmse_nsfptr_create (FPTR p)
 intptr_t cmse_is_nsfptr (FPTR)
+int cmse_nonsecure_caller (void)
 @end smallexample
 
 @node AVR Built-in Functions
diff --git a/gcc/testsuite/gcc.target/arm/cmse/cmse-1.c 
b/gcc/testsuite/gcc.target/arm/cmse/cmse-1.c
index 
d5b9a2d9d59569de170da814ae660e9fb2b943e7..ddcf12a30a6c1806969d239c448da81ccf49532e
 100644
--- a/gcc/testsuite/gcc.target/arm/cmse/cmse-1.c
+++ b/gcc/testsuite/gcc.target/arm/cmse/cmse-1.c
@@ -65,3 +65,32 @@ int foo (char * p)
 /* { dg-final { scan-assembler-times "ttat " 2 } } */
 /* { dg-final { scan-assembler-times "bl.cmse_check_address_range" 7 } } */
 /* { dg-final { scan-assembler-not "cmse_check_pointed_object" } } */
+
+typedef int (*int_ret_funcptr_t) (void);
+typedef int __attribute__ ((cmse_nonsecure_call)) (*int_ret_nsfuncptr_t) 
(void);
+
+int __attribute__ ((cmse_nonsecure_entry))
+baz (void)
+{
+  return cmse_nonsecure_caller ();
+}
+
+int __attribute__ ((cmse_nonsecure_entry))
+qux (int_ret_funcptr_t int_ret_funcptr)
+{
+  int_ret_nsfuncptr_t int_ret_nsfunc_ptr;
+
+  if (cmse_is_nsfptr (int_ret_funcptr))
+{
+  int_ret_nsfunc_ptr = cmse_nsfptr_create (int_ret_funcptr);
+  return int_ret_nsfunc_ptr ();
+}
+  return 0;
+}
+/* { dg-final { scan-assembler "baz:" } } */
+/* { dg-final { scan-assembler "__acle_se_baz:" } } */
+/* { dg-final { scan-assembler-not "\tcmse_nonsecure_caller" } } */
+/* { dg-final { scan-rtl-dump "and.*reg.*const_int 1" expand } } */
+/* { dg-final { scan-assembler "bic" } } */
+/* { dg-final { scan-assembler "push\t\{r4, r5, r6" } } */
+/* { dg-final { scan-assembler "msr\tAPSR_nzcvq" } } */


Re: [PATCHv2 6/7, GCC, ARM, V8M] ARMv8-M Security Extension's cmse_nonsecure_call: use __gnu_cmse_nonsecure_call

2016-11-09 Thread Andre Vieira (lists)
On 27/10/16 11:01, Andre Vieira (lists) wrote:
> On 25/10/16 17:30, Andre Vieira (lists) wrote:
>> On 24/08/16 12:01, Andre Vieira (lists) wrote:
>>> On 25/07/16 14:26, Andre Vieira (lists) wrote:
 This patch extends support for the ARMv8-M Security Extensions
 'cmse_nonsecure_call' to use a new library function
 '__gnu_cmse_nonsecure_call'. This library function is responsible for
 (without using r0-r3 or d0-d7):
 1) saving and clearing all callee-saved registers using the secure stack
 2) clearing the LSB of the address passed in r4 and using blxns to
 'jump' to it
 3) clearing ASPR, including the 'ge bits' if DSP is enabled
 4) clearing FPSCR if using non-soft float-abi
 5) restoring callee-saved registers.

 The decisions whether to include DSP 'ge bits' clearing and floating
 point registers (single/double precision) all depends on the multilib used.

 See Section 5.5 of ARM®v8-M Security Extensions
 (http://infocenter.arm.com/help/topic/com.arm.doc.ecm0359818/index.html).

 *** gcc/ChangeLog ***
 2016-07-25  Andre Vieira
 Thomas Preud'homme  

 * config/arm/arm.c (detect_cmse_nonsecure_call): New.
 (cmse_nonsecure_call_clear_caller_saved): New.
 (arm_reorg): Use cmse_nonsecure_call_clear_caller_saved.
 * config/arm/arm-protos.h (detect_cmse_nonsecure_call): New.
 * config/arm/arm.md (call): Handle cmse_nonsecure_entry.
 (call_value): Likewise.
 (nonsecure_call_internal): New.
 (nonsecure_call_value_internal): New.
 * config/arm/thumb1.md (*nonsecure_call_reg_thumb1_v5): New.
 (*nonsecure_call_value_reg_thumb1_v5): New.
 * config/arm/thumb2.md (*nonsecure_call_reg_thumb2): New.
 (*nonsecure_call_value_reg_thumb2): New.
 * config/arm/unspecs.md (UNSPEC_NONSECURE_MEM): New.

 *** libgcc/ChangeLog ***
 2016-07-25  Andre Vieira
 Thomas Preud'homme  

 * config/arm/cmse_nonsecure_call.S: New.
* config/arm/t-arm: Compile cmse_nonsecure_call.S


 *** gcc/testsuite/ChangeLog ***
 2016-07-25  Andre Vieira
 Thomas Preud'homme  

 * gcc.target/arm/cmse/cmse.exp: Run tests in mainline dir.
 * gcc.target/arm/cmse/cmse-9.c: Added some extra tests.
 * gcc.target/arm/cmse/baseline/bitfield-4.c: New.
 * gcc.target/arm/cmse/baseline/bitfield-5.c: New.
 * gcc.target/arm/cmse/baseline/bitfield-6.c: New.
 * gcc.target/arm/cmse/baseline/bitfield-7.c: New.
 * gcc.target/arm/cmse/baseline/bitfield-8.c: New.
 * gcc.target/arm/cmse/baseline/bitfield-9.c: New.
 * gcc.target/arm/cmse/baseline/bitfield-and-union-1.c: New.
 * gcc.target/arm/cmse/baseline/cmse-11.c: New.
* gcc.target/arm/cmse/baseline/cmse-13.c: New.
* gcc.target/arm/cmse/baseline/cmse-6.c: New.
 * gcc/testsuite/gcc.target/arm/cmse/baseline/union-1.c: New.
 * gcc/testsuite/gcc.target/arm/cmse/baseline/union-2.c: New.
* gcc.target/arm/cmse/mainline/hard-sp/cmse-13.c: New.
* gcc.target/arm/cmse/mainline/hard-sp/cmse-7.c: New.
* gcc.target/arm/cmse/mainline/hard-sp/cmse-8.c: New.
* gcc.target/arm/cmse/mainline/hard/cmse-13.c: New.
* gcc.target/arm/cmse/mainline/hard/cmse-7.c: New.
* gcc.target/arm/cmse/mainline/hard/cmse-8.c: New.
* gcc.target/arm/cmse/mainline/soft/cmse-13.c: New.
* gcc.target/arm/cmse/mainline/soft/cmse-7.c: New.
* gcc.target/arm/cmse/mainline/soft/cmse-8.c: New.
* gcc.target/arm/cmse/mainline/softfp-sp/cmse-7.c: New.
* gcc.target/arm/cmse/mainline/softfp-sp/cmse-8.c: New.
* gcc.target/arm/cmse/mainline/softfp/cmse-13.c: New.
* gcc.target/arm/cmse/mainline/softfp/cmse-7.c: New.
* gcc.target/arm/cmse/mainline/softfp/cmse-8.c: New.

>>>
>>> Updated this patch to correctly clear only the cumulative
>>> exception-status (0-4,7) and the condition code bits (28-31) of the FPSCR.
>>>
>>> 
>>>
>>> This patch extends support for the ARMv8-M Security Extensions
>>> 'cmse_nonsecure_call' to use a new library function
>>> '__gnu_cmse_nonsecure_call'. This library function is responsible for
>>> (without using r0-r3 or d0-d7):
>>> 1) saving and clearing all callee-saved registers using the secure stack
>>> 2) clearing the LSB of the address passed in r4 and using blxns to
>>> 'jump' to it
>>> 3) clearing ASPR, including the 'ge bits' if DSP is enabled
>>> 4) clearing the cumulative exception-status (0-4, 7) and the condition
>>> bits (28-31) of the FPSCR if using non-soft float-abi
>>> 5) restoring callee-saved registers.
>>>
>>> The decisions whether to include DSP 'ge bits' clearing and floating
>>> point registers (single/doubl

Re: [Patch, fortran] PR44265 - Link error with reference to parameter array in specification expression

2016-11-09 Thread Dominique d'Humières
Sorry for the bad news, but while gfortran regtests with regression with the 
patch I still get a link error with the original test:

% gfc pr44265.f90
Undefined symbols for architecture x86_64:
  "___fruits_MOD_names", referenced from:
  _MAIN__ in ccyeNqa1.o
ld: symbol(s) not found for architecture x86_64
collect2: error: ld returned 1 exit status

and its variants.

Thanks for working on this issue

Dominique

> Le 9 nov. 2016 à 13:47, Paul Richard Thomas  a 
> écrit :
> 
> Dear All,
> 
> The title of this PR says what this is all about, except that it
> applies uniquely applicable to character function result string
> lengths.
> 
> Ian Harvey wrote the first patch for this PR for which many thanks.
> However, two issues came up that took some little while to understand;
> (i) In comment #1, it was found that calls from sibling procedures
> could generate streams of undefined references at the link stage; and
> (ii) An entity of the same name as the contained procedure entity in
> module scope caused similar problems.
> 
> The relationship with Ian's patch is still obvious. The fundamental
> difference is that the parameter arrays are automatically promoted to
> module scope using a unique symtree. This fixes both the issues above.
> 
> Dominique, could you please check that the -m32 issue has gone away?
> 
> Bootstrapped and regtested on FC21/x86_64 - OK for trunk?
> 
> Paul
> 
> 2016-11-09  Paul Thomas  
> 
>PR fortran/44265
>* gfortran.h : Add fn_result_spec bitfield to gfc_symbol.
>* resolve.c (flag_fn_result_spec): New function.
>(resolve_fntype): Call it for character result lengths.
>* symbol.c (gfc_new_symbol): Set fn_result_spec to zero.
>* trans-decl.c (gfc_sym_mangled_identifier): Include the
>procedure name in the mangled name for symbols with the
>fn_result_spec bit set.
>(gfc_get_symbol_decl): Mangle the name of these symbols.
>(gfc_create_module_variable): Allow them through the assert.
>(gfc_generate_function_code): Remove the assert before the
>initialization of sym->tlink because the frontend no longer
>uses this field.
>* trans-expr.c (gfc_map_intrinsic_function): Add a case to
>treat the LEN_TRIM intrinsic.
> 
> 2016-11-09  Paul Thomas  
> 
>PR fortran/44265
>* gfortran.dg/char_result_14.f90: New test.
>* gfortran.dg/char_result_15.f90: New test.
> 
> 
> -- 
> The difference between genius and stupidity is; genius has its limits.
> 
> Albert Einstein
> 



Re: [PATCH] fix PR68468

2016-11-09 Thread Bernd Schmidt

On 11/05/2016 06:14 PM, Waldemar Brodkorb wrote:

Hi,

the following patch fixes PR68468.
Patch is used for a while in Buildroot without issues.

2016-11-05  Waldemar Brodkorb 

   PR gcc/68468
   * libgcc/unwind-dw2-fde-dip.c: fix build on FDPIC targets.


This is ok.


Bernd



[PATCH] Add testcase for PR middle-end/77718

2016-11-09 Thread Jakub Jelinek
Hi!

I've noticed this PR is still open, but has been fixed, though
apparently Bernd's patch from the PR has been applied as is with
the needed tab eaten (instead 7 spaces) and no testcase has been added.

Tested with cross-compiler and the r240625 change reverted and as is
and running the resulting assembler on ppc64le.

Ok for trunk?

2016-11-09  Jakub Jelinek  

PR target/77718
* builtins.c (expand_builtin_memcmp): Formatting fix.

* gcc.c-torture/execute/pr77718.c: New test.

--- gcc/ChangeLog.jj2016-11-09 15:22:28.0 +0100
+++ gcc/ChangeLog   2016-11-09 16:29:35.152056326 +0100
@@ -5027,7 +5027,8 @@
 
 2016-09-29  Bernd Schmidt  
 
-   * builtins.c (expand_builtin_memcmp): don't swap args unless
+   PR target/77718
+   * builtins.c (expand_builtin_memcmp): Don't swap args unless
result is only being compared with zero.
 
 2016-09-29  Marek Polacek  
--- gcc/builtins.c.jj   2016-10-31 13:28:06.0 +0100
+++ gcc/builtins.c  2016-11-09 16:19:08.886909150 +0100
@@ -3754,7 +3754,7 @@ expand_builtin_memcmp (tree exp, rtx tar
 {
   src_str = c_getstr (arg1);
   if (src_str != NULL)
-   std::swap (arg1_rtx, arg2_rtx);
+   std::swap (arg1_rtx, arg2_rtx);
 }
 
   /* If SRC is a string constant and block move would be done
--- gcc/testsuite/gcc.c-torture/execute/pr77718.c.jj2016-11-09 
16:28:30.414868074 +0100
+++ gcc/testsuite/gcc.c-torture/execute/pr77718.c   2016-11-09 
16:22:06.0 +0100
@@ -0,0 +1,25 @@
+/* PR middle-end/77718 */
+
+char a[64] __attribute__((aligned (8)));
+
+__attribute__((noinline, noclone)) int
+foo (void)
+{
+  return __builtin_memcmp ("bb", a, 6);
+}
+
+__attribute__((noinline, noclone)) int
+bar (void)
+{
+  return __builtin_memcmp (a, "bb", 6);
+}
+
+int
+main ()
+{
+  __builtin_memset (a, 'a', sizeof (a));
+  if (((foo () < 0) ^ ('a' > 'b'))
+  || ((bar () < 0) ^ ('a' < 'b')))
+__builtin_abort ();
+  return 0;
+}

Jakub


[PING][PATCH][aarch64] Improve Logical And Immediate Expressions

2016-11-09 Thread Michael Collison
Ping. Link to original post:

https://gcc.gnu.org/ml/gcc-patches/2016-10/msg02305.html


Re: [PATCH] Fix folding of memcmp("a", "a", 2) (PR, tree-optimization/78257)

2016-11-09 Thread David Edelsohn
This patch breaks bootstrap on AIX and probably on any 32 bit host.

Thanks, David

/nasfarm/edelsohn/src/src/gcc/fold-const-call.c: In function
'tree_node* fold_const_call(combined_fn, tree, tree, tree, tree)':
/nasfarm/edelsohn/src/src/gcc/fold-const-call.c:1541:36: error: cannot
convert 'size_t* {aka long unsigned int*}' to 'long long unsigned
int*' for argument '2' to 'const char* c_getstr(tree, long long
unsigned int*)'
   if ((p0 = c_getstr (arg0, &s0))
^
/nasfarm/edelsohn/src/src/gcc/fold-const-call.c:1542:32: error: cannot
convert 'size_t* {aka long unsigned int*}' to 'long long unsigned
int*' for argument '2' to 'const char* c_getstr(tree, long long
unsigned int*)'
&& (p1 = c_getstr (arg1, &s1))
^
make[3]: *** [fold-const-call.o] Error 1


[PATCH, Fortran] PR78259: ICE in gfc_trans_subcomponent_assign, at fortran/trans-expr.c:7330

2016-11-09 Thread Fritz Reese
All,

Will commit the below to trunk as an obvious fix for PR78259. (The
regression was introduced in r241626 from
https://gcc.gnu.org/ml/fortran/2016-10/msg00206.html). The patch
clears regtests.

---
Fritz Reese

From: Fritz O. Reese 
Date: Wed, 9 Nov 2016 10:59:17 -0500
Subject: [PATCH] Fix ICE in gfc_trans_subcomponent_assign due to NULL
components.

PR fortran/78259
* gcc/fortran/trans-expr.c (gfc_trans_subcomponent_assign): Guard
against NULL values.

PR fortran/78259
* gcc/testsuite/gfortran.dg/pr78259.f90: New test.
---

diff --git a/gcc/fortran/trans-expr.c b/gcc/fortran/trans-expr.c
index 7159b17..0352916 100644
--- a/gcc/fortran/trans-expr.c
+++ b/gcc/fortran/trans-expr.c
@@ -7321,7 +7321,8 @@ gfc_trans_subcomponent_assign (tree dest, gfc_component *
   gfc_constructor *c = gfc_constructor_first (expr->value.constructor);
   /* We mark that the entire union should be initialized with a contrived
  EXPR_NULL expression at the beginning.  */
-  if (c->n.component == NULL && c->expr->expr_type == EXPR_NULL)
+  if (c != NULL && c->n.component == NULL
+ && c->expr != NULL && c->expr->expr_type == EXPR_NULL)
 {
   tmp = build2_loc (input_location, MODIFY_EXPR, void_type_node,
dest, build_constructor (TREE_TYPE (dest), NULL));
diff --git a/gcc/testsuite/gfortran.dg/pr78259.f90 b/gcc/testsuite/gfortran.dg/p
new file mode 100644
index 000..82f48ea
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr78259.f90
@@ -0,0 +1,22 @@
+! { dg-do "compile" }
+! { dg-options "-fdec-structure" }
+!
+! PR fortran/78259
+!
+! ICE in gfc_trans_subcomponent_assign
+!
+
+subroutine sub
+  structure /s/
+union
+  map
+integer n(2)
+  end map
+  map
+integer(8) m /2/
+  end map
+end union
+  end structure
+  record /s/ r
+  r.n(1) = 1
+end


Re: [PATCH] Add testcase for PR middle-end/77718

2016-11-09 Thread Richard Biener
On November 9, 2016 4:34:27 PM GMT+01:00, Jakub Jelinek  
wrote:
>Hi!
>
>I've noticed this PR is still open, but has been fixed, though
>apparently Bernd's patch from the PR has been applied as is with
>the needed tab eaten (instead 7 spaces) and no testcase has been added.
>
>Tested with cross-compiler and the r240625 change reverted and as is
>and running the resulting assembler on ppc64le.
>
>Ok for trunk?

OK.

Richard.

>2016-11-09  Jakub Jelinek  
>
>   PR target/77718
>   * builtins.c (expand_builtin_memcmp): Formatting fix.
>
>   * gcc.c-torture/execute/pr77718.c: New test.
>
>--- gcc/ChangeLog.jj   2016-11-09 15:22:28.0 +0100
>+++ gcc/ChangeLog  2016-11-09 16:29:35.152056326 +0100
>@@ -5027,7 +5027,8 @@
> 
> 2016-09-29  Bernd Schmidt  
> 
>-  * builtins.c (expand_builtin_memcmp): don't swap args unless
>+  PR target/77718
>+  * builtins.c (expand_builtin_memcmp): Don't swap args unless
>   result is only being compared with zero.
> 
> 2016-09-29  Marek Polacek  
>--- gcc/builtins.c.jj  2016-10-31 13:28:06.0 +0100
>+++ gcc/builtins.c 2016-11-09 16:19:08.886909150 +0100
>@@ -3754,7 +3754,7 @@ expand_builtin_memcmp (tree exp, rtx tar
> {
>   src_str = c_getstr (arg1);
>   if (src_str != NULL)
>-   std::swap (arg1_rtx, arg2_rtx);
>+  std::swap (arg1_rtx, arg2_rtx);
> }
> 
>   /* If SRC is a string constant and block move would be done
>--- gcc/testsuite/gcc.c-torture/execute/pr77718.c.jj   2016-11-09
>16:28:30.414868074 +0100
>+++ gcc/testsuite/gcc.c-torture/execute/pr77718.c  2016-11-09
>16:22:06.0 +0100
>@@ -0,0 +1,25 @@
>+/* PR middle-end/77718 */
>+
>+char a[64] __attribute__((aligned (8)));
>+
>+__attribute__((noinline, noclone)) int
>+foo (void)
>+{
>+  return __builtin_memcmp ("bb", a, 6);
>+}
>+
>+__attribute__((noinline, noclone)) int
>+bar (void)
>+{
>+  return __builtin_memcmp (a, "bb", 6);
>+}
>+
>+int
>+main ()
>+{
>+  __builtin_memset (a, 'a', sizeof (a));
>+  if (((foo () < 0) ^ ('a' > 'b'))
>+  || ((bar () < 0) ^ ('a' < 'b')))
>+__builtin_abort ();
>+  return 0;
>+}
>
>   Jakub




[PATCH] (v2) print-rtl-function.c: add (param) directive to dump

2016-11-09 Thread David Malcolm
On Wed, 2016-11-09 at 12:59 +0100, Bernd Schmidt wrote:
> On 11/08/2016 07:03 PM, David Malcolm wrote:
> > int __RTL("rtl-combine") f1 (int n)
> > {
> > (function "f1"
> >   (param "n"
> > (DECL_RTL
> >   (reg/v:SI %1 [ n ])
> > ) ;; DECL_RTL
>
> The ;; DECL_RTL etc. comments seem somewhat redundant and add
> clutter.
> Please remove those.

Done.

> Also, why is the closing paren on its own line? That doesn't seem
> right.

It's because of the m_sawclose when calling print_rtx, which leads to
a leading newline.

I fixed this by introducing a rtx_writer::finish_directive method to
unset m_sawclose.  (That field seems misnamed; perhaps it should be
renamed to m_force_newline or somesuch?  That feels like a separate
patch though).

> Later (not for this patch) I'd really like to see some logic not to
> add
> linebreaks before simple expressions, so that we'd have (DECL_RTL
> (reg:SI xyz)) on a single line.

I've been thinking about a patch that would track indentation, which
would only insert newlines when close to some column limit, and would
line up the open parentheses for siblings.
That way you'd get things like:

(DECL_RTL (mem/c:SI (plus:DI (reg/f:DI virtual-stack-vars)
 (const_int -4)) [1 i+0 S4 A32]))

where the newline is auto-inserted to fit things within e.g. an
80 char column, and where "const_int" gets indented to be underneath
the "reg", since they're both operands of the same "plus".

Anyway, the following updated version of the patch eliminates
the newlines and comments.

Updated examples: aarch64 function taking one argument:

(function "times_two"
  (param "i"
(DECL_RTL (mem/c:SI (plus:DI (reg/f:DI virtual-stack-vars)
(const_int -4)) [1 i+0 S4 A32]))
(DECL_RTL_INCOMING (reg:SI x0 [ i ])))
  (insn-chain
 ;; etc

x86_64 function taking three ints:

(function "test_1"
  (param "i"
(DECL_RTL (mem/c:SI (plus:DI (reg/f:DI virtual-stack-vars)
(const_int -4)) [1 i+0 S4 A32]))
(DECL_RTL_INCOMING (reg:SI di [ i ])))
  (param "j"
(DECL_RTL (mem/c:SI (plus:DI (reg/f:DI virtual-stack-vars)
(const_int -8)) [1 j+0 S4 A32]))
(DECL_RTL_INCOMING (reg:SI si [ j ])))
  (param "k"
(DECL_RTL (mem/c:SI (plus:DI (reg/f:DI virtual-stack-vars)
(const_int -12)) [1 k+0 S4 A32]))
(DECL_RTL_INCOMING (reg:SI dx [ k ])))
  (insn-chain
 ;; etc

As before, only lightly tested so far.

OK for trunk if it passes bootstrap and regrtest?

gcc/ChangeLog:
* print-rtl-function.c (print_any_param_name): New function.
(print_param): New function.
(print_rtx_function): Call print_param for each argument.
* print-rtl.c (rtx_writer::finish_directive): New function.
* print-rtl.h (rtx_writer::finish_directive): New decl.
---
 gcc/print-rtl-function.c | 36 
 gcc/print-rtl.c  |  9 +
 gcc/print-rtl.h  |  2 ++
 3 files changed, 47 insertions(+)

diff --git a/gcc/print-rtl-function.c b/gcc/print-rtl-function.c
index b62f1b3..f18491a 100644
--- a/gcc/print-rtl-function.c
+++ b/gcc/print-rtl-function.c
@@ -127,6 +127,38 @@ can_have_basic_block_p (const rtx_insn *insn)
   return true;
 }
 
+/* Subroutine of print_param.  Write the name of ARG, if any, to OUTFILE.  */
+
+static void
+print_any_param_name (FILE *outfile, tree arg)
+{
+  if (DECL_NAME (arg))
+fprintf (outfile, " \"%s\"", IDENTIFIER_POINTER (DECL_NAME (arg)));
+}
+
+/* Print a "(param)" directive for ARG to OUTFILE.  */
+
+static void
+print_param (FILE *outfile, rtx_writer &w, tree arg)
+{
+  fprintf (outfile, "  (param");
+  print_any_param_name (outfile, arg);
+  fprintf (outfile, "\n");
+
+  /* Print the value of DECL_RTL (without lazy-evaluation).  */
+  fprintf (outfile, "(DECL_RTL ");
+  rtx decl_rtl = DECL_WRTL_CHECK (arg)->decl_with_rtl.rtl;
+  w.print_rtx (decl_rtl);
+  w.finish_directive ();
+
+  /* Print DECL_INCOMING_RTL.  */
+  fprintf (outfile, "(DECL_RTL_INCOMING ");
+  w.print_rtx (DECL_INCOMING_RTL (arg));
+  fprintf (outfile, ")");
+
+  w.finish_directive ();
+}
+
 /* Write FN to OUTFILE in a form suitable for parsing, with indentation
and comments to make the structure easy for a human to grok.  Track
the basic blocks of insns in the chain, wrapping those that are within
@@ -202,6 +234,10 @@ print_rtx_function (FILE *outfile, function *fn, bool 
compact)
 
   fprintf (outfile, "(function \"%s\"\n", dname);
 
+  /* Params.  */
+  for (tree arg = DECL_ARGUMENTS (fdecl); arg; arg = DECL_CHAIN (arg))
+print_param (outfile, w, arg);
+
   /* The instruction chain.  */
   fprintf (outfile, "  (insn-chain\n");
   basic_block curr_bb = NULL;
diff --git a/gcc/print-rtl.c b/gcc/print-rtl.c
index 30ff8fa..3bbd395 100644
--- a/gcc/print-rtl.c
+++ b/gcc/print-rtl.c
@@ -921,6 +921,15 @@ rtx_writer::print_rtx (const_rtx in_rtx)
   m_sawclose = 1;
 }
 
+/* Emit a closing parenthesis and newline.  */
+
+void
+rtx_writer::finish_directive ()

[PATCH, GCC/ARM] Fix ICE when compiling empty FIQ interrupt handler in ARM mode

2016-11-09 Thread Thomas Preudhomme

Hi,

This patch fixes the following ICE when building when compiling an empty FIQ 
interrupt handler in ARM mode:


empty_fiq_handler.c:5:1: error: insn does not satisfy its constraints:
 }
 ^

(insn/f 13 12 14 (set (reg/f:SI 13 sp)
(plus:SI (reg/f:SI 11 fp)
(const_int 4 [0x4]))) irq.c:5 4 {*arm_addsi3}
 (expr_list:REG_CFA_ADJUST_CFA (set (reg/f:SI 13 sp)
(plus:SI (reg/f:SI 11 fp)
(const_int 4 [0x4])))
(nil)))

The ICE was provoked by missing an alternative to reflect that ARM mode can do 
an add of general register into sp which is unpredictable in Thumb mode add 
immediate.


ChangeLog entries are as follow:

*** gcc/ChangeLog ***

2016-11-04  Thomas Preud'homme  

* config/arm/arm.md (arm_addsi3): Add alternative for addition of
general register with general register or ARM constant into SP
register.


*** gcc/testsuite/ChangeLog ***

2016-11-04  Thomas Preud'homme  

* gcc.target/arm/empty_fiq_handler.c: New test.


Testing: bootstrapped on ARMv7-A ARM mode & testsuite shows no regression.

Is this ok for trunk?

Best regards,

Thomas
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 8393f65bcf4c9c3e61b91e5adcd5f59ff7c6ec3f..70cd31f6cb176fe29efc1fbbf692bfc270ef5a1b 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -609,9 +609,9 @@
 ;;  (plus (reg rN) (reg sp)) into (reg rN).  In this case reload will
 ;; put the duplicated register first, and not try the commutative version.
 (define_insn_and_split "*arm_addsi3"
-  [(set (match_operand:SI  0 "s_register_operand" "=rk,l,l ,l ,r ,k ,r,r ,k ,r ,k,k,r ,k ,r")
-(plus:SI (match_operand:SI 1 "s_register_operand" "%0 ,l,0 ,l ,rk,k ,r,rk,k ,rk,k,r,rk,k ,rk")
- (match_operand:SI 2 "reg_or_int_operand" "rk ,l,Py,Pd,rI,rI,k,Pj,Pj,L ,L,L,PJ,PJ,?n")))]
+  [(set (match_operand:SI  0 "s_register_operand" "=rk,l,l ,l ,r ,k ,r,k ,r ,k ,r ,k,k,r ,k ,r")
+	(plus:SI (match_operand:SI 1 "s_register_operand" "%0 ,l,0 ,l ,rk,k ,r,r ,rk,k ,rk,k,r,rk,k ,rk")
+		 (match_operand:SI 2 "reg_or_int_operand" "rk ,l,Py,Pd,rI,rI,k,rI,Pj,Pj,L ,L,L,PJ,PJ,?n")))]
   "TARGET_32BIT"
   "@
add%?\\t%0, %0, %2
@@ -621,6 +621,7 @@
add%?\\t%0, %1, %2
add%?\\t%0, %1, %2
add%?\\t%0, %2, %1
+   add%?\\t%0, %1, %2
addw%?\\t%0, %1, %2
addw%?\\t%0, %1, %2
sub%?\\t%0, %1, #%n2
@@ -640,10 +641,10 @@
 		  operands[1], 0);
   DONE;
   "
-  [(set_attr "length" "2,4,4,4,4,4,4,4,4,4,4,4,4,4,16")
+  [(set_attr "length" "2,4,4,4,4,4,4,4,4,4,4,4,4,4,4,16")
(set_attr "predicable" "yes")
-   (set_attr "predicable_short_it" "yes,yes,yes,yes,no,no,no,no,no,no,no,no,no,no,no")
-   (set_attr "arch" "t2,t2,t2,t2,*,*,*,t2,t2,*,*,a,t2,t2,*")
+   (set_attr "predicable_short_it" "yes,yes,yes,yes,no,no,no,no,no,no,no,no,no,no,no,no")
+   (set_attr "arch" "t2,t2,t2,t2,*,*,*,a,t2,t2,*,*,a,t2,t2,*")
(set (attr "type") (if_then_else (match_operand 2 "const_int_operand" "")
 		  (const_string "alu_imm")
 		  (const_string "alu_sreg")))
diff --git a/gcc/testsuite/gcc.target/arm/empty_fiq_handler.c b/gcc/testsuite/gcc.target/arm/empty_fiq_handler.c
new file mode 100644
index ..bbcfd0e32f9d0cc60c8a013fd1bb584b21aaad16
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/empty_fiq_handler.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+
+/* Below code used to trigger an ICE due to missing constraints for
+   sp = fp + cst pattern.  */
+
+void fiq_handler (void) __attribute__((interrupt ("FIQ")));
+
+void
+fiq_handler (void)
+{
+}


Re: [PATCH] fix PR68468

2016-11-09 Thread Jakub Jelinek
On Wed, Nov 09, 2016 at 04:08:39PM +0100, Bernd Schmidt wrote:
> On 11/05/2016 06:14 PM, Waldemar Brodkorb wrote:
> >Hi,
> >
> >the following patch fixes PR68468.
> >Patch is used for a while in Buildroot without issues.
> >
> >2016-11-05  Waldemar Brodkorb 

Two spaces before < instead of just one.
> >
> >   PR gcc/68468

PR libgcc/68468
instead.

> >   * libgcc/unwind-dw2-fde-dip.c: fix build on FDPIC targets.

Capital F in Fix.
No libgcc/ prefix for files in libgcc/ChangeLog.

> This is ok.

I think Waldemar does not have SVN write access, are you going to check it
in or who will do that?

Jakub


Re: [PATCH, LIBGCC] Avoid count_leading_zeros with undefined result (PR 78067)

2016-11-09 Thread James Greenhalgh
On Sat, Oct 29, 2016 at 05:47:54AM +, Bernd Edlinger wrote:
> On 10/28/16 16:05, Bernd Edlinger wrote:
> > On 10/27/16 22:23, Joseph Myers wrote:
> >> On Thu, 27 Oct 2016, Bernd Edlinger wrote:
> >>
> >>> Hi,
> >>>
> >>> by code reading I became aware that libgcc can call count_leading_zeros
> >>> in certain cases which can give undefined results.  This happens on
> >>> signed int128 -> float or double conversions, when the int128 is in
> >>> the range
> >>> INT64_MAX+1 to UINT64_MAX.
> >>
> >> I'd expect testcases added to the testsuite that exercise this case at
> >> runtime, if not already present.
> >>
> >
> > Yes, thanks.  I somehow expected there were already test cases,
> > somewhere, but now when you ask that, I begin to doubt as well...
> >
> > I will try to add an asm("int 3") and see if that gets hit at all.
> >
> 
> The breakpoint got hit only once, in the libgo testsuite: runtime/pprof.
> 
> I see there are some int to float conversion tests at
> gcc.dg/torture/fp-int-convert*.c, where it is easy to add a
> test case that hits the breakpoint too.
> 
> However the test case does not fail before the patch,
> it is just slightly undefined behavior, that is not
> causing problems (at least for x86_64).
> 
> Find attached a new patch with test case.

These new test cases look like they are going to be out of exponent range
for _Float16 - so the testcases will fail for a target which tests either
of:

  gcc.dg/torture/fp-int-convert-float16.c
  gcc.dg/torture/fp-int-convert-float16-timode.c

Should they have an M_OK1 check?

Thanks,
James


> 
> 
> Boot-strapped on x86_64-pc-linux-gnu.
> Is it OK for trunk?
> 
> 
> Thanks
> Bernd.

> 2016-10-27  Bernd Edlinger  
> 
>   PR libgcc/78067
>   * libgcc2.c (__floatdisf, __floatdidf): Avoid undefined results from
>   count_leading_zeros.
> 
> testsuite:
> 2016-10-27  Bernd Edlinger  
> 
>   PR libgcc/78067
>   * gcc.dg/torture/fp-int-convert.h: Add more conversion tests.
>   
> 
> Index: libgcc2.c
> ===
> --- libgcc2.c (revision 241400)
> +++ libgcc2.c (working copy)
> @@ -1643,6 +1643,11 @@
>  hi = -(UWtype) hi;
>  
>UWtype count, shift;
> +#if !defined (COUNT_LEADING_ZEROS_0) || COUNT_LEADING_ZEROS_0 != W_TYPE_SIZE
> +  if (hi == 0)
> +count = W_TYPE_SIZE;
> +  else
> +#endif
>count_leading_zeros (count, hi);
>  
>/* No leading bits means u == minimum.  */
> Index: gcc/testsuite/gcc.dg/torture/fp-int-convert.h
> ===
> --- gcc/testsuite/gcc.dg/torture/fp-int-convert.h (revision 241647)
> +++ gcc/testsuite/gcc.dg/torture/fp-int-convert.h (working copy)
> @@ -53,6 +53,8 @@ do {
> \
>TEST_I_F_VAL (U, F, HVAL1U (P, U), P_OK (P, U));   \
>TEST_I_F_VAL (U, F, HVAL1U (P, U) + 1, P_OK (P, U));   \
>TEST_I_F_VAL (U, F, HVAL1U (P, U) - 1, P_OK (P, U));   \
> +  TEST_I_F_VAL (I, F, WVAL0S (I), 1);\
> +  TEST_I_F_VAL (I, F, -WVAL0S (I), 1);   \
>  } while (0)
>  
>  #define P_OK(P, T) ((P) >= sizeof(T) * CHAR_BIT)
> @@ -74,6 +76,7 @@ do {
> \
>? (S)1  \
>: (((S)1 << (sizeof(S) * CHAR_BIT - 2)) \
>   + ((S)3 << (sizeof(S) * CHAR_BIT - 2 - P
> +#define WVAL0S(S) (S)((S)1 << (sizeof(S) * CHAR_BIT / 2 - 1))
>  
>  #define TEST_I_F_VAL(IT, FT, VAL, PREC_OK)   \
>  do { \



Re: [PATCH, GCC/ARM] Fix PR77933: stack corruption on ARM when using high registers and lr

2016-11-09 Thread Thomas Preudhomme
I've reworked the patch following comments from Wilco [1] (sorry could not find 
it in my MUA for some reason).


[1] https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00317.html


== Context ==

When saving registers, function thumb1_expand_prologue () aims at minimizing the 
number of push instructions. One of the optimization it does is to push LR 
alongside high register(s) (after having moved them to low register(s)) when 
there is no low register to save. The way this is implemented is to add LR to 
the pushable_regs mask if it is live just before pushing the registers in that 
mask. The mask of live pushable registers which is used to detect whether LR 
needs to be saved is then clear to ensure LR is only saved once.



== Problem ==

However beyond deciding what register to push pushable_regs is used to track 
what pushable register can be used to move a high register before being pushed, 
hence the name. That mask is cleared when all high registers have been assigned 
a low register but the clearing assumes the high registers were assigned to the 
registers with the biggest number in that mask. This is not the case because LR 
is not considered when looking for a register in that mask. Furthermore, LR 
might have been saved in the TARGET_BACKTRACE path above yet the mask of live 
pushable registers is not cleared in that case.



== Solution ==

This patch changes the loop to iterate over register LR to r0 so as to both fix 
the stack corruption reported in PR77933 and reuse lr to push some high register 
when possible. This patch also introduce a new variable lr_needs_saving to 
record whether LR (still) needs to be saved at a given point in code and sets 
the variable accordingly throughout the code, thus fixing the second issue. 
Finally, this patch create a new push_mask variable to distinguish between the 
mask of registers to push and the mask of live pushable registers.



== Note ==

Other bits could have been improved but have been left out to allow the patch to 
be backported to stable branch:


(1) using argument registers that are not holding an argument
(2) using push_mask consistently instead of l_mask (in TARGET_BACKTRACE), mask 
(low register push) and push_mask

(3) the !l_mask case improved in TARGET_BACKTRACE since offset == 0
(4) rename l_mask to a more appropriate name (live_pushable_regs_mask?)

ChangeLog entry are as follow:

*** gcc/ChangeLog ***

2016-11-08  Thomas Preud'homme  

PR target/77933
* config/arm/arm.c (thumb1_expand_prologue): Distinguish between lr
being live in the function and lr needing to be saved.  Distinguish
between already saved pushable registers and registers to push.
Check for LR being an available pushable register.


*** gcc/testsuite/ChangeLog ***

2016-11-08  Thomas Preud'homme  

PR target/77933
* gcc.target/arm/pr77933-1.c: New test.
* gcc.target/arm/pr77933-2.c: Likewise.


Testing: no regression on arm-none-eabi GCC cross-compiler targeting Cortex-M0

Is this ok for trunk?

Best regards,

Thomas

On 02/11/16 17:08, Thomas Preudhomme wrote:

Hi,

When saving registers, function thumb1_expand_prologue () aims at minimizing the
number of push instructions. One of the optimization it does is to push lr
alongside high register(s) (after having moved them to low register(s)) when
there is no low register to save. The way this is implemented is to add lr to
the list of registers that can be pushed just before the push happens. This
would then push lr and allows it to be used for further push if there was not
enough registers to push all high registers to be pushed.

However, the logic that decides what register to move high registers to before
being pushed only looks at low registers (see for loop initialization). This
means not only that lr is not used for pushing high registers but also that lr
is not removed from the list of registers to be pushed when it's not used. This
extra lr push is not poped in epilogue leading in stack corruption.

This patch changes the loop to iterate over register r0 to lr so as to both fix
the stack corruption and reuse lr to push some high register when possible.

ChangeLog entry are as follow:

*** gcc/ChangeLog ***

2016-11-01  Thomas Preud'homme  

PR target/77933
* config/arm/arm.c (thumb1_expand_prologue): Also check for lr being a
pushable register.


*** gcc/testsuite/ChangeLog ***

2016-11-01  Thomas Preud'homme  

PR target/77933
* gcc.target/arm/pr77933.c: New test.


Testing: no regression on arm-none-eabi GCC cross-compiler targeting Cortex-M0

Is this ok for trunk?

Best regards,

Thomas
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index dd8d5e5db8ca50daab648e58df290969aa794862..ddbda3e46dbcabb6c5775f847bc338c37705e122 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -24806,6 +24806,7 @@ thumb1_expand_prologue (void)
   unsigned long live_regs_mask;
   unsigned long l_mask;
   

Re: [PATCH] (v2) print-rtl-function.c: add (param) directive to dump

2016-11-09 Thread Bernd Schmidt

On 11/09/2016 05:46 PM, David Malcolm wrote:


OK for trunk if it passes bootstrap and regrtest?

gcc/ChangeLog:
* print-rtl-function.c (print_any_param_name): New function.
(print_param): New function.
(print_rtx_function): Call print_param for each argument.
* print-rtl.c (rtx_writer::finish_directive): New function.
* print-rtl.h (rtx_writer::finish_directive): New decl.
+
+  rtx decl_rtl = DECL_WRTL_CHECK (arg)->decl_with_rtl.rtl;


Isn't this DECL_RTL_IF_SET? If so, please use that macro. Otherwise ok.


Bernd


Re: [PATCH][1/2] GIMPLE Frontend, C FE parts (and GIMPLE parser)

2016-11-09 Thread Joseph Myers
On Wed, 9 Nov 2016, Richard Biener wrote:

> I'll push back c_parser to the header and put inlines I need to export
> there as well.
> 
> Joseph, is this (with regard to the inlines) your preference as well?

I'm not clear what the proposal is.  If some patch is now proposed 
different from what was previously posted to gcc-patches, please post it.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH, LIBGCC] Avoid count_leading_zeros with undefined result (PR 78067)

2016-11-09 Thread Joseph Myers
On Wed, 9 Nov 2016, James Greenhalgh wrote:

> These new test cases look like they are going to be out of exponent range
> for _Float16 - so the testcases will fail for a target which tests either
> of:
> 
>   gcc.dg/torture/fp-int-convert-float16.c
>   gcc.dg/torture/fp-int-convert-float16-timode.c
> 
> Should they have an M_OK1 check?

Yes.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [patch,avr] Add new option -mabsdata.

2016-11-09 Thread Sandra Loosemore

On 11/07/2016 05:54 AM, Georg-Johann Lay wrote:

@@ -15261,6 +15262,13 @@ GCC supports the following AVR devices a

 @include avr-mmcu.texi

+@item -mabsdata
+@opindex mabsdata
+
+Assume that all data in static stocage can be accessed by LDS / STS


s/stocage/storage/


+inctructions.  This option has only an effect on reduced Tiny devices like


s/inctructions/instructions/


+ATtiny40.
+
 @item -maccumulate-args
 @opindex maccumulate-args


The documentation changes are OK with those typos fixed.

-Sandra



Re: [PATCH] enable -fprintf-return-value by default

2016-11-09 Thread Sandra Loosemore

On 11/08/2016 08:13 PM, Martin Sebor wrote:

The -fprintf-return-value optimization has been disabled since
the last time it caused a bootstrap failure on powerpc64le.  With
the underlying problems fixed GCC has bootstrapped fine on all of
powerpc64, powerpc64le and x86_64 and tested with no regressions.
I'd like to re-enable the option.  The attached patch does that.




diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 17c5c22..adebeff 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -8301,7 +8301,7 @@ if (snprintf (buf, "%08x", i) >= sizeof buf)
 The @option{-fprintf-return-value} option relies on other optimizations
 and yields best results with @option{-O2}.  It works in tandem with the
 @option{-Wformat-length} option.  The @option{-fprintf-return-value}
-option is disabled by default.
+option is enabled by default.

 @item -fno-peephole
 @itemx -fno-peephole2


Near the beginning of this chapter, in @node Invoking GCC, it says:

Many options have long names starting with @samp{-f} or with
@samp{-W}---for example,
@option{-fmove-loop-invariants}, @option{-Wformat} and so on.  Most of
these have both positive and negative forms; the negative form of
@option{-ffoo} is @option{-fno-foo}.  This manual documents
only one of these two forms, whichever one is not the default.

So you should be documenting the non-default negative form 
-fno-printf-return-value instead of the default positive form.  The 
corresponding entry in the list in @node Option Summary needs to be 
adjusted, too.


-Sandra



RE: [PATCH,testsuite] MIPS: Upgrade to MIPS IV if using (HAS_MOVN) with MIPS III.

2016-11-09 Thread Toma Tabacu
> From: Moore, Catherine [mailto:catherine_mo...@mentor.com]
> Sent: 08 November 2016 20:47
> To: Toma Tabacu; gcc-patches@gcc.gnu.org
> Cc: Matthew Fortune
> Subject: RE: [PATCH,testsuite] MIPS: Upgrade to MIPS IV if using (HAS_MOVN)
> with MIPS III.
> 
> 
> 
> > -Original Message-
> > From: Toma Tabacu [mailto:toma.tab...@imgtec.com]
> > Sent: Monday, November 7, 2016 11:21 AM
> > gcc/testsuite/ChangeLog:
> >
> > 2016-11-07  Toma Tabacu  
> >
> > * gcc.target/mips/mips.exp (mips-dg-options): Upgrade to MIPS IV if
> using
> > (HAS_MOVN) with MIPS III.
> >
> > diff --git a/gcc/testsuite/gcc.target/mips/mips.exp
> > b/gcc/testsuite/gcc.target/mips/mips.exp
> > index 39f44ff..e22d782 100644
> > --- a/gcc/testsuite/gcc.target/mips/mips.exp
> > +++ b/gcc/testsuite/gcc.target/mips/mips.exp
> > @@ -1129,7 +1129,7 @@ proc mips-dg-options { args } {
> >  # We need MIPS IV or higher for:
> > #
> > #
> > -   } elseif { $isa < 3
> > +   } elseif { $isa < 4
> >&& [mips_have_test_option_p options "HAS_MOVN"] }
> > {
> > mips_make_test_option options "-mips4"
> >  # We need MIPS III or higher for:
> 
> Hi Toma,
> 
> The patch itself is OK, but the ChangeLog entry line length is greater than 
> 80.
> 

Yes, sorry. The version below fixes this.

> Do you have write access to the repository?  Please let me know if you would
> like me to commit this for you?
> 
> Thanks,
> Catherine

No, I don't have write access.
I would be grateful if you could commit the patch for me, if you think it
would be OK with Matthew.
To be clear, I am not in a rush to get this committed.

Thanks,
Toma Tabacu

gcc/testsuite/ChangeLog:

2016-11-09  Toma Tabacu  

* gcc.target/mips/mips.exp (mips-dg-options): Upgrade to MIPS IV if
using (HAS_MOVN) with MIPS III.

diff --git a/gcc/testsuite/gcc.target/mips/mips.exp 
b/gcc/testsuite/gcc.target/mips/mips.exp
index 39f44ff..e22d782 100644
--- a/gcc/testsuite/gcc.target/mips/mips.exp
+++ b/gcc/testsuite/gcc.target/mips/mips.exp
@@ -1129,7 +1129,7 @@ proc mips-dg-options { args } {
 # We need MIPS IV or higher for:
#
#
-   } elseif { $isa < 3
+   } elseif { $isa < 4
   && [mips_have_test_option_p options "HAS_MOVN"] } {
mips_make_test_option options "-mips4"
 # We need MIPS III or higher for:



Re: [PATCH][1/2] GIMPLE Frontend, C FE parts (and GIMPLE parser)

2016-11-09 Thread Richard Biener
On November 9, 2016 5:45:32 PM GMT+01:00, Joseph Myers 
 wrote:
>On Wed, 9 Nov 2016, Richard Biener wrote:
>
>> I'll push back c_parser to the header and put inlines I need to
>export
>> there as well.
>> 
>> Joseph, is this (with regard to the inlines) your preference as well?
>
>I'm not clear what the proposal is.  If some patch is now proposed 
>different from what was previously posted to gcc-patches, please post
>it.

I prefer the posted patch with c_parser not exported.  But Jakub raised 
concerns about the inlines thus I digged again and found a possible solution to 
the gengtype issue.  Thus my question if you prefer the inlines to to stay as 
inlines and thus move to the header or if you do not request this change (that 
will save me work).

Thanks,
Richard.




Re: [PATCH, LIBGCC] Avoid count_leading_zeros with undefined result (PR 78067)

2016-11-09 Thread Bernd Edlinger
On 11/09/16 17:52, Joseph Myers wrote:
> On Wed, 9 Nov 2016, James Greenhalgh wrote:
>
>> These new test cases look like they are going to be out of exponent range
>> for _Float16 - so the testcases will fail for a target which tests either
>> of:
>>
>>   gcc.dg/torture/fp-int-convert-float16.c
>>   gcc.dg/torture/fp-int-convert-float16-timode.c
>>
>> Should they have an M_OK1 check?
>
> Yes.
>

Yes, but maybe introduce a test if the half-wide value fits?

like:

#define M_OK2(M, T) ((M) > sizeof(T) * CHAR_BIT / 2 - 1)


Bernd.


[Patch, Fortran, committed] PR 60777: [F03] RECURSIVE function rejected in specification expression

2016-11-09 Thread Janus Weil
Hi all,

I just committed a close-to-obvious fix for PR60777 (ok'd by Steve in
the PR), see attachment:

https://gcc.gnu.org/viewcvs?rev=242009&root=gcc&view=rev

Cheers,
Janus
Index: gcc/fortran/expr.c
===
--- gcc/fortran/expr.c  (Revision 241993)
+++ gcc/fortran/expr.c  (Arbeitskopie)
@@ -2794,12 +2794,12 @@ external_spec_function (gfc_expr *e)
   return false;
 }
 
-  if (f->attr.recursive)
-{
-  gfc_error ("Specification function %qs at %L cannot be RECURSIVE",
-f->name, &e->where);
+  /* F08:7.1.11.6. */
+  if (f->attr.recursive
+  && !gfc_notify_std (GFC_STD_F2003,
+ "Specification function '%s' "
+ "at %L cannot be RECURSIVE",  f->name, &e->where))
   return false;
-}
 
 function_allowed:
   return restricted_args (e->value.function.actual);
! { dg-do run }
!
! PR 60777: [F03] RECURSIVE function rejected in specification expression
!
! Contributed by Vladimir Fuka 

module recur
  implicit none
contains

  pure recursive function f(n) result(answer)
integer, intent(in) :: n
integer :: answer
if (n<2) then
  answer = 1
else
  answer = f(n-1)*n
end if
  end function

  pure function usef(n)
integer,intent(in) :: n
character(f(n)):: usef
usef = repeat('*',f(n))
  end function
end module

program testspecexpr
  use recur
  implicit none
  if (usef(1) /= '*')  call abort()
  if (usef(2) /= '**') call abort()
  if (usef(3) /= '**') call abort()
end


Re: [PATCH] Fix PR78189

2016-11-09 Thread Christophe Lyon
On 9 November 2016 at 09:36, Bin.Cheng  wrote:
> On Tue, Nov 8, 2016 at 9:11 AM, Richard Biener  wrote:
>> On Mon, 7 Nov 2016, Christophe Lyon wrote:
>>
>>> Hi Richard,
>>>
>>>
>>> On 7 November 2016 at 09:01, Richard Biener  wrote:
>>> >
>>> > The following fixes an oversight when computing alignment in the
>>> > vectorizer.
>>> >
>>> > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.
>>> >
>>> > Richard.
>>> >
>>> > 2016-11-07  Richard Biener  
>>> >
>>> > PR tree-optimization/78189
>>> > * tree-vect-data-refs.c (vect_compute_data_ref_alignment): Fix
>>> > alignment computation.
>>> >
>>> > * g++.dg/torture/pr78189.C: New testcase.
>>> >
>>> > Index: gcc/testsuite/g++.dg/torture/pr78189.C
>>> > ===
>>> > --- gcc/testsuite/g++.dg/torture/pr78189.C  (revision 0)
>>> > +++ gcc/testsuite/g++.dg/torture/pr78189.C  (working copy)
>>> > @@ -0,0 +1,41 @@
>>> > +/* { dg-do run } */
>>> > +/* { dg-additional-options "-ftree-slp-vectorize -fno-vect-cost-model" } 
>>> > */
>>> > +
>>> > +#include 
>>> > +
>>> > +struct A
>>> > +{
>>> > +  void * a;
>>> > +  void * b;
>>> > +};
>>> > +
>>> > +struct alignas(16) B
>>> > +{
>>> > +  void * pad;
>>> > +  void * misaligned;
>>> > +  void * pad2;
>>> > +
>>> > +  A a;
>>> > +
>>> > +  void Null();
>>> > +};
>>> > +
>>> > +void B::Null()
>>> > +{
>>> > +  a.a = nullptr;
>>> > +  a.b = nullptr;
>>> > +}
>>> > +
>>> > +void __attribute__((noinline,noclone))
>>> > +NullB(void * misalignedPtr)
>>> > +{
>>> > +  B* b = reinterpret_cast(reinterpret_cast(misalignedPtr) - 
>>> > offsetof(B, misaligned));
>>> > +  b->Null();
>>> > +}
>>> > +
>>> > +int main()
>>> > +{
>>> > +  B b;
>>> > +  NullB(&b.misaligned);
>>> > +  return 0;
>>> > +}
>>> > diff --git gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
>>> > index 9346cfe..b03cb1e 100644
>>> > --- gcc/tree-vect-data-refs.c
>>> > +++ gcc/tree-vect-data-refs.c
>>> > @@ -773,10 +773,25 @@ vect_compute_data_ref_alignment (struct 
>>> > data_reference *dr)
>>> >base = ref;
>>> >while (handled_component_p (base))
>>> >  base = TREE_OPERAND (base, 0);
>>> > +  unsigned int base_alignment;
>>> > +  unsigned HOST_WIDE_INT base_bitpos;
>>> > +  get_object_alignment_1 (base, &base_alignment, &base_bitpos);
>>> > +  /* As data-ref analysis strips the MEM_REF down to its base operand
>>> > + to form DR_BASE_ADDRESS and adds the offset to DR_INIT we have to
>>> > + adjust things to make base_alignment valid as the alignment of
>>> > + DR_BASE_ADDRESS.  */
>>> >if (TREE_CODE (base) == MEM_REF)
>>> > -base = build2 (MEM_REF, TREE_TYPE (base), base_addr,
>>> > -  build_int_cst (TREE_TYPE (TREE_OPERAND (base, 1)), 0));
>>> > -  unsigned int base_alignment = get_object_alignment (base);
>>> > +{
>>> > +  base_bitpos -= mem_ref_offset (base).to_short_addr () * 
>>> > BITS_PER_UNIT;
>>> > +  base_bitpos &= (base_alignment - 1);
>>> > +}
>>> > +  if (base_bitpos != 0)
>>> > +base_alignment = base_bitpos & -base_bitpos;
>>> > +  /* Also look at the alignment of the base address DR analysis
>>> > + computed.  */
>>> > +  unsigned int base_addr_alignment = get_pointer_alignment (base_addr);
>>> > +  if (base_addr_alignment > base_alignment)
>>> > +base_alignment = base_addr_alignment;
>>> >
>>> >if (base_alignment >= TYPE_ALIGN (TREE_TYPE (vectype)))
>>> >  DR_VECT_AUX (dr)->base_element_aligned = true;
>>>
>>> Since you committed this patch (r241892), I'm seeing execution failures:
>>>   gcc.dg/vect/pr40074.c -flto -ffat-lto-objects execution test
>>>   gcc.dg/vect/pr40074.c execution test
>>> on armeb-none-linux-gnueabihf --with-mode=arm --with-cpu=cortex-a9
>>> --with-fpu=neon-fp16
>>> (using qemu as simulator)
>>
>> The difference is that we now vectorize the testcase with versioning
>> for alignment (but it should never execute the vectorized variant).
>> I need arm peoples help to understand what is wrong.
> Hi All,
> I will look at it.
>

Hi,

This is causing new regressions on armeb:
  gcc.dg/vect/vect-strided-a-u8-i2-gap.c -flto -ffat-lto-objects
scan-tree-dump-times vect "vectorized 2 loops" 1
  gcc.dg/vect/vect-strided-a-u8-i2-gap.c scan-tree-dump-times vect
"vectorized 2 loops" 1

pr470074 passes now,

Thanks,

Christophe

> Thanks,
> bin
>>
>> At least the testcase shows there is (kind-of) a missed optimization
>> that we no longer figure out versioning for alignment is useless.
>> I'll look into that.
>>
>> Richard.


Re: [PATCH 2/4] BRIG (HSAIL) frontend: The FE itself.

2016-11-09 Thread Pekka Jääskeläinen
Hi Martin,

On Fri, Nov 4, 2016 at 4:58 PM, Martin Jambor  wrote:
> I personally primarily want this for debugging purposes, and we should
> try to eventually report errors in a more comprehensible way than
> HSAILasm.  But more generally, and more importantly, if the input,
> whether human readable or not, is incorrect, the compiler should not
> produce an "internal error."  If that happens, users will file bugs
> against gcc when in fact it is the generating tool that is broken.
> One you maintain the FE, you personally will not want this :-)

Yes, I will add better error messages as soon asissues are reported.

> I was wondering why in case of VECTOR_TYPE, instead of
> return build1 (VIEW_CONVERT_EXPR, type, expr);
> you do not do
> return fold (convert_to_vector (type, expr));
>
> it seemed to me natural given how you handle the other cases and that
> the function internally also builds a V_C_E, after two checks.  So I
> thought the checks (matching overall size and only converting from an
> integer or another vector-type) were the problem and was wondering
> which one.  Especially the size one is something I believe you should
> never violate too.

I see. I cannot recall the origins of this, but my guess is that the code is
copied from an older Go frontend version. I tested with your
suggestion and it seems
to work fine.

> copy_move_inst_handler operator() starts by type-casting whatever it
> gets to BrigInstSourceType* and immediately dereferencing it, loading
> field sourceType, even though, in case of an LDA, what it is actually
> looking at is not a BrigInstSourceType but a BrigInstAddr, which is an
> entirely different structure that does not have that field.  So you
> read the 8-bit segment and the first reserved field and happily pass
> that to gccbrig_tree_type_for_hsa_type, hoping it does not explode on
> random data.  That is wrong.
>
> Later on in the operator() you actually figure out that you are
> looking at BrigInstSourceType but that is too late.

Ah, I see the issue now. It worked as with LDA the input type was
forced and handled explicitly. I now made the code more explicit
regarding this.

> Indeed it does, thanks.  However, I think that at the point when you
> do:
>
>   if (offs > 0)
> const_offset = build_int_cst (size_type_node, offs);
>
> const_offset can already be non-NULL you may lose whatever value it
> had before.  You might want to keep the constant offset as integer
> until the very moment when you may want to use it.

Great catch! There indeed was a major memory corruption with
group and private variable which was not caught by the PRM conformance
suite I use for testing. Should be now fixed.

> (Also, single statements should not have braces around them ;-)

...also fixed these, once again ;)

> I see.  If you think that MULT_HIGHPART_EXPR is broken, it may be
> worth filing a bug report (probably during next stage1).

It can be considered broken or not just fully implemented for all targets
and various vector types (especially those not directly supported by the
target).  Yes, I rather look closer and report this after the patch has been
merged to be able to provide BRIG test cases that work with
upstream code base.

> This week I have checked out the updated tree from github and looked
> at only a few changed functions there.  Hopefully the steering
> committee will decide soon and we'll get attention of a global
> reviewer during the next few weeks too.
>
> Thanks for addressing so many of my comments,

Thanks again, an updated FE patch attached.

BR,
Pekka


002-brig-fe-gcc.patch.gz
Description: GNU Zip compressed data


[PATCH, i386]: Fix PR78626, wrong code with -fschedule-insns

2016-11-09 Thread Uros Bizjak
Hello!

We need earlyclobber on output operand of doubleword shift insns,
since we have to prevent (partial) output matching %ecx as count
argument.

2016-11-09  Uros Bizjak  

PR target/78262
* config/i386/i386.md (*3_doubleword): Mark
operand 0 as earlyclobber.

testsuite/ChangeLog:

2016-11-09  Uros Bizjak  

PR target/78262
* gcc.target/i386/pr78262.c: New test.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline, will be backported to release branches.

Uros.
Index: config/i386/i386.md
===
--- config/i386/i386.md (revision 242004)
+++ config/i386/i386.md (working copy)
@@ -10339,7 +10339,7 @@
   "operands[2] = gen_lowpart (QImode, operands[2]);")
 
 (define_insn_and_split "*3_doubleword"
-  [(set (match_operand:DWI 0 "register_operand" "=r")
+  [(set (match_operand:DWI 0 "register_operand" "=&r")
(any_shiftrt:DWI (match_operand:DWI 1 "register_operand" "0")
 (match_operand:QI 2 "nonmemory_operand" "c")))
(clobber (reg:CC FLAGS_REG))]
Index: testsuite/gcc.target/i386/pr78262.c
===
--- testsuite/gcc.target/i386/pr78262.c (nonexistent)
+++ testsuite/gcc.target/i386/pr78262.c (working copy)
@@ -0,0 +1,32 @@
+/* { dg-do run } */
+/* { dg-require-effective-target int128 } */
+/* { dg-options "-O -fschedule-insns" } */
+
+typedef unsigned char u8;
+typedef unsigned __int128 u128;
+
+static u128 u128_0;
+static u128 *p128;
+
+u128 __attribute__ ((noinline, noclone))
+foo(u8 u8_0)
+{
+  p128 = &u128_0;
+  u128_0 = u8_0;
+  u128_0 = u128_0 << 127 | u128_0 >> 1;
+  u128_0 >>= (u8)u128_0;
+  return 2 + u128_0;
+}
+
+int
+main()
+{
+  u128 x = foo(5);
+  if (p128 != &u128_0)
+__builtin_abort();
+  if (u128_0 != ((u128)2 << 124))
+__builtin_abort();
+  if (x != ((u128)2 << 124) + 2)
+__builtin_abort();
+  return 0;
+}


C++ PATCH for C++17 auto non-type template parameters

2016-11-09 Thread Jason Merrill
This was pretty straightforward; auto template parameters look more or
less the same as any other parameter with dependent type, so
substitution works without any changes, we just need to do auto
deduction in a couple of places.  The most involved bit was handling
deduction of a type parameter from the type of an array bound, which
only happens if deduction otherwise doesn't find a binding.  At first
I tried to work this into unify et al, but eventually decided to do it
in a (much simpler) separate function.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit 04ade0b5013698fd1e458ef1425f7afd023feaf0
Author: Jason Merrill 
Date:   Mon Nov 7 16:45:06 2016 -0800

Implement P0127R2, Declaring non-type parameters with auto.

gcc/cp/
* cp-tree.h (enum auto_deduction_context): Add adc_unify.
* decl.c (grokdeclarator): Allow 'auto' in C++17 template non-type
parameter types.
* pt.c (do_auto_deduction): Add outer_targs parameter.
(convert_template_argument): Call do_auto_deduction.  If adc_unify,
don't give up on dependent init.
(unify): Likewise.  In C++17, walk into the type of a
TEMPLATE_PARM_INDEX.
(for_each_template_parm): Add any_fn parameter.
(struct pair_fn_data): Likewise.
(for_each_template_parm_r): Call it for any tree.  In C++17, walk
into the type of a TEMPLATE_PARM_INDEX.
(zero_r, array_deduction_r, try_array_deduction): New.
(type_unification_real): Call try_array_deduction.
(get_partial_spec_bindings): Likewise.
gcc/c-family/
* c-cppbuiltin.c (c_cpp_builtins): Define __cpp_template_auto.

diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c
index 55dbf44..70eade1 100644
--- a/gcc/c-family/c-cppbuiltin.c
+++ b/gcc/c-family/c-cppbuiltin.c
@@ -942,6 +942,7 @@ c_cpp_builtins (cpp_reader *pfile)
  cpp_define (pfile, "__cpp_aggregate_bases=201603");
  cpp_define (pfile, "__cpp_deduction_guides=201606");
  cpp_define (pfile, "__cpp_noexcept_function_type=201510");
+ cpp_define (pfile, "__cpp_template_auto=201606");
}
   if (flag_concepts)
cpp_define (pfile, "__cpp_concepts=201507");
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 20b52ad..9b5b5bc 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -5163,6 +5163,7 @@ enum auto_deduction_context
   adc_unspecified,   /* Not given */
   adc_variable_type, /* Variable initializer deduction */
   adc_return_type,   /* Return type deduction */
+  adc_unify, /* Template argument deduction */
   adc_requirement/* Argument dedution constraint */
 };
 
@@ -6088,7 +6089,8 @@ extern tree make_template_placeholder (tree);
 extern tree do_auto_deduction   (tree, tree, tree);
 extern tree do_auto_deduction   (tree, tree, tree,
  tsubst_flags_t,
- auto_deduction_context);
+ auto_deduction_context,
+tree = NULL_TREE);
 extern tree type_uses_auto (tree);
 extern tree type_uses_auto_or_concept  (tree);
 extern void append_type_to_template_for_access_check (tree, tree, tree,
diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index c0321f9..bd37faa 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -11135,7 +11135,8 @@ grokdeclarator (const cp_declarator *declarator,
   if (ctype || in_namespace)
error ("cannot use %<::%> in parameter declaration");
 
-  if (type_uses_auto (type))
+  if (type_uses_auto (type)
+ && !(cxx_dialect >= cxx1z && template_parm_flag))
{
  if (cxx_dialect >= cxx14)
error ("% parameter not permitted in this context");
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 3df71dd..64e566e 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -161,7 +161,7 @@ static tree convert_nontype_argument (tree, tree, 
tsubst_flags_t);
 static tree convert_template_argument (tree, tree, tree,
   tsubst_flags_t, int, tree);
 static tree for_each_template_parm (tree, tree_fn_t, void*,
-   hash_set *, bool);
+   hash_set *, bool, tree_fn_t = NULL);
 static tree expand_template_argument_pack (tree);
 static tree build_template_parm_index (int, int, int, tree, tree);
 static bool inline_needs_template_parms (tree, bool);
@@ -7299,6 +7299,13 @@ convert_template_argument (tree parm,
 {
   tree t = tsubst (TREE_TYPE (parm), args, complain, in_decl);
 
+  if (tree a = type_uses_auto (t))
+   {
+ t = do_auto_deduction (t, arg, a, complain, adc_unspecified);
+ if (t == error_mark_node)
+   return error_mark_node;
+   }
+
   if (invalid_

C++ PATCH for some C++17 class deduction issues

2016-11-09 Thread Jason Merrill
Discussion at the meeting led me to notice that class template
deduction wasn't working with template template parameters.  This
patch also improves a few diagnostic issues.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit 66d2621abd8ed6bb47d0fda747a572e00aece25e
Author: Jason Merrill 
Date:   Tue Nov 8 10:14:24 2016 -0800

Fix C++17 template placeholder for template template parm.

* parser.c (cp_parser_simple_type_specifier): Allow placeholder
for template template parameter.
(cp_parser_type_id_1): Improve diagnostic.
* decl.c (grokdeclarator): Handle class deduction diagnostics here.
* pt.c (splice_late_return_type): Not here.
(tsubst) [TEMPLATE_TYPE_PARM]: Substitute into placeholder template.
(do_class_deduction): Handle non-class templates.

diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index bd37faa..4b18d4e 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -9490,6 +9490,11 @@ grokdeclarator (const cp_declarator *declarator,
   if (initialized > 1)
 funcdef_flag = true;
 
+  location_t typespec_loc = smallest_type_quals_location (type_quals,
+ declspecs->locations);
+  if (typespec_loc == UNKNOWN_LOCATION)
+typespec_loc = declspecs->locations[ds_type_spec];
+
   /* Look inside a declarator for the name being declared
  and get it as a string, for an error message.  */
   for (id_declarator = declarator;
@@ -10011,6 +10016,16 @@ grokdeclarator (const cp_declarator *declarator,
   /* We might have ignored or rejected some of the qualifiers.  */
   type_quals = cp_type_quals (type);
 
+  if (cxx_dialect >= cxx1z && type && is_auto (type)
+  && innermost_code != cdk_function
+  && id_declarator && declarator != id_declarator)
+if (tree tmpl = CLASS_PLACEHOLDER_TEMPLATE (type))
+{
+  error_at (typespec_loc, "template placeholder type %qT must be followed "
+   "by a simple declarator-id", type);
+  inform (DECL_SOURCE_LOCATION (tmpl), "%qD declared here", tmpl);
+}
+
   staticp = 0;
   inlinep = decl_spec_seq_has_spec_p (declspecs, ds_inline);
   virtualp =  decl_spec_seq_has_spec_p (declspecs, ds_virtual);
@@ -10247,12 +10262,7 @@ grokdeclarator (const cp_declarator *declarator,
  {
if (SCALAR_TYPE_P (type) || VOID_TYPE_P (type))
  {
-   location_t loc;
-   loc = smallest_type_quals_location (type_quals,
-   declspecs->locations);
-   if (loc == UNKNOWN_LOCATION)
- loc = declspecs->locations[ds_type_spec];
-   warning_at (loc, OPT_Wignored_qualifiers, "type "
+   warning_at (typespec_loc, OPT_Wignored_qualifiers, "type "
"qualifiers ignored on function return type");
  }
/* We now know that the TYPE_QUALS don't apply to the
@@ -10301,11 +10311,12 @@ grokdeclarator (const cp_declarator *declarator,
funcdecl_p = inner_declarator && inner_declarator->kind == cdk_id;
 
/* Handle a late-specified return type.  */
+   tree late_return_type = declarator->u.function.late_return_type;
if (funcdecl_p)
  {
-   if (type_uses_auto (type))
+   if (tree auto_node = type_uses_auto (type))
  {
-   if (!declarator->u.function.late_return_type)
+   if (!late_return_type)
  {
if (current_class_type
&& LAMBDA_TYPE_P (current_class_type))
@@ -10333,8 +10344,32 @@ grokdeclarator (const cp_declarator *declarator,
   name, type);
return error_mark_node;
  }
+   if (tree tmpl = CLASS_PLACEHOLDER_TEMPLATE (auto_node))
+ {
+   if (!late_return_type)
+ {
+   if (dguide_name_p (unqualified_id))
+ error_at (typespec_loc, "deduction guide for "
+   "%qT must have trailing return type",
+   TREE_TYPE (tmpl));
+   else
+ error_at (typespec_loc, "deduced class type %qT "
+   "in function return type", type);
+   inform (DECL_SOURCE_LOCATION (tmpl),
+   "%qD declared here", tmpl);
+ }
+   else if (CLASS_TYPE_P (late_return_type)
+&& CLASSTYPE_TEMPLATE_INFO (late_return_type)
+&& (CLASSTYPE_TI_TEMPLATE (late_return_type)
+== tmpl))

RE: [PATCH,testsuite] MIPS: Upgrade to MIPS IV if using (HAS_MOVN) with MIPS III.

2016-11-09 Thread Moore, Catherine


> -Original Message-
> From: Toma Tabacu [mailto:toma.tab...@imgtec.com]
> Sent: Wednesday, November 9, 2016 12:19 PM
> Subject: RE: [PATCH,testsuite] MIPS: Upgrade to MIPS IV if using (HAS_MOVN) 
> with MIPS III.
> 
> No, I don't have write access.
> I would be grateful if you could commit the patch for me, if you think it
> would be OK with Matthew.
> To be clear, I am not in a rush to get this committed.

You're all set now.  I committed this:  revision 242021.

Catherine

> gcc/testsuite/ChangeLog:
> 
> 2016-11-09  Toma Tabacu  
> 
>   * gcc.target/mips/mips.exp (mips-dg-options): Upgrade to MIPS IV if
>   using (HAS_MOVN) with MIPS III.
> 



Re: [PATCH][1/2] GIMPLE Frontend, C FE parts (and GIMPLE parser)

2016-11-09 Thread Jason Merrill
On Mon, Nov 7, 2016 at 2:24 AM, Richard Biener  wrote:
> The issue with moving is that I failed to export the definition of
> c_parser in c-parser.h due to gengtype putting vec 
> handlers into gtype-c.h but not gtype-objc.h and thus objc bootstrap
> fails :/
>
> I believe (well, I hope) that code generation for the C parser
> should be mostly unaffected (inlining is still done as determined
> useful) and the performance of the GIMPLE parser shouldn't be
> too important.
>
> If anybody feels like digging into the gengtype issue, I gave up
> after trying for half a day to trick it to do what I want
> (like for example also putting it in gtype-objc.h).

I have a fix for that issue waiting for review:

https://gcc.gnu.org/ml/gcc-patches/2016-10/msg02265.html

Jason


[Patch, Fortran, committed] PR 46459: ICE (segfault): Invalid read in compare_actual_formal [error recovery]

2016-11-09 Thread Janus Weil
Hi all,

I have committed yet another obvious ice-on-invalid fix:

https://gcc.gnu.org/viewcvs?rev=242020&root=gcc&view=rev

Cheers,
Janus
Index: gcc/fortran/interface.c
===
--- gcc/fortran/interface.c (Revision 241993)
+++ gcc/fortran/interface.c (Arbeitskopie)
@@ -3190,6 +3190,7 @@ compare_actual_formal (gfc_actual_arglist **ap, gf
 shape array, if the dummy argument has the VOLATILE attribute.  */
 
   if (f->sym->attr.volatile_
+ && a->expr->expr_type == EXPR_VARIABLE
  && a->expr->symtree->n.sym->as
  && a->expr->symtree->n.sym->as->type == AS_ASSUMED_SHAPE
  && !(f->sym->as && f->sym->as->type == AS_ASSUMED_SHAPE))
@@ -3219,6 +3220,7 @@ compare_actual_formal (gfc_actual_arglist **ap, gf
 dummy argument has the VOLATILE attribute.  */
 
   if (f->sym->attr.volatile_
+ && a->expr->expr_type == EXPR_VARIABLE
  && a->expr->symtree->n.sym->attr.pointer
  && a->expr->symtree->n.sym->as
  && !(f->sym->as


[PATCH], PowerPC ISA 3.0, allow QImode/HImode to go into vector registers

2016-11-09 Thread Michael Meissner
The PowerPC ISA 3.0 (power9) has new instructions that make it feasible to
allow QImode and HImode to be allocated to vector registers.  The new
instructions are:

* load byte with zero extend
* load half word with zero extend
* store byte
* store half word
* extract byte from vector and zero extend
* extract half word from vector and zero extend
* sign extend byte to word/double word
* sign extend half word to word/double word

I have bootstrapped a previous version of the changes on a little endian Power8
system, and I'm now repeating the bootstrap on both a big endian Power8 and a
little endian Power8.  Assuming there are no regressions with the patches, can
I check these patches into the trunk?

I have built the spec 2006 CPU benchmark suite with these changes, and the
power8 (ISA 2.07) code generation does not change.

I have also built the spec 2006 CPU benchmark for power9.  The following
15 (out of 30) benchmarks had code changes:

* perlbench (char <-> floating point)
* gcc   (one extra ld/std)
* gamess(int <-> floating point)
* gromacs   (one fmr instead of li/mtvsrd)
* cactusADM (char/int <-> floating point)
* namd  (floating point -> int)
* gobmk (floating point -> int)
* dealII(int/long in vector regs. vs. gprs)
* povray(char/int <-> floating point)
* calculix  (int -> zero extend to long)
* hmmer (floating point -> int)
* h264ref   (zero extend short)
* tonto (floating point -> int)
* omnetpp   (floating point -> int)
* wrf   (floating point -> unsigned/int)

[gcc]
2016-11-09  Michael Meissner  

* config/rs6000/rs6000.c (rs6000_hard_regno_mode_ok): If ISA 3.0,
enable HImode and QImode to go in vector registers by default if
the -mvsx-small-integer option is enabled.
(rs6000_secondary_reload_simple_move): Likewise.
(rs6000_preferred_reload_class): Don't force integer constants to
be loaded into vector registers that we can easily make into
memory (or being created in the GPRs and moved over with direct
move).
* config/rs6000/vsx.md (UNSPEC_P9_MEMORY): Delete, no longer
used.
(vsx_extract_): Rework V4SImode, V8HImode, and V16QImode
vector extraction on ISA 3.0 when the scalar integer can be
allocated in vector registers.  Generate the VEC_SELECT directy,
and don't use UNSPEC's to avoid having the scalar type in a vector
register.  Make the expander target registers, and let the
combiner fold in results storing to memory, if the machine
supports stores.
(vsx_extract__di): Likewise.
(vsx_extract__p9): Likewise.
(vsx_extract__di_p9): Likewise.
(vsx_extract__store_p9): Likewise.
(vsx_extract_si): Likewise.
(vsx_extract__p8): Likewise.
(p9_lxsizx): Delete, no longer used.
(p9_stxsix): Likewise.
* config/rs6000/rs6000.md (INT_ISA3): New mode iterator for
integers in vector registers for ISA 3.0.
(QHI): Update comment.
(zero_extendqi2): Add support for ISA 3.0 scalar load or
vector extract instructions in sign/zero extend.
(zero_extendhi): Likewise.
(extendqi): Likewise.
(extendhi2): Likewise.
(HImode splitter for load/sign extend in vector register):
Likewise.
(float2): Eliminate old method of
optimizing floating point conversions to/from small data types and
rewrite it to support QImode/HImode being allowed in vector
registers on ISA 3.0.
(float2_internal): Likewise.
(floatuns2): Likewise.
(floatuns2_internal): Likewise.
(fix_trunc2): Likewise.
(fix_trunc2_internal): Likewise.
(fixuns_trunc2): Likewise.
(fixuns_trunc2_internal): Likewise.
VSPLITISW on ISA 2.07.
(movhi_internal): Combine movhi_internal and movqi_internal into
one mov_internal with an iterator.  Add support for QImode
and HImode being allowed in vector registers.  Make large number
of attributes and constraints easier to read.
(movqi_internal): Likewise.
(mov_internal): Likewise.
(movdi_internal64): Fix constraint to allow loading -16..15 with
VSPLITISW on ISA 2.07.
(integer XXSPLTIB splitter): Add support for QI, HI, and SImode as
well as DImode.

[gcc/testsuite]
2016-11-09  Michael Meissner  

* gcc.target/powerpc/vsx-qimode.c: New test for QImode, HImode
being allowed in vector registers.
* gcc.target/powerpc/vsx-qimode2.c: Likewise.
* gcc.target/powerpc/vsx-qimode3.c: Likewise.
* gcc.target/powerpc/vsx-himode.c: Likewise.
* gcc.target/powerpc/vsx-himode2.c: Likewise.
* gcc.target/powe

Re: [Patch, fortran] PR44265 - Link error with reference to parameter array in specification expression

2016-11-09 Thread Dominique d'Humières

> Le 9 nov. 2016 à 20:09, Paul Richard Thomas  a 
> écrit :
> 
> Dear Dominique,
> 
> I am deeply embarrassed. This is the consequence of an additional
> condition added at the last minute.

No reason to be embarrassed;-)

> The attached removes it and makes sure that the original bug is tested
> in char_result_14.f90. The ChangeLogs are the same.
> 
> OK for trunk?

IMO yes

> 
> Paul
> 

I have a last glitch (which can be deferred if needed):

  FUNCTION Get(i) RESULT(s)
CHARACTER(*), PARAMETER :: names(3) = [  &
'Apple  ',  &
'Orange ',  &
'Mango  ' ];  
INTEGER, INTENT(IN) :: i
CHARACTER(LEN_TRIM(names(i))) :: s
!
s = names(i)
print *, len(s)
  END FUNCTION Get

PROGRAM WheresThatbLinkingConstantGone
  IMPLICIT NONE
  interface
FUNCTION Get(i) RESULT(s)
  CHARACTER(*), PARAMETER :: names(3) = [  &
  'Apple  ',  &
  'Orange ',  &
  'Mango  ' ];
  INTEGER, INTENT(IN) :: i
  CHARACTER(LEN_TRIM(names(i))) :: s
  END FUNCTION Get
  end interface

  integer :: i
  i = len(Get(1))
  print *, i
END PROGRAM WheresThatbLinkingConstantGone

does not link.

Dominique



Re: [PATCH], PowerPC ISA 3.0, allow QImode/HImode to go into vector registers

2016-11-09 Thread Michael Meissner
Of course it would help, if I actually attached the patches:

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/rs6000.md
===
--- gcc/config/rs6000/rs6000.md 
(.../svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)
(revision 241924)
+++ gcc/config/rs6000/rs6000.md (.../gcc/config/rs6000) (working copy)
@@ -325,6 +325,9 @@ (define_mode_iterator INT [QI HI SI DI T
 ; Any supported integer mode that fits in one register.
 (define_mode_iterator INT1 [QI HI SI (DI "TARGET_POWERPC64")])
 
+; Integer modes supported in VSX registers with ISA 3.0 instructions
+(define_mode_iterator INT_ISA3 [QI HI SI DI])
+
 ; Everything we can extend QImode to.
 (define_mode_iterator EXTQI [SI (DI "TARGET_POWERPC64")])
 
@@ -334,7 +337,7 @@ (define_mode_iterator EXTHI [SI (DI "TAR
 ; Everything we can extend SImode to.
 (define_mode_iterator EXTSI [(DI "TARGET_POWERPC64")])
 
-; QImode or HImode for small atomic ops
+; QImode or HImode for small integer moves and small atomic ops
 (define_mode_iterator QHI [QI HI])
 
 ; QImode, HImode, SImode for fused ops only for GPR loads
@@ -735,13 +738,15 @@ (define_code_attr SMINMAX [(smin "SM
 ;; complex forms.  Basic data transfer is done later.
 
 (define_insn "zero_extendqi2"
-  [(set (match_operand:EXTQI 0 "gpc_reg_operand" "=r,r")
-   (zero_extend:EXTQI (match_operand:QI 1 "reg_or_mem_operand" "m,r")))]
+  [(set (match_operand:EXTQI 0 "gpc_reg_operand" "=r,r,?*wJwK,?*wK")
+   (zero_extend:EXTQI (match_operand:QI 1 "reg_or_mem_operand" 
"m,r,Z,*wK")))]
   ""
   "@
lbz%U1%X1 %0,%1
-   rlwinm %0,%1,0,0xff"
-  [(set_attr "type" "load,shift")])
+   rlwinm %0,%1,0,0xff
+   lxsibzx %x0,%y1
+   vextractub %0,%1,7"
+  [(set_attr "type" "load,shift,fpload,vecperm")])
 
 (define_insn_and_split "*zero_extendqi2_dot"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x,?y")
@@ -786,13 +791,15 @@ (define_insn_and_split "*zero_extendqi2"
-  [(set (match_operand:EXTHI 0 "gpc_reg_operand" "=r,r")
-   (zero_extend:EXTHI (match_operand:HI 1 "reg_or_mem_operand" "m,r")))]
+  [(set (match_operand:EXTHI 0 "gpc_reg_operand" "=r,r,?*wJwK,?*wK")
+   (zero_extend:EXTHI (match_operand:HI 1 "reg_or_mem_operand" 
"m,r,Z,wK")))]
   ""
   "@
lhz%U1%X1 %0,%1
-   rlwinm %0,%1,0,0x"
-  [(set_attr "type" "load,shift")])
+   rlwinm %0,%1,0,0x
+   lxsihzx %x0,%y1
+   vextractuh %0,%1,6"
+  [(set_attr "type" "load,shift,fpload,vecperm")])
 
 (define_insn_and_split "*zero_extendhi2_dot"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x,?y")
@@ -893,11 +900,13 @@ (define_insn_and_split "*zero_extendsi2"
-  [(set (match_operand:EXTQI 0 "gpc_reg_operand" "=r")
-   (sign_extend:EXTQI (match_operand:QI 1 "gpc_reg_operand" "r")))]
+  [(set (match_operand:EXTQI 0 "gpc_reg_operand" "=r,?*wK")
+   (sign_extend:EXTQI (match_operand:QI 1 "gpc_reg_operand" "r,?*wK")))]
   ""
-  "extsb %0,%1"
-  [(set_attr "type" "exts")])
+  "@
+   extsb %0,%1
+   vextsb2d %0,%1"
+  [(set_attr "type" "exts,vecperm")])
 
 (define_insn_and_split "*extendqi2_dot"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x,?y")
@@ -948,14 +957,30 @@ (define_expand "extendhi2"
   "")
 
 (define_insn "*extendhi2"
-  [(set (match_operand:EXTHI 0 "gpc_reg_operand" "=r,r")
-   (sign_extend:EXTHI (match_operand:HI 1 "reg_or_mem_operand" "m,r")))]
+  [(set (match_operand:EXTHI 0 "gpc_reg_operand" "=r,r,?*wK,?*wK")
+   (sign_extend:EXTHI (match_operand:HI 1 "reg_or_mem_operand" 
"m,r,Z,wK")))]
   "rs6000_gen_cell_microcode"
   "@
lha%U1%X1 %0,%1
-   extsh %0,%1"
-  [(set_attr "type" "load,exts")
-   (set_attr "sign_extend" "yes")])
+   extsh %0,%1
+   #
+   vextsh2d %0,%1"
+  [(set_attr "type" "load,exts,fpload,vecperm")
+   (set_attr "sign_extend" "yes")
+   (set_attr "length" "4,4,8,4")])
+
+(define_split
+  [(set (match_operand:EXTHI 0 "altivec_register_operand" "")
+   (sign_extend:EXTHI
+(match_operand:HI 1 "indexed_or_indirect_operand" "")))]
+  "TARGET_P9_VECTOR && reload_completed"
+  [(set (match_dup 2)
+   (match_dup 1))
+   (set (match_dup 0)
+   (sign_extend:EXTHI (match_dup 2)))]
+{
+  operands[2] = gen_rtx_REG (HImode, REGNO (operands[1]));
+})
 
 (define_insn "*extendhi2_noload"
   [(set (match_operand:EXTHI 0 "gpc_reg_operand" "=r")
@@ -5299,30 +5324,33 @@ (define_insn_and_split "*floatunssidf2_i
(set_attr "type" "fp")])
 
 ;; ISA 3.0 adds instructions lxsi[bh]zx to directly load QImode and HImode to
-;; vector registers.  At the moment, QI/HImode are not allowed in floating
-;; point or vector registers, so we use UNSPEC's to use the load byte and
-;; half-word instructions.
+;; vector registers.  These insns favor doing the sign/zero extension in
+;; the vector registers, rather then loading up a GPR, doing a sign/zero
+;; extension and then a direct move.
 
 (define_expand

Re: [PATCH] (v2) print-rtl-function.c: add (param) directive to dump

2016-11-09 Thread David Malcolm
On Wed, 2016-11-09 at 17:46 +0100, Bernd Schmidt wrote:
> On 11/09/2016 05:46 PM, David Malcolm wrote:
> 
> > OK for trunk if it passes bootstrap and regrtest?
> > 
> > gcc/ChangeLog:
> > * print-rtl-function.c (print_any_param_name): New function.
> > (print_param): New function.
> > (print_rtx_function): Call print_param for each argument.
> > * print-rtl.c (rtx_writer::finish_directive): New function.
> > * print-rtl.h (rtx_writer::finish_directive): New decl.
> > +
> > +  rtx decl_rtl = DECL_WRTL_CHECK (arg)->decl_with_rtl.rtl;
> 
> Isn't this DECL_RTL_IF_SET? If so, please use that macro. Otherwise 
> ok.

Yes, it is; thanks.  Using it required including varasm.h.

For reference, here's what I committed, as r242023 (having verified
 bootstrap and regrtest).Index: gcc/ChangeLog
===
--- gcc/ChangeLog	(revision 242022)
+++ gcc/ChangeLog	(revision 242023)
@@ -1,3 +1,12 @@
+2016-11-09  David Malcolm  
+
+	* print-rtl-function.c: Include varasm.h.
+	(print_any_param_name): New function.
+	(print_param): New function.
+	(print_rtx_function): Call print_param for each argument.
+	* print-rtl.c (rtx_writer::finish_directive): New function.
+	* print-rtl.h (rtx_writer::finish_directive): New decl.
+
 2016-11-09  Uros Bizjak  
 
 	PR target/78262
Index: gcc/print-rtl.c
===
--- gcc/print-rtl.c	(revision 242022)
+++ gcc/print-rtl.c	(revision 242023)
@@ -804,6 +804,15 @@
   m_sawclose = 1;
 }
 
+/* Emit a closing parenthesis and newline.  */
+
+void
+rtx_writer::finish_directive ()
+{
+  fprintf (m_outfile, ")\n");
+  m_sawclose = 0;
+}
+
 /* Print an rtx on the current line of FILE.  Initially indent IND
characters.  */
 
Index: gcc/print-rtl.h
===
--- gcc/print-rtl.h	(revision 242022)
+++ gcc/print-rtl.h	(revision 242023)
@@ -31,6 +31,8 @@
   void print_rtl (const_rtx rtx_first);
   int print_rtl_single_with_indent (const_rtx x, int ind);
 
+  void finish_directive ();
+
  private:
   void print_rtx_operand_code_0 (const_rtx in_rtx, int idx);
   void print_rtx_operand_code_e (const_rtx in_rtx, int idx);
Index: gcc/print-rtl-function.c
===
--- gcc/print-rtl-function.c	(revision 242022)
+++ gcc/print-rtl-function.c	(revision 242023)
@@ -33,6 +33,7 @@
 #include "langhooks.h"
 #include "memmodel.h"
 #include "emit-rtl.h"
+#include "varasm.h"
 
 /* Print an "(edge-from)" or "(edge-to)" directive describing E
to OUTFILE.  */
@@ -127,6 +128,37 @@
   return true;
 }
 
+/* Subroutine of print_param.  Write the name of ARG, if any, to OUTFILE.  */
+
+static void
+print_any_param_name (FILE *outfile, tree arg)
+{
+  if (DECL_NAME (arg))
+fprintf (outfile, " \"%s\"", IDENTIFIER_POINTER (DECL_NAME (arg)));
+}
+
+/* Print a "(param)" directive for ARG to OUTFILE.  */
+
+static void
+print_param (FILE *outfile, rtx_writer &w, tree arg)
+{
+  fprintf (outfile, "  (param");
+  print_any_param_name (outfile, arg);
+  fprintf (outfile, "\n");
+
+  /* Print the value of DECL_RTL (without lazy-evaluation).  */
+  fprintf (outfile, "(DECL_RTL ");
+  w.print_rtx (DECL_RTL_IF_SET (arg));
+  w.finish_directive ();
+
+  /* Print DECL_INCOMING_RTL.  */
+  fprintf (outfile, "(DECL_RTL_INCOMING ");
+  w.print_rtx (DECL_INCOMING_RTL (arg));
+  fprintf (outfile, ")");
+
+  w.finish_directive ();
+}
+
 /* Write FN to OUTFILE in a form suitable for parsing, with indentation
and comments to make the structure easy for a human to grok.  Track
the basic blocks of insns in the chain, wrapping those that are within
@@ -197,6 +229,10 @@
 
   fprintf (outfile, "(function \"%s\"\n", dname);
 
+  /* Params.  */
+  for (tree arg = DECL_ARGUMENTS (fdecl); arg; arg = DECL_CHAIN (arg))
+print_param (outfile, w, arg);
+
   /* The instruction chain.  */
   fprintf (outfile, "  (insn-chain\n");
   basic_block curr_bb = NULL;


[PATCH] simplify-rtx: Transform (xor (and (xor A B) C) B) with C const

2016-11-09 Thread Segher Boessenkool
match.pd transforms (A&C)|(B&~C) to ((A^B)&C)^B, which is fewer
operations if C is not const (and it is not on simple tests at least,
this transform is done very early already).

Various processors have "insert" instructions that can do this, but
combine cannot build those from the xor-and-xor, especially it has no
chance at all to do that if A or B or multiple instructions as well
(on PowerPC, the rl[ws]imi instructions that can do this with a rotate,
or a simple shift with appropriate C; other ISAs have similar insns).

This patch makes RTL simplify transform (xor (and (xor A B) C) B) back
to (ior (and A C) (and B ~C)) for constant C (and similar with A instead
of B for that last term).

Tested on powerpc64-linux {-m32,-m64}; no regressions.  Is this okay
for trunk?


Segher


2016-11-09  Segher Boessenkool  

* simplify-rtx.c (simplify_binary_operation_1): Simplify
(xor (and (xor A B) C) B) to (ior (and A C) (and B ~C)) and
(xor (and (xor A B) C) B) to (ior (and A ~C) (and B C)) if C
is a const_int.

---
 gcc/simplify-rtx.c | 31 +++
 1 file changed, 31 insertions(+)

diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index 5c3dea1..62d60f3 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -2886,6 +2886,37 @@ simplify_binary_operation_1 (enum rtx_code code, 
machine_mode mode,
}
}
 
+/* If we have (xor (and (xor A B) C) A) with C a constant we can instead
+   do (ior (and A ~C) (and B C)) which is a machine instruction on some
+   machines, and also has shorter instruction path length.  */
+  if (GET_CODE (op0) == AND
+ && GET_CODE (XEXP (op0, 0)) == XOR
+ && CONST_INT_P (XEXP (op0, 1))
+ && rtx_equal_p (XEXP (XEXP (op0, 0), 0), trueop1))
+   {
+ rtx a = trueop1;
+ rtx b = XEXP (XEXP (op0, 0), 1);
+ rtx c = XEXP (op0, 1);
+ rtx nc = simplify_gen_unary (NOT, mode, c, mode);
+ rtx a_nc = simplify_gen_binary (AND, mode, a, nc);
+ rtx bc = simplify_gen_binary (AND, mode, b, c);
+ return simplify_gen_binary (IOR, mode, a_nc, bc);
+   }
+/* Similarly, (xor (and (xor A B) C) B) as (ior (and A C) (and B ~C))  */
+  else if (GET_CODE (op0) == AND
+ && GET_CODE (XEXP (op0, 0)) == XOR
+ && CONST_INT_P (XEXP (op0, 1))
+ && rtx_equal_p (XEXP (XEXP (op0, 0), 1), trueop1))
+   {
+ rtx a = XEXP (XEXP (op0, 0), 0);
+ rtx b = trueop1;
+ rtx c = XEXP (op0, 1);
+ rtx nc = simplify_gen_unary (NOT, mode, c, mode);
+ rtx b_nc = simplify_gen_binary (AND, mode, b, nc);
+ rtx ac = simplify_gen_binary (AND, mode, a, c);
+ return simplify_gen_binary (IOR, mode, ac, b_nc);
+   }
+
   /* (xor (comparison foo bar) (const_int 1)) can become the reversed
 comparison if STORE_FLAG_VALUE is 1.  */
   if (STORE_FLAG_VALUE == 1
-- 
1.9.3



[PATCH] libstdc++: Allow using without lock free atomic int

2016-11-09 Thread Pauli
Compiling programs using std::future for old arm processors fails. The
problem is caused by preprocessor check for atomic lock free int.

Future can be changed to work correctly without lock free atomics with
minor changes to exception_ptr implementation.

Without lock free atomics there is question if deadlock can happen. But
atomic operations can't call outside code preventing any ABBA or
recursive mutex acquiring deadlocks.
Deadlock could happen if throwing an exception or access
is_lock_free() == false atomic from asynchronous signal handler.
Throwing from signal handler is undefined behavior. I don't know about
accessing atomics from asynchronous signal handler but that feels like
undefined behavior if is_lock_free returns false.

Bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64735

differences to current if atomic builtins available:
* Race detector annotations that are empty by default
* Check for __gthread_active_p
* Generate x86 code uses xadd instead of xsub
This makes code a bit worse. But problem is duplicated to any other user
of __exchange_and_add. The internal API implementation should be fixed
to generate better code for all cases. But that is a follow up patch.

 ** libstdc++ test results
x86_64 build from trunk@241870 resulted to exactly same results with and
without patch. That is two localetest cases failing:
FAIL: 22_locale/numpunct/members/char/3.cc execution test
FAIL: 22_locale/time_get/get_date/wchar_t/4.cc execution test
  of expected passes11941
  of unexpected failures2
  of expected failures  65
  of unsupported tests  244

armv5tel from gcc-6-branch@241866 in qemu-arm-static -cpu arm926 resulted
to following differences.
Before patch:
WARNING: program timed out.
FAIL: 22_locale/locale/cons/12658_thread-2.cc execution test
FAIL: 22_locale/numpunct/members/char/3.cc execution test
FAIL: 22_locale/time_get/get_date/wchar_t/4.cc execution test
FAIL: 27_io/fpos/14775.cc execution test
  of expected passes10628
  of unexpected failures4
  of expected failures  66
  of unsupported tests  390

After patch:
WARNING: program timed out.
FAIL: 22_locale/locale/cons/12658_thread-1.cc execution test
WARNING: program timed out.
FAIL: 22_locale/locale/cons/12658_thread-2.cc execution test
FAIL: 22_locale/numpunct/members/char/3.cc execution test
FAIL: 22_locale/time_get/get_date/wchar_t/4.cc execution test
FAIL: 27_io/fpos/14775.cc execution test
  of expected passes10831
  of unexpected failures5
  of expected failures  66
  of unsupported tests  281

My cpu isn't fast enough to run 12658 thread tests in qemu.
---

2016-11-09  Pauli Nieminen  

* include/std/future: Remove check for ATOMIC_INT_LOCK_FREE
* libsupc++/eh_ptr.cc (exception_ptr::_M_addref)
(exception_ptr::_M_release) (__gxx_dependent_exception_cleanup)
(rethrow_exception): Use atomicity.h reference counting helpers.
* libsupc++/eh_throw.cc (__gxx_exception_cleanup): Likewise.
* libsupc++/eh_tm.cc (free_any_cxa_exception): Likewise.
* libsupc++/exception: Remove check for ATOMIC_INT_LOCK_FREE
* libsupc++/exception_ptr.h: Likewise
* libsupc++/nested_exception.cc: Likewise
* libsupc++/nested_exception.h: Likewise
* src/c++11/future.cc: Likewise
* testsuite: Remove atomic builtins check from exception_ptr,
nested_exception, future, promise, packaged_task, async and
shared_future.


---
 libstdc++-v3/include/std/future|  4 +---
 libstdc++-v3/libsupc++/eh_ptr.cc   | 26 +-
 libstdc++-v3/libsupc++/eh_throw.cc |  9 
 libstdc++-v3/libsupc++/eh_tm.cc| 11 +
 libstdc++-v3/libsupc++/exception   |  3 +--
 libstdc++-v3/libsupc++/exception_ptr.h |  4 
 libstdc++-v3/libsupc++/nested_exception.cc |  2 --
 libstdc++-v3/libsupc++/nested_exception.h  |  4 
 libstdc++-v3/src/c++11/future.cc   |  4 ++--
 .../testsuite/18_support/exception_ptr/40296.cc|  1 -
 .../18_support/exception_ptr/60612-terminate.cc|  1 -
 .../18_support/exception_ptr/60612-unexpected.cc   |  1 -
 .../testsuite/18_support/exception_ptr/62258.cc|  1 -
 .../testsuite/18_support/exception_ptr/64241.cc|  1 -
 .../18_support/exception_ptr/current_exception.cc  |  1 -
 .../testsuite/18_support/exception_ptr/lifespan.cc |  1 -
 .../18_support/exception_ptr/make_exception_ptr.cc |  1 -
 .../exception_ptr/make_exception_ptr_2.cc  |  1 -
 .../testsuite/18_support/exception_ptr/move.cc |  1 -
 .../18_support/exception_ptr/requirements.cc   |  1 -
 .../18_support/exception_ptr/requirements_neg.cc   |  1 -
 .../18_support/exception_ptr/rethrow_exception.cc  |  1 -
 .../testsuite/18_support/nested_exception/51438.cc |  1 -
 .../testsuite/18_support/nested_exception/62154

[PATCH] libstdc++: Improve code generation for atomic reference counting

2016-11-09 Thread Pauli
Atomic reference counting generates pessimistic code in platforms where
builtin atomics could optimize code for following branch with subtract
instruction.

To allow better code generation with compile time constant addition can
be checked for negative value. Those cases can then be better optimized
with __atomic_fetch_sub builtin.

Extra branch checking presence of threads in application generates a
lot extra instructions when builtin atomics often generate very few
instructions.

Can I assume that __builtin_constant_p is available in all compilers
with _GLIBCXX_ATOMIC_BUILTINS is available?
bits/stl_list.h assumes __builtin_constant_p is available inside
_GLIBCXX_USE_CXX11_ABI.

x86_64 test results:
FAIL: 22_locale/numpunct/members/char/3.cc execution test
FAIL: 22_locale/time_get/get_date/wchar_t/4.cc execution test
  of expected passes11941
  of unexpected failures2
  of expected failures  65
  of unsupported tests  244
---

2016-11-09  Pauli Nieminen  

* include/ext/atomicity.h (__exchange_and_add): Optimize negative add
with builtin __atomic_fetch_sub.
(__atomic_add): Call __exchange_and_add to reuse same code
(__exchange_adn_add_dispatch) (__atomic_add_dispatch): Optimize
threading check if builtins generate inline instructions. 

---
 libstdc++-v3/include/ext/atomicity.h | 24 
 1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/libstdc++-v3/include/ext/atomicity.h 
b/libstdc++-v3/include/ext/atomicity.h
index 0fcb390..ecf5a9d 100644
--- a/libstdc++-v3/include/ext/atomicity.h
+++ b/libstdc++-v3/include/ext/atomicity.h
@@ -35,6 +35,14 @@
 #include 
 #include 
 
+#if defined(__GCC_ATOMIC_INT_LOCK_FREE)
+#define _GLIBCXX_INT_LOCK_FREE __GCC_ATOMIC_INT_LOCK_FREE
+#else
+#define _GLIBCXX_INT_LOCK_FREE 2
+#endif
+#define _GLIBCXX_ATOMIC_INLINE \
+  (_GLIBCXX_ATOMIC_BUILTINS && _GLIBCXX_INT_LOCK_FREE == 2)
+
 namespace __gnu_cxx _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
@@ -46,11 +54,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #ifdef _GLIBCXX_ATOMIC_BUILTINS
   static inline _Atomic_word 
   __exchange_and_add(volatile _Atomic_word* __mem, int __val)
-  { return __atomic_fetch_add(__mem, __val, __ATOMIC_ACQ_REL); }
+  {
+if (__builtin_constant_p(__val) && __val < 0)
+  return __atomic_fetch_sub(__mem, 0 - __val, __ATOMIC_ACQ_REL);
+else
+  return __atomic_fetch_add(__mem, __val, __ATOMIC_ACQ_REL);
+  }
 
   static inline void
   __atomic_add(volatile _Atomic_word* __mem, int __val)
-  { __atomic_fetch_add(__mem, __val, __ATOMIC_ACQ_REL); }
+  { __exchange_and_add(__mem, __val); }
 #else
   _Atomic_word
   __attribute__ ((__unused__))
@@ -78,7 +91,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   __exchange_and_add_dispatch(_Atomic_word* __mem, int __val)
   {
 #ifdef __GTHREADS
-if (__gthread_active_p())
+if (_GLIBCXX_ATOMIC_INLINE || __gthread_active_p())
   return __exchange_and_add(__mem, __val);
 else
   return __exchange_and_add_single(__mem, __val);
@@ -92,7 +105,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   __atomic_add_dispatch(_Atomic_word* __mem, int __val)
   {
 #ifdef __GTHREADS
-if (__gthread_active_p())
+if (_GLIBCXX_ATOMIC_INLINE || __gthread_active_p())
   __atomic_add(__mem, __val);
 else
   __atomic_add_single(__mem, __val);
@@ -104,6 +117,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 _GLIBCXX_END_NAMESPACE_VERSION
 } // namespace
 
+#undef _GLIBCXX_INT_LOCK_FREE
+#undef _GLIBCXX_ATOMIC_INLINE
+
 // Even if the CPU doesn't need a memory barrier, we need to ensure
 // that the compiler doesn't reorder memory accesses across the
 // barriers.
-- 
2.9.3



Re: [PATCH] simplify-rtx: Transform (xor (and (xor A B) C) B) with C const

2016-11-09 Thread Bernd Schmidt

On 11/09/2016 10:13 PM, Segher Boessenkool wrote:

* simplify-rtx.c (simplify_binary_operation_1): Simplify
(xor (and (xor A B) C) B) to (ior (and A C) (and B ~C)) and
(xor (and (xor A B) C) B) to (ior (and A ~C) (and B C)) if C
is a const_int.


I think one of the xors should have A as the second operand.



+/* If we have (xor (and (xor A B) C) A) with C a constant we can instead
+   do (ior (and A ~C) (and B C)) which is a machine instruction on some
+   machines, and also has shorter instruction path length.  */
+  if (GET_CODE (op0) == AND


Comments doesn't line up with the if/else on my monitor; could be email 
damage but please check.


Other than that, I think it does qualify as a simplification (or at 
least an improvement), so OK. Would be nice to check for it with a testcase.



Bernd


[ARM] PR 78253 do not resolve weak ref locally

2016-11-09 Thread Christophe Lyon
Hi,

PR 78253 shows that the handling of weak references has changed for
ARM with gcc-5.

When r220674 was committed, default_binds_local_p_2 gained a new
parameter (weak_dominate), which, when true, implies that a reference
to a weak symbol defined locally will be resolved locally, even though
it could be overridden by a strong definition in another object file.

With r220674, default_binds_local_p forces weak_dominate=true,
effectively changing the previous behavior.

The attached patch introduces default_binds_local_p_4 which is a copy
of default_binds_local_p_2, but using weak_dominate=false, and updates
the ARM target to call default_binds_local_p_4 instead of
default_binds_local_p_2.

I ran cross-tests on various arm* configurations with no regression,
and checked that the test attached to the original bugzilla now works
as expected.

I am not sure why weak_dominate defaults to true, and I couldn't
really understand why by reading the threads related to r220674 and
following updates to default_binds_local_p_* which all deal with other
corner cases and do not discuss the weak_dominate parameter.

Or should this patch be made more generic?

Thanks,

Christophe
2016-11-09  Christophe Lyon  

PR target/78253
* output.h (default_binds_local_p_4): New.
* varasm.c (default_binds_local_p_4): New, use
weak_dominate=false.
* config/arm/linux-elf.h (TARGET_BINDS_LOCAL_P): Define to
default_binds_local_p_4.


diff --git a/gcc/config/arm/linux-elf.h b/gcc/config/arm/linux-elf.h
index cc17b51..4f32ce8 100644
--- a/gcc/config/arm/linux-elf.h
+++ b/gcc/config/arm/linux-elf.h
@@ -110,7 +110,7 @@
strong definitions in dependent shared libraries, will resolve
to COPY relocated symbol in the executable.  See PR65780.  */
 #undef TARGET_BINDS_LOCAL_P
-#define TARGET_BINDS_LOCAL_P default_binds_local_p_2
+#define TARGET_BINDS_LOCAL_P default_binds_local_p_4
 
 /* Define this to be nonzero if static stack checking is supported.  */
 #define STACK_CHECK_STATIC_BUILTIN 1
diff --git a/gcc/output.h b/gcc/output.h
index 0924499..11b5ce5 100644
--- a/gcc/output.h
+++ b/gcc/output.h
@@ -585,6 +585,7 @@ extern bool default_binds_local_p (const_tree);
 extern bool default_binds_local_p_1 (const_tree, int);
 extern bool default_binds_local_p_2 (const_tree);
 extern bool default_binds_local_p_3 (const_tree, bool, bool, bool, bool);
+extern bool default_binds_local_p_4 (const_tree);
 extern void default_globalize_label (FILE *, const char *);
 extern void default_globalize_decl_name (FILE *, tree);
 extern void default_emit_unwind_label (FILE *, tree, int, int);
diff --git a/gcc/varasm.c b/gcc/varasm.c
index 6a7ffc2..7a3cf99 100644
--- a/gcc/varasm.c
+++ b/gcc/varasm.c
@@ -6952,6 +6952,16 @@ default_binds_local_p_2 (const_tree exp)
  !flag_pic);
 }
 
+/* Similar to default_binds_local_p_2, but local weak definition does
+   not imply local resolution (weak_dominate is false).  */
+
+bool
+default_binds_local_p_4 (const_tree exp)
+{
+  return default_binds_local_p_3 (exp, flag_shlib != 0, false, true,
+ !flag_pic);
+}
+
 bool
 default_binds_local_p_1 (const_tree exp, int shlib)
 {


Review debug message generation

2016-11-09 Thread François Dumont

Hi

Here is a proposal to review how we generate the debug output in 
case of assertion failure. It removes usage of format_word which, as a 
side effect will fix PR 77459. Should I reference this PR in the ChangeLog ?


I introduced a print_literal function to avoid using strlen on 
them. I know that gcc optimizes calls to strlen on literals but in our 
case we were not directly calling it on the literals.


Tested under Linux x86_64, ok to commit ?

* src/c++11/debug.cc (format_word): Delete.
(print_literal): New. Replace call to print_word for literals.

François


On 08/11/2016 22:35, fdumont at gcc dot gnu.org wrote:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77459

--- Comment #8 from François Dumont  ---
Ok, at least it confirms what I thought about builtins. So the problem is
rather a buggy target.

Even if so I'll try to find an alternative approach to avoid snprintf usage.



diff --git a/libstdc++-v3/src/c++11/debug.cc b/libstdc++-v3/src/c++11/debug.cc
index d79e43b..2b9c00b 100644
--- a/libstdc++-v3/src/c++11/debug.cc
+++ b/libstdc++-v3/src/c++11/debug.cc
@@ -546,11 +546,6 @@ namespace
   using _Error_formatter = __gnu_debug::_Error_formatter;
   using _Parameter = __gnu_debug::_Error_formatter::_Parameter;
 
-  template
-int
-format_word(char* buf, int n, const char* fmt, _Tp s)
-{ return std::min(__builtin_snprintf(buf, n, fmt, s), n - 1); }
-
   void
   get_max_length(std::size_t& max_length)
   {
@@ -577,6 +572,11 @@ namespace
 bool	_M_wordwrap;
   };
 
+  template
+void
+print_literal(PrintContext& ctx, const char(&word)[Length])
+{ print_word(ctx, word, Length - 1); }
+
   void
   print_word(PrintContext& ctx, const char* word,
 	 std::ptrdiff_t count = -1)
@@ -627,18 +627,19 @@ namespace
   }
 else
   {
-	print_word(ctx, "\n", 1);
+	print_literal(ctx, "\n");
 	print_word(ctx, word, count);
   }
   }
 
+  template
 void
 print_type(PrintContext& ctx,
 	   const type_info* info,
-	 const char* unknown_name)
+	   const char(&unknown_name)[Length])
 {
   if (!info)
-  print_word(ctx, unknown_name);
+	print_literal(ctx, unknown_name);
   else
 	{
 	  int status;
@@ -784,20 +785,18 @@ namespace
   {
 if (type._M_name)
   {
-	const int bufsize = 64;
-	char buf[bufsize];
-	int written
-	  = format_word(buf, bufsize, "\"%s\"", type._M_name);
-	print_word(ctx, buf, written);
+	print_literal(ctx, "\"");
+	print_word(ctx, type._M_name);
+	print_literal(ctx, "\"");
   }
 
-print_word(ctx, " {\n");
+print_literal(ctx, " {\n");
 
 if (type._M_type)
   {
-	print_word(ctx, "  type = ");
+	print_literal(ctx, "  type = ");
 	print_type(ctx, type._M_type, "");
-	print_word(ctx, ";\n");
+	print_literal(ctx, ";\n");
   }
   }
 
@@ -809,9 +808,9 @@ namespace
 
 if (inst._M_name)
   {
-	int written
-	  = format_word(buf, bufsize, "\"%s\" ", inst._M_name);
-	print_word(ctx, buf, written);
+	print_literal(ctx, "\"");
+	print_word(ctx, inst._M_name);
+	print_literal(ctx, "\" ");
   }
 
 int written
@@ -820,7 +819,7 @@ namespace
 
 if (inst._M_type)
   {
-	print_word(ctx, "  type = ");
+	print_literal(ctx, "  type = ");
 	print_type(ctx, inst._M_type, "");
   }
   }
@@ -838,36 +837,36 @@ namespace
 	{
 	  const auto& ite = variant._M_iterator;
 
-	  print_word(ctx, "iterator ");
+	  print_literal(ctx, "iterator ");
 	  print_description(ctx, ite);
 
 	  if (ite._M_type)
 	{
 	  if (ite._M_constness != _Error_formatter::__unknown_constness)
 		{
-		  print_word(ctx, " (");
+		  print_literal(ctx, " (");
 		  print_field(ctx, param, "constness");
-		  print_word(ctx, " iterator)");
+		  print_literal(ctx, " iterator)");
 		}
 
-	  print_word(ctx, ";\n");
+	  print_literal(ctx, ";\n");
 	}
 
 	  if (ite._M_state != _Error_formatter::__unknown_state)
 	{
-	  print_word(ctx, "  state = ");
+	  print_literal(ctx, "  state = ");
 	  print_field(ctx, param, "state");
-	  print_word(ctx, ";\n");
+	  print_literal(ctx, ";\n");
 	}
 
 	  if (ite._M_sequence)
 	{
-	  print_word(ctx, "  references sequence ");
+	  print_literal(ctx, "  references sequence ");
 	  if (ite._M_seq_type)
 		{
-		  print_word(ctx, "with type '");
+		  print_literal(ctx, "with type '");
 		  print_field(ctx, param, "seq_type");
-		  print_word(ctx, "' ");
+		  print_literal(ctx, "' ");
 		}
 
 	  int written
@@ -875,34 +874,34 @@ namespace
 	  print_word(ctx, buf, written);
 	}
 
-	  print_word(ctx, "}\n", 2);
+	  print_literal(ctx, "}\n");
 	}
 	break;
 
   case _Parameter::__sequence:
-	print_word(ctx, "sequence ");
+	print_literal(ctx, "sequence ");
 	print_description(ctx, variant._M_sequence);
 
 	if (variant._M_sequence._M_type)
-	  print_word(ctx, ";\n", 2);
+	  print_literal(ctx, ";\n");
 
-	print_word(ctx, "}\n", 2);
+	print_literal(ctx, "}\n");
 	break;
 
   case _Parameter::__instance:
-	prin

  1   2   >