[v3 PATCH] Implement LWG 2758.

2016-08-10 Thread Ville Voutilainen
Tested on Linux-x64.

2016-08-10  Ville Voutilainen  

Implement LWG 2758.
* include/bits/basic_string.h
(append(__sv_type, size_type, size_type)): Turn into a template,
change parameter type, constrain, add a conversion to __sv_type
from the dependent parameter type.
(assign(__sv_type, size_type, size_type)): Likewise.
(insert(size_type, __sv_type, size_type, size_type)): Likewise.
(replace(size_type, size_type, __sv_type, size_type, size_type)):
Likewise.
(compare(size_type, size_type,__sv_type, size_type, size_type)):
Likewise.
* testsuite/21_strings/basic_string/lwg2758.cc: New.
diff --git a/libstdc++-v3/include/bits/basic_string.h 
b/libstdc++-v3/include/bits/basic_string.h
index 59f1c64..89e2100 100644
--- a/libstdc++-v3/include/bits/basic_string.h
+++ b/libstdc++-v3/include/bits/basic_string.h
@@ -1227,9 +1227,13 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
*  @param __n   The number of characters to append from the string_view.
*  @return  Reference to this string.
*/
-  basic_string& append(__sv_type __sv,
+  template ,
+   bool> = true>
+  basic_string& append(const _Tp& __svt,
   size_type __pos, size_type __n = npos)
   {
+   __sv_type __sv = __svt;
return _M_append(__sv.data()
 + __sv._M_check(__pos, "basic_string::append"),
 __sv._M_limit(__pos, __n));
@@ -1392,10 +1396,14 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
*  @param __n  The number of characters to assign.
*  @return  Reference to this string.
*/
+  template ,
+   bool> = true>
   basic_string&
-  assign(__sv_type __sv,
+  assign(const _Tp& __svt,
 size_type __pos, size_type __n = npos)
   {
+   __sv_type __sv = __svt;
return _M_replace(size_type(0), this->size(), __sv.data()
  + __sv._M_check(__pos, "basic_string::assign"),
  __sv._M_limit(__pos, __n));
@@ -1652,9 +1660,13 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
*  @param __nThe number of characters to insert.
*  @return  Reference to this string.
   */
-  basic_string& insert(size_type __pos1, __sv_type __sv,
+  template ,
+   bool> = true>
+  basic_string& insert(size_type __pos1, const _Tp& __svt,
   size_type __pos2, size_type __n = npos)
   {
+   __sv_type __sv = __svt;
return this->replace(__pos1, size_type(0), __sv.data()
 + __sv._M_check(__pos2, "basic_string::insert"),
 __sv._M_limit(__pos2, __n));
@@ -2071,10 +2083,14 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
*  @param __n2The number of characters to insert.
*  @return  Reference to this string.
   */
+  template ,
+   bool> = true>
   basic_string& replace(size_type __pos1, size_type __n1,
-   __sv_type __sv,
+   const _Tp& __svt,
size_type __pos2, size_type __n2 = npos)
   {
+   __sv_type __sv = __svt;
return this->replace(__pos1, __n1, __sv.data()
 + __sv._M_check(__pos2, "basic_string::replace"),
 __sv._M_limit(__pos2, __n2));
@@ -2720,10 +2736,14 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
*  @param __n2  The number of characters to compare.
*  @return  Integer < 0, 0, or > 0.
*/
+  template ,
+   bool> = true>
   int compare(size_type __pos1, size_type __n1,
- __sv_type __sv,
+ const _Tp& __svt,
  size_type __pos2, size_type __n2 = npos) const
   {
+   __sv_type __sv = __svt;
return __sv_type(*this)
  .substr(__pos1, __n1).compare(__sv.substr(__pos2, __n2));
   }
diff --git a/libstdc++-v3/testsuite/21_strings/basic_string/lwg2758.cc 
b/libstdc++-v3/testsuite/21_strings/basic_string/lwg2758.cc
new file mode 100644
index 000..1d29248
--- /dev/null
+++ b/libstdc++-v3/testsuite/21_strings/basic_string/lwg2758.cc
@@ -0,0 +1,46 @@
+// { dg-options "-std=gnu++17" }
+// { dg-do compile }
+
+// Copyright (C) 2016 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License a

Re: [PATCH 0/7, GCC, V8M] ARMv8-M Security Extensions

2016-08-10 Thread Andre Vieira (lists)
On 09/08/16 17:47, Sandra Loosemore wrote:
> On 08/09/2016 06:01 AM, Andre Vieira (lists) wrote:
>> [snip]
>>
>> The documentation is in the ARMV8-M Security Extensions in: ARM®v8-M
>> Security Extensions: Requirements on Development Tools document I linked
>> in the email above and subsequent emails
>> (http://infocenter.arm.com/help/topic/com.arm.doc.ecm0359818/index.html).
>>
>> Also per patch I refer to the relevant sections. So for instance in
>> PATCH 3/7 refers to Section 5.4, which describes Entry functions and
>> mentions the cmse_nonsecure_entry attribute. Whereas PATCH 7/7 refers to
>> Section 5.4.3 of the same document which describes the
>> cmse_nonsecure_caller intrinsic which that patch implements.
>>
>> Is there a specific intrinsic/attribute you are missing?
> 
> You need to at least add entries to the relevant sections in extend.texi
> for the new target-specific intrinsic and attributes.  The documentation
> there doesn't need to be terribly detailed (one sentence and a link to
> the external document is probably all you need), but it's important that
> these things be listed in GCC's supported extensions so that users know
> they can use them and so that people who see them in code written by
> other people can find out what they mean.
> 
> -Sandra
> 


I see, I did add a new entry to extend.texi for ARMv8-M Security
Extensions. I will also mention all intrinsics and attributes there.

Thank you.

Andre


Re: [PATCH] Fix early debug regression with DW_AT_string_length (PR debug/71906)

2016-08-10 Thread Jakub Jelinek
On Tue, Aug 09, 2016 at 07:10:34PM +0200, Richard Biener wrote:
> >In some cases like location of file scope vars, or this
> >DW_AT_string_length,
> >you really need to adjust late the debug info created early, there is
> >no
> >workaround for that.
> 
> I suppose the workaround is to fix/extend DWARF... (DW_AT_GNU_string_length 
> that allows us to refer to another DIE?)

Introducing another attribute that does the same thing as existing attribute
would be way too ugly.  In theory the reference class could be added to
DW_AT_string_length, I can ask on DWARF workgroup (but it might be too late
for DWARF 5), but that still doesn't solve the issue of the indirect params.

How do you want to handle the debug info without ammending the early-generated
DWARF though?  Just by using it as abstract origins of everything and
ammending in copies?  That would be a total mess for all the consumers...
Parsing DWARF and rewriting it isn't all that hard.

Jakub


Re: [RFC] ipa bitwise constant propagation

2016-08-10 Thread Prathamesh Kulkarni
On 9 August 2016 at 23:43, Martin Jambor  wrote:
> Hi,
>
> On Tue, Aug 09, 2016 at 05:17:31PM +0530, Prathamesh Kulkarni wrote:
>> On 9 August 2016 at 16:39, Martin Jambor  wrote:
>>
>> ...
>>
>> >> Instead of storing arg's precision and sign, we should store
>> >> parameter's precision and sign in ipa_compute_jump_functions_for_edge ().
>> >> Diff with respect to previous patch:
>> >>
>> >> @@ -1688,9 +1690,9 @@ ipa_compute_jump_functions_for_edge (struct
>> >> ipa_func_body_info *fbi,
>> >>&& (TREE_CODE (arg) == SSA_NAME || TREE_CODE (arg) == INTEGER_CST))
>> >>   {
>> >>jfunc->bits.known = true;
>> >> -  jfunc->bits.sgn = TYPE_SIGN (TREE_TYPE (arg));
>> >> -  jfunc->bits.precision = TYPE_PRECISION (TREE_TYPE (arg));
>> >> -
>> >> +  jfunc->bits.sgn = TYPE_SIGN (param_type);
>> >> +  jfunc->bits.precision = TYPE_PRECISION (param_type);
>> >> +
>> >
>> > If you want to use the precision of the formal parameter then you do
>> > not need to store it to jump functions.  Parameter DECLs along with
>> > their types are readily accessible in IPA (even with LTO).  It would
>> > also be much clearer what is going on, IMHO.
>> Could you please point out how to access parameter decl in wpa ?
>> The only reason I ended up putting this in jump function was because
>> I couldn't figure out how to access param decl during WPA.
>> I see there's ipa_get_param() in ipa-prop.h however it's gated on
>> gcc_checking_assert (!flag_wpa), so I suppose I can't use this
>> during WPA ?
>>
>> Alternatively I think I could access cs->callee->decl and get to the param 
>> decl
>> by walking DECL_ARGUMENTS ?
>
> Actually, we no longer have DECL_ARGUMENTS during LTO WPA.  But in
> most cases, you can still get at the type with something like the
> following (only very lightly tested) patch, if Honza does not think it
> is too crazy.
>
> Note that= for old K&R C sources we do not have TYPE_ARG_TYPES and so
> ipa_get_type can return NULL(!) ...however I wonder whether for such
> programs the type assumptions made in callers when constructing jump
> functions can be trusted either.
>
> I have to run, we will continue the discussion later.
Thanks for the patch.
In this version, I updated the patch to use ipa_get_type, remove
precision and sgn
from ipcp_bits_lattice and ipa_bits, and renamed member variables to
add m_ prefix.
Does it look OK ?
I am looking for test-case that affects precision and hopefully add
that along with other
test-cases in follow-up patch.
Bootstrap+test in progress on x86_64-unknown-linux-gnu.

Thanks,
Prathamesh
>
> Martin
>
>
> 2016-08-09  Martin Jambor  
>
> * ipa-prop.h (ipa_param_descriptor): Renamed decl to decl_or_type.
> Update comment.
> (ipa_get_param): Updated comment, added assert that we have a
> PARM_DECL.
> (ipa_get_type): New function.
> * ipa-cp.c (ipcp_propagate_stage): Fill in argument types in LTO mode.
> * ipa-prop.c (ipa_get_param_decl_index_1): Use decl_or_type
> instead of decl;
> (ipa_populate_param_decls): Likewise.
> (ipa_dump_param): Likewise.
>
>
> diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
> index 5b6cb9a..3465da5 100644
> --- a/gcc/ipa-cp.c
> +++ b/gcc/ipa-cp.c
> @@ -1952,11 +1952,21 @@ propagate_constants_accross_call (struct cgraph_edge 
> *cs)
>else
>  i = 0;
>
> +  /* !!! The following dump is of course only a demonstration that it works: 
> */
> +  debug_generic_expr (callee->decl);
> +  fprintf (stderr, "\n");
> +
>for (; (i < args_count) && (i < parms_count); i++)
>  {
>struct ipa_jump_func *jump_func = ipa_get_ith_jump_func (args, i);
>struct ipcp_param_lattices *dest_plats;
>
> +  /* !!! The following dump is of course only a demonstration that it
> + works: */
> +  fprintf (stderr, "  The type of parameter %i is: ", i);
> +  debug_generic_expr (ipa_get_type (callee_info, i));
> +  fprintf (stderr, "\n");
> +
>dest_plats = ipa_get_parm_lattices (callee_info, i);
>if (availability == AVAIL_INTERPOSABLE)
> ret |= set_all_contains_variable (dest_plats);
> @@ -2936,6 +2946,19 @@ ipcp_propagate_stage (struct ipa_topo_info *topo)
>{
>  struct ipa_node_params *info = IPA_NODE_REF (node);
>
> +/* In LTO we do not have PARM_DECLs but we would still like to be able to
> +   look at types of parameters.  */
> +if (in_lto_p)
> +  {
> +   tree t = TYPE_ARG_TYPES (TREE_TYPE (node->decl));
> +   for (int k = 0; k < ipa_get_param_count (info); k++)
> + {
> +   gcc_assert (t != void_list_node);
> +   info->descriptors[k].decl_or_type = TREE_VALUE (t);
> +   t = t ? TREE_CHAIN (t) : NULL;
> + }
> +  }
> +
>  determine_versionability (node, info);
>  if (node->has_gimple_body_p ())
>{
> diff --git a/gcc/ipa-prop.c b/gcc/ipa-prop.c
> index 132b622..1eaccdf 100644
> --- a/gcc/ipa-prop.c
> +++ b/gcc/ipa-prop.c
> @@ -103,9 +103,10 @@ ipa_get_

Re: [PATCH] cpp/c: Add -Wexpansion-to-defined

2016-08-10 Thread Paolo Bonzini


On 10/08/2016 02:48, Manuel López-Ibáñez wrote:
> TL;DR If the warning is enabled by -Wpedantic, it should be an error
> with -Werror=pedantic and it should use cpp_pedwarning. Whether it
> should be enabled by -Wpedantic is more difficult to say.
> 
> -pedantic is equivalent to -Wpedantic. If -Wx is enabled by -Wy, then
> -Werror=y implies -Werror=x. Every warning enabled by -Wpedantic works
> in the same way and I don't see why this one should be different.
> 
> Moreover, I think that this was a latent bug in libcpp: the code
> should have used cpp_pedwarning instead of cpp_warning. This is
> https://gcc.gnu.org/PR66505
> Quoting https://gcc.gnu.org/wiki/DiagnosticsGuidelines#Types_of_diagnostics
> : "The same rules apply to libcpp, which uses cpp_pedwarning (instead
> of pedwarn), CPP_PEDANTIC (pfile) (instead of pedantic) and
> CPP_W_PEDANTIC (instead of OPT_Wpedantic). In particular, you may use
> cpp_pedwarning without CPP_W_PEDANTIC, but you may not use
> CPP_W_PEDANTIC without cpp_pedwarning and you may not use CPP_PEDANTIC
> (pfile) without CPP_W_PEDANTIC."

But pedwarns are for errors mandated by the standard; this one is not.
Let's just move it from -Wpedantic to -Wextra.

Paolo


AW: [PATCH] Fix unaligned access when predictive commoning (PR 71083)

2016-08-10 Thread Bernd Edlinger
On 08/10/16, Bernd Edlinger wrote:
>On 08/09/16 22:48, Eric Botcazou wrote:
>>> I think from Eric's comment in get_inner_ref it can be possible
>>> in Ada that the outer object is accidentally byte-aligned
>>> but the inner reference is not.  In that case I think it is
>>> better to generate a BIT_FIELD_REF instead of a COMPONENT_REF.
>>

I actually meant to say get_bit_range.

I tried to translate the c-test case to an equivalent ada test case with
my not-so-fluent Ada speak...

So here is a test case that *actually* hits the if ((boff % BITS_PER_UNIT) != 0)
code path.

Before my patch there was an unaligned SImode access, and with
the patch we have 3 QImode accesses here.

cat pr71083_pkg.ads 
package Pr71083_pkg is
  type Nibble is mod 2**4;
  type Int24  is mod 2**24;
  type StructA is record
a : Nibble;
b : Int24;
  end record;
  pragma Pack(StructA);
  type StructB is record
a : Nibble;
b : StructA;
  end record;
  pragma Pack(StructB);
  type ArrayOfStructB is array(0..100) of StructB;
  procedure Foo (X : in out ArrayOfStructB);
end Pr71083_pkg;

cat pr71083_pkg.adb
-- { dg-do compile }
-- { dg-options "-O3" }
package body Pr71083_pkg is
  procedure Foo (X : in out ArrayOfStructB) is
  begin
for K in 0..99 loop
  X (K+1).b.b := X (K).b.b;
end loop;
  end Foo;
end Pr71083_pkg;

cat pr71083.adb 
-- { dg-do run }
-- { dg-options "-O3" }
with Pr71083_pkg;
use Pr71083_pkg;
procedure Pr71083 is
  Test : ArrayOfStructB;
begin
  Test (0).b.b := ;
  Foo (Test);
  if Test (100).b.b /=  then
raise Program_Error;
  end if;
end;


I was not sure which name to choose,
so I used the same convention as in the c-torture.
How do you like that?

I would like to add that to the gnat.dg because
it seems that there is zero testing for the predictive
commoning in the gnat.dg test suite.


Bernd.

>>> The patch says get_bit_range instead...  The comment therein means that
>>> bitfields can be nested in Ada: you can have a bitfield in a record which is
>>> itself a bitfield in another record.
>>>
 Eric do you agree?  Are there any Ada test cases where the
 pcom optimization jumps in?
>>>
>>> According to the comment, data-ref analysis punts on bit offsets so I'm not
>>> sure how boff can be not byte-aligned.
>>>
>
> I mean in dr_analyze_innermost, we have:
>
>   base = get_inner_reference (ref, &pbitsize, &pbitpos, &poffset, &pmode,
>   &punsignedp, &preversep, &pvolatilep);
>   gcc_assert (base != NULL_TREE);
>
>   if (pbitpos % BITS_PER_UNIT != 0)
> {
>   if (dump_file && (dump_flags & TDF_DETAILS))
> fprintf (dump_file, "failed: bit offset alignment.\n");
>   return false;
> }
>
>   if (preversep)
> {
>   if (dump_file && (dump_flags & TDF_DETAILS))
> fprintf (dump_file, "failed: reverse storage order.\n");
>   return false;
> }
>
>
> and that means that get_inner_reference on the outermost ref
> returns a byte-aligned bit-offset, and from there I would
> think it has the knowledge, when to punt on reversep and
> when the offset is not byte-aligned.
>
> But in get_bit_range we have a bit-field with arbitrary
> bit-offset, but surprisingly the
> get_inner_reference (TREE_OPERAND (exp, 0)) did not return
> byte-aligned bit-offset.  I doubt that data-ref ananlysis
> ever cares about the byte-alignment of intermediate
> refs.
>
> We know that the difference between the innermost ref
> and the outer ref is byte-aligned, but we do not know
> that the same is true for offset between the COMPONENT_REF
> and the outer ref.
>
> I mean boff is essentially the difference between
> bitpos of get_inner_reference(exp) and
> bitpos of get_inner_reference(TREE_OPERAND (exp, 0))
>
> This would be exposed by my patch, because previously
> we always generated BIT_FIELD_REFS, with bit-offset 0,
> but the MEM_REF at the byte-offset there is in all likelihood
> pretty much unaligned, the MEM_REF at the COMPONENT_REF
> is likely more aligned and allows better code for ARM processors,
> but only if the MEM_REF is at a byte-aligned offset at all,
> otherwise the whole transformation would be wrong.
>
>
>
> Thanks
> Bernd.


Re: [PR72835] Incorrect arithmetic optimization involving bitfield arguments

2016-08-10 Thread Jakub Jelinek
On Wed, Aug 10, 2016 at 08:51:32AM +1000, kugan wrote:
> I see it now. The problem is we are just looking at (-1) being in the ops
> list for passing changed to rewrite_expr_tree in the case of multiplication
> by negate.  If we have combined (-1), as in the testcase, we will not have
> the (-1) and will pass changed=false to rewrite_expr_tree.
> 
> We should set changed based on what happens in try_special_add_to_ops.
> Attached patch does this. Bootstrap and regression testing are ongoing. Is
> this OK for trunk if there is no regression.

I think the bug is elsewhere.  In particular in
undistribute_ops_list/zero_one_operation/decrement_power.
All those look problematic in this regard, they change RHS of statements
to something that holds a different value, while keeping the LHS.
So, generally you should instead just add a new stmt next to the old one,
and adjust data structures (replace the old SSA_NAME in some ->op with
the new one).  decrement_power might be a problem here, dunno if all the
builtins are const in all cases that DSE would kill the old one,
Richard, any preferences for that?  reset flow sensitive info + reset debug
stmt uses, or something different?  Though, replacing the LHS with a new
anonymous SSA_NAME might be needed too, in case it is before SSA_NAME of a
user var that doesn't yet have any debug stmts.

Jakub


[PATCH] objc: update documetation and add test-case of constructor/destructor attr.

2016-08-10 Thread Martin Liška
Hi.

Following patch clarifies usage of ctor and dtor attributes for Objective C.
Patch survives (on x86_64-linux-gnu):

make -k check-objc RUNTESTFLAGS="execute.exp"

Ready for trunk?
Thanks,
Martin
>From 0fcb15e50905a95d8e824aed45d961b5ddc46479 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 10 Aug 2016 11:01:49 +0200
Subject: [PATCH] objc: update documetation and add test-case of
 constructor/destructor attr.

gcc/ChangeLog:

2016-08-10  Martin Liska  

	* doc/extend.texi: Remove limitation of Objective C for
	__attribute__((constructor)) and __attribute__((destructor)).

gcc/testsuite/ChangeLog:

2016-08-10  Martin Liska  

	* objc/execute/construct1.m: New test.
---
 gcc/doc/extend.texi |  2 --
 gcc/testsuite/objc/execute/construct1.m | 11 +++
 2 files changed, 11 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/objc/execute/construct1.m

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 5697f3a..cce17aa 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -2485,8 +2485,6 @@ same priority.  The priorities for constructor and destructor
 functions are the same as those specified for namespace-scope C++
 objects (@pxref{C++ Attributes}).
 
-These attributes are not currently implemented for Objective-C@.
-
 @item deprecated
 @itemx deprecated (@var{msg})
 @cindex @code{deprecated} function attribute
diff --git a/gcc/testsuite/objc/execute/construct1.m b/gcc/testsuite/objc/execute/construct1.m
new file mode 100644
index 000..3de036d
--- /dev/null
+++ b/gcc/testsuite/objc/execute/construct1.m
@@ -0,0 +1,11 @@
+int i;
+
+void hello (void) __attribute__ ((constructor));
+void hello (void) { i = 1; }
+
+int main (void) {
+  if (i != 1)
+return 1;
+
+  return 0;
+}
-- 
2.9.2



Re: [PR72835] Incorrect arithmetic optimization involving bitfield arguments

2016-08-10 Thread kugan



On 10/08/16 18:57, Jakub Jelinek wrote:

On Wed, Aug 10, 2016 at 08:51:32AM +1000, kugan wrote:

I see it now. The problem is we are just looking at (-1) being in the ops
list for passing changed to rewrite_expr_tree in the case of multiplication
by negate.  If we have combined (-1), as in the testcase, we will not have
the (-1) and will pass changed=false to rewrite_expr_tree.

We should set changed based on what happens in try_special_add_to_ops.
Attached patch does this. Bootstrap and regression testing are ongoing. Is
this OK for trunk if there is no regression.


I think the bug is elsewhere.  In particular in
undistribute_ops_list/zero_one_operation/decrement_power.
All those look problematic in this regard, they change RHS of statements
to something that holds a different value, while keeping the LHS.
So, generally you should instead just add a new stmt next to the old one,
and adjust data structures (replace the old SSA_NAME in some ->op with
the new one).  decrement_power might be a problem here, dunno if all the
builtins are const in all cases that DSE would kill the old one,
Richard, any preferences for that?  reset flow sensitive info + reset debug
stmt uses, or something different?  Though, replacing the LHS with a new
anonymous SSA_NAME might be needed too, in case it is before SSA_NAME of a
user var that doesn't yet have any debug stmts.


Hi Jakub,

This is the patch I have now (not full tested yet). This is along what 
you described above. I think I have to handle all the LHS in 
zero_one_operation too, I will wait for the feedback before working on it.


Thanks,
Kugan
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr72835.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr72835.c
index e69de29..049eddc 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr72835.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr72835.c
@@ -0,0 +1,36 @@
+/* PR tree-optimization/72835.  */
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+struct struct_1 {
+unsigned int m1 : 6 ;
+unsigned int m2 : 24 ;
+unsigned int m3 : 6 ;
+};
+
+unsigned short var_32 = 0x2d10;
+
+struct struct_1 s1;
+
+void init ()
+{
+  s1.m1 = 4;
+  s1.m2 = 0x7ca4b8;
+  s1.m3 = 24;
+}
+
+void foo ()
+{
+  unsigned int c
+= ((unsigned int) s1.m2) * (-((unsigned int) s1.m3))
++ (var_32) * (-((unsigned int) (s1.m1)));
+  if (c != 4098873984)
+__builtin_abort ();
+}
+
+int main ()
+{
+init ();
+foo ();
+return 0;
+}
diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
index 7fd7550..b8dfa39 100644
--- a/gcc/tree-ssa-reassoc.c
+++ b/gcc/tree-ssa-reassoc.c
@@ -1039,7 +1039,7 @@ eliminate_using_constants (enum tree_code opcode,
 
 
 static void linearize_expr_tree (vec *, gimple *,
-bool, bool);
+bool, bool, bool *);
 
 /* Structure for tracking and counting operands.  */
 struct oecount {
@@ -1183,9 +1183,25 @@ propagate_op_to_single_use (tree op, gimple *stmt, tree 
*def)
is updated if there is only one operand but no operation left.  */
 
 static void
-zero_one_operation (tree *def, enum tree_code opcode, tree op)
+zero_one_operation (tree *def, enum tree_code opcode, tree op, bool 
ops_changed)
 {
   gimple *stmt = SSA_NAME_DEF_STMT (*def);
+  /* In this case, the result in the *def will be different as
+ compared to how it was.  Therefore, to avoid having SSA
+ which will have range_info and debug that reflects old
+ operation, create a new SSA and use it (PR72835).  */
+  if (ops_changed)
+{
+  use_operand_p use_p;
+  gimple *use_stmt;
+  gimple *stmt = SSA_NAME_DEF_STMT (*def);
+  tree new_def = make_ssa_name (TREE_TYPE (*def));
+  gcc_checking_assert (single_imm_use (*def, &use_p, &use_stmt));
+  SET_USE (use_p, new_def);
+  gimple_set_lhs (stmt, new_def);
+  *def = new_def;
+  update_stmt (use_stmt);
+}
 
   do
 {
@@ -1250,6 +1266,15 @@ zero_one_operation (tree *def, enum tree_code opcode, 
tree op)
  else if (is_gimple_assign (stmt2)
   && gimple_assign_rhs_code (stmt2) == NEGATE_EXPR)
{
+ /* In this case the result in the op will be
+different as compared to how it was.  Therefore, to avoid
+having SSA which will have range_info and debug that
+reflects old operation, create a new SSA and use
+it (PR72835).  */
+ tree tmp = make_ssa_name (TREE_TYPE (op));
+ gimple_set_lhs (stmt2, tmp);
+ gimple_assign_set_rhs2 (stmt, tmp);
+ update_stmt (stmt2);
  if (gimple_assign_rhs1 (stmt2) == op)
{
  tree cst = build_minus_one_cst (TREE_TYPE (op));
@@ -1453,7 +1478,8 @@ build_and_add_sum (tree type, tree op1, tree op2, enum 
tree_code opcode)
 
 static bool
 undistribute_ops_list (enum tree_code opcode,
-  vec *ops, struct loop *loop)
+  vec *ops, struct loop *loop,
+

Re: protected alloca class for malloc fallback

2016-08-10 Thread Richard Biener
On Tue, Aug 9, 2016 at 3:17 PM, Aldy Hernandez  wrote:
> On 08/05/2016 01:55 PM, Richard Biener wrote:
>
> Hi Richard.
>
>> Please don't use std::string.  For string building you can use obstacks.
>
>
> Alright let's talk details then so I can write things up in a way you
> approve of.
>
> Take for instance simple uses like all the tree_*check_failed routines,
> which I thought were great candidates for std::string-- they're going to be
> outputted to the screen or disk which is clearly many times more expensive
> than the malloc or overhead of std::string:
>
>   length += strlen ("expected ");
>   buffer = tmp = (char *) alloca (length);
>   length = 0;
>   while ((code = (enum tree_code) va_arg (args, int)))
> {
>   const char *prefix = length ? " or " : "expected ";
>
>   strcpy (tmp + length, prefix);
>   length += strlen (prefix);
>   strcpy (tmp + length, get_tree_code_name (code));
>   length += strlen (get_tree_code_name (code));
> }
>
> Do you suggest using obstacks here, or did you have something else in mind?

Why would you want to get rid of the alloca here?

> How about if it's something like the above but there are multiple exit
> points from the function.  I mean, we still need to call obstack_free() on
> all exit points, which is what I wanted to avoid for something as simple as
> building a throw-away string.

But you'll end up in

  internal_error ("tree check: %s, have %s in %s, at %s:%d",
  buffer, get_tree_code_name (TREE_CODE (node)),
  function, trim_filename (file), line);

where you have an interface using C strings again.

It's not that the standard library is bad per-se, it's just using a
tool that doesn't
fit its uses.  This makes the code a messy mix of two concepts which is what
I object to.

Yes, the above example may have premature optimization applied.  Is that
a reason to re-write it using std::string?  No.  Is it a reason to rewrite it
using obstracks?  No.  Just leave it alone.

Richard.

>
> Aldy


[PATCH v2] cpp/c: Add -Wexpansion-to-defined

2016-08-10 Thread Paolo Bonzini
clang recently added a new warning -Wexpansion-to-defined, which
warns when `defined' is used outside a #if expression (including the
case of a macro that is then used in a #if expression).

While I disagree with their inclusion of the warning to -Wall, I think
it is a good addition overall.  First, it is a logical extension of the
existing warning for breaking defined across a macro and its caller.
Second, it is good to make these warnings for `defined' available with
a command-line option other than -pedantic.  In fact this warning is
not mandated by the standard and thus is a strange case of a non-pedwarn
enabled by -pedantic.  As a side effect of using the command-line parsing
machinery to attach the new warning to -pedantic, it would become an
error for -pedantic-errors, which would be weird for a diagnostic that
is not mandated by the standard.

This patch adds -Wexpansion-to-defined, and enables it automatically
for -Wextra instead of -pedantic.

Bootstrapped and regression tested x86_64-pc-linux-gnu, ok?

Paolo

libcpp:
2016-08-09  Paolo Bonzini  

* include/cpplib.h (struct cpp_options): Add new member
warn_expansion_to_defined.
(CPP_W_EXPANSION_TO_DEFINED): New enum member.
* expr.c (parse_defined): Warn for all uses of "defined"
in macros, and tie warning to CPP_W_EXPANSION_TO_DEFINED.
* system.h (HAVE_DESIGNATED_INITIALIZERS): Do not use
"defined" in macros.

gcc:
2016-08-09  Paolo Bonzini  

* system.h (HAVE_DESIGNATED_INITIALIZERS,
HAVE_DESIGNATED_UNION_INITIALIZERS): Do not use
"defined" in macros.

gcc/c-family:
2016-08-09  Paolo Bonzini  

* c.opt (Wexpansion-to-defined): New.

gcc/doc:
2016-08-09  Paolo Bonzini  

* cpp.texi (Defined): Mention -Wexpansion-to-defined.
* cppopts.texi (Invocation): Document -Wexpansion-to-defined.
* invoke.texi (Warning Options): Document -Wexpansion-to-defined.

gcc/testsuite:
2016-08-09  Paolo Bonzini  

* gcc.dg/cpp/defined.c: Mark newly introduced warnings.  Adjust
options to include -Wexpansion-to-defined.
* gcc.dg/cpp/defined-no-Wexpansion-to-defined.c,
gcc.dg/cpp/defined-Wextra-Wno-expansion-to-defined.c,
gcc.dg/cpp/defined-Wextra.c: New testcases.

Index: gcc/c-family/c.opt
===
--- gcc/c-family/c.opt  (revision 239276)
+++ gcc/c-family/c.opt  (working copy)
@@ -506,6 +506,10 @@
 C ObjC C++ ObjC++ Var(warn_double_promotion) Warning
 Warn about implicit conversions from \"float\" to \"double\".
 
+Wexpansion-to-defined
+C ObjC C++ ObjC++ CPP(warn_expansion_to_defined) 
CppReason(CPP_W_EXPANSION_TO_DEFINED) Var(cpp_warn_expansion_to_defined) 
Init(0) Warning EnabledBy(Wextra)
+Warn if \"defined\" is used outside #if.
+
 Wimplicit-function-declaration
 C ObjC Var(warn_implicit_function_declaration) Init(-1) Warning 
LangEnabledBy(C ObjC,Wimplicit)
 Warn about implicit function declarations.
Index: gcc/doc/cpp.texi
===
--- gcc/doc/cpp.texi(revision 239276)
+++ gcc/doc/cpp.texi(working copy)
@@ -3342,7 +3342,8 @@
 the C standard says the behavior is undefined.  GNU cpp treats it as a
 genuine @code{defined} operator and evaluates it normally.  It will warn
 wherever your code uses this feature if you use the command-line option
-@option{-pedantic}, since other compilers may handle it differently.
+@option{-Wextra}, since other compilers may handle it differently.  The
+warning can also be enabled individually with @option{-Wexpansion-to-defined}.
 
 @node Else
 @subsection Else
Index: gcc/doc/cppopts.texi
===
--- gcc/doc/cppopts.texi(revision 239276)
+++ gcc/doc/cppopts.texi(working copy)
@@ -120,6 +120,12 @@
 @samp{#if} directive, outside of @samp{defined}.  Such identifiers are
 replaced with zero.
 
+@item -Wexpansion-to-defined
+@opindex Wexpansion-to-defined
+Warn whenever @samp{defined} is encountered in the expansion of a macro
+(including the case where the macro is expanded by an @samp{#if} directive).
+Such usage is not portable.
+
 @item -Wunused-macros
 @opindex Wunused-macros
 Warn about macros defined in the main file that are unused.  A macro
Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi (revision 239276)
+++ gcc/doc/invoke.texi (working copy)
@@ -4914,6 +4914,12 @@
 construct, known from C++, was introduced with ISO C99 and is by default
 allowed in GCC@.  It is not supported by ISO C90.  @xref{Mixed Declarations}.
 
+@item -Wexpansion-to-defined
+@opindex Wexpansion-to-defined
+Warn whenever @samp{defined} is encountered in the expansion of a macro.
+(including the case where the macro is expanded by an @samp{#if} directive).
+This warning is also enabled by @option{-Wextra}.
+
 @item -Wundef
 @opindex Wund

Re: protected alloca class for malloc fallback

2016-08-10 Thread Aldy Hernandez

On 08/10/2016 06:04 AM, Richard Biener wrote:

On Tue, Aug 9, 2016 at 3:17 PM, Aldy Hernandez  wrote:

On 08/05/2016 01:55 PM, Richard Biener wrote:

Hi Richard.


Please don't use std::string.  For string building you can use obstacks.



Alright let's talk details then so I can write things up in a way you
approve of.

Take for instance simple uses like all the tree_*check_failed routines,
which I thought were great candidates for std::string-- they're going to be
outputted to the screen or disk which is clearly many times more expensive
than the malloc or overhead of std::string:

   length += strlen ("expected ");
   buffer = tmp = (char *) alloca (length);
   length = 0;
   while ((code = (enum tree_code) va_arg (args, int)))
 {
   const char *prefix = length ? " or " : "expected ";

   strcpy (tmp + length, prefix);
   length += strlen (prefix);
   strcpy (tmp + length, get_tree_code_name (code));
   length += strlen (get_tree_code_name (code));
 }

Do you suggest using obstacks here, or did you have something else in mind?


Why would you want to get rid of the alloca here?


How about if it's something like the above but there are multiple exit
points from the function.  I mean, we still need to call obstack_free() on
all exit points, which is what I wanted to avoid for something as simple as
building a throw-away string.


But you'll end up in

   internal_error ("tree check: %s, have %s in %s, at %s:%d",
   buffer, get_tree_code_name (TREE_CODE (node)),
   function, trim_filename (file), line);

where you have an interface using C strings again.

It's not that the standard library is bad per-se, it's just using a
tool that doesn't
fit its uses.  This makes the code a messy mix of two concepts which is what
I object to.

Yes, the above example may have premature optimization applied.  Is that
a reason to re-write it using std::string?  No.  Is it a reason to rewrite it
using obstracks?  No.  Just leave it alone.


How about we use auto_vec<> with an expected sane default?  This way we 
have a performance conscious solution that will fallback to malloc if 
necessary, while cleaning up after itself without the need for changing 
every exit point from a function.


For example, the attached untested patch implements this approach for 
tree.c.


Would this be acceptable?

Aldy

diff --git a/gcc/tree.c b/gcc/tree.c
index 11d3b51..160c539 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -9636,9 +9636,9 @@ anon_aggrname_format()
 tree
 get_file_function_name (const char *type)
 {
-  char *buf;
   const char *p;
   char *q;
+  auto_vec chunk;
 
   /* If we already have a name we know to be unique, just use that.  */
   if (first_global_object_name)
@@ -9674,7 +9674,8 @@ get_file_function_name (const char *type)
file = LOCATION_FILE (input_location);
 
   len = strlen (file);
-  q = (char *) alloca (9 + 17 + len + 1);
+  chunk.reserve (9 + 17 + len + 1);
+  q = chunk.address ();
   memcpy (q, file, len + 1);
 
   snprintf (q + len, 9 + 17 + 1, "_%08X_" HOST_WIDE_INT_PRINT_HEX, 
@@ -9684,16 +9685,16 @@ get_file_function_name (const char *type)
 }
 
   clean_symbol_name (q);
-  buf = (char *) alloca (sizeof (FILE_FUNCTION_FORMAT) + strlen (p)
-+ strlen (type));
 
   /* Set up the name of the file-level functions we may need.
  Use a global object (which is already required to be unique over
  the program) rather than the file name (which imposes extra
  constraints).  */
-  sprintf (buf, FILE_FUNCTION_FORMAT, type, p);
+  auto_vec buf;
+  buf.reserve (sizeof (FILE_FUNCTION_FORMAT) + strlen (p) + strlen (type));
+  sprintf (buf.address (), FILE_FUNCTION_FORMAT, type, p);
 
-  return get_identifier (buf);
+  return get_identifier (buf.address ());
 }
 
 #if defined ENABLE_TREE_CHECKING && (GCC_VERSION >= 2007)
@@ -9711,6 +9712,7 @@ tree_check_failed (const_tree node, const char *file,
   const char *buffer;
   unsigned length = 0;
   enum tree_code code;
+  auto_vec chunk;
 
   va_start (args, function);
   while ((code = (enum tree_code) va_arg (args, int)))
@@ -9721,7 +9723,8 @@ tree_check_failed (const_tree node, const char *file,
   char *tmp;
   va_start (args, function);
   length += strlen ("expected ");
-  buffer = tmp = (char *) alloca (length);
+  chunk.reserve (length);
+  buffer = tmp = chunk.address ();
   length = 0;
   while ((code = (enum tree_code) va_arg (args, int)))
{
@@ -9760,7 +9763,9 @@ tree_not_check_failed (const_tree node, const char *file,
 length += 4 + strlen (get_tree_code_name (code));
   va_end (args);
   va_start (args, function);
-  buffer = (char *) alloca (length);
+  auto_vec chunk;
+  chunk.reserve (length);
+  buffer = chunk.address ();
   length = 0;
   while ((code = (enum tree_code) va_arg (args, int)))
 {
@@ -9809,7 +9814,9 @@ tree_range_check_failed 

Re: [PATCH] toplev.c: set FILE * globals to NULL after fclose

2016-08-10 Thread Richard Biener
On Wed, Aug 10, 2016 at 3:25 AM, David Malcolm  wrote:
> Successfully bootstrapped®rtested on x86_64-pc-linux-gnu.
>
> OK for trunk?

Ok.

Richard.

> gcc/ChangeLog:
> * toplev.c (finalize): Set aux_info_file, asm_out_file, and
> stack_usage_file to NULL after fclose calls.
> ---
>  gcc/toplev.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/toplev.c b/gcc/toplev.c
> index b0bb3ec..ddaee8a 100644
> --- a/gcc/toplev.c
> +++ b/gcc/toplev.c
> @@ -1894,6 +1894,7 @@ finalize (bool no_backend)
>if (flag_gen_aux_info)
>  {
>fclose (aux_info_file);
> +  aux_info_file = NULL;
>if (seen_error ())
> unlink (aux_info_file_name);
>  }
> @@ -1908,10 +1909,14 @@ finalize (bool no_backend)
> fatal_error (input_location, "error writing to %s: %m", 
> asm_file_name);
>if (fclose (asm_out_file) != 0)
> fatal_error (input_location, "error closing %s: %m", asm_file_name);
> +  asm_out_file = NULL;
>  }
>
>if (stack_usage_file)
> -fclose (stack_usage_file);
> +{
> +  fclose (stack_usage_file);
> +  stack_usage_file = NULL;
> +}
>
>if (!no_backend)
>  {
> --
> 1.8.5.3
>


Re: [PATCH] Fix early debug regression with DW_AT_string_length (PR debug/71906)

2016-08-10 Thread Richard Biener
On Wed, Aug 10, 2016 at 10:16 AM, Jakub Jelinek  wrote:
> On Tue, Aug 09, 2016 at 07:10:34PM +0200, Richard Biener wrote:
>> >In some cases like location of file scope vars, or this
>> >DW_AT_string_length,
>> >you really need to adjust late the debug info created early, there is
>> >no
>> >workaround for that.
>>
>> I suppose the workaround is to fix/extend DWARF... (DW_AT_GNU_string_length 
>> that allows us to refer to another DIE?)
>
> Introducing another attribute that does the same thing as existing attribute
> would be way too ugly.  In theory the reference class could be added to
> DW_AT_string_length, I can ask on DWARF workgroup (but it might be too late
> for DWARF 5), but that still doesn't solve the issue of the indirect params.
>
> How do you want to handle the debug info without ammending the early-generated
> DWARF though?  Just by using it as abstract origins of everything and
> ammending in copies?

Yes.

> That would be a total mess for all the consumers...
> Parsing DWARF and rewriting it isn't all that hard.

It may be not hard but it takes time and bloats debug info.  You'd
need to do it N times
(each LTRANS unit would need to parse all early generated debug,
pickle the "interesting" pieces, annotate it and then hopefully
write out a lot less than parsed).  Or you do what I do, annotate
via abstract origins.  A clever DWARF optimizer might then do what
you suggest on the final executable (smash all copies with their
abstract origin).  I'm not objecting to the latter, but that's not my
primary objective.  My primary objective is to make C++ debugging
reasonable even when you use LTO - with my prototype patches now
all libstdc++ pretty-printer tests PASS while the currently just FAIL.

Yes, I do have some DWARF consumer issues - VLAs don't work,
but I got no answer from a mail to gdb last year or the bug I filed
a few weeks ago.  Yes, lldb immediately crashes when trying to
parse the DWARF.  Those are the only consumers I have access to.

That said, I don't see a way to do scalable "better" debug for LTO.

Richard.

>
> Jakub


Re: [PR72835] Incorrect arithmetic optimization involving bitfield arguments

2016-08-10 Thread Richard Biener
On Wed, Aug 10, 2016 at 10:57 AM, Jakub Jelinek  wrote:
> On Wed, Aug 10, 2016 at 08:51:32AM +1000, kugan wrote:
>> I see it now. The problem is we are just looking at (-1) being in the ops
>> list for passing changed to rewrite_expr_tree in the case of multiplication
>> by negate.  If we have combined (-1), as in the testcase, we will not have
>> the (-1) and will pass changed=false to rewrite_expr_tree.
>>
>> We should set changed based on what happens in try_special_add_to_ops.
>> Attached patch does this. Bootstrap and regression testing are ongoing. Is
>> this OK for trunk if there is no regression.
>
> I think the bug is elsewhere.  In particular in
> undistribute_ops_list/zero_one_operation/decrement_power.
> All those look problematic in this regard, they change RHS of statements
> to something that holds a different value, while keeping the LHS.
> So, generally you should instead just add a new stmt next to the old one,
> and adjust data structures (replace the old SSA_NAME in some ->op with
> the new one).  decrement_power might be a problem here, dunno if all the
> builtins are const in all cases that DSE would kill the old one,
> Richard, any preferences for that?  reset flow sensitive info + reset debug
> stmt uses, or something different?  Though, replacing the LHS with a new
> anonymous SSA_NAME might be needed too, in case it is before SSA_NAME of a
> user var that doesn't yet have any debug stmts.

I'd say replacing the LHS is the way to go, with calling the appropriate helper
on the old stmt to generate a debug stmt for it / its uses (would need
to look it
up here).

Richard.

>
> Jakub


Re: protected alloca class for malloc fallback

2016-08-10 Thread Richard Biener
On Wed, Aug 10, 2016 at 12:12 PM, Aldy Hernandez  wrote:
> On 08/10/2016 06:04 AM, Richard Biener wrote:
>>
>> On Tue, Aug 9, 2016 at 3:17 PM, Aldy Hernandez  wrote:
>>>
>>> On 08/05/2016 01:55 PM, Richard Biener wrote:
>>>
>>> Hi Richard.
>>>
 Please don't use std::string.  For string building you can use obstacks.
>>>
>>>
>>>
>>> Alright let's talk details then so I can write things up in a way you
>>> approve of.
>>>
>>> Take for instance simple uses like all the tree_*check_failed routines,
>>> which I thought were great candidates for std::string-- they're going to
>>> be
>>> outputted to the screen or disk which is clearly many times more
>>> expensive
>>> than the malloc or overhead of std::string:
>>>
>>>length += strlen ("expected ");
>>>buffer = tmp = (char *) alloca (length);
>>>length = 0;
>>>while ((code = (enum tree_code) va_arg (args, int)))
>>>  {
>>>const char *prefix = length ? " or " : "expected ";
>>>
>>>strcpy (tmp + length, prefix);
>>>length += strlen (prefix);
>>>strcpy (tmp + length, get_tree_code_name (code));
>>>length += strlen (get_tree_code_name (code));
>>>  }
>>>
>>> Do you suggest using obstacks here, or did you have something else in
>>> mind?
>>
>>
>> Why would you want to get rid of the alloca here?
>>
>>> How about if it's something like the above but there are multiple exit
>>> points from the function.  I mean, we still need to call obstack_free()
>>> on
>>> all exit points, which is what I wanted to avoid for something as simple
>>> as
>>> building a throw-away string.
>>
>>
>> But you'll end up in
>>
>>internal_error ("tree check: %s, have %s in %s, at %s:%d",
>>buffer, get_tree_code_name (TREE_CODE (node)),
>>function, trim_filename (file), line);
>>
>> where you have an interface using C strings again.
>>
>> It's not that the standard library is bad per-se, it's just using a
>> tool that doesn't
>> fit its uses.  This makes the code a messy mix of two concepts which is
>> what
>> I object to.
>>
>> Yes, the above example may have premature optimization applied.  Is that
>> a reason to re-write it using std::string?  No.  Is it a reason to rewrite
>> it
>> using obstracks?  No.  Just leave it alone.
>
>
> How about we use auto_vec<> with an expected sane default?  This way we have
> a performance conscious solution that will fallback to malloc if necessary,
> while cleaning up after itself without the need for changing every exit
> point from a function.
>
> For example, the attached untested patch implements this approach for
> tree.c.
>
> Would this be acceptable?

Again - why change way from alloca in the tree_*_check_fail routines?

get_file_function_name may be seen as working on "arbitrary" user input
(filenames), but I'm not sure I'd treat it that way.  The function computes
strings that are the same over the whole compilation (but it does so
possibly repeatedly), so simply using malloc and having the result cached
would work here, too.

Richard.

> Aldy
>


[PATCH, RFC] gcov: dump in a static dtor instead of in an atexit handler

2016-08-10 Thread Martin Liška
Hello.

There are multiple PRs (mentioned in ChangeLog) which suffer from missing 
capability of gcov
to save counters for functions with constructor/destructor attributes. I done 
that by simply
replacing atexit handler (gcov_exit) with a new static destructor 
(__gcov_exit), which has
priority 99 (I did the same for __gcov_init). However, I'm not sure whether 
it's possible
that a ctor defined in a source file can be potentially executed before 
__gcov_init (w/ the default
priority)?

Patch survives:
make check -k -j10 RUNTESTFLAGS="gcov.exp"
make check -k -j10 RUNTESTFLAGS="tree-prof.exp"

I've just also verified that a DSO gcov dump works as before. I'm attaching a 
test-case which 
tests both static ctors/dtors, as well as C++ ctors/dtors.

Thoughts?
Martin
>From 0e7f660b77628533679e6302a3f4b444166fc365 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 10 Aug 2016 12:18:45 +0200
Subject: [PATCH] gcov: dump in a static dtor instead of in an atexit handler

gcc/testsuite/ChangeLog:

2016-08-10  Martin Liska  

	PR gcov-profile/7970
	PR gcov-profile/16855
	PR gcov-profile/44779
	* g++.dg/gcov/pr16855.C: New test.

gcc/ChangeLog:

2016-08-10  Martin Liska  

	PR gcov-profile/7970
	PR gcov-profile/16855
	PR gcov-profile/44779
	* coverage.c (build_gcov_exit_decl): New function.
	(coverage_obj_init): Call the function and generate __gcov_exit
	destructor.

libgcc/ChangeLog:

2016-08-10  Martin Liska  

	PR gcov-profile/7970
	PR gcov-profile/16855
	PR gcov-profile/44779
	* libgcov-driver.c (__gcov_init): Do not register a atexit
	handler.
	(__gcov_exit): Rename from gcov_exit.
	* libgcov.h (__gcov_exit): Declare.
---
 gcc/coverage.c  | 27 +--
 gcc/testsuite/g++.dg/gcov/pr16855.C | 37 +
 libgcc/libgcov-driver.c |  5 ++---
 libgcc/libgcov.h|  3 +++
 4 files changed, 67 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/gcov/pr16855.C

diff --git a/gcc/coverage.c b/gcc/coverage.c
index d4d371e..da7f915 100644
--- a/gcc/coverage.c
+++ b/gcc/coverage.c
@@ -1054,8 +1054,30 @@ build_init_ctor (tree gcov_info_type)
   stmt = build_call_expr (init_fn, 1, stmt);
   append_to_statement_list (stmt, &ctor);
 
-  /* Generate a constructor to run it.  */
-  cgraph_build_static_cdtor ('I', ctor, DEFAULT_INIT_PRIORITY);
+  /* Generate a constructor to run it (with priority 99).  */
+  cgraph_build_static_cdtor ('I', ctor, DEFAULT_INIT_PRIORITY - 1);
+}
+
+/* Generate the destructor function to call __gcov_exit.  */
+
+static void
+build_gcov_exit_decl (void)
+{
+  tree init_fn = build_function_type_list (void_type_node, void_type_node,
+	   NULL);
+  init_fn = build_decl (BUILTINS_LOCATION, FUNCTION_DECL,
+			get_identifier ("__gcov_exit"), init_fn);
+  TREE_PUBLIC (init_fn) = 1;
+  DECL_EXTERNAL (init_fn) = 1;
+  DECL_ASSEMBLER_NAME (init_fn);
+
+  /* Generate a call to __gcov_exit ().  */
+  tree dtor = NULL;
+  tree stmt = build_call_expr (init_fn, 0);
+  append_to_statement_list (stmt, &dtor);
+
+  /* Generate a destructor to run it (with priority 99).  */
+  cgraph_build_static_cdtor ('D', dtor, DEFAULT_INIT_PRIORITY - 1);
 }
 
 /* Create the gcov_info types and object.  Generate the constructor
@@ -1113,6 +1135,7 @@ coverage_obj_init (void)
   DECL_NAME (gcov_info_var) = get_identifier (name_buf);
 
   build_init_ctor (gcov_info_type);
+  build_gcov_exit_decl ();
 
   return true;
 }
diff --git a/gcc/testsuite/g++.dg/gcov/pr16855.C b/gcc/testsuite/g++.dg/gcov/pr16855.C
new file mode 100644
index 000..ec72e99
--- /dev/null
+++ b/gcc/testsuite/g++.dg/gcov/pr16855.C
@@ -0,0 +1,37 @@
+/* { dg-options "-fprofile-arcs -ftest-coverage" } */
+/* { dg-do run { target native } } */
+
+#include 
+using namespace std;
+class Test {
+public:
+	Test(void){
+	cout<< "In Test ctor" << endl;			  /* count(1) */
+	}
+	~Test(void){
+	cout<< "In Test dtor" << endl;			  /* count(1) */
+	}
+}T1;
+
+void uncalled(void){
+	cout<< "In uncalled" << endl;			  /* count(#) */
+}
+int main(void){
+// Test T2;
+cout<< "In main" << endl;  /* count(1) */
+return 0;
+}
+
+#include 
+
+__attribute__((constructor))
+static void construct_navigationBarImages() {
+  fprintf (stderr,  "((construct_navigationBarImages))"); /* count(1) */
+}
+
+__attribute__((destructor))
+static void destroy_navigationBarImages() {
+  fprintf (stderr,  "((destroy_navigationBarImages))");	  /* count(1) */
+}
+
+/* { dg-final { run-gcov branches { -b pr16855.C } } } */
diff --git a/libgcc/libgcov-driver.c b/libgcc/libgcov-driver.c
index d51397e..84471bd 100644
--- a/libgcc/libgcov-driver.c
+++ b/libgcc/libgcov-driver.c
@@ -872,8 +872,8 @@ struct gcov_root __gcov_root;
 struct gcov_master __gcov_master = 
   {GCOV_VERSION, 0};
 
-static void
-gcov_exit (void)
+void
+__gcov_exit (void)
 {
   __gcov_dump_one (&__gcov_root);
   if (__gcov_root.next)
@@ -906,7 +906,6 @@ __gcov_init (struct gcov_info *info)
 		__gcov_master.root->prev = &_

Re: [Patch, libfortran] Multi-threaded random_number

2016-08-10 Thread Janne Blomqvist
Hi,

thanks for the Ok. However, moments before committing I got cold feet
and started digging around; it unfortunately seems that TLS
(_Thread_local) is not supported on all targets. So I'll have to
copy-paste some configure magic from libgomp/libjava/etc., and provide
workarounds for such systems. :(

On Tue, Aug 9, 2016 at 8:01 AM, Jerry DeLisle  wrote:
> On 08/08/2016 04:01 AM, Janne Blomqvist wrote:
>>
>> PING**2
>>
>
> OK, thanks for patch.
>
> Jerry
>
>> On Sun, Jul 24, 2016 at 4:45 PM, Janne Blomqvist
>>  wrote:
>>>
>>> Hi,
>>>
>>> the attached patch replaces the current random_number / random_seed
>>> implementations with an implementation that better supports threads.
>>> It's an improved version of the RFC patch I posted earlier at
>>> https://gcc.gnu.org/ml/gcc-patches/2015-12/msg02110.html . Please see
>>> that earlier message for a longer-winded explanation of what's wrong
>>> with the current implementation and how the patch addresses this.
>>>
>>> In short, with the patch the random number generator state is now
>>> per-thread and stored in a per-thread (TLS) variable, enabling a
>>> lockless fast-path. This provides up to 2 orders of magnitude better
>>> performance on a synthetic benchmark using 4 threads, and provides a
>>> more deterministic result as the order that threads are scheduled does
>>> not affect the random number streams for each thread.
>>>
>>> Compared to the RFC patch, a number of minor and not-so-minor bugs
>>> have been fixed, so the patch now passes the testsuite (with a few
>>> modifications to the suite, part of the patch). Also, for REAL kinds
>>> 4, 8, 10 the generated streams are identical (except precision, of
>>> course) (like the current implementation), enabling precision
>>> comparisons, as requested by Steve Kargl. However, this does not
>>> extend to REAL(16) as that would have necessitated doubling the size
>>> of the state, along with potential issues of slower escape from a
>>> low-entropy state, for a feature that I believe is not used by
>>> particularly many users in the end. So if one wants to do precision
>>> comparisons with REAL(16) one must employ a wrapper routine.
>>>
>>> Regtested on x86_64-pc-linux-gnu, Ok for trunk?
>>>
>>> frontend ChangeLog:
>>>
>>> 2016-07-27  Janne Blomqvist  
>>>
>>> * check.c (gfc_check_random_seed): Use new seed size in check.
>>> * intrinsic.texi (RANDOM_NUMBER): Updated documentation.
>>> (RANDOM_SEED): Likewise.
>>>
>>>
>>> testsuite:
>>>
>>> 2016-07-27  Janne Blomqvist  
>>>
>>> * gfortran.dg/random_7.f90: Take into account that the last seed
>>> value is the special p value.
>>> * gfortran.dg/random_seed_1.f90: Seed size is now constant.
>>>
>>>
>>> libgfortran:
>>> 2016-07-27  Janne Blomqvist  
>>>
>>> * intrinsics/random.c: Replace KISS with xorshift1024* with
>>> per-thread (TLS) state.
>>> * runtime/main.c (init): Don't call random_seed_i4.
>>>
>>>
>>> --
>>> Janne Blomqvist
>>
>>
>>
>>
>



-- 
Janne Blomqvist


Re: [PATCH v2] cpp/c: Add -Wexpansion-to-defined

2016-08-10 Thread Manuel López-Ibáñez
On 10 August 2016 at 11:06, Paolo Bonzini  wrote:
> While I disagree with their inclusion of the warning to -Wall, I think
> it is a good addition overall.  First, it is a logical extension of the
> existing warning for breaking defined across a macro and its caller.
> Second, it is good to make these warnings for `defined' available with
> a command-line option other than -pedantic.  In fact this warning is
> not mandated by the standard and thus is a strange case of a non-pedwarn
> enabled by -pedantic.  As a side effect of using the command-line parsing
> machinery to attach the new warning to -pedantic, it would become an
> error for -pedantic-errors, which would be weird for a diagnostic that
> is not mandated by the standard.

Note that the definition of -pedantic-errors says: "in some cases
where there is undefined behavior at compile-time". Thus, this would
be allowed according to our current definitions. However, the
definition of -Wpedantic does not mention that, thus it could be a
pedwarn not controlled by -Wpedantic.

My only fear is that people not using -Wpedantic nor -pedantic-errors
expect that GNU extensions work. This is a GNU extension that defines
something that is undefined according to ISO. Enabling the warning
with -Wextra is just annoying those people who may not care about
other compilers.

Thus, my opinion is that the current definition of -Wpedantic is too
restrictive and it should contain the "in some cases where there is
undefined behavior at compile-time". And thus, this should be a
pedwarn enabled by -Wpedantic, not by -Wextra and an error with
-pedantic-errors. But you should wait for other opinions, specially
Joseph, before redoing it, even if you agree with me.

Cheers,

Manuel.


Ping^2 Re: Implement C _FloatN, _FloatNx types [version 5]

2016-08-10 Thread Joseph Myers
Ping^2.  This patch 
 
(non-C-front-end parts) is still pending review.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [RFC] ipa bitwise constant propagation

2016-08-10 Thread Prathamesh Kulkarni
On 10 August 2016 at 14:14, Prathamesh Kulkarni
 wrote:
> On 9 August 2016 at 23:43, Martin Jambor  wrote:
>> Hi,
>>
>> On Tue, Aug 09, 2016 at 05:17:31PM +0530, Prathamesh Kulkarni wrote:
>>> On 9 August 2016 at 16:39, Martin Jambor  wrote:
>>>
>>> ...
>>>
>>> >> Instead of storing arg's precision and sign, we should store
>>> >> parameter's precision and sign in ipa_compute_jump_functions_for_edge ().
>>> >> Diff with respect to previous patch:
>>> >>
>>> >> @@ -1688,9 +1690,9 @@ ipa_compute_jump_functions_for_edge (struct
>>> >> ipa_func_body_info *fbi,
>>> >>&& (TREE_CODE (arg) == SSA_NAME || TREE_CODE (arg) == INTEGER_CST))
>>> >>   {
>>> >>jfunc->bits.known = true;
>>> >> -  jfunc->bits.sgn = TYPE_SIGN (TREE_TYPE (arg));
>>> >> -  jfunc->bits.precision = TYPE_PRECISION (TREE_TYPE (arg));
>>> >> -
>>> >> +  jfunc->bits.sgn = TYPE_SIGN (param_type);
>>> >> +  jfunc->bits.precision = TYPE_PRECISION (param_type);
>>> >> +
>>> >
>>> > If you want to use the precision of the formal parameter then you do
>>> > not need to store it to jump functions.  Parameter DECLs along with
>>> > their types are readily accessible in IPA (even with LTO).  It would
>>> > also be much clearer what is going on, IMHO.
>>> Could you please point out how to access parameter decl in wpa ?
>>> The only reason I ended up putting this in jump function was because
>>> I couldn't figure out how to access param decl during WPA.
>>> I see there's ipa_get_param() in ipa-prop.h however it's gated on
>>> gcc_checking_assert (!flag_wpa), so I suppose I can't use this
>>> during WPA ?
>>>
>>> Alternatively I think I could access cs->callee->decl and get to the param 
>>> decl
>>> by walking DECL_ARGUMENTS ?
>>
>> Actually, we no longer have DECL_ARGUMENTS during LTO WPA.  But in
>> most cases, you can still get at the type with something like the
>> following (only very lightly tested) patch, if Honza does not think it
>> is too crazy.
>>
>> Note that= for old K&R C sources we do not have TYPE_ARG_TYPES and so
>> ipa_get_type can return NULL(!) ...however I wonder whether for such
>> programs the type assumptions made in callers when constructing jump
>> functions can be trusted either.
>>
>> I have to run, we will continue the discussion later.
> Thanks for the patch.
> In this version, I updated the patch to use ipa_get_type, remove
> precision and sgn
> from ipcp_bits_lattice and ipa_bits, and renamed member variables to
> add m_ prefix.
> Does it look OK ?
> I am looking for test-case that affects precision and hopefully add
> that along with other
> test-cases in follow-up patch.
> Bootstrap+test in progress on x86_64-unknown-linux-gnu.
oops, I forgot to add tree-ssa-ccp.h hunk to the patch :/
Attached in this version.

Thanks,
Prathamesh
>
> Thanks,
> Prathamesh
>>
>> Martin
>>
>>
>> 2016-08-09  Martin Jambor  
>>
>> * ipa-prop.h (ipa_param_descriptor): Renamed decl to decl_or_type.
>> Update comment.
>> (ipa_get_param): Updated comment, added assert that we have a
>> PARM_DECL.
>> (ipa_get_type): New function.
>> * ipa-cp.c (ipcp_propagate_stage): Fill in argument types in LTO 
>> mode.
>> * ipa-prop.c (ipa_get_param_decl_index_1): Use decl_or_type
>> instead of decl;
>> (ipa_populate_param_decls): Likewise.
>> (ipa_dump_param): Likewise.
>>
>>
>> diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
>> index 5b6cb9a..3465da5 100644
>> --- a/gcc/ipa-cp.c
>> +++ b/gcc/ipa-cp.c
>> @@ -1952,11 +1952,21 @@ propagate_constants_accross_call (struct cgraph_edge 
>> *cs)
>>else
>>  i = 0;
>>
>> +  /* !!! The following dump is of course only a demonstration that it 
>> works: */
>> +  debug_generic_expr (callee->decl);
>> +  fprintf (stderr, "\n");
>> +
>>for (; (i < args_count) && (i < parms_count); i++)
>>  {
>>struct ipa_jump_func *jump_func = ipa_get_ith_jump_func (args, i);
>>struct ipcp_param_lattices *dest_plats;
>>
>> +  /* !!! The following dump is of course only a demonstration that it
>> + works: */
>> +  fprintf (stderr, "  The type of parameter %i is: ", i);
>> +  debug_generic_expr (ipa_get_type (callee_info, i));
>> +  fprintf (stderr, "\n");
>> +
>>dest_plats = ipa_get_parm_lattices (callee_info, i);
>>if (availability == AVAIL_INTERPOSABLE)
>> ret |= set_all_contains_variable (dest_plats);
>> @@ -2936,6 +2946,19 @@ ipcp_propagate_stage (struct ipa_topo_info *topo)
>>{
>>  struct ipa_node_params *info = IPA_NODE_REF (node);
>>
>> +/* In LTO we do not have PARM_DECLs but we would still like to be able 
>> to
>> +   look at types of parameters.  */
>> +if (in_lto_p)
>> +  {
>> +   tree t = TYPE_ARG_TYPES (TREE_TYPE (node->decl));
>> +   for (int k = 0; k < ipa_get_param_count (info); k++)
>> + {
>> +   gcc_assert (t != void_list_node);
>> +   info->descriptors[k].decl_or_type = TREE_VALUE (t);
>> +   t = t ? 

Re: Use verify_oacc_routine_clauses for Fortran (was: Use verify_oacc_routine_clauses for C/C++)

2016-08-10 Thread Thomas Schwinge
Hi Cesar!

On Tue, 9 Aug 2016 08:54:46 -0700, Cesar Philippidis  
wrote:
> On 08/01/2016 08:29 AM, Thomas Schwinge wrote:
> 
> > On Mon, 01 Aug 2016 17:21:37 +0200, I wrote:
> >> Some checking of OpenACC clauses currently done in the front ends should
> >> be moved later, and be unified.  (Also, I suppose, for supporting of the
> >> device_type clause, such checking actually *must* be moved later, into
> >> the oaccdevlow pass, or similar.)  Here is a first preparatory patch.  OK
> >> for trunk?
> >>
> >> commit e02a9b65c505b404f8d985b0ec6ccb99d73515d3
> >> Author: Thomas Schwinge 
> >> Date:   Wed Jul 27 15:54:38 2016 +0200
> >>
> >> Use verify_oacc_routine_clauses for C/C++
> > 
> > Here is a Fortran patch.  This depends on other Fortran patches in flight
> > (such as Cesar's), and on PR72741 "Fortran OpenACC routine directive
> > doesn't properly handle clauses specifying the level of parallelism" be
> > resolved, and thereabouts, but I'm posting it anyway, in case anyone has
> > any review comments already.  I suppose, to begin with, the call of
> > gfc_oacc_routine_dims will move later into the Fortran front end
> > pipeline, to the point then function declarations' attributes are set, or
> > similar.  Also, as discussed already, the Fortran front end currently is
> > very "forgetful" in regards to OpenACC/OpenMP clauses' specific location
> > information, so we're not able at present to produce diagnostics with
> > precise location information.
> 
> Are you planning on staging this patch in gomp4?

As we discussed yesterday, I was planning to wait for the corresponding
Fortran front end changes to materialize, and then handle it all
together, for C, C++, and Fortran.

> I know it's a work in
> progress, but is this function ...

> > --- gcc/fortran/trans-openmp.c
> > +++ gcc/fortran/trans-openmp.c

> > +/* Determine and verify the level of parallelism for an OpenACC routine.  
> > */
> > +
> > +oacc_function
> > +gfc_oacc_routine_dims (gfc_omp_clauses *clauses, locus location)
> > +{
> > +  [...]
> 
> included elsewhere in your patch set? I'll take over this patch as I
> work on PR72741.

I don't know what exactly you mean with "included"?  Also, as I
elaborated in ,
actually this function should "ideally just call
gcc/fortran/trans-openmp.c:gfc_trans_omp_clauses to translate the Fortran
OMP clauses representation into OMP_CLAUSE trees" (if that makes sense).
Anyway, feel free to take over this function/patch as required.


Grüße
 Thomas


signature.asc
Description: PGP signature


Re: [PATCH v2] cpp/c: Add -Wexpansion-to-defined

2016-08-10 Thread Joseph Myers
On Wed, 10 Aug 2016, Manuel López-Ibáñez wrote:

> Thus, my opinion is that the current definition of -Wpedantic is too
> restrictive and it should contain the "in some cases where there is
> undefined behavior at compile-time". And thus, this should be a

Yes, it should.

-- 
Joseph S. Myers
jos...@codesourcery.com

[PATCH][RFC] Add the ability to record sub-timevars (-ftime-report-details)

2016-08-10 Thread Richard Biener

The following patch adds the ability to record time spent in utility
to the pass using it.  For things like CFG cleanup, alias stmt walking
or DF infrastructure work it is currently not visible which pass is
responsible for the time spent there.  With -ftime-report-details
you now get

 alias stmt walking  :   4.48 ( 2%) usr   0.14 ( 3%) sys   4.66 ( 2%) 
wall1209 kB ( 0%) ggc
...
 tree PRE:   3.58 ( 2%) usr   0.04 ( 1%) sys   3.71 ( 2%) 
wall   11870 kB ( 1%) ggc
 `- tree tail merge  :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) 
wall   0 kB ( 0%) ggc
 `- alias stmt walking   :   0.88 ( 0%) usr   0.00 ( 0%) sys   0.91 ( 0%) 
wall 153 kB ( 0%) ggc
 `- loop init:   0.05 ( 0%) usr   0.00 ( 0%) sys   0.08 ( 0%) 
wall   0 kB ( 0%) ggc
 `- tree SSA incremental :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) 
wall   0 kB ( 0%) ggc
 `- tree operand scan:   0.06 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) 
wall 544 kB ( 0%) ggc
 `- tree STMT verifier   :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) 
wall   0 kB ( 0%) ggc
 tree FRE:   3.91 ( 2%) usr   0.09 ( 2%) sys   3.71 ( 2%) 
wall5658 kB ( 1%) ggc
 `- alias stmt walking   :   2.81 ( 2%) usr   0.08 ( 2%) sys   2.84 ( 2%) 
wall 819 kB ( 0%) ggc
 `- unaccounted todo :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) 
wall 851 kB ( 0%) ggc
 `- tree operand scan:   0.06 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) 
wall   0 kB ( 0%) ggc
 `- dominance computation:   0.03 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) 
wall   0 kB ( 0%) ggc
 tree code sinking   :   0.11 ( 0%) usr   0.02 ( 0%) sys   0.17 ( 0%) 
wall1268 kB ( 0%) ggc
 `- loop init:   0.13 ( 0%) usr   0.00 ( 0%) sys   0.11 ( 0%) 
wall 526 kB ( 0%) ggc
 `- dominance computation:   0.06 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) 
wall   0 kB ( 0%) ggc
...

note that the full time spent in, say, alias stmt walking is still
recorded and dumped.  Note this also visualizes current nesting
of timevars and thus you'll get

 rest of compilation :   1.02 ( 1%) usr   0.04 ( 1%) sys   1.17 ( 1%) 
wall4977 kB ( 0%) ggc
 `- hard reg cprop   :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) 
wall   0 kB ( 0%) ggc
 `- tree conservative DCE:   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) 
wall   0 kB ( 0%) ggc
...

because we have passes w/o timevar that are accumulated in
rest of compilation.  There is the opportunity to clean this up.

I've extended the pass timevars to also cover their TODOs and thus
removed TV_TODO.

The accounting is "flat", that is, at most one level of sub-timevars
are recorded.

Does this look sensible and useful given the imperfectness of the
current timevar assignments?

Bootstrap pending.

Thanks,
Richard.

2016-08-10  Richard Biener  

* passes.c (execute_todo): Do not push/pop TV_TODO.
(execute_one_ipa_transform_pass): Move timevar push/pop TODO execution.
(execute_one_pass): Likewise.
* common.opt (ftime-report-details): New switch.
* doc/invoke.texi (ftime-report-details): Document.
* timevar.h (timer::print_row): Adjust signature.
(timer::child_map_t): New typedef.
(timer::time_var_def): Add children field.
* timevar.c (timer::named_items::print): Adjust.
(timer::~timer): Free timevar recorded children.
(timer::pop_internal): When -ftime-report-details record
time spent in sub-timevars.
(timer::print_row): Adjust.
(timer::print): Print sub-timevar stats.
* timevar.def (TV_TODO): Remove.

Index: gcc/common.opt
===
--- gcc/common.opt  (revision 238469)
+++ gcc/common.opt  (working copy)
@@ -2255,6 +2255,10 @@ ftime-report
 Common Report Var(time_report)
 Report the time taken by each compiler pass.
 
+ftime-report-details
+Common Report Var(time_report_details)
+Record times taken by sub-phases separately.
+
 ftls-model=
 Common Joined RejectNegative Enum(tls_model) Var(flag_tls_default) 
Init(TLS_MODEL_GLOBAL_DYNAMIC)
 -ftls-model=[global-dynamic|local-dynamic|initial-exec|local-exec] Set the 
default thread-local storage code generation model.
Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi (revision 238469)
+++ gcc/doc/invoke.texi (working copy)
@@ -548,7 +548,7 @@ Objective-C and Objective-C++ Dialects}.
 -fprofile-report @gol
 -frandom-seed=@var{string} -fsched-verbose=@var{n} @gol
 -fsel-sched-verbose -fsel-sched-dump-cfg -fsel-sched-pipelining-verbose @gol
--fstats  -fstack-usage  -ftime-report @gol
+-fstats  -fstack-usage  -ftime-report -ftime-report-details @gol
 -fvar-tracking-assignments-toggle -gtoggle @gol
 -print-file-name=@var{library}  -print-libgcc-file-name @gol
 -print-multi-directory  -print-multi-lib  -print-multi-os-directory @gol
@@ -12543,6 +12543,10 @@ print som

Re: [PATCH v2] cpp/c: Add -Wexpansion-to-defined

2016-08-10 Thread Paolo Bonzini


On 10/08/2016 13:31, Manuel López-Ibáñez wrote:
> My only fear is that people not using -Wpedantic nor -pedantic-errors
> expect that GNU extensions work. This is a GNU extension that defines
> something that is undefined according to ISO. Enabling the warning
> with -Wextra is just annoying those people who may not care about
> other compilers.

I think this warning falls in the same category as
-Wshift-negative-value.  (In fact I dislike -Wshift-negative-value a
lot, and would put that one under -Wpedantic only).

It is interesting that GCC has been relying for a long time on such
behavior in the HAVE_DESIGNATED_INITIALIZERS macro, despite GCC not
having any interest in compiling with -Wpedantic.  I think this
reinforces the choice of adding the warning to -Wextra.

People using -Wextra are used to having to remove some warnings
manually, and this one probably doesn't have many hits (1 in QEMU, which
is what motivated me to add it to GCC; and 2 in GCC not counting the
duplicate code between gcc/system.h and libcpp/system.h).

Paolo


Re: AW: [PATCH] Fix unaligned access when predictive commoning (PR 71083)

2016-08-10 Thread Eric Botcazou
> I tried to translate the c-test case to an equivalent ada test case with
> my not-so-fluent Ada speak...
> 
> So here is a test case that *actually* hits the if ((boff % BITS_PER_UNIT)
> != 0) code path.

I see, well done then.

> Before my patch there was an unaligned SImode access, and with
> the patch we have 3 QImode accesses here.
> 
> cat pr71083_pkg.ads
> package Pr71083_pkg is
>   type Nibble is mod 2**4;
>   type Int24  is mod 2**24;
>   type StructA is record
> a : Nibble;
> b : Int24;
>   end record;
>   pragma Pack(StructA);
>   type StructB is record
> a : Nibble;
> b : StructA;
>   end record;
>   pragma Pack(StructB);
>   type ArrayOfStructB is array(0..100) of StructB;
>   procedure Foo (X : in out ArrayOfStructB);
> end Pr71083_pkg;
> 
> cat pr71083_pkg.adb
> -- { dg-do compile }
> -- { dg-options "-O3" }
> package body Pr71083_pkg is
>   procedure Foo (X : in out ArrayOfStructB) is
>   begin
> for K in 0..99 loop
>   X (K+1).b.b := X (K).b.b;
> end loop;
>   end Foo;
> end Pr71083_pkg;
> 
> cat pr71083.adb
> -- { dg-do run }
> -- { dg-options "-O3" }
> with Pr71083_pkg;
> use Pr71083_pkg;
> procedure Pr71083 is
>   Test : ArrayOfStructB;
> begin
>   Test (0).b.b := ;
>   Foo (Test);
>   if Test (100).b.b /=  then
> raise Program_Error;
>   end if;
> end;
> 
> 
> I was not sure which name to choose,
> so I used the same convention as in the c-torture.
> How do you like that?

Worst convention ever. :-)  Either opt57 or loop_optimization23 at your 
convenience (you can add a reference to the PR in a comment).

> I would like to add that to the gnat.dg because
> it seems that there is zero testing for the predictive
> commoning in the gnat.dg test suite.

Very likely so.  The renamed testcase is OK for mainline, but please replace 
"Eric's" with "the" in the patch or copy-and-paste the explanation.

-- 
Eric Botcazou


Re: [PATCH] Fix unaligned access when predictive commoning (PR 71083)

2016-08-10 Thread Richard Biener
On Tue, 9 Aug 2016, Bernd Edlinger wrote:

> On 08/09/16 22:48, Eric Botcazou wrote:
> >> I think from Eric's comment in get_inner_ref it can be possible
> >> in Ada that the outer object is accidentally byte-aligned
> >> but the inner reference is not.  In that case I think it is
> >> better to generate a BIT_FIELD_REF instead of a COMPONENT_REF.
> >
> > The patch says get_bit_range instead...  The comment therein means that
> > bitfields can be nested in Ada: you can have a bitfield in a record which is
> > itself a bitfield in another record.
> >
> >> Eric do you agree?  Are there any Ada test cases where the
> >> pcom optimization jumps in?
> >
> > According to the comment, data-ref analysis punts on bit offsets so I'm not
> > sure how boff can be not byte-aligned.
> >
> 
> I mean in dr_analyze_innermost, we have:
> 
>base = get_inner_reference (ref, &pbitsize, &pbitpos, &poffset, &pmode,
>&punsignedp, &preversep, &pvolatilep);
>gcc_assert (base != NULL_TREE);
> 
>if (pbitpos % BITS_PER_UNIT != 0)
>  {
>if (dump_file && (dump_flags & TDF_DETAILS))
>  fprintf (dump_file, "failed: bit offset alignment.\n");
>return false;
>  }
> 
>if (preversep)
>  {
>if (dump_file && (dump_flags & TDF_DETAILS))
>  fprintf (dump_file, "failed: reverse storage order.\n");
>return false;
>  }
> 
> 
> and that means that get_inner_reference on the outermost ref
> returns a byte-aligned bit-offset, and from there I would
> think it has the knowledge, when to punt on reversep and
> when the offset is not byte-aligned.
> 
> But in get_bit_range we have a bit-field with arbitrary
> bit-offset, but surprisingly the
> get_inner_reference (TREE_OPERAND (exp, 0)) did not return
> byte-aligned bit-offset.  I doubt that data-ref ananlysis
> ever cares about the byte-alignment of intermediate
> refs.

It doesn't.  Note that it cares about byte-alignment of the ref
only because it decomposes the ref into an address plus
a byte offset - and the address is natrually to a byte-aligned thing.

> We know that the difference between the innermost ref
> and the outer ref is byte-aligned, but we do not know
> that the same is true for offset between the COMPONENT_REF
> and the outer ref.
> 
> I mean boff is essentially the difference between
> bitpos of get_inner_reference(exp) and
> bitpos of get_inner_reference(TREE_OPERAND (exp, 0))
> 
> This would be exposed by my patch, because previously
> we always generated BIT_FIELD_REFS, with bit-offset 0,
> but the MEM_REF at the byte-offset there is in all likelihood
> pretty much unaligned, the MEM_REF at the COMPONENT_REF
> is likely more aligned and allows better code for ARM processors,
> but only if the MEM_REF is at a byte-aligned offset at all,
> otherwise the whole transformation would be wrong.

Note that the important thing to ensure is that the access the
MEM_REF performs is correct TBAA-wise which means you either
have to use alias-set zero (ptr_type_node offset) or _not_
shuffle the offset arbitrarily between the MEM_REF and the
components you wrap it in.

Richard.


Re: [Patch, tentative, reload] Make push_reload work with more types of subregs?

2016-08-10 Thread Senthil Kumar Selvaraj

Bernd Schmidt writes:

> On 08/08/2016 05:26 PM, Senthil Kumar Selvaraj wrote:
>
>> I picked out the commit where you'd added SYMBOL_REF handling (rev
>> #190252), and patched that with the below code. Bootstrapped and
>> regtested on x86_64-pc-linux - the results were identical with and
>> without the patch. Is this good enough for trunk?
>
>> -  else if (GET_CODE (SUBREG_REG (in)) == SYMBOL_REF)
>> +  else if (GET_CODE (SUBREG_REG (in)) == SYMBOL_REF
>> +   || CONSTANT_P (SUBREG_REG (in))
>> +   || GET_CODE (SUBREG_REG (in)) == PLUS)
>>  subreg_in_class = find_valid_class_1 (inmode,
>>GET_MODE (SUBREG_REG (in)),
>>rclass);
>
> Actually SYMBOL_REF should also be CONSTANT_P. For integers it looks 
> like it'll pass VOIDmode to find_valid_class_1 and just return NO_REGS. 
> which I suppose is OK.
>
> Would you mind removing the SYMBOL_REF test and retesting? Ok with that 
> change.

Bootstrapped and reg tested below patch with same setup as above - no
regressions showed up.

Committed patch to trunk. Ok for backport to 6.x and 5.x branches as
well?

Regards
Senthil

gcc/ChangeLog

2016-08-10  Senthil Kumar Selvaraj  

PR target/71873
* reload.c (push_reload): Compute subreg_in_class for
subregs of constants and plus expressions. Remove special
handling of SYMBOL_REFs.


gcc/testsuite/ChangeLog

2016-08-10  Senthil Kumar Selvaraj  

PR target/71873
* gcc.target/avr/pr71873.c: New test.



Index: gcc/reload.c
===
--- gcc/reload.c(revision 239318)
+++ gcc/reload.c(working copy)
@@ -1141,7 +1141,8 @@
   SUBREG_BYTE (in),
   GET_MODE (in)),
  REGNO (SUBREG_REG (in)));
-  else if (GET_CODE (SUBREG_REG (in)) == SYMBOL_REF)
+  else if (CONSTANT_P (SUBREG_REG (in))
+   || GET_CODE (SUBREG_REG (in)) == PLUS)
subreg_in_class = find_valid_class_1 (inmode,
  GET_MODE (SUBREG_REG (in)),
  rclass);
Index: gcc/testsuite/gcc.target/avr/pr71873.c
===
--- gcc/testsuite/gcc.target/avr/pr71873.c  (nonexistent)
+++ gcc/testsuite/gcc.target/avr/pr71873.c  (working copy)
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-options "-Os -fcaller-saves" } */
+
+#include 
+
+typedef struct {
+  uint8_t x;
+  uint32_t y;
+} A;
+
+A a;
+
+extern int bar(int);
+extern int foo (char *s, ...);
+
+extern uint8_t param;
+extern uint8_t h,m,s,ld,lm;
+extern uint16_t d;
+
+void gps_parse_string(int z)
+{
+  while (bar(z))
+  {
+switch (param)
+{
+  case 0: foo("a", &h, &m, &s, &d); break;
+  case 1: foo("d", &ld, &lm, &a.y); break;
+}
+  }
+}


[PATCH] Add test coverage for PR gcov-profile/35590

2016-08-10 Thread Martin Liška
Hi.

Following patch makes better test-coverage for cases mentioned in the PR.
The PR is resolved, I'll close it as soon as the patch is accepted.

Survives make check -k -j10 RUNTESTFLAGS="tree-prof.exp" on x86_64-linux-gnu.

Ready for trunk?
Thanks,
Martin
>From 686f204ab67b4c7a06feda0a85b22e864ffc1bda Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 10 Aug 2016 14:45:40 +0200
Subject: [PATCH] Add test coverage for PR gcov-profile/35590

gcc/testsuite/ChangeLog:

2016-08-10  Martin Liska  

	PR gcov-profile/35590
	* gcc.dg/tree-prof/val-prof-7.c: Improve test coverage.
---
 gcc/testsuite/gcc.dg/tree-prof/val-prof-7.c | 86 -
 1 file changed, 71 insertions(+), 15 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/tree-prof/val-prof-7.c b/gcc/testsuite/gcc.dg/tree-prof/val-prof-7.c
index 9765b99..3e636aa 100644
--- a/gcc/testsuite/gcc.dg/tree-prof/val-prof-7.c
+++ b/gcc/testsuite/gcc.dg/tree-prof/val-prof-7.c
@@ -1,25 +1,81 @@
 /* { dg-options "-O2 -fdump-ipa-profile -mtune=core2" } */
 /* { dg-skip-if "" { ! { i?86-*-* x86_64-*-* } } { "*" } { "" } } */
 
-extern void bzero (void *, __SIZE_TYPE__);
+char *buffer1;
+char *buffer2;
 
-int foo(int len)
-{
-  char array[1000];
-  bzero(array, len);
-  return 0;
+#define DEFINE_TEST(N) \
+__attribute__((noinline)) \
+void bzero_test_ ## N (int len) \
+{ \
+  __builtin_bzero (buffer1, len); \
+} \
+ \
+__attribute__((noinline)) \
+void memcpy_test_ ## N (int len) \
+{ \
+  __builtin_memcpy (buffer1, buffer2, len); \
+} \
+ \
+__attribute__((noinline)) \
+void mempcpy_test_ ## N (int len) \
+{ \
+  __builtin_mempcpy (buffer1, buffer2, len); \
+} \
+ \
+__attribute__((noinline)) \
+void memset_test_ ## N (int len) \
+{ \
+  __builtin_memset (buffer1, 'c', len); \
+} \
+ \
+void test_stringops_ ## N(int len) \
+{ \
+  bzero_test_ ## N (len); \
+  memcpy_test_## N (len); \
+  mempcpy_test_ ## N (len); \
+  memset_test_ ## N (len); \
+} \
+ \
+void test_stringops_with_values_ ## N (int common, int not_common) \
+{ \
+  int i; \
+  for (i = 0; i < 1000; i++) \
+{ \
+  if (i > 990) \
+	test_stringops_ ## N (not_common); \
+  else \
+	test_stringops_ ## N (common); \
+} \
 }
 
+DEFINE_TEST(0);
+DEFINE_TEST(1);
+DEFINE_TEST(2);
+
 int main() {
-  int i;
-  for (i = 0; i < 1000; i++)
-{
-  if (i > 990)
-	foo(16);
-  else
-	foo(8);
-}
+  buffer1 = __builtin_malloc (1000);
+  buffer2 = __builtin_malloc (1000);
+
+  test_stringops_with_values_0 (8, 111);
+  test_stringops_with_values_1 (111, 111);
+  test_stringops_with_values_2 (257, 111);
+
   return 0;
 }
 
-/* { dg-final-use-not-autofdo { scan-ipa-dump "Single value 8 stringop transformation on bzero" "profile" } } */
+/* { dg-final-use-not-autofdo { scan-ipa-dump "Single value 8 stringop transformation on __builtin_bzero" "profile" } } */
+/* { dg-final-use-not-autofdo { scan-ipa-dump "Single value 111 stringop transformation on __builtin_bzero" "profile" } } */
+/* { dg-final-use-not-autofdo { scan-ipa-dump-times "Single value 257 stringop transformation on __builtin_bzero" 0 "profile" } } */
+
+/* { dg-final-use-not-autofdo { scan-ipa-dump "Single value 8 stringop transformation on __builtin_memcpy" "profile" } } */
+/* { dg-final-use-not-autofdo { scan-ipa-dump "Single value 111 stringop transformation on __builtin_memcpy" "profile" } } */
+/* { dg-final-use-not-autofdo { scan-ipa-dump-times "Single value 257 stringop transformation on __builtin_memcpy" 0 "profile" } } */
+
+/* { dg-final-use-not-autofdo { scan-ipa-dump "Single value 8 stringop transformation on __builtin_mempcpy" "profile" } } */
+/* { dg-final-use-not-autofdo { scan-ipa-dump "Single value 111 stringop transformation on __builtin_mempcpy" "profile" } } */
+/* { dg-final-use-not-autofdo { scan-ipa-dump-times "Single value 257 stringop transformation on __builtin_mempcpy" 0 "profile" } } */
+
+/* { dg-final-use-not-autofdo { scan-ipa-dump "Single value 8 stringop transformation on __builtin_memset" "profile" } } */
+/* { dg-final-use-not-autofdo { scan-ipa-dump "Single value 111 stringop transformation on __builtin_memset" "profile" } } */
+/* { dg-final-use-not-autofdo { scan-ipa-dump-times "Single value 257 stringop transformation on __builtin_memset" 0 "profile" } } */
-- 
2.9.2



Re: [PATCH, RFC] gcov: dump in a static dtor instead of in an atexit handler

2016-08-10 Thread Nathan Sidwell

On 08/10/16 06:43, Martin Liška wrote:

Hello.

There are multiple PRs (mentioned in ChangeLog) which suffer from missing 
capability of gcov
to save counters for functions with constructor/destructor attributes. I done 
that by simply
replacing atexit handler (gcov_exit) with a new static destructor 
(__gcov_exit), which has
priority 99 (I did the same for __gcov_init). However, I'm not sure whether 
it's possible
that a ctor defined in a source file can be potentially executed before 
__gcov_init (w/ the default
priority)?

Patch survives:
make check -k -j10 RUNTESTFLAGS="gcov.exp"
make check -k -j10 RUNTESTFLAGS="tree-prof.exp"

I've just also verified that a DSO gcov dump works as before. I'm attaching a 
test-case which
tests both static ctors/dtors, as well as C++ ctors/dtors.


Does a coverage bootstrap (--enable-coverage) still succeed?

I think this is a good idea, but we should document the changed behavior. (I 
don't think the current behaviour's documented).



nathan


Re: [PATCH, RFC] gcov: dump in a static dtor instead of in an atexit handler

2016-08-10 Thread Nathan Sidwell

On 08/10/16 08:53, Nathan Sidwell wrote:


I think this is a good idea, but we should document the changed behavior. (I
don't think the current behaviour's documented).


oh, gcov_exit is a user callable routine.  You'll have to keep it available.


Re: [PATCH 1/4] Cherry-pick fprofile-generate-atomic from google/gcc-4_9 branch

2016-08-10 Thread Nathan Sidwell

On 08/05/16 09:43, Martin Liška wrote:

On 08/05/2016 03:14 PM, Nathan Sidwell wrote:

On 08/05/16 08:48, Martin Liška wrote:


Ok, after all the experimenting and inventing "almost" thread-safe code, I 
incline to not to include __gcov_one_value_profiler_body_atomic
counter. The final solution is cumbersome and probably does not worth doing. 
Moreover, even having a thread-safe implementation, result of an indirect call 
counter
is not going to be stable among different runs (due to a single value storage 
capability).

If you agree, I'll prepare a final version of patch?


Agreed.

nathan



Great, attaching install candidate.


ok, thanks.



[PATCH, ARM] Fix aprofile multilib mappings

2016-08-10 Thread Thomas Preudhomme

Hi,

Mappings (MULTILIB_MATCHES and MULTILIB_REUSE) in ARM aprofile multilib 
suffer from a number of issues:


* missing mapping of -mcpu=cortex-a7 to -march=armv7-a
* typo on vfpv3-d16-fp16 (MULTILIB_MATCHES uses vfpv3-fp16-d16 instead)
* missing mapping for neon-fp16, fpv5-d16 and fp-armv8 to neon, vfpv4 
and vfpv4 respectively
* reuse directive with option not in MULTILIB_OPTION (-mfpu=vfpv4 and 
-mfpu=fp-armv8)


The latter leads to unexpected results currently: GCC creates a reuse 
mapping that match for any -mfpu not in MULTILIB_OPTIONS. This means for 
instance that -march=armv7-a -mfpu=vfp -mfloat-abi=hard is mapped to 
-march=armv7-a -mfpu=vfpv3-d16 -mfloat-abi=hard. With this patch, it is 
now mapped to the default multilib (ie. -print-multi-directory gives .) 
which is a softfloat multilib. Arguably an option not in 
MULTILIB_OPTIONS should give an error when appearing in MULTILIB_REUSE 
rather the current behavior. This should be the object of a future patch.


The patch in attachment fixes all the issues mentioned above.

ChangeLog entry is as follows:


*** gcc/ChangeLog ***

2016-08-01  Thomas Preud'homme  

* config/arm/t-aprofile (MULTILIB_MATCHES): Add mapping for
-mcpu=cortex-a7, -mfpu=neon-fp16, -mfpu=fpv5-d16 and 
-mfpu=fp-armv8.

Fix typo in -mfpu=vfpv3-d16-fp16 mapping.
(MULTILIB_REUSE): Remove reuse rules for option set including
-mfpu=fp-armv8 and -mfpu=vfpv4


The mappings added have been tested to work as expected and the option 
combinations that the intended behavior of removed MULTILIB_REUSE 
directives were checked as well. For details, see [1]. Regression 
testsuite for arm-none-eabi with aprofile multilib shows no regression.


Is this ok for trunk?

Best regards,

Thomas


[1] Format of the tests is as follows:

# expected mapping
% command line for getting multilib we should be mapping to
expected result
% command line for the multilib that should be mapped to the one above
[before] result without patch
[ after] result with patch

Provided that the second set of options should indeed be mapped to the 
first set, the result [ after] should be the same as the expected result.



# cortex-a7 -> armv7-a
% arm-none-eabi-gcc -march=armv7ve -print-multi-directory
v7ve
% arm-none-eabi-gcc -mcpu=cortex-a7 -print-multi-directory
[before] .
[ after] v7ve

# vfpv3-d16-fp16 -> vfpv3-d16
% arm-none-eabi-gcc -march=armv7ve -mfpu=vfpv3-d16 -mfloat-abi=hard 
-print-multi-directory

v7-a/fpv3/hard
% arm-none-eabi-gcc -march=armv7ve -mfpu=vfpv3-d16-fp16 -mfloat-abi=hard 
-print-multi-directory

[before] v7ve/fpv4/hard
[ after] v7-a/fpv3/hard

# neon-fp16 -> neon
arm-none-eabi-gcc -march=armv7-a -mfpu=neon -mfloat-abi=hard 
-print-multi-directory

v7-a/simdv1/hard
% arm-none-eabi-gcc -march=armv7-a -mfpu=neon-fp16 -mfloat-abi=hard 
-print-multi-directory

[before] v7-a/fpv3/hard
[ after] v7-a/simdv1/hard

# fpv5-d16 -> vfpv4-d16
% arm-none-eabi-gcc -march=armv7ve -mfpu=vfpv4-d16 -mfloat-abi=hard 
-print-multi-directory

v7ve/fpv4/hard
% arm-none-eabi-gcc -march=armv7ve -mfpu=fpv5-d16 -mfloat-abi=hard 
-print-multi-directory

[before] v7ve/fpv4/hard
[ after] v7ve/fpv4/hard

# fp-armv8 -> vfpv4-d16
% arm-none-eabi-gcc -march=armv7ve -mfpu=vfpv4-d16 -mfloat-abi=hard 
-print-multi-directory

v7ve/fpv4/hard
% arm-none-eabi-gcc -march=armv7ve -mfpu=fp-armv8 -mfloat-abi=hard 
-print-multi-directory

[before] v7ve/fpv4/hard
[ after] v7ve/fpv4/hard

# armv7-a vfpv4 softfp -> armv7-a vfpv3-d16 softfp
% arm-none-eabi-gcc -march=armv7-a -mfpu=vfpv4-d16 -mfloat-abi=softfp 
-print-multi-directory

v7-a/fpv3/softfp
% arm-none-eabi-gcc -march=armv7-a -mfpu=vfpv4 -mfloat-abi=softfp 
-print-multi-directory

[before] v7-a/fpv3/softfp
[ after] v7-a/fpv3/softfp

# armv7-a vfpv4 hard -> armv7-a vfpv3-d16 hard
% arm-none-eabi-gcc -march=armv7-a -mfpu=vfpv3-d16 -mfloat-abi=hard 
-print-multi-directory

v7-a/fpv3/hard
% arm-none-eabi-gcc -march=armv7-a -mfpu=vfpv4 -mfloat-abi=hard 
-print-multi-directory

[before] v7-a/fpv3/hard
[ after] v7-a/fpv3/hard

# armv7ve fp-armv8 softfp -> armv7ve vfpv4-d16 softfp
% arm-none-eabi-gcc -march=armv7ve -mfpu=vfpv4-d16 -mfloat-abi=softfp 
-print-multi-directory

v7ve/fpv4/softfp
% arm-none-eabi-gcc -march=armv7ve -mfpu=fp-armv8 -mfloat-abi=softfp 
-print-multi-directory

[before] v7ve/fpv4/softfp
[ after] v7ve/fpv4/softfp

# armv7ve fp-armv8 hard -> armv7ve vfpv4-d16 hard
% arm-none-eabi-gcc -march=armv7ve -mfpu=vfpv4-d16 -mfloat-abi=hard 
-print-multi-directory

v7ve/fpv4/hard
% arm-none-eabi-gcc -march=armv7ve -mfpu=fp-armv8 -mfloat-abi=hard 
-print-multi-directory

[before] v7ve/fpv4/hard
[ after] v7ve/fpv4/hard

# armv8-a vfpv4 softfp -> armv7ve vfpv4-d16 softfp
% arm-none-eabi-gcc -march=armv7ve -mfpu=vfpv4-d16 -mfloat-abi=softfp 
-print-multi-directory

v7ve/fpv4/softfp
% arm-none-eabi-gcc -march=armv8-a -mfpu=vfpv4 -mfloat-abi=softfp 
-print-multi-directory

[before] v7ve/fpv4/softfp
[ after] v7ve/fpv4/softfp

# armv8-a vfpv4 hard

Re: [PATCH build/doc] Replacing libiberty with gnulib

2016-08-10 Thread ayush goel
On 9 August 2016 at 2:20:59 PM, Pedro Alves (pal...@redhat.com) wrote:
> On 08/06/2016 05:34 AM, ayush goel wrote:
> > On 5 August 2016 at 4:09:00 AM, Pedro Alves (pal...@redhat.com) wrote:
> >> On 08/02/2016 12:38 AM, Manuel López-Ibáñez wrote:
> >>>
> >>> If there is something wrong or missing, ideally we would like to know
> >>> so that Ayush can work on fixing it before the Summer of Code is over
> >>> in less than two weeks.
> >>
> >> I couldn't see anywhere gnulib's config.h file is included.
> >>
> >> gnulib's replacement headers and (and .c files) depend on
> >> that being included.
> >>
> >> Did I simply miss it?
> >>
> >
> > gnulib’s config.h is created on compile time. After building the
> > library it is present inside gnulib build folder.
>
> Sure, but that was not the question. The question is how are
> the gcc files including that new config.h file.
>
> E.g., how come you're not getting this:
>
> gcc/foo.c
> -> #include "config.h" (pick up gcc's config.h not the new gnulib one)
> -> #include (e.g., #include )
> -> trip on #error in gnulib replacement header:
> #ifndef _GL_INLINE_HEADER_BEGIN
> #error "Please include config.h first."
> #endif
>
> As explained here:
>
> https://gcc.gnu.org/ml/gcc/2016-06/msg00144.html
>
> and here in more detail:
>
> https://sourceware.org/ml/gdb-patches/2012-04/msg00426.html
>
> the scheme of configuring gnulib in a separate directory as borrowed from gdb
> requires including two config.h headers -- the gnulib client's, and gnulib's.
>
> Did you do something different that avoids needing that somehow?

I wasn’t aware of this. Thanks for pointing this out.
It’s strange however, I didn’t see anything failing while
building/testing my system.
>
> In gdb, .c files don't include "config.h" directly. Instead all .c files
> include a "defs.h" file first thing, and that in turn (after another 
> indirection)
> is what includes both gdb's "config.h" and gnulib's "config.h”:

Can gcc also adopt a similar approach? Include gnulib’s config.h in a
single header file instead of including it in every function that uses
it.
Which header file would be the most suitable for this purpose(probably
which is generically included by almost all the gcc functions)?
>
> https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob;f=gdb/common/common-defs.h;h=2c9411704531b510d176a4a22a718ae8313294e7;hb=HEAD#l23
>
> Thanks,
> Pedro Alves
>
>


Re: [PATCH][RFC] Add the ability to record sub-timevars (-ftime-report-details)

2016-08-10 Thread David Malcolm
On Wed, 2016-08-10 at 14:04 +0200, Richard Biener wrote:
> The following patch adds the ability to record time spent in utility
> to the pass using it.  For things like CFG cleanup, alias stmt
> walking
> or DF infrastructure work it is currently not visible which pass is
> responsible for the time spent there.  With -ftime-report-details
> you now get
> 
>  alias stmt walking  :   4.48 ( 2%) usr   0.14 ( 3%) sys   4.66 (
> 2%) 
> wall1209 kB ( 0%) ggc
> ...
>  tree PRE:   3.58 ( 2%) usr   0.04 ( 1%) sys   3.71 (
> 2%) 
> wall   11870 kB ( 1%) ggc
>  `- tree tail merge  :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.02 (
> 0%) 
> wall   0 kB ( 0%) ggc
>  `- alias stmt walking   :   0.88 ( 0%) usr   0.00 ( 0%) sys   0.91 (
> 0%) 
> wall 153 kB ( 0%) ggc
>  `- loop init:   0.05 ( 0%) usr   0.00 ( 0%) sys   0.08 (
> 0%) 
> wall   0 kB ( 0%) ggc
>  `- tree SSA incremental :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 (
> 0%) 
> wall   0 kB ( 0%) ggc
>  `- tree operand scan:   0.06 ( 0%) usr   0.00 ( 0%) sys   0.04 (
> 0%) 
> wall 544 kB ( 0%) ggc
>  `- tree STMT verifier   :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.01 (
> 0%) 
> wall   0 kB ( 0%) ggc
>  tree FRE:   3.91 ( 2%) usr   0.09 ( 2%) sys   3.71 (
> 2%) 
> wall5658 kB ( 1%) ggc
>  `- alias stmt walking   :   2.81 ( 2%) usr   0.08 ( 2%) sys   2.84 (
> 2%) 
> wall 819 kB ( 0%) ggc
>  `- unaccounted todo :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.03 (
> 0%) 
> wall 851 kB ( 0%) ggc
>  `- tree operand scan:   0.06 ( 0%) usr   0.00 ( 0%) sys   0.02 (
> 0%) 
> wall   0 kB ( 0%) ggc
>  `- dominance computation:   0.03 ( 0%) usr   0.00 ( 0%) sys   0.01 (
> 0%) 
> wall   0 kB ( 0%) ggc
>  tree code sinking   :   0.11 ( 0%) usr   0.02 ( 0%) sys   0.17 (
> 0%) 
> wall1268 kB ( 0%) ggc
>  `- loop init:   0.13 ( 0%) usr   0.00 ( 0%) sys   0.11 (
> 0%) 
> wall 526 kB ( 0%) ggc
>  `- dominance computation:   0.06 ( 0%) usr   0.00 ( 0%) sys   0.01 (
> 0%) 
> wall   0 kB ( 0%) ggc
> ...

The example seems to have suffered from line-wrap.  Presumably it looks
like:

 alias stmt walking  :   4.48 ( 2%) usr   0.14 ( 3%) sys   4.66 ( 2%) wall  
  1209 kB ( 0%) ggc
...
 tree PRE:   3.58 ( 2%) usr   0.04 ( 1%) sys   3.71 ( 2%) wall  
 11870 kB ( 1%) ggc
 `- tree tail merge  :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall  
 0 kB ( 0%) ggc
 `- alias stmt walking   :   0.88 ( 0%) usr   0.00 ( 0%) sys   0.91 ( 0%) wall  
   153 kB ( 0%) ggc
 `- loop init:   0.05 ( 0%) usr   0.00 ( 0%) sys   0.08 ( 0%) wall  
 0 kB ( 0%) ggc
 `- tree SSA incremental :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall  
 0 kB ( 0%) ggc
 `- tree operand scan:   0.06 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall  
   544 kB ( 0%) ggc
 `- tree STMT verifier   :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall  
 0 kB ( 0%) ggc
 tree FRE:   3.91 ( 2%) usr   0.09 ( 2%) sys   3.71 ( 2%) wall  
  5658 kB ( 1%) ggc
 `- alias stmt walking   :   2.81 ( 2%) usr   0.08 ( 2%) sys   2.84 ( 2%) wall  
   819 kB ( 0%) ggc
 `- unaccounted todo :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall  
   851 kB ( 0%) ggc
 `- tree operand scan:   0.06 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall  
 0 kB ( 0%) ggc
 `- dominance computation:   0.03 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall  
 0 kB ( 0%) ggc
 tree code sinking   :   0.11 ( 0%) usr   0.02 ( 0%) sys   0.17 ( 0%) wall  
  1268 kB ( 0%) ggc
 `- loop init:   0.13 ( 0%) usr   0.00 ( 0%) sys   0.11 ( 0%) wall  
   526 kB ( 0%) ggc
 `- dominance computation:   0.06 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall  
 0 kB ( 0%) ggc
...


(...assuming I can get Evolution to suppress line-wrapping above)


BTW, did you test this with jit enabled?  The jit "frontend" exposes
part of the timer API to client code and uses it in some ways that the
rest of the code doesn't, with its own two-level hierarchy:
https://gcc.gnu.org/onlinedocs/jit/topics/performance.html#the-timing-api

Various other comments inline below

> note that the full time spent in, say, alias stmt walking is still
> recorded and dumped.  Note this also visualizes current nesting
> of timevars and thus you'll get
> 
>  rest of compilation :   1.02 ( 1%) usr   0.04 ( 1%) sys   1.17 (
> 1%) 
> wall4977 kB ( 0%) ggc
>  `- hard reg cprop   :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.02 (
> 0%) 
> wall   0 kB ( 0%) ggc
>  `- tree conservative DCE:   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 (
> 0%) 
> wall   0 kB ( 0%) ggc
> ...
> 
> because we have passes w/o timevar that are accumulated in
> rest of compilation.  There is the opportunity to clean this up.
> 
> I've extended the pass timevars to also cover their TODOs and thus
> removed TV_TODO.
> 
> The accounting is "flat", that is, at most one level of sub-timevars
> are recorded.
> 
> Does this look 

[PATCH v3] cpp/c: Add -Wexpansion-to-defined

2016-08-10 Thread Paolo Bonzini
From: Paolo Bonzini 

clang recently added a new warning -Wexpansion-to-defined, which
warns when `defined' is used outside a #if expression (including the
case of a macro that is then used in a #if expression).

While I disagree with their inclusion of the warning to -Wall, I think
it is a good addition overall.  First, it is a logical extension of the
existing warning for breaking defined across a macro and its caller.
Second, it is good to make these warnings for `defined' available with
a command-line option other than -pedantic.

This patch adds -Wexpansion-to-defined, and enables it automatically
for both -pedantic (as before) and -Wextra.  It also changes the warning
from a warning to a pedwarn; this is mostly for documentation sake
since the EnabledBy machinery would turn -pedantic-errors into
-Werror=expansion-to-defined anyway.

Bootstrapped and regression tested x86_64-pc-linux-gnu, ok?

Paolo

libcpp:
2016-08-09  Paolo Bonzini  

* include/cpplib.h (struct cpp_options): Add new member
warn_expansion_to_defined.
(CPP_W_EXPANSION_TO_DEFINED): New enum member.
* expr.c (parse_defined): Warn for all uses of "defined"
in macros, and tie warning to CPP_W_EXPANSION_TO_DEFINED.
Make it a pedwarning instead of a warning.
* system.h (HAVE_DESIGNATED_INITIALIZERS): Do not use
"defined" in macros.

gcc:
2016-08-09  Paolo Bonzini  

* system.h (HAVE_DESIGNATED_INITIALIZERS,
HAVE_DESIGNATED_UNION_INITIALIZERS): Do not use
"defined" in macros.

gcc/c-family:
2016-08-09  Paolo Bonzini  

* c.opt (Wexpansion-to-defined): New.

gcc/doc:
2016-08-09  Paolo Bonzini  

* cpp.texi (Defined): Mention -Wexpansion-to-defined.
* cppopts.texi (Invocation): Document -Wexpansion-to-defined.
* invoke.texi (Warning Options): Document -Wexpansion-to-defined.

gcc/testsuite:
2016-08-09  Paolo Bonzini  

* gcc.dg/cpp/defined.c: Mark newly introduced warnings and
adjust for warning->pedwarn change.
* gcc.dg/cpp/defined-Wexpansion-to-defined.c,
gcc.dg/cpp/defined-Wextra-Wno-expansion-to-defined.c,
gcc.dg/cpp/defined-Wextra.c,
gcc.dg/cpp/defined-Wno-expansion-to-defined.c: New testcases.

Index: gcc/c-family/c.opt
===
--- gcc/c-family/c.opt  (revision 239276)
+++ gcc/c-family/c.opt  (working copy)
@@ -506,6 +506,10 @@
 C ObjC C++ ObjC++ Var(warn_double_promotion) Warning
 Warn about implicit conversions from \"float\" to \"double\".
 
+Wexpansion-to-defined
+C ObjC C++ ObjC++ CPP(warn_expansion_to_defined) 
CppReason(CPP_W_EXPANSION_TO_DEFINED) Var(cpp_warn_expansion_to_defined) 
Init(0) Warning EnabledBy(Wextra || Wpedantic)
+Warn if \"defined\" is used outside #if.
+
 Wimplicit-function-declaration
 C ObjC Var(warn_implicit_function_declaration) Init(-1) Warning 
LangEnabledBy(C ObjC,Wimplicit)
 Warn about implicit function declarations.
Index: gcc/doc/cpp.texi
===
--- gcc/doc/cpp.texi(revision 239276)
+++ gcc/doc/cpp.texi(working copy)
@@ -3342,7 +3342,9 @@
 the C standard says the behavior is undefined.  GNU cpp treats it as a
 genuine @code{defined} operator and evaluates it normally.  It will warn
 wherever your code uses this feature if you use the command-line option
-@option{-pedantic}, since other compilers may handle it differently.
+@option{-Wpedantic}, since other compilers may handle it differently.  The
+warning is also enabled by @option{-Wextra}, and can also be enabled
+individually with @option{-Wexpansion-to-defined}.
 
 @node Else
 @subsection Else
Index: gcc/doc/cppopts.texi
===
--- gcc/doc/cppopts.texi(revision 239276)
+++ gcc/doc/cppopts.texi(working copy)
@@ -120,6 +120,12 @@
 @samp{#if} directive, outside of @samp{defined}.  Such identifiers are
 replaced with zero.
 
+@item -Wexpansion-to-defined
+@opindex Wexpansion-to-defined
+Warn whenever @samp{defined} is encountered in the expansion of a macro
+(including the case where the macro is expanded by an @samp{#if} directive).
+Such usage is not portable.
+
 @item -Wunused-macros
 @opindex Wunused-macros
 Warn about macros defined in the main file that are unused.  A macro
Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi (revision 239276)
+++ gcc/doc/invoke.texi (working copy)
@@ -4914,6 +4914,11 @@
 construct, known from C++, was introduced with ISO C99 and is by default
 allowed in GCC@.  It is not supported by ISO C90.  @xref{Mixed Declarations}.
 
+@item -Wexpansion-to-defined
+@opindex Wexpansion-to-defined
+Warn whenever @samp{defined} is encountered in the expansion of a macro.
+This warning is also enabled by @option{-Wpedantic} and @option{-Wextra}.
+
 @item -Wundef
 @opindex Wundef
 @opindex Wno-undef

Re: [PATCH][RFC] Add the ability to record sub-timevars (-ftime-report-details)

2016-08-10 Thread Alexander Monakov
> Index: gcc/doc/invoke.texi
> ===
> --- gcc/doc/invoke.texi   (revision 238469)
> +++ gcc/doc/invoke.texi   (working copy)
[snip]
> @@ -12543,6 +12543,10 @@ print some statistics about each pass wh
>  Makes the compiler print some statistics about the time consumed by each
>  pass when it finishes.
>  
> +@item -ftime-report
> +@opindex ftime-report
> +Record the time consumed by infrastructure parts separately for each pass.

It seems pasted @item/@opindex are missing '-details'.

Alexander


Re: [PATCH build/doc] Replacing libiberty with gnulib

2016-08-10 Thread Manuel López-Ibáñez
On 10 August 2016 at 14:40, ayush goel  wrote:
> On 9 August 2016 at 2:20:59 PM, Pedro Alves (pal...@redhat.com) wrote:
>> the scheme of configuring gnulib in a separate directory as borrowed from gdb
>> requires including two config.h headers -- the gnulib client's, and gnulib's.

Isn't this also true for libiberty's config.h ? I have no idea
when/how is that included.

>> Did you do something different that avoids needing that somehow?
>
> I wasn’t aware of this. Thanks for pointing this out.
> It’s strange however, I didn’t see anything failing while
> building/testing my system.
>>
>> In gdb, .c files don't include "config.h" directly. Instead all .c files
>> include a "defs.h" file first thing, and that in turn (after another 
>> indirection)
>> is what includes both gdb's "config.h" and gnulib's "config.h”:
>
> Can gcc also adopt a similar approach? Include gnulib’s config.h in a
> single header file instead of including it in every function that uses
> it.
> Which header file would be the most suitable for this purpose(probably
> which is generically included by almost all the gcc functions)?

Unfortunately, gcc/*.c include config.h directly. Sorry, I'm not
really sure how this is supposed to work and how it was working for
libiberty's config.h but I'd suggest to copy that if possible.

Cheers,

Manuel.


Re: [PATCH build/doc] Replacing libiberty with gnulib

2016-08-10 Thread Pedro Alves
On 08/10/2016 02:40 PM, ayush goel wrote:
> On 9 August 2016 at 2:20:59 PM, Pedro Alves (pal...@redhat.com) wrote:

> I wasn’t aware of this. Thanks for pointing this out.
> It’s strange however, I didn’t see anything failing while
> building/testing my system.

If I comment out the include on gdb, here's what I get:

g++ -g3 -O0   -I. -I/home/pedro/gdb/mygit/src/gdb 
-I/home/pedro/gdb/mygit/src/gdb/common -I/home/pedro/gdb/mygit/src/gdb/config 
-DLOCALEDIR="\"/opt/gdb/share/locale\"" -DHAVE_CONFIG_H 
-I/home/pedro/gdb/mygit/src/gdb/../include/opcode 
-I/home/pedro/gdb/mygit/src/gdb/../opcodes/.. 
-I/home/pedro/gdb/mygit/src/gdb/../readline/.. 
-I/home/pedro/gdb/mygit/src/gdb/../zlib -I../bfd 
-I/home/pedro/gdb/mygit/src/gdb/../bfd 
-I/home/pedro/gdb/mygit/src/gdb/../include -I../libdecnumber 
-I/home/pedro/gdb/mygit/src/gdb/../libdecnumber  
-I/home/pedro/gdb/mygit/src/gdb/gnulib/import -Ibuild-gnulib/import   -DTUI=1   
-pthread -I/usr/include/guile/2.0  -I/usr/include/python3.4m 
-I/usr/include/python3.4m -Wall -Wpointer-arith -Wno-unused -Wunused-value 
-Wunused-function -Wno-switch -Wno-char-subscripts -Wempty-body 
-Wunused-but-set-parameter -Wunused-but-set-variable -Wno-sign-compare 
-Wno-write-strings -Wno-narrowing -Wformat-nonliteral -Werror -c -o gdb.o -MT 
gdb.o -MMD -MP -MF .deps/gdb.Tpo /home/pedro/gdb/mygit/src/gdb/gdb.c
In file included from 
/home/pedro/gdb/mygit/src/gdb/gnulib/import/pathmax.h:42:0,
 from /home/pedro/gdb/mygit/src/gdb/common/common-defs.h:67,
 from /home/pedro/gdb/mygit/src/gdb/defs.h:28,
 from /home/pedro/gdb/mygit/src/gdb/gdb.c:19:
build-gnulib/import/unistd.h:135:3: error: #error "Please include config.h 
first."
  #error "Please include config.h first."
   ^
In file included from /home/pedro/gdb/mygit/src/gdb/gdb_wchar.h:53:0,
 from /home/pedro/gdb/mygit/src/gdb/defs.h:51,
 from /home/pedro/gdb/mygit/src/gdb/gdb.c:19:
build-gnulib/import/wctype.h:66:3: error: #error "Please include config.h 
first."
  #error "Please include config.h first."
   ^
In file included from 
/home/pedro/gdb/mygit/src/gdb/gnulib/import/pathmax.h:42:0,
 from /home/pedro/gdb/mygit/src/gdb/common/common-defs.h:67,
 from /home/pedro/gdb/mygit/src/gdb/defs.h:28,
 from /home/pedro/gdb/mygit/src/gdb/gdb.c:19:
build-gnulib/import/unistd.h:137:1: error: ‘_GL_INLINE_HEADER_BEGIN’ does not 
name a type
 _GL_INLINE_HEADER_BEGIN
 ^
build-gnulib/import/unistd.h:1894:1: error: ‘_GL_INLINE_HEADER_END’ does not 
name a type
 _GL_INLINE_HEADER_END
 ^
In file included from /home/pedro/gdb/mygit/src/gdb/gdb_wchar.h:53:0,
 from /home/pedro/gdb/mygit/src/gdb/defs.h:51,
 from /home/pedro/gdb/mygit/src/gdb/gdb.c:19:
build-gnulib/import/wctype.h:68:1: error: ‘_GL_INLINE_HEADER_BEGIN’ does not 
name a type
 _GL_INLINE_HEADER_BEGIN
 ^

> It’s strange however, I didn’t see anything failing while
> building/testing my system.

You're pulling in the unistd.h replacement module, and from above,
we see that that replacement header should trip on this.

I think the reason you don't trip on this, is here:

> +INCGNU = -I../gnulib -I$(srcdir)/../gnulib/import

Specifically, the "-I../gnulib" part.  That looks wrong.

Here's what the gnulib build directory looks like on a gdb
build:

$ cd build/gdb/build-gnulib
$ ls
import  config.cache  config.h  config.log  config.status  Makefile  stamp-h1
$ ls import/
sys  dirent.hmath.h   string.h   wchar.h
alloca.h dirname-lgpl.o  math.o   stripslash.o   wctype.h
arg-nonnull.hinttypes.h  ref-add.sed  strnlen1.o wctype-h.o
basename-lgpl.o  libgnu.aref-del.sed  time.h
c++defs.hlocalcharset.o  signal.h unistd.h
charset.aliasMakefilestdio.h  unistd.o
configmake.h malloca.o   stdlib.h warn-on-use.h

So you want to point -I at these generated, replacement headers, not
the gnulib build dir root, which in gcc is simply gnulib/ instead of
build-gnulib/.

So again, here:

> +INCGNU = -I../gnulib -I$(srcdir)/../gnulib/import

"-I../gnulib" should instead be "-I../gnulib/import" to put gnulib's
generated replacement headers in the include path.

And then you should trip on the problem.  And you'll make gnulib's
replacement headers actually replace something.  :-)

Assuming I'm right, that is.

> Can gcc also adopt a similar approach? Include gnulib’s config.h in a
> single header file instead of including it in every function that uses
> it.
> Which header file would be the most suitable for this purpose(probably
> which is generically included by almost all the gcc functions)?

I think "system.h" would be the most suitable.  It's already included
everywhere first thing.  The gen* stuff might need something special.

Since you're problably going to use sed to patch the files, including
some new file instead of config.h should be about the same 

Re: [PATCH][RFC] Add the ability to record sub-timevars (-ftime-report-details)

2016-08-10 Thread Richard Biener
On Wed, 10 Aug 2016, David Malcolm wrote:

> On Wed, 2016-08-10 at 14:04 +0200, Richard Biener wrote:
> > The following patch adds the ability to record time spent in utility
> > to the pass using it.  For things like CFG cleanup, alias stmt
> > walking
> > or DF infrastructure work it is currently not visible which pass is
> > responsible for the time spent there.  With -ftime-report-details
> > you now get
> > 
> >  alias stmt walking  :   4.48 ( 2%) usr   0.14 ( 3%) sys   4.66 (
> > 2%) 
> > wall1209 kB ( 0%) ggc
> > ...
> >  tree PRE:   3.58 ( 2%) usr   0.04 ( 1%) sys   3.71 (
> > 2%) 
> > wall   11870 kB ( 1%) ggc
> >  `- tree tail merge  :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.02 (
> > 0%) 
> > wall   0 kB ( 0%) ggc
> >  `- alias stmt walking   :   0.88 ( 0%) usr   0.00 ( 0%) sys   0.91 (
> > 0%) 
> > wall 153 kB ( 0%) ggc
> >  `- loop init:   0.05 ( 0%) usr   0.00 ( 0%) sys   0.08 (
> > 0%) 
> > wall   0 kB ( 0%) ggc
> >  `- tree SSA incremental :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 (
> > 0%) 
> > wall   0 kB ( 0%) ggc
> >  `- tree operand scan:   0.06 ( 0%) usr   0.00 ( 0%) sys   0.04 (
> > 0%) 
> > wall 544 kB ( 0%) ggc
> >  `- tree STMT verifier   :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.01 (
> > 0%) 
> > wall   0 kB ( 0%) ggc
> >  tree FRE:   3.91 ( 2%) usr   0.09 ( 2%) sys   3.71 (
> > 2%) 
> > wall5658 kB ( 1%) ggc
> >  `- alias stmt walking   :   2.81 ( 2%) usr   0.08 ( 2%) sys   2.84 (
> > 2%) 
> > wall 819 kB ( 0%) ggc
> >  `- unaccounted todo :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.03 (
> > 0%) 
> > wall 851 kB ( 0%) ggc
> >  `- tree operand scan:   0.06 ( 0%) usr   0.00 ( 0%) sys   0.02 (
> > 0%) 
> > wall   0 kB ( 0%) ggc
> >  `- dominance computation:   0.03 ( 0%) usr   0.00 ( 0%) sys   0.01 (
> > 0%) 
> > wall   0 kB ( 0%) ggc
> >  tree code sinking   :   0.11 ( 0%) usr   0.02 ( 0%) sys   0.17 (
> > 0%) 
> > wall1268 kB ( 0%) ggc
> >  `- loop init:   0.13 ( 0%) usr   0.00 ( 0%) sys   0.11 (
> > 0%) 
> > wall 526 kB ( 0%) ggc
> >  `- dominance computation:   0.06 ( 0%) usr   0.00 ( 0%) sys   0.01 (
> > 0%) 
> > wall   0 kB ( 0%) ggc
> > ...
> 
> The example seems to have suffered from line-wrap.  Presumably it looks
> like:
> 
>  alias stmt walking  :   4.48 ( 2%) usr   0.14 ( 3%) sys   4.66 ( 2%) 
> wall1209 kB ( 0%) ggc
> ...
>  tree PRE:   3.58 ( 2%) usr   0.04 ( 1%) sys   3.71 ( 2%) 
> wall   11870 kB ( 1%) ggc
>  `- tree tail merge  :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) 
> wall   0 kB ( 0%) ggc
>  `- alias stmt walking   :   0.88 ( 0%) usr   0.00 ( 0%) sys   0.91 ( 0%) 
> wall 153 kB ( 0%) ggc
>  `- loop init:   0.05 ( 0%) usr   0.00 ( 0%) sys   0.08 ( 0%) 
> wall   0 kB ( 0%) ggc
>  `- tree SSA incremental :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) 
> wall   0 kB ( 0%) ggc
>  `- tree operand scan:   0.06 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) 
> wall 544 kB ( 0%) ggc
>  `- tree STMT verifier   :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) 
> wall   0 kB ( 0%) ggc
>  tree FRE:   3.91 ( 2%) usr   0.09 ( 2%) sys   3.71 ( 2%) 
> wall5658 kB ( 1%) ggc
>  `- alias stmt walking   :   2.81 ( 2%) usr   0.08 ( 2%) sys   2.84 ( 2%) 
> wall 819 kB ( 0%) ggc
>  `- unaccounted todo :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) 
> wall 851 kB ( 0%) ggc
>  `- tree operand scan:   0.06 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) 
> wall   0 kB ( 0%) ggc
>  `- dominance computation:   0.03 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) 
> wall   0 kB ( 0%) ggc
>  tree code sinking   :   0.11 ( 0%) usr   0.02 ( 0%) sys   0.17 ( 0%) 
> wall1268 kB ( 0%) ggc
>  `- loop init:   0.13 ( 0%) usr   0.00 ( 0%) sys   0.11 ( 0%) 
> wall 526 kB ( 0%) ggc
>  `- dominance computation:   0.06 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) 
> wall   0 kB ( 0%) ggc
> ...
> 
> 
> (...assuming I can get Evolution to suppress line-wrapping above)
> 
> 
> BTW, did you test this with jit enabled?  The jit "frontend" exposes
> part of the timer API to client code and uses it in some ways that the
> rest of the code doesn't, with its own two-level hierarchy:
> https://gcc.gnu.org/onlinedocs/jit/topics/performance.html#the-timing-api

If I have to enable jit manually then no, but I made it a timevar-def
map rather than a TV_ID one to catch the named stuff that's probably
there for the JIT.  We don't dump the info recorded though.

> Various other comments inline below
> 
> > note that the full time spent in, say, alias stmt walking is still
> > recorded and dumped.  Note this also visualizes current nesting
> > of timevars and thus you'll get
> > 
> >  rest of compilation :   1.02 ( 1%) usr   0.04 ( 1%) sys   1.17 (
> > 1%) 
> > wall4977 kB ( 0%) ggc
> >  `- hard reg cprop   :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.02 (
> > 0%)

Re: [PATCH][RFC] Add the ability to record sub-timevars (-ftime-report-details)

2016-08-10 Thread Alexander Monakov
On Wed, 10 Aug 2016, David Malcolm wrote:
> Looks like inside a pop there's a lazy allocation of a map, and a
> get_or_insert.  Could this make -ftime-report have an impact on the
> timing?

-ftime-report already has a great impact on timing because it adds a syscall on
each timevar change (which happens thousands of times). It's possible to 
diminish
that impact by using a time measuring facility that can go through just the vDSO
on Linux (e.g. clock_gettime), but that wouldn't make split user/kernel timing
available, and on old glibc versions you get clock_gettime only with -lrt.

On a random translation unit of about 1800 sloc the impact of -ftime report is:

w/o -ftime-reportwith -ftime-report
real0m0.456s real0m0.544s
user0m0.447s user0m0.464s
sys 0m0.009s sys 0m0.079s

Alexander


Re: [PATCH build/doc] Replacing libiberty with gnulib

2016-08-10 Thread Pedro Alves
On 08/10/2016 03:05 PM, Manuel López-Ibáñez wrote:
> On 10 August 2016 at 14:40, ayush goel  wrote:
>> On 9 August 2016 at 2:20:59 PM, Pedro Alves (pal...@redhat.com) wrote:
>>> the scheme of configuring gnulib in a separate directory as borrowed from 
>>> gdb
>>> requires including two config.h headers -- the gnulib client's, and 
>>> gnulib's.
> 
> Isn't this also true for libiberty's config.h ? I have no idea
> when/how is that included.

Nope.  For libiberty, the current scheme is that all clients of libiberty
duplicate the necessary autoconf checks on their own, so the client's
config.h file ends up with a copy the necessary HAVE_FOOs.

For libiberty, I've tried to factor out the autoconf checks, here:

 https://gcc.gnu.org/ml/gcc-patches/2015-02/msg00580.html

Ended up being gdb-only:

 https://gcc.gnu.org/ml/gcc-patches/2015-02/msg01712.html

src/intl takes yet another approach.  See src/intl/config.intl.in.

Thanks,
Pedro Alves



[PATCH][RFC] Fix PR72851 by "randomizing" SSA propagator worklist extraction

2016-08-10 Thread Richard Biener

The following randomizes SSA propagator worklist processing to make the
processing order less likely hit the worst-case as in the PR.  Ideally
we'd process it in stmt order but that adds overhead to the idea of a
SSA propagator that makes it very much not appealing.  Using a queue
rather than a stack would als be a possibility (but also not really
fix the underlying issue).  So this patch uses a very simple approach
to fix the specific testcase.

Opinion?  Any great ideas on how to avoid O (n log n) sorting or
O (log n) insertion?  [mark blocks to be visited and simply iterate
over all stmts in to-be-visited BBs]

Bootstrap/regtest running on x86_64-unknown-linux-gnu.

Richard.

2016-08-10  Richard Biener  

PR tree-optimization/72851
* tree-ssa-propagate.c (process_ssa_edge_worklist): Use a
randomized worklist item order.

* gcc.dg/torture/pr72851.c: New testcase.

Index: gcc/tree-ssa-propagate.c
===
*** gcc/tree-ssa-propagate.c(revision 238469)
--- gcc/tree-ssa-propagate.c(working copy)
*** process_ssa_edge_worklist (vec
*** 422,428 
basic_block bb;
  
/* Pull the statement to simulate off the worklist.  */
!   gimple *stmt = worklist->pop ();
  
/* If this statement was already visited by simulate_block, then
 we don't need to visit it again here.  */
--- 422,429 
basic_block bb;
  
/* Pull the statement to simulate off the worklist.  */
!   gimple *stmt = (*worklist)[0];
!   worklist->unordered_remove (0);
  
/* If this statement was already visited by simulate_block, then
 we don't need to visit it again here.  */
Index: gcc/testsuite/gcc.dg/torture/pr72851.c
===
*** gcc/testsuite/gcc.dg/torture/pr72851.c  (revision 0)
--- gcc/testsuite/gcc.dg/torture/pr72851.c  (working copy)
***
*** 0 
--- 1,30 
+ /* { dg-do compile } */
+ 
+ typedef unsigned char uint8_t;
+ typedef unsigned long int uint64_t;
+ union unaligned_64 {
+ uint64_t l;
+ }
+ __attribute__((packed)) __attribute__((may_alias));
+ typedef struct AVDES {
+ uint64_t round_keys[3][16];
+ } AVDES;
+ static const uint8_t PC1_shuffle[] = {
+ 64-57,64-49,64-41,64-33,64-25,64-17,64-9, 
64-1,64-58,64-50,64-42,64-34,64-26,64-18, 
64-10,64-2,64-59,64-51,64-43,64-35,64-27, 
64-19,64-11,64-3,64-60,64-52,64-44,64-36, 
64-63,64-55,64-47,64-39,64-31,64-23,64-15, 
64-7,64-62,64-54,64-46,64-38,64-30,64-22, 
64-14,64-6,64-61,64-53,64-45,64-37,64-29, 
64-21,64-13,64-5,64-28,64-20,64-12,64-4 };
+ static const uint8_t PC2_shuffle[] = {
+ 56-14,56-17,56-11,56-24,56-1,56-5, 56-3,56-28,56-15,56-6,56-21,56-10, 
56-23,56-19,56-12,56-4,56-26,56-8, 56-16,56-7,56-27,56-20,56-13,56-2,   
  56-41,56-52,56-31,56-37,56-47,56-55, 56-30,56-40,56-51,56-45,56-33,56-48, 
56-44,56-49,56-39,56-56,56-34,56-53, 
56-46,56-42,56-50,56-36,56-29,56-32 };
+ static uint64_t shuffle(uint64_t in, const uint8_t *shuffle, int shuffle_len)
+ {
+   int i;
+   uint64_t res = 0;
+   for (i = 0; i < shuffle_len; i++)
+ res += res + ((in >> *shuffle++) & 1);
+   return res;
+ }
+ void gen_roundkeys(uint64_t K[16], uint64_t key)
+ {
+   int i;
+   uint64_t CDn = shuffle(key, PC1_shuffle, sizeof(PC1_shuffle));
+   for (i = 0; i < 16; i++)
+ K[i] = shuffle(CDn, PC2_shuffle, sizeof(PC2_shuffle));
+ }


Re: [PATCH build/doc] Replacing libiberty with gnulib

2016-08-10 Thread Pedro Alves
On 08/10/2016 03:06 PM, Pedro Alves wrote:

> Since you're problably going to use sed to patch the files, including
> some new file instead of config.h should be about the same work.

Note, above I was assuming that if you include the gnulib config.h file
in system.h, then you're going to move the inclusion of gcc's own
config.h to system.h as well, to keep the inclusions nicely together, like
gdb does.  Obviously if you leave gcc's config.h inclusions where
they are, then it's a smaller patch.  Maybe start with just changing
system.h, and leave removing all the current config.h inclusions
to a separate pass later (but not never).

> Up to gcc maintainers to decide.

Thanks,
Pedro Alves



Re: [PATCH v2] cpp/c: Add -Wexpansion-to-defined

2016-08-10 Thread Manuel López-Ibáñez
On 10 August 2016 at 13:06, Paolo Bonzini  wrote:
>
>
> On 10/08/2016 13:31, Manuel López-Ibáñez wrote:
>> My only fear is that people not using -Wpedantic nor -pedantic-errors
>> expect that GNU extensions work. This is a GNU extension that defines
>> something that is undefined according to ISO. Enabling the warning
>> with -Wextra is just annoying those people who may not care about
>> other compilers.
>
> I think this warning falls in the same category as
> -Wshift-negative-value.  (In fact I dislike -Wshift-negative-value a
> lot, and would put that one under -Wpedantic only).

It is not the same category. One is compile-time UB and the other is
runtime UB. See:
https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01551.html
and https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01529.html

Cheers,

Manuel.


Re: [PATCH v2] cpp/c: Add -Wexpansion-to-defined

2016-08-10 Thread Paolo Bonzini


On 10/08/2016 16:42, Manuel López-Ibáñez wrote:
> > > My only fear is that people not using -Wpedantic nor -pedantic-errors
> > > expect that GNU extensions work. This is a GNU extension that defines
> > > something that is undefined according to ISO. Enabling the warning
> > > with -Wextra is just annoying those people who may not care about
> > > other compilers.
> >
> > I think this warning falls in the same category as
> > -Wshift-negative-value.  (In fact I dislike -Wshift-negative-value a
> > lot, and would put that one under -Wpedantic only).
>
> It is not the same category. One is compile-time UB and the other is
> runtime UB. See:
> https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01551.html
> and https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01529.html

Right---what I meant is it's the same kind of "annoying for people who
expect that GNU extensions work" warning.

Paolo


Re: Implement -Wimplicit-fallthrough: the rest

2016-08-10 Thread Marek Polacek
On Fri, Jul 29, 2016 at 05:26:19PM +, Joseph Myers wrote:
> On Fri, 22 Jul 2016, Marek Polacek wrote:
> 
> > gcc/go/
> > * go-system.h (go_fallthrough): Define.
> > * gofrontend/escape.cc (Escape_analysis_assign::assign): Add
> > go_fallthrough.
> > * gofrontend/expressions.cc (Binary_expression::do_get_backend):
> > Likewise.
> 
> gofrontend is externally maintained and should not be modified locally.
 
Yeah, I'll send a patch to the gofrontend upstream.

> > libgcc/
> > * soft-fp/op-common.h: Add _FP_FALLTHRU.
> > * soft-fp/soft-fp.h (_FP_FALLTHRU): Define.
> 
> Likewise soft-fp (and I think a comment would be better than anything 
> needing version conditionals).

Actually, this doesn't generate a bootstrap error, just a warning, so could
probably stay as it is.  But if it's desirable to add those comments, I can
send a patch to glibc upstream.

Thanks,

Marek


Re: [PATCH] Allow relayout_decl on FIELD_DECLs (PR c/72816)

2016-08-10 Thread Jakub Jelinek
On Mon, Aug 08, 2016 at 11:04:32AM +, Joseph Myers wrote:
> On Sat, 6 Aug 2016, Jakub Jelinek wrote:
> 
> > --- gcc/testsuite/gcc.dg/pr72816.c.jj   2016-08-06 13:06:45.046003282 
> > +0200
> > +++ gcc/testsuite/gcc.dg/pr72816.c  2016-08-06 13:07:57.217093845 +0200
> > @@ -0,0 +1,9 @@
> > +/* PR c/72816 */
> > +/* { dg-do compile } */
> > +/* { dg-options "-std=gnu11" } */
> > +
> > +typedef const int A[];
> > +struct S {
> > +  int a;
> > +  A b; /* { dg-error "array size missing" } */
> > +};
> 
> As far as I can tell, this is actually valid code that should not produce 
> an error; the type of a flexible array member can be given by a typedef, 
> and I see nothing to disallow it being given by a typedef for an array of 
> qualified type.  Note that both the version of this test without const, 
> and the version with const but not using a typedef, are accepted.

Indeed.
The following untested patch fixes the issue for me.  The problem was that
when we create the distinct type with TYPE_DOMAIN [0:], if
TYPE_MAIN_VARIANT (orig_qual_type) is TYPE_MAIN_VARIANT (type) before we
do this, then c_build_qualified_type will return that orig_qual_type and
the TYPE_DOMAIN [0:] is gone again.  I think for orig_qual_indirect > 0
we are ok, in that case c_build_qualified_type will never return that and
as the flexible array member is the outermost ARRAY_TYPE, we should be fine.
But perhaps if (orig_qual_indirect > 0 && orig_qual_type), we never reach
the flexible array member through typedef else if, so perhaps
just unconditional orig_qual_type = NULL_TREE; in that else if would work too.
Which of these two versions do you prefer (or something else)?
Shall I revert the stor-layout.c change which might be unnecessary (though
shouldn't hurt)?

2016-08-10  Jakub Jelinek  

PR c/72816
* c-decl.c (grokdeclarator): When adding TYPE_DOMAIN for flexible
array member through typedef, for orig_qual_indirect == 0 clear
orig_qual_type.

* gcc.dg/pr72816.c: New test.

--- gcc/c/c-decl.c.jj   2016-08-06 12:11:47.0 +0200
+++ gcc/c/c-decl.c  2016-08-10 17:07:10.675144603 +0200
@@ -6710,6 +6710,8 @@ grokdeclarator (const struct c_declarato
type = build_distinct_type_copy (TYPE_MAIN_VARIANT (type));
TYPE_DOMAIN (type) = build_range_type (sizetype, size_zero_node,
   NULL_TREE);
+   if (orig_qual_indirect == 0)
+ orig_qual_type = NULL_TREE;
  }
type = c_build_qualified_type (type, type_quals, orig_qual_type,
   orig_qual_indirect);
--- gcc/testsuite/gcc.dg/pr72816.c.jj   2016-08-07 11:48:34.0 +0200
+++ gcc/testsuite/gcc.dg/pr72816.c  2016-08-10 17:08:21.153266419 +0200
@@ -5,5 +5,5 @@
 typedef const int A[];
 struct S {
   int a;
-  A b; /* { dg-error "array size missing" } */
+  A b;
 };


Jakub


Re: [PATCH v2] cpp/c: Add -Wexpansion-to-defined

2016-08-10 Thread Manuel López-Ibáñez
On 10 August 2016 at 15:44, Paolo Bonzini  wrote:
>
>
> On 10/08/2016 16:42, Manuel López-Ibáñez wrote:
>> > > My only fear is that people not using -Wpedantic nor -pedantic-errors
>> > > expect that GNU extensions work. This is a GNU extension that defines
>> > > something that is undefined according to ISO. Enabling the warning
>> > > with -Wextra is just annoying those people who may not care about
>> > > other compilers.
>> >
>> > I think this warning falls in the same category as
>> > -Wshift-negative-value.  (In fact I dislike -Wshift-negative-value a
>> > lot, and would put that one under -Wpedantic only).
>>
>> It is not the same category. One is compile-time UB and the other is
>> runtime UB. See:
>> https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01551.html
>> and https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01529.html
>
> Right---what I meant is it's the same kind of "annoying for people who
> expect that GNU extensions work" warning.

Oh, I agree. I'm just mentioning what the current definition/behavior
is (and documenting it here:
https://gcc.gnu.org/wiki/DiagnosticsGuidelines FWIW), not what I think
should be.

Perhaps we need something like -Wextra-pedantic, for things that are
undefined by ISO C but defined by GNU. Thus, they would not trigger
pedwarns and no error with -pedantic-errors.

Or we need to split -Wpedantic into -Wpedantic-pedwarns and
-Wpedantic-nopedwarns (with better names). This way -pedantic-errors
would be equivalent to -Werror=pedantic-pedwarns +
-Werror=pedwarns-not-controlled-by-pedantic.

i find -pedantic-errors too out of place with the rest of -W* options.

Manuel.


Re: [PATCH v2] cpp/c: Add -Wexpansion-to-defined

2016-08-10 Thread Paolo Bonzini


On 10/08/2016 17:24, Manuel López-Ibáñez wrote:
> Perhaps we need something like -Wextra-pedantic, for things that are
> undefined by ISO C but defined by GNU. Thus, they would not trigger
> pedwarns and no error with -pedantic-errors.

I think this is overengineering it a bit.  If they are annoying, put
them in -Wpedantic; if they aren't too bad, put them in Wextra.

There is also the case (which includes -Wshift-negative-value) of things
that are undefined by ISO C and trapped by ubsan, but defined by GNU C.

Actually are pedwarns even necessary nowadays?  In theory they fall in
two categories:

- stuff that is EnabledBy(Wpedantic) automatically gets bumped from
warning to error by -pedantic-errors or (I think) -Werror=pedantic;

- stuff that is not enabled by anything should use OPT_Wpedantic, and
then warning(OPT_Wpedantic) should have the same effect as
pedwarn(OPT_Wpedantic).

Are there other cases that I'm missing?

Paolo


Re: [PATCH] Allow relayout_decl on FIELD_DECLs (PR c/72816)

2016-08-10 Thread Joseph Myers
On Wed, 10 Aug 2016, Jakub Jelinek wrote:

> 2016-08-10  Jakub Jelinek  
> 
>   PR c/72816
>   * c-decl.c (grokdeclarator): When adding TYPE_DOMAIN for flexible
>   array member through typedef, for orig_qual_indirect == 0 clear
>   orig_qual_type.
> 
>   * gcc.dg/pr72816.c: New test.

OK (subject to testing).

-- 
Joseph S. Myers
jos...@codesourcery.com


[PATCH] Use TImode for piecewise move in 64-bit mode

2016-08-10 Thread H.J. Lu
Use TImode for piecewise move in 64-bit mode.  When vector register
is used for piecewise move, we don't increase stack_alignment_needed
since vector register spill isn't required for piecewise move.  Since
stack_realign_needed is set to true by checking stack_alignment_estimated
set by pseudo vector register usage, we also need to check
stack_realign_needed to eliminate frame pointer.

Tested on x86-64.  OK for trunk?

H.J.
---
gcc/

* config/i386/i386.c (ix86_finalize_stack_realign_flags): Also
check stack_realign_needed for stack realignment.
* config/i386/i386.h (MOVE_MAX_PIECES): Set to 16 in 64-bit mode
if unaligned SSE load and store are optimal.

gcc/testsuite/

* gcc.target/i386/pieces-memcpy-1.c: New test.
* gcc.target/i386/pieces-memcpy-2.c: Likewise.
* gcc.target/i386/pieces-memcpy-3.c: Likewise.
* gcc.target/i386/pieces-memcpy-4.c: Likewise.
* gcc.target/i386/pieces-memcpy-5.c: Likewise.
* gcc.target/i386/pieces-memcpy-6.c: Likewise.
---
 gcc/config/i386/i386.c  | 11 +--
 gcc/config/i386/i386.h  |  6 +-
 gcc/testsuite/gcc.target/i386/pieces-memcpy-1.c | 17 +
 gcc/testsuite/gcc.target/i386/pieces-memcpy-2.c | 17 +
 gcc/testsuite/gcc.target/i386/pieces-memcpy-3.c | 17 +
 gcc/testsuite/gcc.target/i386/pieces-memcpy-4.c | 17 +
 gcc/testsuite/gcc.target/i386/pieces-memcpy-5.c | 17 +
 gcc/testsuite/gcc.target/i386/pieces-memcpy-6.c | 17 +
 8 files changed, 116 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-6.c

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 93eaab1..60dc160 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -13286,8 +13286,15 @@ ix86_finalize_stack_realign_flags (void)
   /* If the only reason for frame_pointer_needed is that we conservatively
  assumed stack realignment might be needed, but in the end nothing that
  needed the stack alignment had been spilled, clear frame_pointer_needed
- and say we don't need stack realignment.  */
-  if (stack_realign
+ and say we don't need stack realignment.
+
+ When vector register is used for piecewise move and store, we don't
+ increase stack_alignment_needed as there is no register spill for
+ piecewise move and store.  Since stack_realign_needed is set to true
+ by checking stack_alignment_estimated which is updated by pseudo
+ vector register usage, we also need to check stack_realign_needed to
+ eliminate frame pointer.  */
+  if ((stack_realign || crtl->stack_realign_needed)
   && frame_pointer_needed
   && crtl->is_leaf
   && flag_omit_frame_pointer
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 9b66264..24db855 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -1951,7 +1951,11 @@ typedef struct ix86_args {
 /* MOVE_MAX_PIECES is the number of bytes at a time which we can
move efficiently, as opposed to  MOVE_MAX which is the maximum
number of bytes we can move with a single instruction.  */
-#define MOVE_MAX_PIECES UNITS_PER_WORD
+#define MOVE_MAX_PIECES \
+  ((TARGET_64BIT \
+&& TARGET_SSE2 \
+&& TARGET_SSE_UNALIGNED_LOAD_OPTIMAL \
+&& TARGET_SSE_UNALIGNED_STORE_OPTIMAL) ? 16 : UNITS_PER_WORD)
 
 /* If a memory-to-memory move would take MOVE_RATIO or more simple
move-instruction pairs, we will do a movmem or libcall instead.
diff --git a/gcc/testsuite/gcc.target/i386/pieces-memcpy-1.c 
b/gcc/testsuite/gcc.target/i386/pieces-memcpy-1.c
new file mode 100644
index 000..adc0aa8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pieces-memcpy-1.c
@@ -0,0 +1,17 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -mno-avx -msse2 -mtune=generic" } */
+
+extern char *dst, *src;
+
+void
+foo (void)
+{
+  __builtin_memcpy (dst, src, 64);
+}
+
+/* { dg-final { scan-assembler-times "movdqu\[ \\t\]+\[^\n\]*%xmm" 4 } } */
+/* { dg-final { scan-assembler-times "movups\[ \\t\]+\[^\n\]*%xmm" 4 } } */
+/* No need to dynamically realign the stack here.  */
+/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */
+/* Nor use a frame pointer.  */
+/* { dg-final { scan-assembler-not "%\[re\]bp" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pieces-memcpy-2.c 
b/gcc/testsuite/gcc.target/i386/pieces-memcpy-2.c
new file mode 100644
index 000..c52c1d9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pieces-memcpy-2.c
@@ -0,0 +1,17 @@
+/* { dg-do compile { targe

Re: [PATCH v2] cpp/c: Add -Wexpansion-to-defined

2016-08-10 Thread Joseph Myers
On Wed, 10 Aug 2016, Paolo Bonzini wrote:

> - stuff that is not enabled by anything should use OPT_Wpedantic, and

No, lots of pedwarns are for usages that are (a) dubious enough we want to 
diagnose them by default, and (b) required to be diagnosed by ISO C so 
must become errors with -pedantic-errors.  (You could argue about whether 
we should have -fpermissive for C like for C++ and make some of those into 
errors by default.)

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH v2] cpp/c: Add -Wexpansion-to-defined

2016-08-10 Thread Paolo Bonzini


On 10/08/2016 17:33, Joseph Myers wrote:
> On Wed, 10 Aug 2016, Paolo Bonzini wrote:
> 
>> - stuff that is not enabled by anything should use OPT_Wpedantic, and
> 
> No, lots of pedwarns are for usages that are (a) dubious enough we want to 
> diagnose them by default, and (b) required to be diagnosed by ISO C so 
> must become errors with -pedantic-errors.  (You could argue about whether 
> we should have -fpermissive for C like for C++ and make some of those into 
> errors by default.)

There are indeed many pedwarn(loc, 0, ...) occurrences in C++ (most, but
not all, are "foo only available with -std=bar" which in the C front-end
would use OPT_Wpedantic, OPT_W*compat be enabled by specific flags such
as -Wvariadic-macros).  In C I only see three:

- overflowing floating-point constants:

  if (!MODE_HAS_INFINITIES (TYPE_MODE (type)))
pedwarn (input_location, 0,
 "floating constant exceeds range of %qT", type);
  else
warning (OPT_Woverflow,
 "floating constant exceeds range of %qT", type);

- __FUNCTION__ outside function scope

  if (!ix && !current_function_decl)
pedwarn (loc, 0, "%qD is not defined outside of function scope", decl);

- macro redefinition

For the first two it would not feel wrong at all to use permerror.  The
last one indeed is a very good example of a pedwarn.

Paolo


Fwd: [PATCH, doc/ARM] Remove false affirmation that Thumb cannot use an FPU

2016-08-10 Thread Thomas Preudhomme

Forwarding to gcc-patches which I forgot past patch #1
--- Begin Message ---
GCC documentation for MULTILIB_EXCEPTIONS states that ARM processors cannot use 
hardware floating-point in Thumb execution state. This is incorrect since 
ARMv7E-M processors can do just that. This patch replace this example with the 
case of the meaningless -mfloat-abi=soft -mfpu=* combination of options. 
MULTILIB_EXCEPTIONS was indeed use to exclude such a case in 
config/arm/t-aprofile until patches preceding this one.


ChangeLog entry is as follows:


*** gcc/ChangeLog ***

2016-08-02  Thomas Preud'homme  

* doc/fragments.texi (MULTILIB_EXCEPTIONS): Replace example with
preventing combination of -mfloat-abi=soft with any -mfpu option.


gccint.pdf builds fine with the patch and display is as expected.

Is this ok for trunk?

Best regards,

Thomas
diff --git a/gcc/doc/fragments.texi b/gcc/doc/fragments.texi
index b6d8541c8ca820fa732363a05221e2cd4d1251c2..a060635c9cee7374d9d187858ac87acdd08860f2 100644
--- a/gcc/doc/fragments.texi
+++ b/gcc/doc/fragments.texi
@@ -117,12 +117,15 @@ specified, there are combinations that should not be built.  In that
 case, set @code{MULTILIB_EXCEPTIONS} to be all of the switch exceptions
 in shell case syntax that should not be built.
 
-For example the ARM processor cannot execute both hardware floating
-point instructions and the reduced size THUMB instructions at the same
-time, so there is no need to build libraries with both of these
-options enabled.  Therefore @code{MULTILIB_EXCEPTIONS} is set to:
+For example on ARM targets @code{-mfloat-abi=soft} requests to use a
+softfloat implementation for floating-point operations.  Therefore, it
+does not make sense to find both @code{-mfloat-abi=soft} and an
+@code{mfpu} option on the command line so @code{MULTILIB_EXCEPTIONS}
+could contain the following exception (assuming that @code{-mfloat-abi}
+comes after in MULTILIB_OPTIONS and given that soft is the default
+value):
 @smallexample
-*mthumb/*mhard-float*
+*mfpu=*
 @end smallexample
 
 @findex MULTILIB_REQUIRED
--- End Message ---


Fwd: [PATCH] genmultilib: improve error reporting for MULTILIB_REUSE

2016-08-10 Thread Thomas Preudhomme

Forwarding to gcc-patches which I forgot past patch #1
--- Begin Message ---

Hi,

It was discovered while working on ARM's aprofile multilib Makefile fragment 
that some REUSE rules were mentioning on the RHS options not in 
MULTILIB_OPTIONS. This is not supposed to happen and leads to unexpected 
behavior: genmultilib will generate a rule that matches for any option not in 
MULTILIB_OPTIONS (eg. in the case of ARM, it would create a rule matching for 
-march=armv7-a -mfloat-abi=hard with any -mfpu option not in MULTILIB_OPTIONS).


This patch (i) adds logic in genmultilib to detect this situation and give an 
error. It also make sure (ii) the error is visible by outputing it to standard 
error since standard output is redirected to multilib.h when invoking 
genmultilib. Finally, (iii) it documents the need to use options present in 
MULTILIB_OPTIONS while doing various small wording fixes.


Patch is in attachment. ChangeLog entry is as follows:


*** gcc/ChangeLog ***

2016-08-01  Thomas Preud'homme  

* doc/fragments.texi (MULTILIB_REUSE): Mention that only options in
MULTILIB_OPTIONS should be used.  Small wording fixes.
* genmultilib: Memorize set of all option combinations in
combination_space.  Detect if RHS of MULTILIB_REUSE uses an option not
found in MULTILIB_OPTIONS by checking if option set is listed in
combination_space.  Output new and existing error message to stderr.


GCC and libgcc have been built with multilib for all targets supporting 
--with-multilib-list without error as follows:


* --build=i386-linux-gnu --with-multilib-list=m32,m64,mx32
* --target=x86_64-linux-gnu --with-multilib-list=m32,m64,mx32
* --target=sh-none-elf 
--with-multilib-list=m1,m2,m2e,m3,m3e,m4,m4-single,m4a-single-only,m4al,m2a,m2a-single,m2a-single-only 
(lib1func failed to build with -nofpu variants due to unrecognized instructions)

* --target=aarch64-none-elf --with-multilib-list=default
* --target=arm-none-eabi --with-multilib-list=aprofile

Is this ok for trunk?

Best regards,

Thomas
diff --git a/gcc/doc/fragments.texi b/gcc/doc/fragments.texi
index f4e6636fb9510c0da40f1956c1f40f8bc0e23510..b6d8541c8ca820fa732363a05221e2cd4d1251c2 100644
--- a/gcc/doc/fragments.texi
+++ b/gcc/doc/fragments.texi
@@ -156,15 +156,16 @@ variants.  And for some targets it is better to reuse an existing multilib
 than to fall back to default multilib when there is no corresponding multilib.
 This can be done by adding reuse rules to @code{MULTILIB_REUSE}.
 
-A reuse rule is comprised of two parts connected by equality sign.  The left part
-is option set used to build multilib and the right part is option set that will
-reuse this multilib.  The order of options in the left part matters and should be
-same with those specified in @code{MULTILIB_REQUIRED} or aligned with order in
-@code{MULTILIB_OPTIONS}.  There is no such limitation for options in right part
-as we don't build multilib from them.  But the equality sign in both parts should
-be replaced with period.
-
-The @code{MULTILIB_REUSE} is different from @code{MULTILIB_MATCHES} in that it
+A reuse rule is comprised of two parts connected by equality sign.  The left
+part is the option set used to build multilib and the right part is the option
+set that will reuse this multilib.  Both part should only use options specified
+in @code{MULTILIB_OPTIONS} and the equality signs found in options name should
+be replaced with periods.  The order of options in the left part matters and
+should be same with those specified in @code{MULTILIB_REQUIRED} or aligned with
+the order in @code{MULTILIB_OPTIONS}.  There is no such limitation for options
+in the right part as we don't build multilib from them.
+
+@code{MULTILIB_REUSE} is different from @code{MULTILIB_MATCHES} in that it
 sets up relations between two option sets rather than two options.  Here is an
 example to demo how we reuse libraries built in Thumb mode for applications built
 in ARM mode:
diff --git a/gcc/genmultilib b/gcc/genmultilib
index 083259aa82ca085be7410faa92929eb429393f92..eb5f661a5fded65a44a725884f77bf2e46f3ab1a 100644
--- a/gcc/genmultilib
+++ b/gcc/genmultilib
@@ -186,7 +186,8 @@ fi
 EOF
 chmod +x tmpmultilib
 
-combinations=`initial=/ ./tmpmultilib ${options}`
+combination_space=`initial=/ ./tmpmultilib ${options}`
+combinations="$combination_space"
 
 # If there exceptions, weed them out now
 if [ -n "${exceptions}" ]; then
@@ -472,14 +473,19 @@ for rrule in ${multilib_reuse}; do
   # in this variable, it means no multilib will be built for current reuse
   # rule.  Thus the reuse purpose specified by current rule is meaningless.
   if expr "${combinations} " : ".*/${combo}/.*" > /dev/null; then
-combo="/${combo}/"
-dirout=`./tmpmultilib3 "${combo}" "${todirnames}" "${toosdirnames}" "${enable_multilib}"`
-copts="/${copts}/"
-optout=`./tmpmultilib4 "${copts}" "${options}"`
-# Output the line with all appropriate matches.
-dirout="${di

Re: [PATCH] Use TImode for piecewise move in 64-bit mode

2016-08-10 Thread Uros Bizjak
On Wed, Aug 10, 2016 at 5:32 PM, H.J. Lu  wrote:
> Use TImode for piecewise move in 64-bit mode.  When vector register
> is used for piecewise move, we don't increase stack_alignment_needed
> since vector register spill isn't required for piecewise move.  Since
> stack_realign_needed is set to true by checking stack_alignment_estimated
> set by pseudo vector register usage, we also need to check
> stack_realign_needed to eliminate frame pointer.

Why only in 64-bit mode? We can use SSE moves also in 32-bit mode.

I don't think we can handle crtl->stack_realign_needed in this way. If
there are other insns with 32byte vector registers in use in the same
(large) function as converted __builtin_memcpy, we will *still* need
realigned stack.

Uros.


> Tested on x86-64.  OK for trunk?
>
> H.J.
> ---
> gcc/
>
> * config/i386/i386.c (ix86_finalize_stack_realign_flags): Also
> check stack_realign_needed for stack realignment.
> * config/i386/i386.h (MOVE_MAX_PIECES): Set to 16 in 64-bit mode
> if unaligned SSE load and store are optimal.
>
> gcc/testsuite/
>
> * gcc.target/i386/pieces-memcpy-1.c: New test.
> * gcc.target/i386/pieces-memcpy-2.c: Likewise.
> * gcc.target/i386/pieces-memcpy-3.c: Likewise.
> * gcc.target/i386/pieces-memcpy-4.c: Likewise.
> * gcc.target/i386/pieces-memcpy-5.c: Likewise.
> * gcc.target/i386/pieces-memcpy-6.c: Likewise.
> ---
>  gcc/config/i386/i386.c  | 11 +--
>  gcc/config/i386/i386.h  |  6 +-
>  gcc/testsuite/gcc.target/i386/pieces-memcpy-1.c | 17 +
>  gcc/testsuite/gcc.target/i386/pieces-memcpy-2.c | 17 +
>  gcc/testsuite/gcc.target/i386/pieces-memcpy-3.c | 17 +
>  gcc/testsuite/gcc.target/i386/pieces-memcpy-4.c | 17 +
>  gcc/testsuite/gcc.target/i386/pieces-memcpy-5.c | 17 +
>  gcc/testsuite/gcc.target/i386/pieces-memcpy-6.c | 17 +
>  8 files changed, 116 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-3.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-4.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-5.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-6.c
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 93eaab1..60dc160 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -13286,8 +13286,15 @@ ix86_finalize_stack_realign_flags (void)
>/* If the only reason for frame_pointer_needed is that we conservatively
>   assumed stack realignment might be needed, but in the end nothing that
>   needed the stack alignment had been spilled, clear frame_pointer_needed
> - and say we don't need stack realignment.  */
> -  if (stack_realign
> + and say we don't need stack realignment.
> +
> + When vector register is used for piecewise move and store, we don't
> + increase stack_alignment_needed as there is no register spill for
> + piecewise move and store.  Since stack_realign_needed is set to true
> + by checking stack_alignment_estimated which is updated by pseudo
> + vector register usage, we also need to check stack_realign_needed to
> + eliminate frame pointer.  */
> +  if ((stack_realign || crtl->stack_realign_needed)
>&& frame_pointer_needed
>&& crtl->is_leaf
>&& flag_omit_frame_pointer
> diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> index 9b66264..24db855 100644
> --- a/gcc/config/i386/i386.h
> +++ b/gcc/config/i386/i386.h
> @@ -1951,7 +1951,11 @@ typedef struct ix86_args {
>  /* MOVE_MAX_PIECES is the number of bytes at a time which we can
> move efficiently, as opposed to  MOVE_MAX which is the maximum
> number of bytes we can move with a single instruction.  */
> -#define MOVE_MAX_PIECES UNITS_PER_WORD
> +#define MOVE_MAX_PIECES \
> +  ((TARGET_64BIT \
> +&& TARGET_SSE2 \
> +&& TARGET_SSE_UNALIGNED_LOAD_OPTIMAL \
> +&& TARGET_SSE_UNALIGNED_STORE_OPTIMAL) ? 16 : UNITS_PER_WORD)
>
>  /* If a memory-to-memory move would take MOVE_RATIO or more simple
> move-instruction pairs, we will do a movmem or libcall instead.
> diff --git a/gcc/testsuite/gcc.target/i386/pieces-memcpy-1.c 
> b/gcc/testsuite/gcc.target/i386/pieces-memcpy-1.c
> new file mode 100644
> index 000..adc0aa8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pieces-memcpy-1.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-options "-O2 -mno-avx -msse2 -mtune=generic" } */
> +
> +extern char *dst, *src;
> +
> +void
> +foo (void)
> +{
> +  __builtin_memcpy (dst, src, 64);
> +}
> +
> +/* { dg-final { scan-assembler-times "movdqu\[ \\t\]+\[^\n\]*%xmm" 4 } } */
> +/* { d

Fwd: [PATCH, ARM] Use a MULTILIB_REQUIRED approach for aprofile multilib

2016-08-10 Thread Thomas Preudhomme

Forwarding to gcc-patches which I forgot past patch #1
--- Begin Message ---

Hi,

Currently, the Makefile fragment for ARM aprofile multilib is using a 
substractive approach. It specifies a set of options to be combined (eg. 
-march=armv7-a,armv7ve,armv8-a, with 
-mfpu=vfpv3-d16,neon,vfpv4-d16,neon-fpv4,neon-fp-armv8) using MULTILIB_OPTIONS 
and then specifies which combination should *not* be built with MULTILIB_EXCEPTIONS.


This patch replaces that approach by an additive one: using MULTILIB_REQUIRED to 
specify the combinations we *do* want. This approach is more scalable and 
maintainable:


1) Scalability

The substractive approach (MULTILIB_EXCEPTIONS) is doable today because there is 
only 3 -march and 5 -mfpu options in t-aprofile. Yet it needs already 22 
MULTILIB_EXCEPTIONS to define the set of multilib to be built. Adding new 
architecture or new mfpu would make that worse. Since we only care about a small 
number of combinations (each mfpu is used with only one march), it makes more 
sense to specify what we want. The new approach only needs 9 MULTILIB_REQUIRED 
lines.


2) Maintainability

Adding one new architecture or vfp option means adding exceptions for all 
combinations which does not make sense with that option (eg. if we add mfpu=foo 
we'll have to exclude all the march we don't want to mix with foo). It forces us 
to think about all combinations involved with this new option and thinking about 
the combinations in it that we do not want. Basically we have to do the work of 
genmultilib in our mind. MULTILIB_REQUIRED on the other hand would allow us to 
just specify the combination involving that new option that we care about which 
is likely to be more obvious IMHO.


Patch is in attachment. ChangeLog entry is as follows:


*** gcc/ChangeLog ***

2016-08-01  Thomas Preud'homme  

* config/arm/t-aprofile (MULTILIB_EXCEPTIONS): Rewrite into ...
(MULTILIB_REQUIRED): This by specifying multilib needing to be built
rather than those that should not be built.


The output of "tree lib/gcc/arm-none-eabi/7.0.0" before and after the patch 
shows that the same set of multilib is being built.


Is this ok for trunk?

Best regards,

Thomas
diff --git a/gcc/config/arm/t-aprofile b/gcc/config/arm/t-aprofile
index 0d91006d4ef51a765e849079fd43679175466a71..90305e1206e3964e08a673e385d3198747bdffa1 100644
--- a/gcc/config/arm/t-aprofile
+++ b/gcc/config/arm/t-aprofile
@@ -49,33 +49,27 @@ MULTILIB_DIRNAMES  += fpv3 simdv1 fpv4 simdvfpv4 simdv8
 MULTILIB_OPTIONS   += mfloat-abi=softfp/mfloat-abi=hard
 MULTILIB_DIRNAMES  += softfp hard
 
-# We don't build no-float libraries with an FPU.
-MULTILIB_EXCEPTIONS+= *mfpu=vfpv3-d16
-MULTILIB_EXCEPTIONS+= *mfpu=neon
-MULTILIB_EXCEPTIONS+= *mfpu=vfpv4-d16
-MULTILIB_EXCEPTIONS+= *mfpu=neon-vfpv4
-MULTILIB_EXCEPTIONS+= *mfpu=neon-fp-armv8
-
-# We don't build libraries requiring an FPU at the CPU/Arch/ISA level.
-MULTILIB_EXCEPTIONS+= mfloat-abi=*
-MULTILIB_EXCEPTIONS+= mfpu=*
-MULTILIB_EXCEPTIONS+= mthumb/mfloat-abi=*
-MULTILIB_EXCEPTIONS+= mthumb/mfpu=*
-MULTILIB_EXCEPTIONS+= *march=armv7-a/mfloat-abi=*
-MULTILIB_EXCEPTIONS+= *march=armv7ve/mfloat-abi=*
-MULTILIB_EXCEPTIONS+= *march=armv8-a/mfloat-abi=*
-
-# Ensure the correct FPU variants apply to the correct base architectures.
-MULTILIB_EXCEPTIONS+= *march=armv7ve/*mfpu=vfpv3-d16*
-MULTILIB_EXCEPTIONS+= *march=armv7ve/*mfpu=neon/*
-MULTILIB_EXCEPTIONS+= *march=armv8-a/*mfpu=vfpv3-d16*
-MULTILIB_EXCEPTIONS+= *march=armv8-a/*mfpu=neon/*
-MULTILIB_EXCEPTIONS+= *march=armv7-a/*mfpu=vfpv4-d16*
-MULTILIB_EXCEPTIONS+= *march=armv7-a/*mfpu=neon-vfpv4*
-MULTILIB_EXCEPTIONS+= *march=armv8-a/*mfpu=vfpv4-d16*
-MULTILIB_EXCEPTIONS+= *march=armv8-a/*mfpu=neon-vfpv4*
-MULTILIB_EXCEPTIONS+= *march=armv7-a/*mfpu=neon-fp-armv8*
-MULTILIB_EXCEPTIONS+= *march=armv7ve/*mfpu=neon-fp-armv8*
+
+# Option combinations to build library with
+
+# Default CPU/Arch (ARM is implicitly included because it uses the default
+# multilib)
+MULTILIB_REQUIRED  += mthumb
+
+# ARMv7-A
+MULTILIB_REQUIRED  += *march=armv7-a
+MULTILIB_REQUIRED  += *march=armv7-a/mfpu=vfpv3-d16/mfloat-abi=*
+MULTILIB_REQUIRED  += *march=armv7-a/mfpu=neon/mfloat-abi=*
+
+# ARMv7VE
+MULTILIB_REQUIRED  += *march=armv7ve
+MULTILIB_REQUIRED  += *march=armv7ve/mfpu=vfpv4-d16/mfloat-abi=*
+MULTILIB_REQUIRED  += *march=armv7ve/mfpu=neon-vfpv4/mfloat-abi=*
+
+# ARMv8-A
+MULTILIB_REQUIRED  += *march=armv8-a
+MULTILIB_REQUIRED  += *march=armv8-a/mfpu=neon-fp-armv8/mfloat-abi=*
+
 
 # CPU Matches
 MULTILIB_MATCHES   += march?armv7-a=mcpu?cortex-a8
--- End Message ---


Re: [PATCH v2] cpp/c: Add -Wexpansion-to-defined

2016-08-10 Thread Joseph Myers
On Wed, 10 Aug 2016, Paolo Bonzini wrote:

> There are indeed many pedwarn(loc, 0, ...) occurrences in C++ (most, but
> not all, are "foo only available with -std=bar" which in the C front-end
> would use OPT_Wpedantic, OPT_W*compat be enabled by specific flags such
> as -Wvariadic-macros).  In C I only see three:

I don't know why you think there are only three; there are loads.  The 
first few in c-decl.c are:

pedwarn (csi->location, 0,
 "%qD is static but used in inline function %qD "
 "which is not static", csi->static_decl, csi->function);

pedwarn (csi->location, 0,
 "%q+D is static but declared in inline function %qD "
 "which is not static", csi->static_decl, csi->function);

pedwarn (input_location, 0,
 "inline function %q+D declared but never defined", p);

  pedwarned = pedwarn (input_location, 0,
   "conflicting types for %q+D", newdecl);

  pedwarned = pedwarn (input_location, 0,
   "conflicting types for %q+D", newdecl);

  pedwarn (input_location, 0,
   "unnamed struct/union that defines no instances");

-- 
Joseph S. Myers
jos...@codesourcery.com


[PATCH testsuite]Require vect_cond_mixed for test case gcc.dg/vect/pr56541.c

2016-08-10 Thread Bin Cheng
Hi,
Due to some reasons, tree-if-conv.c now factors floating point comparison out 
of cond_expr,
resulting in mixed types in it.  This does help CSE on common comparison 
operations.  
Only problem is that test gcc.dg/vect/pr56541.c now requires vect_cond_mixed to 
be 
vectorized.  This patch changes the test in that way.
Test result checked.  Is it OK?

Thanks,
bin

gcc/testsuite/ChangeLog
2016-08-09  Bin Cheng  

* gcc.dg/vect/pr56541.c: Require vect_cond_mixed.diff --git a/gcc/testsuite/gcc.dg/vect/pr56541.c 
b/gcc/testsuite/gcc.dg/vect/pr56541.c
index 16b8d7c..a7b96ce 100644
--- a/gcc/testsuite/gcc.dg/vect/pr56541.c
+++ b/gcc/testsuite/gcc.dg/vect/pr56541.c
@@ -1,7 +1,7 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target vect_int } */
 /* { dg-require-effective-target vect_float } */
-/* { dg-require-effective-target vect_condition } */
+/* { dg-require-effective-target vect_cond_mixed } */
 
 float a,b,c,d;
 


[PATCH PR69848]Avoid not insn by inverting comparison code in vcond patterns

2016-08-10 Thread Bin Cheng
Hi,
This is a follow up patch for previous vcond patches.  In previous ones, 
we rely on combiner to simplify "X = !Y; Z = X ? A : B" into "Z = Y ? B : A".
That works for some cases, but not all of them, for example, case in 
PR69848.  The reason could be in combiner, but more likely in bsl patterns
which are too complicated to be handled by combiner.  Investigating all
cases pattern by pattern would be tedious, this patch modifies vcond 
patterns to explicitly invert comparison code (as well as switch operands)
to avoid the additional NOT instruction.  Note un-ordered floating point 
comparison is not handled because it will complicate the code, also NE is 
the most common case.  The patch further reduces assembly code in 
PR69848 on the basis of vcond patches.
Bootstrap and test on AArch64.  Is it OK?

Thanks,
bin

2016-08-03  Bin Cheng  

PR tree-optimization/69848
* config/aarch64/aarch64-simd.md (vcond): Invert NE
and swtich operands to avoid additional NOT instruction.
(vcond): Ditto.
(vcondu, vcondu): Ditto.

gcc/testsuite/ChangeLog
2016-08-03  Bin Cheng  

PR tree-optimization/69848
* gcc.target/aarch64/simd/vcond-ne-bit.c: New test.diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index dca079f..3fa88be 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -2575,6 +2575,15 @@
   rtx mask = gen_reg_rtx (mode);
   enum rtx_code code = GET_CODE (operands[3]);
 
+  /* NE is handled as !EQ in vec_cmp patterns, we can explicitly invert
+ it as well as switch operands 1/2 in order to avoid the additional
+ NOT instruction.  */
+  if (code == NE)
+{
+  operands[3] = gen_rtx_fmt_ee (EQ, GET_MODE (operands[3]),
+   operands[4], operands[5]);
+  std::swap (operands[1], operands[2]);
+}
   emit_insn (gen_vec_cmp (mask, operands[3],
  operands[4], operands[5]));
   emit_insn (gen_vcond_mask_ (operands[0], operands[1],
@@ -2596,6 +2605,15 @@
   rtx mask = gen_reg_rtx (mode);
   enum rtx_code code = GET_CODE (operands[3]);
 
+  /* NE is handled as !EQ in vec_cmp patterns, we can explicitly invert
+ it as well as switch operands 1/2 in order to avoid the additional
+ NOT instruction.  */
+  if (code == NE)
+{
+  operands[3] = gen_rtx_fmt_ee (EQ, GET_MODE (operands[3]),
+   operands[4], operands[5]);
+  std::swap (operands[1], operands[2]);
+}
   emit_insn (gen_vec_cmp (mask, operands[3],
  operands[4], operands[5]));
   emit_insn (gen_vcond_mask_ (
@@ -2618,6 +2636,15 @@
   rtx mask = gen_reg_rtx (mode);
   enum rtx_code code = GET_CODE (operands[3]);
 
+  /* NE is handled as !EQ in vec_cmp patterns, we can explicitly invert
+ it as well as switch operands 1/2 in order to avoid the additional
+ NOT instruction.  */
+  if (code == NE)
+{
+  operands[3] = gen_rtx_fmt_ee (EQ, GET_MODE (operands[3]),
+   operands[4], operands[5]);
+  std::swap (operands[1], operands[2]);
+}
   emit_insn (gen_vec_cmp (mask, operands[3],
  operands[4], operands[5]));
   emit_insn (gen_vcond_mask_ (operands[0], operands[1],
@@ -2638,6 +2665,15 @@
   rtx mask = gen_reg_rtx (mode);
   enum rtx_code code = GET_CODE (operands[3]);
 
+  /* NE is handled as !EQ in vec_cmp patterns, we can explicitly invert
+ it as well as switch operands 1/2 in order to avoid the additional
+ NOT instruction.  */
+  if (code == NE)
+{
+  operands[3] = gen_rtx_fmt_ee (EQ, GET_MODE (operands[3]),
+   operands[4], operands[5]);
+  std::swap (operands[1], operands[2]);
+}
   emit_insn (gen_vec_cmp (
  mask, operands[3],
  operands[4], operands[5]));
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vcond-ne-bit.c 
b/gcc/testsuite/gcc.target/aarch64/simd/vcond-ne-bit.c
new file mode 100644
index 000..25170c2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vcond-ne-bit.c
@@ -0,0 +1,32 @@
+/* { dg-do run } */
+/* { dg-options "-save-temps" } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_condition } */
+#include 
+
+int fn1 (int) __attribute__ ((noinline));
+
+int a[128];
+int fn1(int d) {
+  int b, c = 1;
+  for (b = 0; b < 128; b++)
+if (a[b])
+  c = 0;
+  return c;
+}
+
+int
+main (void)
+{
+  int i;
+  for (i = 0; i < 128; i++)
+a[i] = 0;
+  if (fn1(10) != 1)
+abort ();
+  a[3] = 2;
+  a[24] = 1;
+  if (fn1(10) != 0)
+abort ();
+  return 0;
+}
+/* { dg-final { scan-assembler-not "\[ \t\]not\[ \t\]" } } */


[PATCH PR69848]Introduce special conditional reduction CONST_COND_REDUCTION

2016-08-10 Thread Bin Cheng
Hi,
This patch fixes the inefficient code generated by vectorizer as reported by 
PR69848.
It introduces new conditional reduction type CONST_COND_REDUCTION.  As a result,
we don't need to compute index vector in loop; also the epilog reduction code 
can be 
simplified using single reduc_max/reduc_min operation.  Together with AArch64 
vcond 
patches, the # of insns in loop body is reduced from 10 to 4 on AArch64.  Note, 
this one
doesn't handle cases in which reduction values are loop invariants because it 
requires 
quite different code to current implementation, and I failed to work out a 
"clean" patch at 
the moment.

Bootstrap and test on x86_64 and AArch64.  Is it OK?

Thanks,
bin

2016-08-08  Bin Cheng  

PR tree-optimization/69848
* tree-vectorizer.h (enum vect_def_type): New condition reduction
type CONST_COND_REDUCTION.
* tree-vect-loop.c (vectorizable_reduction): Support new condition
reudction type CONST_COND_REDUCTION.

gcc/testsuite/ChangeLog
2016-08-08  Bin Cheng  

PR tree-optimization/69848
* gcc.dg/vect/vect-pr69848.c: New test.diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 41c4c29..4957b66 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -5416,7 +5416,7 @@ vectorizable_reduction (gimple *stmt, 
gimple_stmt_iterator *gsi,
   optab optab, reduc_optab;
   tree new_temp = NULL_TREE;
   gimple *def_stmt;
-  enum vect_def_type dt;
+  enum vect_def_type dt, cond_reduc_dt = vect_unknown_def_type;
   gphi *new_phi = NULL;
   tree scalar_type;
   bool is_simple_use;
@@ -5447,7 +5447,7 @@ vectorizable_reduction (gimple *stmt, 
gimple_stmt_iterator *gsi,
   tree def0, def1, tem, op0, op1 = NULL_TREE;
   bool first_p = true;
   tree cr_index_scalar_type = NULL_TREE, cr_index_vector_type = NULL_TREE;
-  gimple *cond_expr_induction_def_stmt = NULL;
+  tree cond_reduc_val = NULL_TREE, const_cond_cmp = NULL_TREE;
 
   /* In case of reduction chain we switch to the first stmt in the chain, but
  we don't update STMT_INFO, since only the last stmt is marked as reduction
@@ -5597,8 +5597,18 @@ vectorizable_reduction (gimple *stmt, 
gimple_stmt_iterator *gsi,
   reduc_index = i;
 }
 
-  if (i == 1 && code == COND_EXPR && dt == vect_induction_def)
-   cond_expr_induction_def_stmt = def_stmt;
+  if (i == 1 && code == COND_EXPR)
+   {
+ /* Record how value of COND_EXPR is defined.  */
+ if (dt == vect_constant_def)
+   {
+ cond_reduc_dt = dt;
+ cond_reduc_val = ops[i];
+   }
+ if (dt == vect_induction_def && def_stmt != NULL
+ && is_nonwrapping_integer_induction (def_stmt, loop))
+   cond_reduc_dt = dt;
+   }
 }
 
   is_simple_use = vect_is_simple_use (ops[reduc_index], loop_vinfo,
@@ -5630,18 +5640,49 @@ vectorizable_reduction (gimple *stmt, 
gimple_stmt_iterator *gsi,
  !nested_cycle, &dummy, false,
  &v_reduc_type);
 
+  STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) = v_reduc_type;
   /* If we have a condition reduction, see if we can simplify it further.  */
-  if (v_reduc_type == COND_REDUCTION
-  && cond_expr_induction_def_stmt != NULL
-  && is_nonwrapping_integer_induction (cond_expr_induction_def_stmt, loop))
+  if (v_reduc_type == COND_REDUCTION)
 {
-  if (dump_enabled_p ())
-   dump_printf_loc (MSG_NOTE, vect_location,
-"condition expression based on integer induction.\n");
-  STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) = INTEGER_INDUC_COND_REDUCTION;
+  if (cond_reduc_dt == vect_induction_def)
+   {
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_NOTE, vect_location,
+"condition expression based on "
+"integer induction.\n");
+ STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
+   = INTEGER_INDUC_COND_REDUCTION;
+   }
+
+  if (cond_reduc_dt == vect_constant_def)
+   {
+ enum vect_def_type cond_initial_dt;
+ gimple *def_stmt = SSA_NAME_DEF_STMT (ops[reduc_index]);
+ tree cond_initial_val
+   = PHI_ARG_DEF_FROM_EDGE (def_stmt, loop_preheader_edge (loop));
+
+ gcc_assert (cond_reduc_val != NULL_TREE);
+ vect_is_simple_use (cond_initial_val, loop_vinfo,
+ &def_stmt, &cond_initial_dt);
+ if (cond_initial_dt == vect_constant_def
+ && types_compatible_p (TREE_TYPE (cond_initial_val),
+TREE_TYPE (cond_reduc_val)))
+   {
+ tree e = fold_build2 (LE_EXPR, boolean_type_node,
+   cond_initial_val, cond_reduc_val);
+ if (e && (integer_onep (e) || integer_zerop (e)))
+   {
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_NOTE, vect_location,
+

Re: [PATCH v2] cpp/c: Add -Wexpansion-to-defined

2016-08-10 Thread Paolo Bonzini


On 10/08/2016 17:58, Joseph Myers wrote:
> On Wed, 10 Aug 2016, Paolo Bonzini wrote:
> 
>> There are indeed many pedwarn(loc, 0, ...) occurrences in C++ (most, but
>> not all, are "foo only available with -std=bar" which in the C front-end
>> would use OPT_Wpedantic, OPT_W*compat be enabled by specific flags such
>> as -Wvariadic-macros).  In C I only see three:
> 
> I don't know why you think there are only three; there are loads.  The 
> first few in c-decl.c are:
> 
> pedwarn (csi->location, 0,
>  "%qD is static but used in inline function %qD "
>  "which is not static", csi->static_decl, csi->function);
> 
> pedwarn (csi->location, 0,
>  "%q+D is static but declared in inline function %qD "
>  "which is not static", csi->static_decl, csi->function);
> 
> pedwarn (input_location, 0,
>  "inline function %q+D declared but never defined", 
> p);
> 
>   pedwarned = pedwarn (input_location, 0,
>"conflicting types for %q+D", newdecl);
> 
>   pedwarned = pedwarn (input_location, 0,
>"conflicting types for %q+D", newdecl);
> 
>   pedwarn (input_location, 0,
>"unnamed struct/union that defines no instances");
> 

Hmm I was searching toplevel and c-family/.  I forgot that c/ now
exists, my knowledge is getting very dated...

Paolo


Re: [v3 PATCH] Implement LWG 2758.

2016-08-10 Thread Paolo Carlini

Hi Ville,

On 10/08/2016 09:50, Ville Voutilainen wrote:

Tested on Linux-x64.

2016-08-10  Ville Voutilainen  

 Implement LWG 2758.
If I understand correctly this touches only the new C++17 members and in 
any case, a recent commit by Jon disabled the extern template strings in 
C++17 mode, thus there are no ABI-related risks in that mode.


Patch looks good to me. I'll get to your other contributions later today.

Thanks,
Paolo.


Re: [PATCH][AArch64] Improve stack adjustment

2016-08-10 Thread Wilco Dijkstra
Richard Earnshaw wrote:
> I see you've added a default argument for your new parameter.  I think
> doing that is fine, but I have two comments about how we might use that
> in this case.
> 
> Firstly, if this parameter is suitable for having a default value, then
> I think the preceding one should also be treated in the same way.
> 
> Secondly, I think (due to these parameters being BOOL with no useful
> context to make it clear which is which) that having wrapper functions
> (inlined, of course) that describe the intended behaviour more clearly
> would be useful.  So, defining
> 
> static inline void
> aarch64_add_frame_constant (mode, regnum, scratchreg, delta)
> {
>    aarch64_add_frame_constant (mode, regnum, scratchreg, delta, true);
> }
> 
> would make it clearer at the call point than having a lot of true and
> false parameters scattered round the code.
> 
> Alternatively we might remove all the default parameters and require
> wrappers like the above to make it more explicit which API is intended -
> this might make more sense if not all combinations make sense.

OK here is the new version using simple wrapper functions and no
default parameters:

ChangeLog:
2016-08-10  Wilco Dijkstra  

gcc/
* config/aarch64/aarch64.c (aarch64_add_constant_internal):
Add extra argument to allow emitting the move immediate.
Use add/sub with positive immediate.
(aarch64_add_constant): Add inline function.
(aarch64_add_sp): Likewise.
(aarch64_sub_sp): Likewise.
(aarch64_expand_prologue): Call aarch64_sub_sp.
(aarch64_expand_epilogue): Call aarch64_add_sp.
Decide when to leave out move.
(aarch64_output_mi_thunk): Call aarch64_add_constant.

testsuite/
* gcc.target/aarch64/test_frame_17.c: New test.
--
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
b8536175a84b76f8c2939e61f1379ae279b20d43..5a25fce17785af9f9dc12e0f2a9609af09af0b35
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1941,15 +1941,21 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
 }
 
 /* Add DELTA to REGNUM in mode MODE.  SCRATCHREG can be used to held
-   intermediate value if necessary.
+   intermediate value if necessary.  FRAME_RELATED_P should be true if
+   the RTX_FRAME_RELATED flag should be set and CFA adjustments added
+   to the generated instructions.  If SCRATCHREG is known to hold
+   abs (delta), EMIT_MOVE_IMM can be set to false to avoid emitting the
+   immediate again.
 
-   This function is sometimes used to adjust the stack pointer, so we must
-   ensure that it can never cause transient stack deallocation by writing an
-   invalid value into REGNUM.  */
+   Since this function may be used to adjust the stack pointer, we must
+   ensure that it cannot cause transient stack deallocation (for example
+   by first incrementing SP and then decrementing when adjusting by a
+   large immediate).  */
 
 static void
-aarch64_add_constant (machine_mode mode, int regnum, int scratchreg,
- HOST_WIDE_INT delta, bool frame_related_p)
+aarch64_add_constant_internal (machine_mode mode, int regnum, int scratchreg,
+  HOST_WIDE_INT delta, bool frame_related_p,
+  bool emit_move_imm)
 {
   HOST_WIDE_INT mdelta = abs_hwi (delta);
   rtx this_rtx = gen_rtx_REG (mode, regnum);
@@ -1967,11 +1973,11 @@ aarch64_add_constant (machine_mode mode, int regnum, 
int scratchreg,
   return;
 }
 
-  /* We need two add/sub instructions, each one performing part of the
- calculation.  Don't do this if the addend can be loaded into register with
- a single instruction, in that case we prefer a move to a scratch register
- following by an addition.  */
-  if (mdelta < 0x100 && !aarch64_move_imm (delta, mode))
+  /* We need two add/sub instructions, each one perform part of the
+ addition/subtraction, but don't this if the addend can be loaded into
+ register by single instruction, in that case we prefer a move to scratch
+ register following by addition.  */
+  if (mdelta < 0x100 && !aarch64_move_imm (mdelta, mode))
 {
   HOST_WIDE_INT low_off = mdelta & 0xfff;
 
@@ -1985,8 +1991,10 @@ aarch64_add_constant (machine_mode mode, int regnum, int 
scratchreg,
 
   /* Otherwise use generic function to handle all other situations.  */
   rtx scratch_rtx = gen_rtx_REG (mode, scratchreg);
-  aarch64_internal_mov_immediate (scratch_rtx, GEN_INT (delta), true, mode);
-  insn = emit_insn (gen_add2_insn (this_rtx, scratch_rtx));
+  if (emit_move_imm)
+aarch64_internal_mov_immediate (scratch_rtx, GEN_INT (mdelta), true, mode);
+  insn = emit_insn (delta < 0 ? gen_sub2_insn (this_rtx, scratch_rtx)
+ : gen_add2_insn (this_rtx, scratch_rtx));
   if (frame_related_p)
 {
   RTX_FRAME_RELATED_P (insn) = frame_related_p;
@@ -1995,6 +2003,27 @@ aarch64_add_constant (machine_mode

Re: [PATCH] Fix unaligned access when predictive commoning (PR 71083)

2016-08-10 Thread Bernd Edlinger
On 08/10/16 14:29, Richard Biener wrote:
> On Tue, 9 Aug 2016, Bernd Edlinger wrote:
>> We know that the difference between the innermost ref
>> and the outer ref is byte-aligned, but we do not know
>> that the same is true for offset between the COMPONENT_REF
>> and the outer ref.
>>
>> I mean boff is essentially the difference between
>> bitpos of get_inner_reference(exp) and
>> bitpos of get_inner_reference(TREE_OPERAND (exp, 0))
>>
>> This would be exposed by my patch, because previously
>> we always generated BIT_FIELD_REFS, with bit-offset 0,
>> but the MEM_REF at the byte-offset there is in all likelihood
>> pretty much unaligned, the MEM_REF at the COMPONENT_REF
>> is likely more aligned and allows better code for ARM processors,
>> but only if the MEM_REF is at a byte-aligned offset at all,
>> otherwise the whole transformation would be wrong.
>
> Note that the important thing to ensure is that the access the
> MEM_REF performs is correct TBAA-wise which means you either
> have to use alias-set zero (ptr_type_node offset) or _not_
> shuffle the offset arbitrarily between the MEM_REF and the
> components you wrap it in.
>
> Richard.
>

Yes, the patch exactly replicates the outermost COMPONENT_REF and
subtracts the component's byte-offset from the MEM_REF's address,
and the MEM_REF uses the pointer type of the inner reference.

In the case without bitfields and the Ada bitfields the patch changes
nothing, except we build an aligned type out of TREE_TYPE (DR_REF (dr))
and get_object_alignment (DR_REF (dr)).

In the case with a component_ref that is byte-aligned
we subtract the component byte offset from the address
before the MEM_REF is constructed.  And the
alias_ptr is of type reference_alias_ptr_type
(TREE_OPERAND (DR_REF (dr), 0)) and again the alignment
from get_object_alignment (TREE_OPERAND (DR_REF (dr), 0)
so that should be exactly type-correct from TBAA's perspective.


Attached a new version of the patch with an improved comment,
and the new Ada test cases.


Bootstrap and reg-test on x86_64-pc-linux-gnu without regression.
Is it OK for trunk?


Thanks
Bernd.
2016-08-08  Bernd Edlinger  

PR tree-optimization/71083
* tree-predcom.c (ref_at_iteration): Correctly align the
inner reference.

testsuite:
2016-08-08  Bernd Edlinger  

PR tree-optimization/71083
* gcc.c-torture/execute/pr71083.c: New test.
* gnat.dg/loop_optimization23.adb: New test.
* gnat.dg/loop_optimization23_pkg.ads: New test.
* gnat.dg/loop_optimization23_pkg.adb: New test.
Index: gcc/tree-predcom.c
===
--- gcc/tree-predcom.c	(revision 239193)
+++ gcc/tree-predcom.c	(working copy)
@@ -213,6 +213,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-scalar-evolution.h"
 #include "params.h"
 #include "tree-affine.h"
+#include "builtins.h"
 
 /* The maximum number of iterations between the considered memory
references.  */
@@ -1364,11 +1365,16 @@ replace_ref_with (gimple *stmt, tree new_tree, boo
 /* Returns a memory reference to DR in the ITER-th iteration of
the loop it was analyzed in.  Append init stmts to STMTS.  */
 
-static tree 
+static tree
 ref_at_iteration (data_reference_p dr, int iter, gimple_seq *stmts)
 {
   tree off = DR_OFFSET (dr);
   tree coff = DR_INIT (dr);
+  tree ref = DR_REF (dr);
+  enum tree_code ref_code = ERROR_MARK;
+  tree ref_type = NULL_TREE;
+  tree ref_op1 = NULL_TREE;
+  tree ref_op2 = NULL_TREE;
   if (iter == 0)
 ;
   else if (TREE_CODE (DR_STEP (dr)) == INTEGER_CST)
@@ -1377,27 +1383,48 @@ ref_at_iteration (data_reference_p dr, int iter, g
   else
 off = size_binop (PLUS_EXPR, off,
 		  size_binop (MULT_EXPR, DR_STEP (dr), ssize_int (iter)));
-  tree addr = fold_build_pointer_plus (DR_BASE_ADDRESS (dr), off);
-  addr = force_gimple_operand_1 (unshare_expr (addr), stmts,
- is_gimple_mem_ref_addr, NULL_TREE);
-  tree alias_ptr = fold_convert (reference_alias_ptr_type (DR_REF (dr)), coff);
   /* While data-ref analysis punts on bit offsets it still handles
  bitfield accesses at byte boundaries.  Cope with that.  Note that
- we cannot simply re-apply the outer COMPONENT_REF because the
- byte-granular portion of it is already applied via DR_INIT and
- DR_OFFSET, so simply build a BIT_FIELD_REF knowing that the bits
+ if the bitfield object also starts at a byte-boundary we can simply
+ replicate the COMPONENT_REF, but we have to subtract the component's
+ byte-offset from the MEM_REF address first.
+ Otherwise we simply build a BIT_FIELD_REF knowing that the bits
  start at offset zero.  */
-  if (TREE_CODE (DR_REF (dr)) == COMPONENT_REF
-  && DECL_BIT_FIELD (TREE_OPERAND (DR_REF (dr), 1)))
+  if (TREE_CODE (ref) == COMPONENT_REF
+  && DECL_BIT_FIELD (TREE_OPERAND (ref, 1)))
 {
-  tree field = TREE_OPERAND (DR_REF (dr), 1);
-  return build3 (BIT_FIELD_REF, TREE_TYPE (DR_R

Re: [PATCH] Use TImode for piecewise move in 64-bit mode

2016-08-10 Thread H.J. Lu
On Wed, Aug 10, 2016 at 8:55 AM, Uros Bizjak  wrote:
> On Wed, Aug 10, 2016 at 5:32 PM, H.J. Lu  wrote:
>> Use TImode for piecewise move in 64-bit mode.  When vector register
>> is used for piecewise move, we don't increase stack_alignment_needed
>> since vector register spill isn't required for piecewise move.  Since
>> stack_realign_needed is set to true by checking stack_alignment_estimated
>> set by pseudo vector register usage, we also need to check
>> stack_realign_needed to eliminate frame pointer.
>
> Why only in 64-bit mode? We can use SSE moves also in 32-bit mode.

I will extend it to 32-bit mode.

> I don't think we can handle crtl->stack_realign_needed in this way. If
> there are other insns with 32byte vector registers in use in the same
> (large) function as converted __builtin_memcpy, we will *still* need
> realigned stack.

We don't align stack in leaf functions using SSE/AVX/AVX512 vector
registers if there is no spill when __builtin_memcpy isn't optimized with
SSE load/store.  My change just extends it to __builtin_memcpy optimized
with SSE load/store with or without 32byte vector registers in use.

> Uros.
>
>
>> Tested on x86-64.  OK for trunk?
>>
>> H.J.
>> ---
>> gcc/
>>
>> * config/i386/i386.c (ix86_finalize_stack_realign_flags): Also
>> check stack_realign_needed for stack realignment.
>> * config/i386/i386.h (MOVE_MAX_PIECES): Set to 16 in 64-bit mode
>> if unaligned SSE load and store are optimal.
>>
>> gcc/testsuite/
>>
>> * gcc.target/i386/pieces-memcpy-1.c: New test.
>> * gcc.target/i386/pieces-memcpy-2.c: Likewise.
>> * gcc.target/i386/pieces-memcpy-3.c: Likewise.
>> * gcc.target/i386/pieces-memcpy-4.c: Likewise.
>> * gcc.target/i386/pieces-memcpy-5.c: Likewise.
>> * gcc.target/i386/pieces-memcpy-6.c: Likewise.
>> ---
>>  gcc/config/i386/i386.c  | 11 +--
>>  gcc/config/i386/i386.h  |  6 +-
>>  gcc/testsuite/gcc.target/i386/pieces-memcpy-1.c | 17 +
>>  gcc/testsuite/gcc.target/i386/pieces-memcpy-2.c | 17 +
>>  gcc/testsuite/gcc.target/i386/pieces-memcpy-3.c | 17 +
>>  gcc/testsuite/gcc.target/i386/pieces-memcpy-4.c | 17 +
>>  gcc/testsuite/gcc.target/i386/pieces-memcpy-5.c | 17 +
>>  gcc/testsuite/gcc.target/i386/pieces-memcpy-6.c | 17 +
>>  8 files changed, 116 insertions(+), 3 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-1.c
>>  create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-2.c
>>  create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-3.c
>>  create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-4.c
>>  create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-5.c
>>  create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-6.c
>>
>> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
>> index 93eaab1..60dc160 100644
>> --- a/gcc/config/i386/i386.c
>> +++ b/gcc/config/i386/i386.c
>> @@ -13286,8 +13286,15 @@ ix86_finalize_stack_realign_flags (void)
>>/* If the only reason for frame_pointer_needed is that we conservatively
>>   assumed stack realignment might be needed, but in the end nothing that
>>   needed the stack alignment had been spilled, clear frame_pointer_needed
>> - and say we don't need stack realignment.  */
>> -  if (stack_realign
>> + and say we don't need stack realignment.
>> +
>> + When vector register is used for piecewise move and store, we don't
>> + increase stack_alignment_needed as there is no register spill for
>> + piecewise move and store.  Since stack_realign_needed is set to true
>> + by checking stack_alignment_estimated which is updated by pseudo
>> + vector register usage, we also need to check stack_realign_needed to
>> + eliminate frame pointer.  */
>> +  if ((stack_realign || crtl->stack_realign_needed)
>>&& frame_pointer_needed
>>&& crtl->is_leaf
>>&& flag_omit_frame_pointer
>> diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
>> index 9b66264..24db855 100644
>> --- a/gcc/config/i386/i386.h
>> +++ b/gcc/config/i386/i386.h
>> @@ -1951,7 +1951,11 @@ typedef struct ix86_args {
>>  /* MOVE_MAX_PIECES is the number of bytes at a time which we can
>> move efficiently, as opposed to  MOVE_MAX which is the maximum
>> number of bytes we can move with a single instruction.  */
>> -#define MOVE_MAX_PIECES UNITS_PER_WORD
>> +#define MOVE_MAX_PIECES \
>> +  ((TARGET_64BIT \
>> +&& TARGET_SSE2 \
>> +&& TARGET_SSE_UNALIGNED_LOAD_OPTIMAL \
>> +&& TARGET_SSE_UNALIGNED_STORE_OPTIMAL) ? 16 : UNITS_PER_WORD)
>>
>>  /* If a memory-to-memory move would take MOVE_RATIO or more simple
>> move-instruction pairs, we will do a movmem or libcall instead.
>> diff --git a/gcc/testsuite/gcc.target/i386/pieces-memcpy-1.c 
>> b/gcc/testsuite/gcc.target

Re: [PATCH][AArch64 - v2] Simplify eh_return implementation

2016-08-10 Thread Wilco Dijkstra
I noticed it would still be a good idea to add an extra barrier in the epilog 
as the
scheduler doesn't appear to handle aliases of frame accesses properly.

This patch simplifies the handling of the EH return value.  We force the use of 
the
frame pointer so the return location is always at FP + 8.  This means we can 
emit
a simple volatile access in EH_RETURN_HANDLER_RTX without needing md
patterns, splitters and frame offset calculations.  The new implementation also
fixes various bugs in aarch64_final_eh_return_addr, which does not work with
-fomit-frame-pointer, alloca or outgoing arguments.

Bootstrap OK, GCC Regression OK, OK for trunk? Would it be useful to backport
this to GCC6.x?

ChangeLog:
2016-08-10  Wilco Dijkstra  
gcc/
* config/aarch64/aarch64.md (eh_return): Remove pattern and splitter.
* config/aarch64/aarch64.h (AARCH64_EH_STACKADJ_REGNUM): Remove.
(EH_RETURN_HANDLER_RTX): New define.
* config/aarch64/aarch64.c (aarch64_frame_pointer_required):
Force frame pointer in EH return functions.
(aarch64_expand_epilogue): Add barrier for eh_return.
(aarch64_final_eh_return_addr): Remove.
(aarch64_eh_return_handler_rtx): New function.
* config/aarch64/aarch64-protos.h (aarch64_final_eh_return_addr):
Remove.
(aarch64_eh_return_handler_rtx): New prototype.
--

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 
3cdd69b8af1089a839e5d45cda94bc70a15cd777..327c0a97f6f687604afef249b79ac22628418070
 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -358,7 +358,7 @@ int aarch64_hard_regno_mode_ok (unsigned, machine_mode);
 int aarch64_hard_regno_nregs (unsigned, machine_mode);
 int aarch64_uxt_size (int, HOST_WIDE_INT);
 int aarch64_vec_fpconst_pow_of_2 (rtx);
-rtx aarch64_final_eh_return_addr (void);
+rtx aarch64_eh_return_handler_rtx (void);
 rtx aarch64_mask_from_zextract_ops (rtx, rtx);
 const char *aarch64_output_move_struct (rtx *operands);
 rtx aarch64_return_addr (int, rtx);
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 
003fec87e41db618570663f28cc2387a87e8252a..fa81e4b853daf08842955288861ec7e7acca
 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -400,9 +400,9 @@ extern unsigned aarch64_architecture_version;
 #define ASM_DECLARE_FUNCTION_NAME(STR, NAME, DECL) \
   aarch64_declare_function_name (STR, NAME, DECL)
 
-/* The register that holds the return address in exception handlers.  */
-#define AARCH64_EH_STACKADJ_REGNUM (R0_REGNUM + 4)
-#define EH_RETURN_STACKADJ_RTX gen_rtx_REG (Pmode, AARCH64_EH_STACKADJ_REGNUM)
+/* For EH returns X4 contains the stack adjustment.  */
+#define EH_RETURN_STACKADJ_RTX gen_rtx_REG (Pmode, R4_REGNUM)
+#define EH_RETURN_HANDLER_RTX  aarch64_eh_return_handler_rtx ()
 
 /* Don't use __builtin_setjmp until we've defined it.  */
 #undef DONT_USE_BUILTIN_SETJMP
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
5a25fce17785af9f9dc12e0f2a9609af09af0b35..bb8baff1e7a06942c8b8f51c1d6b341673401ef9
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -2739,6 +2739,10 @@ aarch64_frame_pointer_required (void)
   && (!crtl->is_leaf || df_regs_ever_live_p (LR_REGNUM)))
 return true;
 
+  /* Force a frame pointer for EH returns so the return address is at FP+8.  */
+  if (crtl->calls_eh_return)
+return true;
+
   return false;
 }
 
@@ -3298,7 +3302,8 @@ aarch64_expand_epilogue (bool for_sibcall)
 + cfun->machine->frame.saved_varargs_size) != 0;
 
   /* Emit a barrier to prevent loads from a deallocated stack.  */
-  if (final_adjust > crtl->outgoing_args_size || cfun->calls_alloca)
+  if (final_adjust > crtl->outgoing_args_size || cfun->calls_alloca
+  || crtl->calls_eh_return)
 {
   emit_insn (gen_stack_tie (stack_pointer_rtx, stack_pointer_rtx));
   need_barrier_p = false;
@@ -3366,52 +3371,15 @@ aarch64_expand_epilogue (bool for_sibcall)
 emit_jump_insn (ret_rtx);
 }
 
-/* Return the place to copy the exception unwinding return address to.
-   This will probably be a stack slot, but could (in theory be the
-   return register).  */
+/* Implement EH_RETURN_HANDLER_RTX.  The return address is stored at FP + 8.
+   The access needs to be volatile to prevent it from being removed.  */
 rtx
-aarch64_final_eh_return_addr (void)
+aarch64_eh_return_handler_rtx (void)
 {
-  HOST_WIDE_INT fp_offset;
-
-  aarch64_layout_frame ();
-
-  fp_offset = cfun->machine->frame.frame_size
- - cfun->machine->frame.hard_fp_offset;
-
-  if (cfun->machine->frame.reg_offset[LR_REGNUM] < 0)
-return gen_rtx_REG (DImode, LR_REGNUM);
-
-  /* DSE and CSELIB do not detect an alias between sp+k1 and fp+k2.  This can
- result in a store to save LR introduced by builtin_eh_return () being
- incorrectly deleted because the alias is n

Re: [PATCH][AArch64] Add legitimize_address_displacement hook

2016-08-10 Thread Wilco Dijkstra
Richard Earnshaw wrote:
> OK.  But please enhance the comment with some explanation as to WHY
> you've chosen to use just two base pairings rather than separate bases
> for each access size.

OK here is the updated patch which also handles unaligned accesses
which further improves the benefit:

This patch adds legitimize_address_displacement hook so that stack accesses
with large offsets are split into a more efficient sequence.  Unaligned and 
TI/TFmode use a 256-byte range, byte and halfword accesses use a 4KB range,
wider accesses use a 16KB range to maximise the available addressing range
and increase opportunities to share the base address.

int f(int x)
{
  int arr[8192];
  arr[4096] = 0;
  arr[6000] = 0;
  arr[7000] = 0;
  arr[8191] = 0;
  return arr[x];
}

Now generates:

sub sp, sp, #32768
add x1, sp, 16384
str wzr, [x1]
str wzr, [x1, 7616]
str wzr, [x1, 11616]
str wzr, [x1, 16380]
ldr w0, [sp, w0, sxtw 2]
add sp, sp, 32768
ret

instead of:

sub sp, sp, #32768
mov x2, 28000
add x1, sp, 16384
mov x3, 32764
str wzr, [x1]
mov x1, 24000
add x1, sp, x1
str wzr, [x1]
add x1, sp, x2
str wzr, [x1]
add x1, sp, x3
str wzr, [x1]
ldr w0, [sp, w0, sxtw 2]
add sp, sp, 32768
ret

Bootstrap, GCC regression OK.

ChangeLog:
2016-08-10  Wilco Dijkstra  

gcc/
* config/aarch64/aarch64.c (aarch64_legitimize_address_displacement):
New function.
(TARGET_LEGITIMIZE_ADDRESS_DISPLACEMENT): Define.
--

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
9a5fc199128b1326d0fb2afe0833aa6a5ce62ddf..b8536175a84b76f8c2939e61f1379ae279b20d43
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -4173,6 +4173,24 @@ aarch64_legitimate_address_p (machine_mode mode, rtx x,
   return aarch64_classify_address (&addr, x, mode, outer_code, strict_p);
 }
 
+/* Split an out-of-range address displacement into a base and offset.
+   Use 4KB range for 1- and 2-byte accesses and a 16KB range otherwise
+   to increase opportunities for sharing the base address of different sizes.
+   For TI/TFmode and unaligned accesses use a 256-byte range.  */
+static bool
+aarch64_legitimize_address_displacement (rtx *disp, rtx *off, machine_mode 
mode)
+{
+  HOST_WIDE_INT mask = GET_MODE_SIZE (mode) < 4 ? 0xfff : 0x3fff;
+
+  if (mode == TImode || mode == TFmode ||
+  (INTVAL (*disp) & (GET_MODE_SIZE (mode) - 1)) != 0)
+ mask = 0xff;
+
+  *off = GEN_INT (INTVAL (*disp) & ~mask);
+  *disp = GEN_INT (INTVAL (*disp) & mask);
+  return true;
+}
+
 /* Return TRUE if rtx X is immediate constant 0.0 */
 bool
 aarch64_float_const_zero_rtx_p (rtx x)
@@ -14137,6 +14155,10 @@ aarch64_optab_supported_p (int op, machine_mode mode1, 
machine_mode,
 #undef TARGET_LEGITIMATE_CONSTANT_P
 #define TARGET_LEGITIMATE_CONSTANT_P aarch64_legitimate_constant_p
 
+#undef TARGET_LEGITIMIZE_ADDRESS_DISPLACEMENT
+#define TARGET_LEGITIMIZE_ADDRESS_DISPLACEMENT \
+  aarch64_legitimize_address_displacement
+
 #undef TARGET_LIBGCC_CMP_RETURN_MODE
 #define TARGET_LIBGCC_CMP_RETURN_MODE aarch64_libgcc_cmp_return_mode
 



Re: [PATCH build/doc] Replacing libiberty with gnulib

2016-08-10 Thread ayush goel
On 10 August 2016 at 7:36:41 PM, Pedro Alves (pal...@redhat.com) wrote:
> On 08/10/2016 02:40 PM, ayush goel wrote:
> > On 9 August 2016 at 2:20:59 PM, Pedro Alves (pal...@redhat.com) wrote:
>
> > I wasn’t aware of this. Thanks for pointing this out.
> > It’s strange however, I didn’t see anything failing while
> > building/testing my system.
>
> If I comment out the include on gdb, here's what I get:
>
> g++ -g3 -O0 -I. -I/home/pedro/gdb/mygit/src/gdb 
> -I/home/pedro/gdb/mygit/src/gdb/common
> -I/home/pedro/gdb/mygit/src/gdb/config -DLOCALEDIR="\"/opt/gdb/share/locale\""
> -DHAVE_CONFIG_H -I/home/pedro/gdb/mygit/src/gdb/../include/opcode 
> -I/home/pedro/gdb/mygit/src/gdb/../opcodes/..
> -I/home/pedro/gdb/mygit/src/gdb/../readline/.. 
> -I/home/pedro/gdb/mygit/src/gdb/../zlib
> -I../bfd -I/home/pedro/gdb/mygit/src/gdb/../bfd 
> -I/home/pedro/gdb/mygit/src/gdb/../include
> -I../libdecnumber -I/home/pedro/gdb/mygit/src/gdb/../libdecnumber 
> -I/home/pedro/gdb/mygit/src/gdb/gnulib/import
> -Ibuild-gnulib/import -DTUI=1 -pthread -I/usr/include/guile/2.0 
> -I/usr/include/python3.4m
> -I/usr/include/python3.4m -Wall -Wpointer-arith -Wno-unused -Wunused-value 
> -Wunused-function
> -Wno-switch -Wno-char-subscripts -Wempty-body -Wunused-but-set-parameter 
> -Wunused-but-set-variable
> -Wno-sign-compare -Wno-write-strings -Wno-narrowing -Wformat-nonliteral 
> -Werror
> -c -o gdb.o -MT gdb.o -MMD -MP -MF .deps/gdb.Tpo 
> /home/pedro/gdb/mygit/src/gdb/gdb.c
> In file included from 
> /home/pedro/gdb/mygit/src/gdb/gnulib/import/pathmax.h:42:0,
> from /home/pedro/gdb/mygit/src/gdb/common/common-defs.h:67,
> from /home/pedro/gdb/mygit/src/gdb/defs.h:28,
> from /home/pedro/gdb/mygit/src/gdb/gdb.c:19:
> build-gnulib/import/unistd.h:135:3: error: #error "Please include config.h 
> first."
> #error "Please include config.h first."
> ^
> In file included from /home/pedro/gdb/mygit/src/gdb/gdb_wchar.h:53:0,
> from /home/pedro/gdb/mygit/src/gdb/defs.h:51,
> from /home/pedro/gdb/mygit/src/gdb/gdb.c:19:
> build-gnulib/import/wctype.h:66:3: error: #error "Please include config.h 
> first."
> #error "Please include config.h first."
> ^
> In file included from 
> /home/pedro/gdb/mygit/src/gdb/gnulib/import/pathmax.h:42:0,
> from /home/pedro/gdb/mygit/src/gdb/common/common-defs.h:67,
> from /home/pedro/gdb/mygit/src/gdb/defs.h:28,
> from /home/pedro/gdb/mygit/src/gdb/gdb.c:19:
> build-gnulib/import/unistd.h:137:1: error: ‘_GL_INLINE_HEADER_BEGIN’ does not
> name a type
> _GL_INLINE_HEADER_BEGIN
> ^
> build-gnulib/import/unistd.h:1894:1: error: ‘_GL_INLINE_HEADER_END’ does not
> name a type
> _GL_INLINE_HEADER_END
> ^
> In file included from /home/pedro/gdb/mygit/src/gdb/gdb_wchar.h:53:0,
> from /home/pedro/gdb/mygit/src/gdb/defs.h:51,
> from /home/pedro/gdb/mygit/src/gdb/gdb.c:19:
> build-gnulib/import/wctype.h:68:1: error: ‘_GL_INLINE_HEADER_BEGIN’ does not
> name a type
> _GL_INLINE_HEADER_BEGIN
> ^
>
> > It’s strange however, I didn’t see anything failing while
> > building/testing my system.
>
> You're pulling in the unistd.h replacement module, and from above,
> we see that that replacement header should trip on this.
>
> I think the reason you don't trip on this, is here:
>
> > +INCGNU = -I../gnulib -I$(srcdir)/../gnulib/import
>
> Specifically, the "-I../gnulib" part. That looks wrong.
>
> Here's what the gnulib build directory looks like on a gdb
> build:
>
> $ cd build/gdb/build-gnulib
> $ ls
> import config.cache config.h config.log config.status Makefile stamp-h1
> $ ls import/
> sys dirent.h math.h string.h wchar.h
> alloca.h dirname-lgpl.o math.o stripslash.o wctype.h
> arg-nonnull.h inttypes.h ref-add.sed strnlen1.o wctype-h.o
> basename-lgpl.o libgnu.a ref-del.sed time.h
> c++defs.h localcharset.o signal.h unistd.h
> charset.alias Makefile stdio.h unistd.o
> configmake.h malloca.o stdlib.h warn-on-use.h
>
> So you want to point -I at these generated, replacement headers, not
> the gnulib build dir root, which in gcc is simply gnulib/ instead of
> build-gnulib/.
>
> So again, here:
>
> > +INCGNU = -I../gnulib -I$(srcdir)/../gnulib/import
>
> "-I../gnulib" should instead be "-I../gnulib/import" to put gnulib's
> generated replacement headers in the include path.
>
> And then you should trip on the problem. And you'll make gnulib's
> replacement headers actually replace something. :-)
>
> Assuming I'm right, that is.

So I was getting the exact same error as above and that is precisely
why I included "-I../gnulib” in the include path. I figured the
compiler wasn’t able to find config.h and hence has to be pointed to
the right path.

Excuse my limited understanding of the system, but the compiler seems
to already require config.h while building, and therefore doesn’t
having “-I../gnulib” solve the problem?
Why should I remove this from the path and instead have “#include
config.h” inside every function of gcc using gnulib.
Any explanation would be sincerely appreciated.
> > Can gcc also adopt a similar approach? Includ

Re: [PATCH] Use TImode for piecewise move in 64-bit mode

2016-08-10 Thread H.J. Lu
On Wed, Aug 10, 2016 at 9:26 AM, H.J. Lu  wrote:
> On Wed, Aug 10, 2016 at 8:55 AM, Uros Bizjak  wrote:
>> On Wed, Aug 10, 2016 at 5:32 PM, H.J. Lu  wrote:
>>> Use TImode for piecewise move in 64-bit mode.  When vector register
>>> is used for piecewise move, we don't increase stack_alignment_needed
>>> since vector register spill isn't required for piecewise move.  Since
>>> stack_realign_needed is set to true by checking stack_alignment_estimated
>>> set by pseudo vector register usage, we also need to check
>>> stack_realign_needed to eliminate frame pointer.
>>
>> Why only in 64-bit mode? We can use SSE moves also in 32-bit mode.
>
> I will extend it to 32-bit mode.

It doesn't work in 32-bit mode due to

#define MAX_FIXED_MODE_SIZE GET_MODE_BITSIZE (TARGET_64BIT ? TImode : DImode):

/export/build/gnu/gcc/build-x86_64-linux/gcc/xgcc
-B/export/build/gnu/gcc/build-x86_64-linux/gcc/ -O2
-fno-asynchronous-unwind-tables -m32 -S -o x.s x.i
x.i: In function ‘foo’:
x.i:6:10: internal compiler error: in by_pieces_ninsns, at expr.c:799
   return __builtin_mempcpy (dst, src, 32);
  ^~~~
0xa5a5a5 by_pieces_ninsns(unsigned long, unsigned int, unsigned int,
by_pieces_operation)
/export/gnu/import/git/sources/gcc/gcc/expr.c:799
0xe88e57 default_use_by_pieces_infrastructure_p(unsigned long,
unsigned int, by_pieces_operation, bool)
/export/gnu/import/git/sources/gcc/gcc/targhooks.c:1516
0xa5a3c2 can_do_by_pieces
/export/gnu/import/git/sources/gcc/gcc/expr.c:739
0xa5a3ee can_move_by_pieces(unsigned long, unsigned int)
/export/gnu/import/git/sources/gcc/gcc/expr.c:749
0x8d85c8 expand_builtin_mempcpy_args
/export/gnu/import/git/sources/gcc/gcc/builtins.c:3152
0x8d8092 expand_builtin_mempcpy
/export/gnu/import/git/sources/gcc/gcc/builtins.c:3044
0x8e1bf1 expand_builtin(tree_node*, rtx_def*, rtx_def*, machine_mode, int)
/export/gnu/import/git/sources/gcc/gcc/builtins.c:6146
0xa7ca9e expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
expand_modifier, rtx_def**, bool)
/export/gnu/import/git/sources/gcc/gcc/expr.c:10733
0xa70d9d expand_expr_real(tree_node*, rtx_def*, machine_mode,
expand_modifier, rtx_def**, bool)
/export/gnu/import/git/sources/gcc/gcc/expr.c:8088
0xa67c2b store_expr_with_bounds(tree_node*, rtx_def*, int, bool, bool,
tree_node*)
/export/gnu/import/git/sources/gcc/gcc/expr.c:5547
0xa6683b expand_assignment(tree_node*, tree_node*, bool)
/export/gnu/import/git/sources/gcc/gcc/expr.c:5316
0x917582 expand_call_stmt
/export/gnu/import/git/sources/gcc/gcc/cfgexpand.c:2665
0x91a7b2 expand_gimple_stmt_1
/export/gnu/import/git/sources/gcc/gcc/cfgexpand.c:3580
0x91aea8 expand_gimple_stmt
/export/gnu/import/git/sources/gcc/gcc/cfgexpand.c:3746
0x91afa7 expand_gimple_tailcall
/export/gnu/import/git/sources/gcc/gcc/cfgexpand.c:3793
0x92265e expand_gimple_basic_block
/export/gnu/import/git/sources/gcc/gcc/cfgexpand.c:5730
0x92429d execute
/export/gnu/import/git/sources/gcc/gcc/cfgexpand.c:6367
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.
Makefile:23: recipe for target 'x.s' failed
make: *** [x.s] Error 1

I tried to fix it:

https://gcc.gnu.org/ml/gcc-patches/2016-04/msg01506.html

But there is a concern for simplify_immed_subreg:

https://gcc.gnu.org/ml/gcc-patches/2016-04/msg01527.html

>> I don't think we can handle crtl->stack_realign_needed in this way. If
>> there are other insns with 32byte vector registers in use in the same
>> (large) function as converted __builtin_memcpy, we will *still* need
>> realigned stack.
>
> We don't align stack in leaf functions using SSE/AVX/AVX512 vector
> registers if there is no spill when __builtin_memcpy isn't optimized with
> SSE load/store.  My change just extends it to __builtin_memcpy optimized
> with SSE load/store with or without 32byte vector registers in use.
>
>> Uros.
>>
>>
>>> Tested on x86-64.  OK for trunk?
>>>
>>> H.J.
>>> ---
>>> gcc/
>>>
>>> * config/i386/i386.c (ix86_finalize_stack_realign_flags): Also
>>> check stack_realign_needed for stack realignment.
>>> * config/i386/i386.h (MOVE_MAX_PIECES): Set to 16 in 64-bit mode
>>> if unaligned SSE load and store are optimal.
>>>
>>> gcc/testsuite/
>>>
>>> * gcc.target/i386/pieces-memcpy-1.c: New test.
>>> * gcc.target/i386/pieces-memcpy-2.c: Likewise.
>>> * gcc.target/i386/pieces-memcpy-3.c: Likewise.
>>> * gcc.target/i386/pieces-memcpy-4.c: Likewise.
>>> * gcc.target/i386/pieces-memcpy-5.c: Likewise.
>>> * gcc.target/i386/pieces-memcpy-6.c: Likewise.
>>> ---
>>>  gcc/config/i386/i386.c  | 11 +--
>>>  gcc/config/i386/i386.h  |  6 +-
>>>  gcc/testsuite/gcc.target/i386/pieces-memcpy-1.c | 17 +
>>>  gcc/testsuite/gcc.target/i386/pieces-memcpy-2.c | 17 +
>>>  gcc/testsuite/gcc.

Re: [Revert][AArch64] PR 63521 Define REG_ALLOC_ORDER/HONOR_REG_ALLOC_ORDER

2016-08-10 Thread Vladimir N Makarov



On 08/08/2016 01:04 PM, Jiong Wang wrote:

[...]
There is very tiny performance regression on SPEC2K6INT
464.h264ref. Checking the codegen, there are some bad instruction
scheduling, it looks to me caused by REG_ALLOC_ORDER is not used
consistently inside IRA that parts of the code are using new allocation
order while some other parts are still using original order.

I see in ira.c:setup_class_hard_regs we are checking whether
REG_ALLOC_ORDER is defined:

   #ifdef REG_ALLOC_ORDER
   hard_regno = reg_alloc_order[i];
   #else
   hard_regno = i;
   #endif

but in ira-color.c:assign_hard_reg, there is no similar check:

   /* We don't care about giving callee saved registers to allocnos no
  living through calls because call clobbered registers are
  allocated first (it is usual practice to put them first in
  REG_ALLOC_ORDER).  */
   mode = ALLOCNO_MODE (a);
   for (i = 0; i < class_size; i++)
 {
   hard_regno = ira_class_hard_regs[aclass][i];

We might want to use reg_alloc_order to map above "i" if REG_ALLOC_ORDER is
defined?

Vlad, any comments?

The order is already present in ira_class_hard_regs.  So there is no 
need to use it in ira-color.c::assign_hard_reg.




Re: [v3 PATCH] Implement LWG 2758.

2016-08-10 Thread Ville Voutilainen
On 10 August 2016 at 19:06, Paolo Carlini  wrote:
>>  Implement LWG 2758.
>
> If I understand correctly this touches only the new C++17 members and in any
> case, a recent commit by Jon disabled the extern template strings in C++17
> mode, thus there are no ABI-related risks in that mode.

Correct, afaik.

> Patch looks good to me. I'll get to your other contributions later today.


Thanks, I will commit this patch later today.


Re: [PATCH, rs6000] Fix vec_construct vectorization cost to be somewhat more accurate

2016-08-10 Thread Bill Schmidt
Sorry for the long delay on getting back to this.  I took a look at the
suggested test cases with the cost model available, and did some SPEC
testing to validate the model.  I found that it is still important to
model the 4xfloat case separately to account for conversion from 64-bit
to 32-bit in our internal representation; correcting the calculation
there actually results in no net change, but the commentary is now
better.  The default cost of N-1 that Richard added in his patch applies
well to POWER also, except for V2DI and V2DF modes (N=2) where this
tends to undercount the cost and encourage unprofitable SLP
vectorization in some cases.  Establishing a minimum cost of 2 avoids
test suite regressions and produces acceptable SPEC results.  (So rather
than using N instead of N-1 as in the previous version of this patch,
I'm using N-1 with a floor of 2.)

I looked through gcc.dg/vect/slp-4[35].c with the cost model enabled,
and the results are sensible with these changes.  SPEC results were all
in the noise range.

Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
regressions.  Ok for trunk?

Thanks,
Bill


2016-08-10  Bill Schmidt  

* config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost):
Correct costs for vec_construct.


Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 239310)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -5266,16 +5266,20 @@ rs6000_builtin_vectorization_cost (enum vect_cost_
 return 2;
 
   case vec_construct:
-   elements = TYPE_VECTOR_SUBPARTS (vectype);
+   /* This is a rough approximation assuming non-constant elements
+  constructed into a vector via element insertion.  FIXME:
+  vec_construct is not granular enough for uniformly good
+  decisions.  If the initialization is a splat, this is
+  cheaper than we estimate.  Improve this someday.  */
elem_type = TREE_TYPE (vectype);
/* 32-bit vectors loaded into registers are stored as double
-  precision, so we need n/2 converts in addition to the usual
-  n/2 merges to construct a vector of short floats from them.  */
+  precision, so we need 2 permutes, 2 converts, and 1 merge
+  to construct a vector of short floats from them.  */
if (SCALAR_FLOAT_TYPE_P (elem_type)
&& TYPE_PRECISION (elem_type) == 32)
- return elements + 1;
+ return 5;
else
- return elements / 2 + 1;
+ return max (2, TYPE_VECTOR_SUBPARTS (vectype) - 1);
 
   default:
 gcc_unreachable ();




On Mon, 2016-07-18 at 14:29 +0200, Richard Biener wrote:
> On Mon, Jul 18, 2016 at 1:56 PM, Segher Boessenkool
>  wrote:
> > Hi Bill,
> >
> > On Fri, Jul 15, 2016 at 08:55:08AM -0500, Bill Schmidt wrote:
> >> This patch is a follow-up to Richard's patch of
> >> https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00584.html.  The cost of a
> >> vec_construct (initialization of an N-way vector by N scalars) is too low,
> >> which can cause too-aggressive vectorization in particular for N=8 or
> >> higher.  Richard changed the default cost to N-1, which is generally
> >> sensible.  For powerpc I am going with a slightly higher cost of N, which
> >> will keep us from being less conservative than the previous values when 
> >> N=2.
> >
> >> In any case, the purpose of this patch is simply to avoid vectorizing
> >> things we shouldn't when we've undercounted the cost of a vec_construct.
> >> Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
> >> regressions (hence the vectorization decisions in the test suite have
> >> not changed).  Is this ok for trunk?
> >
> > Do you also have a testcase where it does matter?  It would be good to
> > add that, then.  Or is it fixing a regression?
> >
> > I know nothing about the cost model, so someone else will have to review,
> > or I can just say "okay" ;-)
> 
> You can maybe look at gcc.dg/vect/slp-4[35].c (and run it with the cost model
> enabled).
> 
> Richard.
> 
> >
> > Segher
> 




Re: [PATCH build/doc] Replacing libiberty with gnulib

2016-08-10 Thread Pedro Alves
On 08/10/2016 05:31 PM, ayush goel wrote:
> On 10 August 2016 at 7:36:41 PM, Pedro Alves (pal...@redhat.com) wrote:

> So I was getting the exact same error as above and that is precisely
> why I included "-I../gnulib” in the include path. I figured the
> compiler wasn’t able to find config.h and hence has to be pointed to
> the right path.
> 
> Excuse my limited understanding of the system, but the compiler seems
> to already require config.h while building, and therefore doesn’t
> having “-I../gnulib” solve the problem?
> Why should I remove this from the path and instead have “#include
> config.h” inside every function of gcc using gnulib.
> Any explanation would be sincerely appreciated.

Puzzling how that didn't cause other problems for you.

Because:

- we end up with two different config.h files:

  build/gcc/config.h
  build/gnulib/config.h

- #include "config.h"' will pick one or the other, but not both.

- we need to include both.


Now, now that I look at gcc's config.h, it doesn't look like that
is the standard config.h file autogenerated from config.in at all.
Ah.  It's a custom file.  The one generated from gcc/config.in is
build/gcc/auto-host.h.  From configure.ac:

 # auto-host.h is the file containing items generated by autoconf and is
 # the first file included by config.h.
 # If host=build, it is correct to have bconfig include auto-host.h
 # as well.  If host!=build, we are in error and need to do more 
 # work to find out the build config parameters.
 if test x$host = x$build
 then
build_auto=auto-host.h
 else

So it sounds like you could include gnulib's config.h from within
gcc's config.h directly.  Looks like that file is generated
by src/gcc/mkconfig.sh, so that's what you'd patch.

Thanks,
Pedro Alves



Re: protected alloca class for malloc fallback

2016-08-10 Thread Oleg Endo
On Tue, 2016-08-09 at 13:41 -0400, Trevor Saunders wrote:

> If what you want is the ability to put the buffer on the stack
> instead  of the heap then I think a stack_string class that
> interoperates with  your string class is the thing you want.

I'd avoid a separate stack_string class.  Instead use a linear
allocator with storage allocated on the stack, e.g.:

typedef std::basic_string,
  linear_allocator
my_tmp_string;

void bleh (int foo)
{
  linear_allocator buffer (alloca (1024), 1024);

  tmp_string aaa (buffer);
  aaa += "test";
  aaa += "moon";
  ...

  // what happens when "buffer" runs out of space?
  // throw?  switch to heap alloc? ...
}


> 
>  I don't really see anything wrong with a string class being a really
>  fancy smart pointer that has a bunch of useful string stuff on it. 


That's what std::string basically is?


>  As for operator == I'd be fairly ok with that, other than it hiding 
> a O(N) operation in ==.

Yes, it makes it more difficult to carry out algorithm complexity
analysis using grep...


> Regretably necessary sure, but I'm not sure its funny.

Hence the quotes.

> The first big problem with using the stl is that the subset available 
> in C++98 isn't that great, you could maybe hack up libstdc++ so that 
> you can use newer types just without the bits that use newer language 
> features, but that would take some work.

Or just wait until people have agreed to switch to C++11 or C++14.  I
don't think in practice anybody uses an C++11-incapable GCC to build a
newer GCC these days.

> The other big problem is that the stl is often too general, and tries 
> to be too simple.  std::string is actually a good example of both of 
> those problems.  There's really no reason it should use size_t 
> instead of uint32_t for the string length / capacity.

Yes, some of the things in the STL are collections of compromises for
general-purpose usage.  If you're doing something extra fancy, most
likely you can outperform the generic STL implementation.  But very
often it's actually not needed.


>   It would also be a lot better if it had separate types for strings 
> where you want an internal buffer or don't.

Using custom allocators is one way.  For example ...

template  struct linear_allocator_with_buffer;

template 
using stack_string =
  std::basic_string,
linear_allocator_with_buffer>;

But then ...

stack_string<32> a = "a";
stack_string<64> b = "b";

b += a;

... will not work because they are different types.

One drawback of using custom allocators for that kind of thing is being
unable to pass the above stack_string to a function that takes a "const
std::string&" because they differ in template parameters.  But that's
where std::string_view comes in.


Cheers,
Oleg


Re: Ping^2 Re: Implement C _FloatN, _FloatNx types [version 5]

2016-08-10 Thread Paul Richard Thomas
Dear Joseph,

The fortran part is OK. I have put FX in copy since he was
instrumental in the fortran implementation. However, I find it hard to
imagine that he will object to the renaming.

Thanks

Paul


On 10 August 2016 at 13:33, Joseph Myers  wrote:
> Ping^2.  This patch
> 
> (non-C-front-end parts) is still pending review.
>
> --
> Joseph S. Myers
> jos...@codesourcery.com



-- 
The difference between genius and stupidity is; genius has its limits.

Albert Einstein


Re: [PATCH, LRA] PR71680, Reload of slow mems

2016-08-10 Thread Vladimir N Makarov



On 08/09/2016 04:41 AM, Alan Modra wrote:

On Tue, Aug 02, 2016 at 11:02:56PM +0930, Alan Modra wrote:

This is a patch for a problem in lra, triggered by the rs6000
backend not allowing SImode in floating point registers.

Ping?  https://gcc.gnu.org/ml/gcc-patches/2016-08/msg00113.html

Note that to recreate the problem with the comment #1 testcase in the
PR you'll need to revert svn 239011, 239012, 239013 and 239217.

Alan, sorry for the delay with answer.  The patch is ok to me.  Your 
arguments are convincing.  You can commit it into the trunk.


Because it is a sensitive LRA code (the code was related to many PRs and 
it was changed many times), there is a possibility that it will 
introduce some new PRs.  So wee need to keep attention to it after you 
commit the patch.


Thank you for working on the PR.



Re: [PATCH build/doc] Replacing libiberty with gnulib

2016-08-10 Thread ayush goel
> # auto-host.h is the file containing items generated by autoconf and is
> # the first file included by config.h.
> # If host=build, it is correct to have bconfig include auto-host.h
> # as well. If host!=build, we are in error and need to do more
> # work to find out the build config parameters.
> if test x$host = x$build
> then
> build_auto=auto-host.h
> else
>
> So it sounds like you could include gnulib's config.h from within
> gcc's config.h directly. Looks like that file is generated
> by src/gcc/mkconfig.sh, so that's what you'd patch.

Thanks for the pointers, really insightful.

I can see mkconfig.sh being invoked inside gcc/Makefile.in with
different arguments to create the gcc header files (config.h etc).
So I guess I’ll copy all the definitions from gnulib/config.h inside
mkconfig.sh and try and test the system.

>
> Thanks,
> Pedro Alves
>
>


Re: [PATCH build/doc] Replacing libiberty with gnulib

2016-08-10 Thread Pedro Alves
On 08/10/2016 06:33 PM, ayush goel wrote:

> I can see mkconfig.sh being invoked inside gcc/Makefile.in with
> different arguments to create the gcc header files (config.h etc).
> So I guess I’ll copy all the definitions from gnulib/config.h inside
> mkconfig.sh and try and test the system.

What do you mean by "copy all the definitions" ?

All you should need is add '#include "gnulib/config.h"' to
the generated gcc/config.h.

Thanks,
Pedro Alves



Re: [PATCH build/doc] Replacing libiberty with gnulib

2016-08-10 Thread ayush goel
On 10 August 2016 at 11:19:42 PM, Pedro Alves (pal...@redhat.com) wrote:
> On 08/10/2016 06:33 PM, ayush goel wrote:
>
> > I can see mkconfig.sh being invoked inside gcc/Makefile.in with
> > different arguments to create the gcc header files (config.h etc).
> > So I guess I’ll copy all the definitions from gnulib/config.h inside
> > mkconfig.sh and try and test the system.
>
> What do you mean by "copy all the definitions" ?
>
> All you should need is add '#include "gnulib/config.h"' to
> the generated gcc/config.h.
>
Yes copy the file containing the definitions*

> Thanks,
> Pedro Alves
>
>


Re: [PATCH build/doc] Replacing libiberty with gnulib

2016-08-10 Thread Pedro Alves
On 08/10/2016 06:51 PM, ayush goel wrote:
> On 10 August 2016 at 11:19:42 PM, Pedro Alves (pal...@redhat.com) wrote:
>> On 08/10/2016 06:33 PM, ayush goel wrote:
>>
>>> I can see mkconfig.sh being invoked inside gcc/Makefile.in with
>>> different arguments to create the gcc header files (config.h etc).
>>> So I guess I’ll copy all the definitions from gnulib/config.h inside
>>> mkconfig.sh and try and test the system.
>>
>> What do you mean by "copy all the definitions" ?
>>
>> All you should need is add '#include "gnulib/config.h"' to
>> the generated gcc/config.h.
>>
> Yes copy the file containing the definitions*

That's a funny and confusing definition of copy.  :-)

Thanks,
Pedro Alves



Re: protected alloca class for malloc fallback

2016-08-10 Thread Jeff Law

On 08/10/2016 04:04 AM, Richard Biener wrote:

On Tue, Aug 9, 2016 at 3:17 PM, Aldy Hernandez  wrote:

On 08/05/2016 01:55 PM, Richard Biener wrote:

Hi Richard.


Please don't use std::string.  For string building you can use obstacks.



Alright let's talk details then so I can write things up in a way you
approve of.

Take for instance simple uses like all the tree_*check_failed routines,
which I thought were great candidates for std::string-- they're going to be
outputted to the screen or disk which is clearly many times more expensive
than the malloc or overhead of std::string:

  length += strlen ("expected ");
  buffer = tmp = (char *) alloca (length);
  length = 0;
  while ((code = (enum tree_code) va_arg (args, int)))
{
  const char *prefix = length ? " or " : "expected ";

  strcpy (tmp + length, prefix);
  length += strlen (prefix);
  strcpy (tmp + length, get_tree_code_name (code));
  length += strlen (get_tree_code_name (code));
}

Do you suggest using obstacks here, or did you have something else in mind?


Why would you want to get rid of the alloca here?
Do you know the range for LENGTH in the code above?  Is it based on 
something the user could potentially control (like a variable name, 
typdef name, etc).  If you don't know the length or it's possibly under 
the control of the user, then this can blow out the stack, which makes 
the code vulnerable to a stack shifting style attack by which further 
writes into the stack are actually writing into other parts of the 
stack, the heap, plt or some other location.  Essentially this gives an 
attacker control over one or more stores to memory, which is often 
enough of a vulnerability to mount an attack.


jeff


Re: protected alloca class for malloc fallback

2016-08-10 Thread Richard Biener
On August 10, 2016 8:00:20 PM GMT+02:00, Jeff Law  wrote:
>On 08/10/2016 04:04 AM, Richard Biener wrote:
>> On Tue, Aug 9, 2016 at 3:17 PM, Aldy Hernandez 
>wrote:
>>> On 08/05/2016 01:55 PM, Richard Biener wrote:
>>>
>>> Hi Richard.
>>>
 Please don't use std::string.  For string building you can use
>obstacks.
>>>
>>>
>>> Alright let's talk details then so I can write things up in a way
>you
>>> approve of.
>>>
>>> Take for instance simple uses like all the tree_*check_failed
>routines,
>>> which I thought were great candidates for std::string-- they're
>going to be
>>> outputted to the screen or disk which is clearly many times more
>expensive
>>> than the malloc or overhead of std::string:
>>>
>>>   length += strlen ("expected ");
>>>   buffer = tmp = (char *) alloca (length);
>>>   length = 0;
>>>   while ((code = (enum tree_code) va_arg (args, int)))
>>> {
>>>   const char *prefix = length ? " or " : "expected ";
>>>
>>>   strcpy (tmp + length, prefix);
>>>   length += strlen (prefix);
>>>   strcpy (tmp + length, get_tree_code_name (code));
>>>   length += strlen (get_tree_code_name (code));
>>> }
>>>
>>> Do you suggest using obstacks here, or did you have something else
>in mind?
>>
>> Why would you want to get rid of the alloca here?
>Do you know the range for LENGTH in the code above? 

Yes, it's a set of tree code names.

 Is it based on 
>something the user could potentially control (like a variable name, 
>typdef name, etc).  If you don't know the length or it's possibly under
>
>the control of the user, then this can blow out the stack, which makes 
>the code vulnerable to a stack shifting style attack by which further 
>writes into the stack are actually writing into other parts of the 
>stack, the heap, plt or some other location.  Essentially this gives an
>
>attacker control over one or more stores to memory, which is often 
>enough of a vulnerability to mount an attack.

Yes, I understand that.  The above is not such a case.  If an attacker can 
trick me into compiling (and possibly executing) his code then things are lost 
anyway.  No need for a fancy buffer overflow.

IMHO the alloca case warrants for a protection on the level of the stack 
adjustment itself.

Richard.

>
>jeff




Go patch committed

2016-08-10 Thread Ian Lance Taylor
This patch to the Go frontend by Marek Polacek fixes one missing break
statement and adds a couple of "fall through" comments to fix the Go
frontend for -Wimplicit-fallthrough.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 239315)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-8da2129a005cc1f44d4d993b0b7312b64c0d68a4
+5e05b7bc947231b4d5a8327bf63e2fa648e51dc7
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/escape.cc
===
--- gcc/go/gofrontend/escape.cc (revision 239140)
+++ gcc/go/gofrontend/escape.cc (working copy)
@@ -281,6 +281,7 @@ Node::op_format() const
{
case Runtime::PANIC:
  op << "panic";
+ break;
 
case Runtime::APPEND:
  op << "append";
@@ -1923,6 +1924,7 @@ Escape_analysis_assign::assign(Node* dst
if (!e->type()->has_pointer())
  break;
  }
+ // Fall through.
 
case Expression::EXPRESSION_CONVERSION:
case Expression::EXPRESSION_TYPE_GUARD:
Index: gcc/go/gofrontend/expressions.cc
===
--- gcc/go/gofrontend/expressions.cc(revision 239095)
+++ gcc/go/gofrontend/expressions.cc(working copy)
@@ -5767,6 +5767,7 @@ Binary_expression::do_get_backend(Transl
 case OPERATOR_DIV:
   if (left_type->float_type() != NULL || left_type->complex_type() != NULL)
 break;
+  // Fall through.
 case OPERATOR_MOD:
   is_idiv_op = true;
   break;


[PATCH] Fix caret locations in format_type_warning (PR c/72857)

2016-08-10 Thread David Malcolm
In r239253 I introduced a bug in c-format.c:format_type_warning.

In that patch I removed these lines from format_type_warning:

-  unsigned int offset_loc = type->offset_loc;
[...snip...]
-  loc = location_from_offset (loc, offset_loc);

which used the location from type->offset_loc.  Instead, I
erroneously used fmt_loc for these warnings.  This has the range of
the format, but hardcodes the caret location as the *end* of the range,
rather than the location within the format.

For example, before r239253, we would emit:

 pr72857.c:12:29: warning: field precision specifier ‘.*’ expects
 argument of type ‘int’, but argument 3 has type ‘long int’ [-Wformat=]
__builtin_sprintf (d, " %.*ld ", foo + bar, foo);
  ^

after r239253, we emit:

 pr72857.c:12:31: warning: field precision specifier ‘.*’ expects
 argument of type ‘int’, but argument 3 has type ‘long int’ [-Wformat=]
__builtin_sprintf (d, " %.*ld ", foo + bar, foo);
^~

underlining the format and the pertinent param (if it has a location),
but with the caret erroneously at the end of the format, rather than on
the "*".

With the following patch, we emit the correct caret location:

 pr72857.c:12:29: warning: field precision specifier ‘.*’ expects
 argument of type ‘int’, but argument 3 has type ‘long int’ [-Wformat=]
__builtin_sprintf (d, " %.*ld ", foo + bar, foo);
~~^~~~

The patch generalizes class substring_loc from being just a range
(pair of indices) to being a range plus a caret (three indices),
updating c-format.c to use the appropriate caret location.
Doing so means we have to pass a location_t around (containing the
caret+range) rather than a source_range.

For multi-column charaters the patch uses the start of the source
character for the caret, so the locations for embedded NULs in
gcc.dg/format/asm_fprintf-1.c and c90-printf-1.c revert to their locations
before r239253, i.e. from:
  "  \0  "
 ~^
to
  "  \0  "
 ^~
Other locations in c90-printf-1.c revert to their locations due to the fix
for PR c/72857.

Successfully bootstrapped®rtested on x86_64-pc-linux-gnu.

OK for trunk?

gcc/c-family/ChangeLog:
PR c/72857
* c-common.c (substring_loc::get_range): Rename to...
(substring_loc::get_location): ...this, converting param from a
source_range * to a location_t *.  Call
get_source_location_for_substring rather than
get_source_range_for_substring, and pass in m_caret_idx.
* c-common.h (substring_loc::substring_loc): Add param "caret_idx".
(substring_loc::get_range): Replace with...
(substring_loc::get_location): ...this.
(substring_loc::set_caret_index): New method.
(substring_loc): Add field m_caret_idx.
* c-format.c (format_warning_va): Update for above changes.
Rename local "substring_loc" to "fmt_substring_loc" to avoid
clashing with type name.
(format_warning_at_char): Add caret_idx param to substring_loc ctor.
(check_argument_type): Likewise.
(format_type_warning): Rename param "fmt_loc" to "whole_fmt_loc"
Use a copy when emitting warnings, setting the caret index from TYPE.

gcc/ChangeLog:
PR c/72857
* input.c (get_source_range_for_substring): Rename to...
(get_source_location_for_substring): ...this, adding param
"caret_idx", and converting output param from source_range * to
location_t *.
(get_source_range_for_char): New function.
(get_num_source_ranges_for_substring): Update comment to reflect
above renaming.
(assert_char_at_range): Update to use get_source_range_for_char
rather than get_source_range_for_substring.
(test_lexer_string_locations_concatenation_2): Likewise.
* substring-locations.h (get_source_range_for_substring): Rename
to...
(get_source_location_for_substring): ...this, and adding param
"caret_idx", and converting output param from source_range * to
location_t *.

gcc/testsuite/ChangeLog:
PR c/72857
* gcc.dg/format/asm_fprintf-1.c: Restore column numbers
for embedded NUL.
* gcc.dg/format/c90-printf-1.c: Restore column numbers.
* gcc.dg/format/diagnostic-ranges.c (test_hex): Update expected
caret placement.
(test_oct): Likewise.
(test_multiple): Likewise.
(test_field_width_specifier): Likewise.
(test_field_width_specifier_2): New function.
(test_field_precision_specifier): New function.
(test_embedded_nul): Update expected caret placement.
(test_non_contiguous_strings): Update line number.
* gcc.dg/plugin/diagnostic-test-string-literals-1.c
(__emit_string_literal_range): Add "caret_idx" param.
(test_simple_string_literal): Add value for new param, updating
expected output..
(test_con

Re: [PATCH] Add mark_spam.py script

2016-08-10 Thread Joseph Myers
The latest spam bugs have spam attachments as well.  I'm not sure if the 
API can delete attachments, but it would be helpful for the script to do 
as much as possible with them (change filenames, descriptions, MIME types, 
mark them as obsolete).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Add mark_spam.py script

2016-08-10 Thread Martin Liška

On 08/10/2016 10:49 PM, Joseph Myers wrote:

The latest spam bugs have spam attachments as well.  I'm not sure if the
API can delete attachments, but it would be helpful for the script to do
as much as possible with them (change filenames, descriptions, MIME types,
mark them as obsolete).



I'm testing this, if it's working I'll install the patch.

Martin
diff --git a/contrib/mark_spam.py b/contrib/mark_spam.py
index cc394dc..569a03d 100755
--- a/contrib/mark_spam.py
+++ b/contrib/mark_spam.py
@@ -67,6 +67,18 @@ def mark_as_spam(id, api_key, verbose):
 print(r)
 print(r.text)
 
+# 4) mark all attachments as spam
+r = requests.get(u + '/attachment')
+response = json.loads(r.text)
+attachments = response['bugs'][str(id)]
+for a in attachments:
+attachment_id = a['id']
+url = '%sbug/attachment/%d' % (base_url, attachment_id)
+r = requests.put(url, json = {'ids': [attachment_id], 'summary': 'spam', 'comment': 'spam', 'is_obsolete': True, 'api_key': api_key})
+if verbose:
+print(r)
+print(r.text)
+
 parser = argparse.ArgumentParser(description='Mark Bugzilla issues as spam.')
 parser.add_argument('api_key', help = 'API key')
 parser.add_argument('range', help = 'Range of IDs, e.g. 10-23,24,25,27')


C++ PATCH for C++17 constexpr if

2016-08-10 Thread Jason Merrill
This patch implements the C++17 constexpr if feature.  The primary use
is in templates, where the non-taken branch of the constexpr
if-statement is not instantiated.  But it can also be used outside of
templates, where the branches are parsed and discarded, and discarded
return statements are not used for return type deduction.

constexpr if is also active with a pedwarn in C++11 and C++14.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit 504a8c704153edc559e7108321892e7809cdcc2f
Author: Jason Merrill 
Date:   Wed Aug 10 14:26:01 2016 -0400

Implement C++17 constexpr if.

* cp-tree.h (IF_STMT_CONSTEXPR_P): New.
* name-lookup.c (push_to_top_level, pop_from_top_level_1): Handle it.
* parser.h (struct cp_parser): Add in_discarded_stmt field.
* parser.c (cp_parser_selection_statement): Handle 'if constexpr'.
(cp_parser_jump_statement): Avoid deducing from a discarded return.
* pt.c (tsubst_expr): Only instantiate taken branch of constexpr if.
* semantics.c (begin_if_stmt): Set the binding level this_entity.
(finish_if_stmt_cond): Require the condition of a
constexpr if to be constant.
* decl.c (level_for_constexpr_if): New.
(named_label_entry): Add in_constexpr_if field.
(poplevel_named_label_1): Set it.
(check_goto): Check it.
(check_previous_goto_1): Check level_for_constexpr_if.

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index f98b1c4..8a32f17 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -145,6 +145,7 @@ operator == (const cp_expr &lhs, tree rhs)
   WILDCARD_PACK_P (in WILDCARD_DECL)
   BLOCK_OUTER_CURLY_BRACE_P (in BLOCK)
   FOLD_EXPR_MODOP_P (*_FOLD_EXPR)
+  IF_STMT_CONSTEXPR_P (IF_STMT)
1: IDENTIFIER_VIRTUAL_P (in IDENTIFIER_NODE)
   TI_PENDING_TEMPLATE_FLAG.
   TEMPLATE_PARMS_FOR_INLINE.
@@ -4530,6 +4531,7 @@ more_aggr_init_expr_args_p (const 
aggr_init_expr_arg_iterator *iter)
 #define THEN_CLAUSE(NODE)  TREE_OPERAND (IF_STMT_CHECK (NODE), 1)
 #define ELSE_CLAUSE(NODE)  TREE_OPERAND (IF_STMT_CHECK (NODE), 2)
 #define IF_SCOPE(NODE) TREE_OPERAND (IF_STMT_CHECK (NODE), 3)
+#define IF_STMT_CONSTEXPR_P(NODE) TREE_LANG_FLAG_0 (IF_STMT_CHECK (NODE))
 
 /* WHILE_STMT accessors. These give access to the condition of the
while statement and the body of the while statement, respectively.  */
@@ -6303,7 +6305,7 @@ extern void add_decl_expr (tree);
 extern tree maybe_cleanup_point_expr_void  (tree);
 extern tree finish_expr_stmt   (tree);
 extern tree begin_if_stmt  (void);
-extern void finish_if_stmt_cond(tree, tree);
+extern tree finish_if_stmt_cond(tree, tree);
 extern tree finish_then_clause (tree);
 extern void begin_else_clause  (tree);
 extern void finish_else_clause (tree);
diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 45286d0..43cf3df 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -218,6 +218,7 @@ struct GTY((for_user)) named_label_entry {
   bool in_catch_scope;
   bool in_omp_scope;
   bool in_transaction_scope;
+  bool in_constexpr_if;
 };
 
 #define named_labels cp_function_chain->x_named_labels
@@ -476,6 +477,16 @@ objc_mark_locals_volatile (void *enclosing_blk)
 }
 }
 
+/* True if B is the level for the condition of a constexpr if.  */
+
+static bool
+level_for_constexpr_if (cp_binding_level *b)
+{
+  return (b->kind == sk_cond && b->this_entity
+ && TREE_CODE (b->this_entity) == IF_STMT
+ && IF_STMT_CONSTEXPR_P (b->this_entity));
+}
+
 /* Update data for defined and undefined labels when leaving a scope.  */
 
 int
@@ -512,6 +523,10 @@ poplevel_named_label_1 (named_label_entry **slot, 
cp_binding_level *bl)
case sk_transaction:
  ent->in_transaction_scope = true;
  break;
+   case sk_block:
+ if (level_for_constexpr_if (bl->level_chain))
+   ent->in_constexpr_if = true;
+ break;
default:
  break;
}
@@ -3047,7 +3062,7 @@ check_previous_goto_1 (tree decl, cp_binding_level* 
level, tree names,
   cp_binding_level *b;
   bool complained = false;
   int identified = 0;
-  bool saw_eh = false, saw_omp = false, saw_tm = false;
+  bool saw_eh = false, saw_omp = false, saw_tm = false, saw_cxif = false;
 
   if (exited_omp)
 {
@@ -3132,6 +3147,20 @@ check_previous_goto_1 (tree decl, cp_binding_level* 
level, tree names,
"  enters synchronized or atomic statement");
  saw_tm = true;
}
+  if (!saw_cxif && b->kind == sk_block
+ && level_for_constexpr_if (b->level_chain))
+   {
+ if (identified < 2)
+   {
+ complained = identify_goto (decl, input_location, locus,
+ DK_ERROR);
+ identified = 2;
+   }
+ if (complained)
+

Re: C++ PATCH for C++17 constexpr if

2016-08-10 Thread Jason Merrill
On Wed, Aug 10, 2016 at 5:57 PM, Jason Merrill  wrote:
> This patch implements the C++17 constexpr if feature.  The primary use
> is in templates, where the non-taken branch of the constexpr
> if-statement is not instantiated.  But it can also be used outside of
> templates, where the branches are parsed and discarded, and discarded
> return statements are not used for return type deduction.

...and the feature test macro.
commit 1f20a739d61ef839ec7091aa1278adf6d726ead4
Author: Jason Merrill 
Date:   Wed Aug 10 18:35:01 2016 -0400

* c-cppbuiltin.c (c_cpp_builtins): Define __cpp_if_constexpr.

diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c
index 46c70ac..82ed19d 100644
--- a/gcc/c-family/c-cppbuiltin.c
+++ b/gcc/c-family/c-cppbuiltin.c
@@ -882,6 +882,7 @@ c_cpp_builtins (cpp_reader *pfile)
  cpp_define (pfile, "__cpp_nontype_template_args=201411");
  cpp_define (pfile, "__cpp_range_based_for=201603");
  cpp_define (pfile, "__cpp_constexpr=201603");
+ cpp_define (pfile, "__cpp_if_constexpr=201606");
}
   if (flag_concepts)
/* Use a value smaller than the 201507 specified in
diff --git a/gcc/testsuite/g++.dg/cpp1z/feat-cxx1z.C 
b/gcc/testsuite/g++.dg/cpp1z/feat-cxx1z.C
index f5ed6ab..41b6111 100644
--- a/gcc/testsuite/g++.dg/cpp1z/feat-cxx1z.C
+++ b/gcc/testsuite/g++.dg/cpp1z/feat-cxx1z.C
@@ -344,6 +344,12 @@
 #  error "__cpp_hex_float != 201603"
 #endif
 
+#ifndef __cpp_if_constexpr
+#  error "__cpp_if_constexpr"
+#elif __cpp_if_constexpr != 201606
+#  error "__cpp_if_constexpr != 201606"
+#endif
+
 #ifdef __has_cpp_attribute
 
 #  if ! __has_cpp_attribute(maybe_unused)


Re: [PR72835] Incorrect arithmetic optimization involving bitfield arguments

2016-08-10 Thread kugan

Hi,

On 10/08/16 20:28, Richard Biener wrote:

On Wed, Aug 10, 2016 at 10:57 AM, Jakub Jelinek  wrote:

On Wed, Aug 10, 2016 at 08:51:32AM +1000, kugan wrote:

I see it now. The problem is we are just looking at (-1) being in the ops
list for passing changed to rewrite_expr_tree in the case of multiplication
by negate.  If we have combined (-1), as in the testcase, we will not have
the (-1) and will pass changed=false to rewrite_expr_tree.

We should set changed based on what happens in try_special_add_to_ops.
Attached patch does this. Bootstrap and regression testing are ongoing. Is
this OK for trunk if there is no regression.


I think the bug is elsewhere.  In particular in
undistribute_ops_list/zero_one_operation/decrement_power.
All those look problematic in this regard, they change RHS of statements
to something that holds a different value, while keeping the LHS.
So, generally you should instead just add a new stmt next to the old one,
and adjust data structures (replace the old SSA_NAME in some ->op with
the new one).  decrement_power might be a problem here, dunno if all the
builtins are const in all cases that DSE would kill the old one,
Richard, any preferences for that?  reset flow sensitive info + reset debug
stmt uses, or something different?  Though, replacing the LHS with a new
anonymous SSA_NAME might be needed too, in case it is before SSA_NAME of a
user var that doesn't yet have any debug stmts.


I'd say replacing the LHS is the way to go, with calling the appropriate helper
on the old stmt to generate a debug stmt for it / its uses (would need
to look it
up here).



Here is an attempt to fix it. The problem arises when in 
undistribute_ops_list, we linearize_expr_tree such that NEGATE_EXPR is 
added (-1) MULT_EXPR (OP). Real problem starts when we handle this in 
zero_one_operation. Unlike what was done earlier, we now change the stmt 
(with propagate_op_to_signle use or by directly) such that the value 
computed by stmt is no longer what it used to be. Because of this, what 
is computed in undistribute_ops_list and rewrite_expr_tree are also changed.


undistribute_ops_list already expects this but rewrite_expr_tree will 
not if we dont pass the changed as an argument.


The way I am fixing this now is, in linearize_expr_tree, I set 
ops_changed  to true if we change NEGATE_EXPR to (-1) MULT_EXPR (OP). 
Then when we call zero_one_operation with ops_changed = true, I replace 
all the LHS in zero_one_operation with the new SSA and replace all the 
uses. I also call the rewrite_expr_tree with changed = false in this case.


Does this make sense? Bootstrapped and regression tested for 
x86_64-linux-gnu without any new regressions.


Thanks,
Kugan


gcc/testsuite/ChangeLog:

2016-08-10  Kugan Vivekanandarajah  

PR tree-optimization/72835
* gcc.dg/tree-ssa/pr72835.c: New test.

gcc/ChangeLog:

2016-08-10  Kugan Vivekanandarajah  

PR tree-optimization/72835
	* tree-ssa-reassoc.c (zero_one_operation): Incase of NEGATE_EXPR create 
and use

 new SSA_NAME.
(try_special_add_to_ops): Return true if we changed the value in 
operands.
	(linearize_expr_tree): Return true if try_special_add_top_ops set 
ops_changed to true.

(undistribute_ops_list): Likewise.
	(reassociate_bb): Pass ops_changed returned by linearlize_expr_tree to 
rewrite_expr_tree.




whil cif we change the operands such that the

/zero_one_operation
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr72835.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr72835.c
index e69de29..049eddc 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr72835.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr72835.c
@@ -0,0 +1,36 @@
+/* PR tree-optimization/72835.  */
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+struct struct_1 {
+unsigned int m1 : 6 ;
+unsigned int m2 : 24 ;
+unsigned int m3 : 6 ;
+};
+
+unsigned short var_32 = 0x2d10;
+
+struct struct_1 s1;
+
+void init ()
+{
+  s1.m1 = 4;
+  s1.m2 = 0x7ca4b8;
+  s1.m3 = 24;
+}
+
+void foo ()
+{
+  unsigned int c
+= ((unsigned int) s1.m2) * (-((unsigned int) s1.m3))
++ (var_32) * (-((unsigned int) (s1.m1)));
+  if (c != 4098873984)
+__builtin_abort ();
+}
+
+int main ()
+{
+init ();
+foo ();
+return 0;
+}
diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
index 7fd7550..038da41 100644
--- a/gcc/tree-ssa-reassoc.c
+++ b/gcc/tree-ssa-reassoc.c
@@ -1039,7 +1039,7 @@ eliminate_using_constants (enum tree_code opcode,
 
 
 static void linearize_expr_tree (vec *, gimple *,
-bool, bool);
+bool, bool, bool *);
 
 /* Structure for tracking and counting operands.  */
 struct oecount {
@@ -1183,7 +1183,7 @@ propagate_op_to_single_use (tree op, gimple *stmt, tree 
*def)
is updated if there is only one operand but no operation left.  */
 
 static void
-zero_one_operation (tree *def, enum tree_code opcode, tree op)
+zero_one_operation (tree *def, enum tree_code opcode, tree op

[v3 PATCH] Implement C++17 make_from_tuple.

2016-08-10 Thread Ville Voutilainen
I was in the middle of doing this, so here's the patch before I commit
the string_view one.

Tested on Linux-x64.

2016-08-11  Ville Voutilainen  

Implement C++17 make_from_tuple.
* include/std/tuple (__make_from_tuple_impl, make_from_tuple): New.
* testsuite/20_util/tuple/make_from_tuple/1.cc: Likewise.
diff --git a/libstdc++-v3/include/std/tuple b/libstdc++-v3/include/std/tuple
index b9074cb..3403048 100644
--- a/libstdc++-v3/include/std/tuple
+++ b/libstdc++-v3/include/std/tuple
@@ -1655,6 +1655,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   return __apply_impl(std::forward<_Fn>(__f), std::forward<_Tuple>(__t),
  _Indices{});
 }
+
+  template 
+constexpr _Tp
+__make_from_tuple_impl(_Tuple&& __t, index_sequence<_Idx...>)
+{ return _Tp(get<_Idx>(std::forward<_Tuple>(__t))...); }
+
+  template 
+constexpr _Tp
+make_from_tuple(_Tuple&& __t)
+{
+  return __make_from_tuple_impl<_Tp>(
+std::forward<_Tuple>(__t),
+   make_index_sequence>>{});
+}
 #endif // C++17
 
   /// @}
diff --git a/libstdc++-v3/testsuite/20_util/tuple/make_from_tuple/1.cc 
b/libstdc++-v3/testsuite/20_util/tuple/make_from_tuple/1.cc
new file mode 100644
index 000..7963ea1
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/tuple/make_from_tuple/1.cc
@@ -0,0 +1,55 @@
+// Copyright (C) 2016 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-options "-std=gnu++17" }
+
+#include 
+#include 
+#include 
+
+template 
+struct ThreeParam
+{
+  T t;
+  U u;
+  V v;
+  ThreeParam(const T& t, const U& u, const V& v)
+: t(t),
+  u(u),
+  v(v) {}
+};
+
+void
+test01()
+{
+  auto x = std::make_tuple(1024, 'x', 2048);
+  ThreeParam y
+= std::make_from_tuple>(x);
+  VERIFY(y.t == 1024 && y.u == 'x' && y.v == 2048);
+  auto x2 = std::make_tuple(4096, 'z');
+  std::pair z = std::make_from_tuple>(x2);
+  VERIFY(z.first == 4096 && z.second == 'z');
+  auto x3 = std::make_tuple(8192);
+  int i = std::make_from_tuple(x3);
+  VERIFY(i == 8192);
+}
+
+int
+main()
+{
+  test01();
+}


Re: protected alloca class for malloc fallback

2016-08-10 Thread Trevor Saunders
On Thu, Aug 11, 2016 at 02:03:29AM +0900, Oleg Endo wrote:
> On Tue, 2016-08-09 at 13:41 -0400, Trevor Saunders wrote:
> 
> > If what you want is the ability to put the buffer on the stack
> > instead  of the heap then I think a stack_string class that
> > interoperates with  your string class is the thing you want.
> 
> I'd avoid a separate stack_string class.  Instead use a linear
> allocator with storage allocated on the stack, e.g.:
> 
> typedef std::basic_string,
>   linear_allocator
> my_tmp_string;
> 
> void bleh (int foo)
> {
>   linear_allocator buffer (alloca (1024), 1024);
> 
>   tmp_string aaa (buffer);
>   aaa += "test";
>   aaa += "moon";
>   ...
> 
>   // what happens when "buffer" runs out of space?
>   // throw?  switch to heap alloc? ...
> }
> 
> 
> > 
> >  I don't really see anything wrong with a string class being a really
> >  fancy smart pointer that has a bunch of useful string stuff on it. 
> 
> 
> That's what std::string basically is?

sure, but I'm saying it does that job baddly.

> >  As for operator == I'd be fairly ok with that, other than it hiding 
> > a O(N) operation in ==.
> 
> Yes, it makes it more difficult to carry out algorithm complexity
> analysis using grep...
> 
> 
> > Regretably necessary sure, but I'm not sure its funny.
> 
> Hence the quotes.
> 
> > The first big problem with using the stl is that the subset available 
> > in C++98 isn't that great, you could maybe hack up libstdc++ so that 
> > you can use newer types just without the bits that use newer language 
> > features, but that would take some work.
> 
> Or just wait until people have agreed to switch to C++11 or C++14.  I
> don't think in practice anybody uses an C++11-incapable GCC to build a
> newer GCC these days.

I'm pretty sure they do, I've seen patches for 4.1 and 4.3 go in this
cycle, and rval refs didn't work until 4.4 and that was to an older
version of the spec.  Anyways people apparently use compilers other than
gcc to build gcc, and I'm prepared to believe some of the proprietery
compilers on ancient unix variants don't support C++11.

> > The other big problem is that the stl is often too general, and tries 
> > to be too simple.  std::string is actually a good example of both of 
> > those problems.  There's really no reason it should use size_t 
> > instead of uint32_t for the string length / capacity.
> 
> Yes, some of the things in the STL are collections of compromises for
> general-purpose usage.  If you're doing something extra fancy, most
> likely you can outperform the generic STL implementation.  But very
> often it's actually not needed.

Well, I'd say the compromises made in std::string make it pretty
terrible for general purpose use accept where perf and memory doesn't
matter at all.

std::vector isn't very great size wise either, imho the size / capacity
fields would be much better put in the buffer than in the struct itself,
to save 2 * sizeof (void *) in sizeof std::vector.

I haven't looked at other stl datastructures that much, but those two
examples don't fill me with confidence :/

> >   It would also be a lot better if it had separate types for strings 
> > where you want an internal buffer or don't.
> 
> Using custom allocators is one way.  For example ...
> 
> template  struct linear_allocator_with_buffer;
> 
> template 
> using stack_string =
>   std::basic_string,
> linear_allocator_with_buffer>;

that *is* a different type than std::string with the default template
arguments.

> But then ...
> 
> stack_string<32> a = "a";
> stack_string<64> b = "b";
> 
> b += a;
> 
> ... will not work because they are different types.
> 
> One drawback of using custom allocators for that kind of thing is being
> unable to pass the above stack_string to a function that takes a "const
> std::string&" because they differ in template parameters.  But that's
> where std::string_view comes in.

or just having your stack_string type convert to the normal string type
so that you can pass mutable references.

Trev

> 
> 
> Cheers,
> Oleg


Re: [PATCH] objc: update documetation and add test-case of constructor/destructor attr.

2016-08-10 Thread Sandra Loosemore

On 08/10/2016 03:11 AM, Martin Liška wrote:

Hi.

Following patch clarifies usage of ctor and dtor attributes for Objective C.
Patch survives (on x86_64-linux-gnu):

make -k check-objc RUNTESTFLAGS="execute.exp"

Ready for trunk?


The documentation fix looks fine, but probably an objc maintainer needs 
to confirm that it's not just an accident that the test case works.


-Sandra



Re: [PATCH v2] cpp/c: Add -Wexpansion-to-defined

2016-08-10 Thread Sandra Loosemore

On 08/10/2016 04:06 AM, Paolo Bonzini wrote:


===
--- gcc/doc/invoke.texi (revision 239276)
+++ gcc/doc/invoke.texi (working copy)
@@ -4914,6 +4914,12 @@
  construct, known from C++, was introduced with ISO C99 and is by default
  allowed in GCC@.  It is not supported by ISO C90.  @xref{Mixed Declarations}.

+@item -Wexpansion-to-defined
+@opindex Wexpansion-to-defined
+Warn whenever @samp{defined} is encountered in the expansion of a macro.
+(including the case where the macro is expanded by an @samp{#if} directive).


You've got too many periods in that sentence, and I'd get rid of the 
parentheses, too:


Warn whenever @samp{defined} is encountered in the expansion of a macro,
including the case where the macro is expanded by an @samp{#if} directive.

-Sandra the nit-picky



Re: Fwd: [PATCH, doc/ARM] Remove false affirmation that Thumb cannot use an FPU

2016-08-10 Thread Sandra Loosemore

On 08/10/2016 09:51 AM, Thomas Preudhomme wrote:


diff --git a/gcc/doc/fragments.texi b/gcc/doc/fragments.texi
index 
b6d8541c8ca820fa732363a05221e2cd4d1251c2..a060635c9cee7374d9d187858ac87acdd08860f2
 100644
--- a/gcc/doc/fragments.texi
+++ b/gcc/doc/fragments.texi
@@ -117,12 +117,15 @@ specified, there are combinations that should not be 
built.  In that
 case, set @code{MULTILIB_EXCEPTIONS} to be all of the switch exceptions
 in shell case syntax that should not be built.

-For example the ARM processor cannot execute both hardware floating
-point instructions and the reduced size THUMB instructions at the same
-time, so there is no need to build libraries with both of these
-options enabled.  Therefore @code{MULTILIB_EXCEPTIONS} is set to:
+For example on ARM targets @code{-mfloat-abi=soft} requests to use a
+softfloat implementation for floating-point operations.  Therefore, it


For example, on ARM targets @option{-mfloat-abi=soft} requests use of 
software floating-point operations.  Therefore, it



+does not make sense to find both @code{-mfloat-abi=soft} and an


@option here too


+@code{mfpu} option on the command line so @code{MULTILIB_EXCEPTIONS}


and here @option{-mfpu}


+could contain the following exception (assuming that @code{-mfloat-abi}


@option


+comes after in MULTILIB_OPTIONS and given that soft is the default


@code markup on MULTILIB_OPTIONS?
@samp markup on soft?


+value):
 @smallexample
-*mthumb/*mhard-float*
+*mfpu=*
 @end smallexample

 @findex MULTILIB_REQUIRED


-Sandra



Re: [RFC][PR61839]Convert CST BINOP COND_EXPR to COND_EXPR ? (CST BINOP 1) : (CST BINOP 0)

2016-08-10 Thread kugan

Hi Richard,

On 09/08/16 20:22, Richard Biener wrote:

On Tue, Aug 9, 2016 at 4:51 AM, Kugan Vivekanandarajah
 wrote:

Hi Richard,

Thanks for the review.

On 29 April 2016 at 20:47, Richard Biener  wrote:

On Sun, Apr 17, 2016 at 1:14 AM, kugan
 wrote:

As explained in PR61839,

Following difference results in extra instructions:
-  c = b != 0 ? 486097858 : 972195717;
+  c = a + 972195718 >> (b != 0);

As suggested in PR, attached patch converts CST BINOP COND_EXPR to COND_EXPR
? (CST BINOP 1) : (CST BINOP 0).

Bootstrapped and regression tested for x86-64-linux-gnu with no new
regression. Is this OK for statege-1.


You are missing a testcase.

I think the transform can be generalized to any two-value value-range by
instead of

  lhs = cond_res ? (cst binop 1) : (cst binop 0)

emitting

  lhs = tmp == val1 ? (cst binop val1) : (cst binop val2);

In the PR I asked the transform to be only carried out if cond_res and
tmp have a single use (and thus they'd eventually vanish).

I'm not sure if a general two-value "constant" propagation is profitable
which is why I was originally asking for the pattern to only apply
if the resulting value is used in a comparison which we could then
in turn simplify by substituting COND_RES (or ! COND_RES) for it.
For the general two-value case we'd substitute it with tmp [=!]= val[12]
dependent on which constant is cheaper to test for.

So I think this needs some exploring work on which way to go
and which transform is profitable in the end.  I think the general
two-value case feeding a condition will be always profitable.



Please find a modified version which checks for two-valued variable
and uses this to optimize. In the attached test case (in function
bar), we end up doing the conversion twice.

Bootstrapped and regression tested on x86_64-linux-gnu without no new
regressions. Is this OK for trunk?


+/* Return true if VAR is a two-valued variable.  Set MIN and MAX when it is
+   true.  Return false otherwise.  */
+
+static bool
+two_valued_val_range_p (tree var, tree *min, tree *max)
+{

I'd use A and B, not MIN/MAX given it's two values, not necessarily
a two-valued range (for example for ~[1, UINT_MAX-1] which you
I have changed this. I don't  think this would be the only 
VR_ANTI_RANGE. Others like TYPE_MIN + 1, TYPE_MIN + 2 should come as 
VR_RANGE.



don't handle).  In theory VRP might get a more sophisticated range
representation to also allow a range consisting of just 3 and 7 for example.

I am not sure how this will be represented as value range. Is this for 
enum types where thhere is no valid values between 3 and 7 ?



+  tree tmp
+= int_const_binop (PLUS_EXPR,
+  vr->min,
+  build_int_cst_type (TREE_TYPE (var), 1));
+  if (0 != compare_values (tmp, vr->max))
+return false;

I think simply

   if (wi::sub (vr->max, vr->min) == 1)

I have changed this.



might work as well and avoid building a tree node.

+  /* Convert:
+LHS = CST BINOP VAR
+where VAR is two-valued.
+
+To:
+LHS = VAR == VAL1 ? (CST BINOP VAL1) : (CST BINOP VAL2) */
+
+  if (TREE_CODE_CLASS (rhs_code) == tcc_binary
+ && TREE_CODE (rhs1) == INTEGER_CST
+ && TREE_CODE (rhs2) == SSA_NAME

Note that for all commutative tcc_binary operators the constant will be on the
other operand.  I think you need to handle the constant appearing in both places
(and for division for example watch out for a zero divisor).


I have now canonicalized it in the beginning. I don't think it will 
affect other simplifications that comes after this transformation.




+ && has_single_use (rhs2)
+ && two_valued_val_range_p (rhs2, &min, &max))
+
+   {
+ tree cond = build2 (EQ_EXPR, TREE_TYPE (rhs2), rhs2, min);
+ tree new_rhs1 =  int_const_binop (rhs_code, rhs1, min);
+ tree new_rhs2 =  int_const_binop (rhs_code, rhs1, max);

too many spaces after '='.

Done.



+
+ if (new_rhs1 && new_rhs2)

You didn't address my point about profitability - you test for a single use
but not for the kind of use.  Please instead use
I checked with some simple test-cases. In those cases it either improves 
or no difference.




&& single_imm_use (rhs2, &use_p, &use_stmt)
&& gimple_code (use_stmt) == GIMPLE_COND

Done.



The testcase won't work on targets with small integers thus please
require int32plus.  With the existing scan-dumps it's not obvious

Done.


what transform it is testing for - please add a comment before
the dump scan reflecting the desired transform.  Maybe also scan
"optimized" instead to also verify that followup transforms trigger.


Done.


ASM difference for x86-64 is
@@ -11,11 +11,7 @@
movl$1, 12(%rsp)
movl12(%rsp), %eax
testl   %eax, %eax
-   movl$972195717, %eax
-   setne   %cl
-   sarl%cl, %eax
-   cmpl$486097858, %eax
-   jne .L5
+   je  .L5
xorl%eax, %eax