date:20180319

Re: [PATCH][GCC][ARM] Fix can_change_mode_class for big-endian

2018-03-19 Thread Tamar Christina

Sending to list

Thu 03/19/2018 08:48, Tamar Christina wrote:
> Hi Christophe,
> 
> The 03/16/2018 13:50, Christophe Lyon wrote:
> > On 15 March 2018 at 11:19, Kyrill  Tkachov  
> > wrote:
> > > Hi Tamar,
> > >
> > >
> > >> Regtested on armeb-none-eabi and no regressions.
> > >> Bootstrapped on arm-none-linux-gnueabihf and no issues.
> > >>
> > >> Ok for trunk? and for backport to GCC 7?
> > >>
> > >
> > > Ok for trunk.
> > > Please wait for a few days before backporting.
> > >
> > 
> > Hi Tamar,
> > 
> > Strangely I have noticed regressions on armeb, I have updated bugzilla
> > accordingly.
> 
> Thanks, seems testsuite didn't catch it with our default options.
> 
> This seems to be a combine issue as it's removing subregs it thinks is a
> no-op, causing the vec_select not to be done. It was working before because
> we were saying any subreg is invalid in arm-be, so if the code didn't get
> stuck in an endless loop, it would skip the no-op check.
> 
> I don't see anyway to fix this in the back-end alone as combine gives no way
> to tell it that the instruction is not a no-op. Even if I split the 
> instruction
> early on to explicitly use a new register, combine will combine the mov and 
> vec_select
> again and come to the same conclusion. =
> 
> So I will revert this for GCC 8 and fix it for GCC 9. Better the compiler ICE 
> than
> generate bad code.
> 
> Thanks,
> Tamar
> 
> > 
> > Thanks,
> > 
> > Christophe
> > 
> > > Thanks,
> > > Kyrill
> > >
> > >
> > >> Thanks,
> > >> Tamar
> > >>
> > >> gcc/
> > >> 2018-03-05  Tamar Christina  
> > >>
> > >> PR target/84711
> > >> * config/arm/arm.c (arm_can_change_mode_class): Use
> > >> GET_MODE_UNIT_SIZE
> > >> instead of GET_MODE_SIZE when comparing Units.
> > >>
> > >> gcc/testsuite/
> > >> 2018-03-05  Tamar Christina  
> > >>
> > >> PR target/84711
> > >> * gcc.target/arm/big-endian-subreg.c: New.
> > >>
> > >> --
> > >
> > >
> 
> -- 

--

Re: [PATCH] Fix PR84512

2018-03-19 Thread Richard Biener

On Fri, 16 Mar 2018, Tom de Vries wrote:

> On 03/16/2018 12:55 PM, Richard Biener wrote:
> > On Fri, 16 Mar 2018, Tom de Vries wrote:
> > 
> > > On 02/27/2018 01:42 PM, Richard Biener wrote:
> > > > Index: gcc/testsuite/gcc.dg/tree-ssa/pr84512.c
> > > > ===
> > > > --- gcc/testsuite/gcc.dg/tree-ssa/pr84512.c (nonexistent)
> > > > +++ gcc/testsuite/gcc.dg/tree-ssa/pr84512.c (working copy)
> > > > @@ -0,0 +1,15 @@
> > > > +/* { dg-do compile } */
> > > > +/* { dg-options "-O3 -fdump-tree-optimized" } */
> > > > +
> > > > +int foo()
> > > > +{
> > > > +  int a[10];
> > > > +  for(int i = 0; i < 10; ++i)
> > > > +a[i] = i*i;
> > > > +  int res = 0;
> > > > +  for(int i = 0; i < 10; ++i)
> > > > +res += a[i];
> > > > +  return res;
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-tree-dump "return 285;" "optimized" } } */
> > > 
> > > This fails for nvptx, because it doesn't have the required vector
> > > operations.
> > > To fix the fail, I've added requiring effective target vect_int_mult.
> > 
> > On targets that do not vectorize you should see the scalar loops unrolled
> > instead.  Or do you have only one loop vectorized?
> 
> Sort of. Loop vectorization has no effect, and the scalar loops are completely
> unrolled. But then slp vectorization vectorizes the stores.
> 
> So at optimized we have:
> ...
>   MEM[(int *)&a] = { 0, 1 };
>   MEM[(int *)&a + 8B] = { 4, 9 };
>   MEM[(int *)&a + 16B] = { 16, 25 };
>   MEM[(int *)&a + 24B] = { 36, 49 };
>   MEM[(int *)&a + 32B] = { 64, 81 };
>   _6 = a[0];
>   _28 = a[1];
>   res_29 = _6 + _28;
>   _35 = a[2];
>   res_36 = res_29 + _35;
>   _42 = a[3];
>   res_43 = res_36 + _42;
>   _49 = a[4];
>   res_50 = res_43 + _49;
>   _56 = a[5];
>   res_57 = res_50 + _56;
>   _63 = a[6];
>   res_64 = res_57 + _63;
>   _70 = a[7];
>   res_71 = res_64 + _70;
>   _77 = a[8];
>   res_78 = res_71 + _77;
>   _2 = a[9];
>   res_11 = _2 + res_78;
>   a ={v} {CLOBBER};
>   return res_11;
> ...
> 
> The stores and loads are eliminated by dse1 in the rtl phase, and in the end
> we have:
> ...
> .visible .func (.param.u32 %value_out) foo
> {
> .reg.u32 %value;
> .local .align 16 .b8 %frame_ar[48];
> .reg.u64 %frame;
> cvta.local.u64 %frame, %frame_ar;
> mov.u32 %value, 285;
> st.param.u32[%value_out], %value;
> ret;
> }
> ...
> 
> > That's precisely
> > what the PR was about...  which means it isn't fixed for nvptx :/
> 
> Indeed the assembly is not optimal, and would be optimal if we'd have optimal
> code at optimized.
> 
> FWIW, using this patch we generate optimal code at optimized:
> ...
> diff --git a/gcc/passes.def b/gcc/passes.def
> index 3ebcfc30349..6b64f600c4a 100644
> --- a/gcc/passes.def
> +++ b/gcc/passes.def
> @@ -325,6 +325,7 @@ along with GCC; see the file COPYING3.  If not see
>NEXT_PASS (pass_tracer);
>NEXT_PASS (pass_thread_jumps);
>NEXT_PASS (pass_dominator, false /* may_peel_loop_headers_p */);
> +  NEXT_PASS (pass_fre);
>NEXT_PASS (pass_strlen);
>NEXT_PASS (pass_thread_jumps);
>NEXT_PASS (pass_vrp, false /* warn_array_bounds_p */);
> ...
> 
> and we get:
> ...
> .visible .func (.param.u32 %value_out) foo
> {
> .reg.u32 %value;
> mov.u32 %value, 285;
> st.param.u32[%value_out], %value;
> ret;
> }
> ...
> 
> I could file a missing optimization PR for nvptx, but I'm not sure where this
> should be fixed.

Ah, yeah... the usual issue then.

Can you please XFAIL the test on nvptx instead of requiring vect_int_mult?

Thanks,
Richard.

C++ PATCH for c++/84925, ICE in enclosing_instantiation_of

2018-03-19 Thread Marek Polacek

Seems like with this testcase we end up in a scenario where, when counting the
lambda count in enclosing_instantiation_of, we arrive at decl_function_context
being null, so checking DECL_TEMPLATE_INFO then crashes.  It appears that just
the following does the right thing, but I'm not too sure about this function.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2018-03-19  Marek Polacek  

PR c++/84925
* pt.c (enclosing_instantiation_of): Check if fn is null.

* g++.dg/cpp1z/lambda-__func__.C: New test.

diff --git gcc/cp/pt.c gcc/cp/pt.c
index 745c9acd6ee..066cb627a70 100644
--- gcc/cp/pt.c
+++ gcc/cp/pt.c
@@ -12898,7 +12898,7 @@ enclosing_instantiation_of (tree otctx)
   for (; flambda_count < lambda_count && fn && LAMBDA_FUNCTION_P (fn);
   fn = decl_function_context (fn))
++flambda_count;
-  if (DECL_TEMPLATE_INFO (fn)
+  if ((fn && DECL_TEMPLATE_INFO (fn))
  ? most_general_template (fn) != most_general_template (tctx)
  : fn != tctx)
continue;
diff --git gcc/testsuite/g++.dg/cpp1z/lambda-__func__.C 
gcc/testsuite/g++.dg/cpp1z/lambda-__func__.C
index e69de29bb2d..d5d1c1cb7b6 100644
--- gcc/testsuite/g++.dg/cpp1z/lambda-__func__.C
+++ gcc/testsuite/g++.dg/cpp1z/lambda-__func__.C
@@ -0,0 +1,13 @@
+// PR c++/84925
+// { dg-options "-std=c++17" }
+
+template 
+struct A {
+  static const int value = 0;
+  static auto constexpr fn = [] { return __func__; };
+};
+
+template 
+int x = A::value;
+
+auto s = x;

Marek

Re: [PATCH] Fix PR84859

2018-03-19 Thread Jakub Jelinek

On Fri, Mar 16, 2018 at 03:08:17PM +0100, Richard Biener wrote:
> Bootstrapped and tested on x86_64-unknown-linux-gnu, ok?

Ok, thanks.

Jakub

Re: [Patch AArch64] Turn on -fasynchronous-unwind-tables and -funwind-tables by default.

2018-03-19 Thread James Greenhalgh

On Tue, Mar 13, 2018 at 01:35:56PM +, Ramana Radhakrishnan wrote:
> Jakub commented here that
> https://gcc.gnu.org/ml/gcc-patches/2018-02/msg01325.html we don't turn
> on fasynchronous-unwind-tables for AArch64. I note that x86_64 turns on
> funwind-tables as well. Thus this patch turns on both.
> 
> Note that this doesn't increase code size but will likely increase
> binary size because we now have a lot more eh_frame / debug_frame
> information being spat out. We probably need to do get the clang /llvm
> guys to do the same but I'll prod them separately.
> 
> Bootstrapped and regression tested on aarch64-none-linux-gnu.
> 
> Ok ?

OK.

Thanks,
James

Re: [PATCH][AARCH64] PR target/84521 Fix frame pointer corruption with -fomit-frame-pointer with __builtin_setjmp

2018-03-19 Thread James Greenhalgh

On Wed, Mar 14, 2018 at 05:40:49PM +, Sudi Das wrote:
> Hi
> 
> This patch is another partial fix for PR 84521. This is adding a 
> definition to one of the target hooks used in the SJLJ implemetation so 
> that AArch64 defines the hard_frame_pointer_rtx as the 
> TARGET_BUILTIN_SETJMP_FRAME_VALUE. As pointed out by Wilco there is 
> still a lot more work to be done for these builtins in the future.
> 
> Testing: Bootstrapped and regtested on aarch64-none-linux-gnu and added 
> new test.
> 
> Is this ok for trunk?

OK.

Thanks,
James

> *** gcc/ChangeLog ***
> 
> 2018-03-14  Sudakshina Das  
> 
>   * builtins.c (expand_builtin_setjmp_receiver): Update condition
>   to restore frame pointer.
>   * config/aarch64/aarch64.h (DONT_USE_BUILTIN_SETJMP): Update
>   comment.
>   * config/aarch64/aarch64.c (aarch64_builtin_setjmp_frame_value):
>   New.
>   (TARGET_BUILTIN_SETJMP_FRAME_VALUE): Define.
> 
> *** gcc/testsuite/ChangeLog ***
> 
> 2018-03-14  Sudakshina Das  
> 
>   * gcc.c-torture/execute/pr84521.c: New test.

> diff --git a/gcc/builtins.c b/gcc/builtins.c
> index 85affa7..640f1a9 100644
> --- a/gcc/builtins.c
> +++ b/gcc/builtins.c
> @@ -898,7 +898,8 @@ expand_builtin_setjmp_receiver (rtx receiver_label)
>  
>/* Now put in the code to restore the frame pointer, and argument
>   pointer, if needed.  */
> -  if (! targetm.have_nonlocal_goto ())
> +  if (! targetm.have_nonlocal_goto ()
> +  && targetm.builtin_setjmp_frame_value () != hard_frame_pointer_rtx)
>  {
>/* First adjust our frame pointer to its actual value.  It was
>previously set to the start of the virtual area corresponding to
> diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
> index e3c52f6..7a21c14 100644
> --- a/gcc/config/aarch64/aarch64.h
> +++ b/gcc/config/aarch64/aarch64.h
> @@ -474,7 +474,9 @@ extern unsigned aarch64_architecture_version;
>  #define EH_RETURN_STACKADJ_RTX   gen_rtx_REG (Pmode, R4_REGNUM)
>  #define EH_RETURN_HANDLER_RTX  aarch64_eh_return_handler_rtx ()
>  
> -/* Don't use __builtin_setjmp until we've defined it.  */
> +/* Don't use __builtin_setjmp until we've defined it.
> +   CAUTION: This macro is only used during exception unwinding.
> +   Don't fall for its name.  */
>  #undef DONT_USE_BUILTIN_SETJMP
>  #define DONT_USE_BUILTIN_SETJMP 1
>  
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index e1fb87f..e7ac0fe 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -12128,6 +12128,13 @@ aarch64_expand_builtin_va_start (tree valist, rtx 
> nextarg ATTRIBUTE_UNUSED)
>expand_expr (t, const0_rtx, VOIDmode, EXPAND_NORMAL);
>  }
>  
> +/* Implement TARGET_BUILTIN_SETJMP_FRAME_VALUE.  */
> +static rtx
> +aarch64_builtin_setjmp_frame_value (void)
> +{
> +  return hard_frame_pointer_rtx;
> +}
> +
>  /* Implement TARGET_GIMPLIFY_VA_ARG_EXPR.  */
>  
>  static tree
> @@ -17505,6 +17512,9 @@ aarch64_run_selftests (void)
>  #undef TARGET_FOLD_BUILTIN
>  #define TARGET_FOLD_BUILTIN aarch64_fold_builtin
>  
> +#undef TARGET_BUILTIN_SETJMP_FRAME_VALUE
> +#define TARGET_BUILTIN_SETJMP_FRAME_VALUE aarch64_builtin_setjmp_frame_value
> +
>  #undef TARGET_FUNCTION_ARG
>  #define TARGET_FUNCTION_ARG aarch64_function_arg
>  
> diff --git a/gcc/testsuite/gcc.c-torture/execute/pr84521.c 
> b/gcc/testsuite/gcc.c-torture/execute/pr84521.c
> new file mode 100644
> index 000..76b10d2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/execute/pr84521.c
> @@ -0,0 +1,49 @@
> +/* { dg-require-effective-target indirect_jumps } */
> +
> +#include 
> +
> +jmp_buf buf;
> +
> +int uses_longjmp (void)
> +{
> +  __builtin_longjmp (buf, 1);
> +}
> +
> +int gl;
> +void after_longjmp (void)
> +{
> +  gl = 5;
> +}
> +
> +int
> +test_1 (int n)
> +{
> +  volatile int *p = alloca (n);
> +  if (__builtin_setjmp (buf))
> +{
> +  after_longjmp ();
> +}
> +  else
> +{
> +  uses_longjmp ();
> +}
> +
> +  return 0;
> +}
> +
> +int __attribute__ ((optimize ("no-omit-frame-pointer")))
> +test_2 (int n)
> +{
> +  int i;
> +  int *ptr = (int *)__builtin_alloca (sizeof (int) * n);
> +  for (i = 0; i < n; i++)
> +ptr[i] = i;
> +  test_1 (n);
> +  return 0;
> +}
> +
> +int main (int argc, const char **argv)
> +{
> +  __builtin_memset (&buf, 0xaf, sizeof (buf));
> +  test_2 (100);
> +}

[PATCH] Fix PR84929

2018-03-19 Thread Richard Biener


This fixes PR84929.  analyze_siv_subscript_cst_affine chrec_convert()s
chrec_b but doesn't handle the case of that simply wrapping a NOP
around the CHREC.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2018-03-19  Richard Biener  

PR tree-optimization/84929
* tree-data-ref.c (analyze_siv_subscript_cst_affine): Guard
chrec_is_positive against non-chrec arg.

* gcc.dg/torture/pr84929.c: New testcase.

Index: gcc/tree-data-ref.c
===
--- gcc/tree-data-ref.c (revision 258641)
+++ gcc/tree-data-ref.c (working copy)
@@ -3015,7 +3017,8 @@ analyze_siv_subscript_cst_affine (tree c
 {
   if (value0 == false)
{
- if (!chrec_is_positive (CHREC_RIGHT (chrec_b), &value1))
+ if (TREE_CODE (chrec_b) != POLYNOMIAL_CHREC
+ || !chrec_is_positive (CHREC_RIGHT (chrec_b), &value1))
{
  if (dump_file && (dump_flags & TDF_DETAILS))
fprintf (dump_file, "siv test failed: chrec not positive.\n");
@@ -3096,7 +3099,8 @@ analyze_siv_subscript_cst_affine (tree c
}
   else
{
- if (!chrec_is_positive (CHREC_RIGHT (chrec_b), &value2))
+ if (TREE_CODE (chrec_b) != POLYNOMIAL_CHREC
+ || !chrec_is_positive (CHREC_RIGHT (chrec_b), &value2))
{
  if (dump_file && (dump_flags & TDF_DETAILS))
fprintf (dump_file, "siv test failed: chrec not positive.\n");
Index: gcc/testsuite/gcc.dg/torture/pr84929.c
===
--- gcc/testsuite/gcc.dg/torture/pr84929.c  (nonexistent)
+++ gcc/testsuite/gcc.dg/torture/pr84929.c  (working copy)
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target int32plus } */
+
+int a[4];
+void fn1()
+{
+  __UINT64_TYPE__ b = 7818038963515661296;
+  for (;; b++)
+a[0xA699ECD2C348A3A0] = a[b];
+}

Re: [PATCH ] PR 844422 Fix FCTID, FCTIW with -mcpu=power7

2018-03-19 Thread Bill Schmidt

On Mar 16, 2018, at 5:51 PM, Segher Boessenkool  
wrote:
> 
> Hi Carl,
> 
> On Wed, Mar 14, 2018 at 08:27:08AM -0700, Carl Love wrote:
>> The following patch fixes an ICE when compiling the test case
>> 
>>  gcc -mcpu=power7 builtin-fctid-fctiw-runnable.c
>> 
>> The GCC compiler now gives a message 
>> 
>>"error: builtin function ‘__builtin_fctiw’ requires the ‘-mpower8-vector’ 
>> option" 
>> 
>> and exits without generating an internal error.
> 
> This is an improvement over the ICE, for sure.
> 
> But fctiw is an ISA 1.xx instruction already (and fctid is as well,
> but explicitly undefined for 32-bit implementations until some 2.xx,
> I forgot which, 2.02 I think?)
> 
> Requiring power8 for it is weird and surprising.  power8-vector doubly so.
> 
> It does not seem to be a good idea to only enable the builtin for cases
> where the current implementation is not broken.
> 
> (The issue is that pre-power8 we do not allow SImode in FPR registers.
> Which makes the current fctiw implementation fail).
> 
> I think we have two good options:
> 
> 1) Remove these builtins;
> or
> 2) Make them work.
> 
I don't think we can remove them, as for a while they were the only way to
access these instructions.  We now have some vec_* intrinsics that are
the preferred interfaces, but for backward compatibility I think we have to
keep the old ones.

Thanks,
Bill

> 
> Segher
>

Re: [PATCH ] PR 844422 Fix FCTID, FCTIW with -mcpu=power7

2018-03-19 Thread Bill Schmidt


> On Mar 19, 2018, at 8:19 AM, Bill Schmidt  wrote:
> 
> On Mar 16, 2018, at 5:51 PM, Segher Boessenkool  
> wrote:
>> 
>> Hi Carl,
>> 
>> On Wed, Mar 14, 2018 at 08:27:08AM -0700, Carl Love wrote:
>>> The following patch fixes an ICE when compiling the test case
>>> 
>>> gcc -mcpu=power7 builtin-fctid-fctiw-runnable.c
>>> 
>>> The GCC compiler now gives a message 
>>> 
>>>   "error: builtin function ‘__builtin_fctiw’ requires the ‘-mpower8-vector’ 
>>> option" 
>>> 
>>> and exits without generating an internal error.
>> 
>> This is an improvement over the ICE, for sure.
>> 
>> But fctiw is an ISA 1.xx instruction already (and fctid is as well,
>> but explicitly undefined for 32-bit implementations until some 2.xx,
>> I forgot which, 2.02 I think?)
>> 
>> Requiring power8 for it is weird and surprising.  power8-vector doubly so.
>> 
>> It does not seem to be a good idea to only enable the builtin for cases
>> where the current implementation is not broken.
>> 
>> (The issue is that pre-power8 we do not allow SImode in FPR registers.
>> Which makes the current fctiw implementation fail).
>> 
>> I think we have two good options:
>> 
>> 1) Remove these builtins;
>> or
>> 2) Make them work.
>> 
> I don't think we can remove them, as for a while they were the only way to
> access these instructions.  We now have some vec_* intrinsics that are
> the preferred interfaces, but for backward compatibility I think we have to
> keep the old ones.

Sorry, never mind, I was thinking of vector forms.  Please ignore me.

Bill
> 
> Thanks,
> Bill
> 
>> 
>> Segher

Re: [PATCH ] PR 844422 Fix FCTID, FCTIW with -mcpu=power7

2018-03-19 Thread Segher Boessenkool

On Mon, Mar 19, 2018 at 08:19:18AM -0500, Bill Schmidt wrote:
> > Requiring power8 for it is weird and surprising.  power8-vector doubly so.
> > 
> > It does not seem to be a good idea to only enable the builtin for cases
> > where the current implementation is not broken.
> > 
> > (The issue is that pre-power8 we do not allow SImode in FPR registers.
> > Which makes the current fctiw implementation fail).
> > 
> > I think we have two good options:
> > 
> > 1) Remove these builtins;
> > or
> > 2) Make them work.
> > 
> I don't think we can remove them, as for a while they were the only way to
> access these instructions.  We now have some vec_* intrinsics that are
> the preferred interfaces, but for backward compatibility I think we have to
> keep the old ones.

It is new for GCC 8, added in r253238.  We can still remove it for GCC 8
if it needs more work; or we can fix it.  I'd rather not ship a broken
feature (that no one yet uses).


Segher

[PATCH] Fix PR84933

2018-03-19 Thread Richard Biener


The following fixes PR84933 without really tackling the underlying
issue of having out-of-bound ranges in the lattice (out-of-bound
with respect to TYPE_MIN/MAX_VALUE) which is not sth I'd like to
resolve at this point.  It simply avoids the bogus transform
in set_and_canonicalize_value_range that results in this case
(~[0, 5] -> [4, 1] with TYPE_MAX_VALUE == 1).  I realize there
are similar conditions throughout the code so a more conservative
"fix" would be to not use TYPE_MIN/MAX_VALUE at all but IIRC
that will regress some optimization cases - still that's sth
to consider for GCC9.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2018-03-19  Richard Biener  

PR tree-optimization/84933
* tree-vrp.c (set_and_canonicalize_value_range): Treat out-of-bound
values as -INF/INF when canonicalizing an ANTI_RANGE to a RANGE.

* g++.dg/pr84933.C: New testcase.

Index: gcc/tree-vrp.c
===
--- gcc/tree-vrp.c  (revision 258641)
+++ gcc/tree-vrp.c  (working copy)
@@ -386,8 +386,13 @@ set_and_canonicalize_value_range (value_
   /* Anti-ranges that can be represented as ranges should be so.  */
   if (t == VR_ANTI_RANGE)
 {
-  bool is_min = vrp_val_is_min (min);
-  bool is_max = vrp_val_is_max (max);
+  /* For -fstrict-enums we may receive out-of-range ranges so consider
+ values < -INF and values > INF as -INF/INF as well.  */
+  tree type = TREE_TYPE (min);
+  bool is_min = (INTEGRAL_TYPE_P (type)
+&& tree_int_cst_compare (min, TYPE_MIN_VALUE (type)) <= 0);
+  bool is_max = (INTEGRAL_TYPE_P (type)
+&& tree_int_cst_compare (max, TYPE_MAX_VALUE (type)) >= 0);
 
   if (is_min && is_max)
{
Index: gcc/testsuite/g++.dg/pr84933.C
===
--- gcc/testsuite/g++.dg/pr84933.C  (nonexistent)
+++ gcc/testsuite/g++.dg/pr84933.C  (working copy)
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fstrict-enums -fno-inline" } */
+
+enum a {};
+int *d;
+int b, e, f;
+a c, g;
+class h {
+virtual unsigned i();
+};
+class j : h {
+unsigned i() {
+   for (;;) {
+   b = c <= 0;
+   if (b)
+ e = *d;
+   b = g && c;
+   if (b)
+ f = *d;
+   }
+}
+};
+void k() { new j; }

[C++/84812] ICE with local fn decl

2018-03-19 Thread Nathan Sidwell

This bug happens when the machinery to check for an implicit extern "C" 
linkage on a local declaration, meets an ambiguous local lookup.  I was 
misled by a comment that implied TREE_LIST -> overload.  That was true 
way back, but now means ambiguous lookup.


I declined the opportunity to micro-optimize the test to 'TREE_CODE (x) 
== ERROR_MARK || TTEE_CODE (X) == TREE_LIST'.


nathan
--
Nathan Sidwell
2018-03-19  Nathan Sidwell  

	PR c++/84812
	* name-lookup.c (set_local_extern_decl_linkage): Defend against
	ambiguous lookups.

	PR c++/84812
	* g++.dg/lookup/pr84812.C: New.

Index: cp/name-lookup.c
===
--- cp/name-lookup.c	(revision 258642)
+++ cp/name-lookup.c	(working copy)
@@ -2878,7 +2878,9 @@ set_local_extern_decl_linkage (tree decl
 	= find_namespace_value (current_namespace, DECL_NAME (decl));
 	  loc_value = ns_value;
 	}
-  if (loc_value == error_mark_node)
+  if (loc_value == error_mark_node
+	  /* An ambiguous lookup.  */
+	  || (loc_value && TREE_CODE (loc_value) == TREE_LIST))
 	loc_value = NULL_TREE;
 
   for (ovl_iterator iter (loc_value); iter; ++iter)
@@ -2926,7 +2928,8 @@ set_local_extern_decl_linkage (tree decl
   if (ns_value == decl)
 	ns_value = find_namespace_value (current_namespace, DECL_NAME (decl));
 
-  if (ns_value == error_mark_node)
+  if (ns_value == error_mark_node
+	  || (ns_value && TREE_CODE (ns_value) == TREE_LIST))
 	ns_value = NULL_TREE;
 
   for (ovl_iterator iter (ns_value); iter; ++iter)
Index: testsuite/g++.dg/lookup/pr84812.C
===
--- testsuite/g++.dg/lookup/pr84812.C	(revision 0)
+++ testsuite/g++.dg/lookup/pr84812.C	(working copy)
@@ -0,0 +1,18 @@
+// PR 84812.  ICE determining implicit "C" linkage
+
+struct A { void foo(); };
+struct B { void foo(); };
+
+struct C : A, B
+{
+  void X ();
+};
+
+void C::X ()
+{
+  void foo (); // local decl of ::foo
+
+  foo ();
+}
+
+// { dg-final { scan-assembler "_Z3foov" } }

Re: [og7] Update nvptx_fork/join barrier placement

2018-03-19 Thread Tom de Vries


On 03/09/2018 05:55 PM, Cesar Philippidis wrote:

On 03/09/2018 08:21 AM, Tom de Vries wrote:

On 03/09/2018 12:31 AM, Cesar Philippidis wrote:

Nvidia Volta GPUs now support warp-level synchronization.


Well, let's try to make that statement a bit more precise.

All Nvidia architectures have supported synchronization of threads in a
warp on a very basic level: by means of convergence (and unfortunately,
we've seen that this is very error-prone).

What is new in ptx 6.0 combined with sm_70 is the ability to sync
divergent threads without having to converge, f.i. by using new
instructions bar.warp.sync and barrier.sync.


Yes. The major difference sm_70 GPU architectures and earlier GPUs is
that sm_70 allows the user to explicitly synchronize divergent warps. At
least on Maxwell and Pascal, the PTX SASS compiler uses two instructions
to branch, SYNC and BRA. I think, SYNC guarantees that a warp is
convergent at the SYNC point, whereas BRA makes no such guarantees.



If you want to understand the interplay of sync (or .s suffix), branch 
and ssy, please read 
https://people.engr.ncsu.edu/hzhou/ispass_15-poster.pdf .



What's worse, once a warp has become divergent on sm_60 and earlier
GPUs, there's no way to reliably reconverge them. So, to avoid that
problem, it critical that the PTX SASS compiler use SYNC instructions
when possible. Fortunately, bar.warp.sync resolves the divergent warp
problem on sm_70+.


As such, the
semantics of legacy bar.sync instructions have slightly changed on newer
GPUs.


Before in ptx 3.1, we have for bar.sync:
...
Barriers are executed on a per-warp basis as if all the threads in a
warp are active. Thus, if any thread in a warp executes a bar
instruction, it is as if all the threads in the warp have executed
the bar instruction. All threads in the warp are stalled until the
barrier completes, and the arrival count for the barrier is incremented
by the warp size (not the number of active threads in the warp). In
conditionally executed code, a bar instruction should only be used if it
is known that all threads evaluate the condition identically (the warp
does not diverge).
...

But in ptx 6.0, we have:
...
bar.sync is equivalent to barrier.sync.aligned
...
and:
...
Instruction barrier has optional .aligned modifier. When specified, it
indicates that all threads in CTA will execute the same barrier
instruction. In conditionally executed code, an aligned barrier
instruction should only be used if it is known that all threads in
CTA evaluate the condition identically, otherwise behavior is undefined.
...

So, in ptx 3.1 bar.sync should be executed in convergent mode (all the
threads in each warp executing the same). But in ptx 6.0, bar.sync
should be executed in the mode that the whole CTA is executing the same
code.

So going from the description of ptx, it seems indeed that the semantics
of bar.sync has changed. That is however surprising, since it would
break the forward compatibility that AFAIU is the idea behind ptx.

So for now my hope is that this is a documentation error.


I spent a lot of time debugging deadlocks with the vector length changes
and I have see no changes in the SASS code generated in the newer Nvidia
drivers when compared to the older ones, at lease with respect to the
barrier instructions. This isn't the first time I've seen
inconsistencies with thread synchronization in Nvidia's documentation.
For the longest time, the "CUDA Programming Guide" provided slightly
conflicting semantics for the __syncthreads() function, which ultimately
gets implemented as bar.sync in PTX.


The PTX JIT will now, occasionally, emit a warpsync instruction
immediately before a bar.sync for Volta GPUs. That implies that warps
must be convergent on entry to those threads barriers.



That warps must be convergent on entry to bar.sync is already required
by ptx 3.1.

[ And bar.warp.sync does not force convergence, so if the warpsync
instruction you mention is equivalent to bar.warp.sync then your
reasoning is incorrect. ]


I'm under the impression that bar.warp.sync converges all of the
non-exited threads in a warp.


I have not played around with the instruction yet, so I'm not sure, but 
what I read from the docs is that bar.warp.sync converges all of the 
non-exited threads in a warp only and only if it's positioned at a point 
post-dominating a divergent branch.


Consider this case:
...
if (tid.x == 0)
  {
A;
bar.warp.sync 32;
B;
  }
else
  {
C;
bar.warp.sync 32;
D;
  }
...
AFAIU, this allows bar.warp.sync to synchronize the threads in the warp, 
_without_ converging.




You'd still need to use bar.sync or some
variant of the new barrier instruction to converge the entire CTA. But
at the moment, we're still generating code that's backwards compatible
with sm_30.


The problem in og7, and trunk, is that GCC emits barrier instructions at
the wrong spots. E.g., consider the following OpenACC parallel region:

    #pragma acc parallel loop worker

Re: [PATCH PR81647][AARCH64] Fix handling of Unordered Comparisons in aarch64-simd.md

2018-03-19 Thread James Greenhalgh

On Fri, Dec 15, 2017 at 11:57:46AM +, Sudi Das wrote:
> Hi
> 
> This patch fixes the inconsistent behavior observed at -O3 for the 
> unordered comparisons. According to the online docs 
> (https://gcc.gnu.org/onlinedocs/gcc-7.2.0/gccint/Unary-and-Binary-Expressions.html),
>  
> all of the following should not raise an FP exception:
> - UNGE_EXPR
> - UNGT_EXPR
> - UNLE_EXPR
> - UNLT_EXPR
> - UNEQ_EXPR
> Also ORDERED_EXPR and UNORDERED_EXPR should only return zero or one.
> 
> The aarch64-simd.md handling of these were generating exception raising 
> instructions such as fcmgt. This patch changes the instructions that are 
> emitted to in order to not give out the exceptions. We first check each 
> operand for NaNs and force any elements containing NaN to zero before 
> using them in the compare.
> 
> Example: UN (a, b) -> UNORDERED (a, b) | (cm (isnan (a) ? 0.0 : 
> a, isnan (b) ? 0.0 : b))
> 
> 
> The ORDERED_EXPR is now handled as (cmeq (a, a) & cmeq (b, b)) and 
> UNORDERED_EXPR as ~ORDERED_EXPR and UNEQ as (~ORDERED_EXPR | cmeq (a,b)).
> 
> Testing done: Checked for regressions on bootstrapped 
> aarch64-none-linux-gnu and added a new test case.
> 
> Is this ok for trunk? This will probably need a back-port to 
> gcc-7-branch as well.

OK.

Let it soak on trunk for a while before the backport.

Thanks,
James

> ChangeLog Entries:
> 
> *** gcc/ChangeLog ***
> 
> 2017-12-15  Sudakshina Das  
> 
>   PR target/81647
>   * config/aarch64/aarch64-simd.md (vec_cmp): Modify 
> instructions for
>   UNLT, UNLE, UNGT, UNGE, UNEQ, UNORDERED and ORDERED.
> 
> *** gcc/testsuite/ChangeLog ***
> 
> 2017-12-15  Sudakshina Das  
> 
>   PR target/81647
>   * gcc.target/aarch64/pr81647.c: New.

> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index 
> f90f74fe7fd5990a97b9f4eb68f5735b7d4fb9aa..acff06c753b3e3aaa5775632929909afa4d3294b
>  100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -2731,10 +2731,10 @@
> break;
>   }
>/* Fall through.  */
> -case UNGE:
> +case UNLT:
>std::swap (operands[2], operands[3]);
>/* Fall through.  */
> -case UNLE:
> +case UNGT:
>  case GT:
>comparison = gen_aarch64_cmgt;
>break;
> @@ -2745,10 +2745,10 @@
> break;
>   }
>/* Fall through.  */
> -case UNGT:
> +case UNLE:
>std::swap (operands[2], operands[3]);
>/* Fall through.  */
> -case UNLT:
> +case UNGE:
>  case GE:
>comparison = gen_aarch64_cmge;
>break;
> @@ -2771,21 +2771,35 @@
>  case UNGT:
>  case UNLE:
>  case UNLT:
> -case NE:
> -  /* FCM returns false for lanes which are unordered, so if we use
> -  the inverse of the comparison we actually want to emit, then
> -  invert the result, we will end up with the correct result.
> -  Note that a NE NaN and NaN NE b are true for all a, b.
> -
> -  Our transformations are:
> -  a UNGE b -> !(b GT a)
> -  a UNGT b -> !(b GE a)
> -  a UNLE b -> !(a GT b)
> -  a UNLT b -> !(a GE b)
> -  a   NE b -> !(a EQ b)  */
> -  gcc_assert (comparison != NULL);
> -  emit_insn (comparison (operands[0], operands[2], operands[3]));
> -  emit_insn (gen_one_cmpl2 (operands[0], operands[0]));
> +  {
> + /* All of the above must not raise any FP exceptions.  Thus we first
> +check each operand for NaNs and force any elements containing NaN to
> +zero before using them in the compare.
> +Example: UN (a, b) -> UNORDERED (a, b) |
> +  (cm (isnan (a) ? 0.0 : a,
> +   isnan (b) ? 0.0 : b))
> +We use the following transformations for doing the comparisions:
> +a UNGE b -> a GE b
> +a UNGT b -> a GT b
> +a UNLE b -> b GE a
> +a UNLT b -> b GT a.  */
> +
> + rtx tmp0 = gen_reg_rtx (mode);
> + rtx tmp1 = gen_reg_rtx (mode);
> + rtx tmp2 = gen_reg_rtx (mode);
> + emit_insn (gen_aarch64_cmeq (tmp0, operands[2], operands[2]));
> + emit_insn (gen_aarch64_cmeq (tmp1, operands[3], operands[3]));
> + emit_insn (gen_and3 (tmp2, tmp0, tmp1));
> + emit_insn (gen_and3 (tmp0, tmp0,
> + lowpart_subreg (mode, operands[2], 
> mode)));
> + emit_insn (gen_and3 (tmp1, tmp1,
> + lowpart_subreg (mode, operands[3], 
> mode)));
> + gcc_assert (comparison != NULL);
> + emit_insn (comparison (operands[0],
> +lowpart_subreg (mode, tmp0, 
> mode),
> +lowpart_subreg (mode, tmp1, 
> mode)));
> + emit_insn (gen_orn3 (operands[0], tmp2, operands[0]));
> +  }
>break;
>  
>  case LT:
> @@ -2793,25 +2807,19 @@
>  case GT:
>  case GE:
>  case EQ:
> +case NE:
>/* The easy case.  Here we emit one of FCMGE, FCMGT or FCMEQ

New Swedish PO file for 'gcc' (version 8.1-b20180128)

2018-03-19 Thread Translation Project Robot

Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the Swedish team of translators.  The file is available at:

http://translationproject.org/latest/gcc/sv.po

(This file, 'gcc-8.1-b20180128.sv.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

http://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

http://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.

Re: Use SCEV information when aligning for vectorisation (PR 84005)

2018-03-19 Thread Richard Biener

On Sat, Mar 17, 2018 at 11:45 AM, Richard Sandiford
 wrote:
> This PR is another regression caused by the removal of the simple_iv
> check in dr_analyze_innermost for BB analysis.  Without splitting out
> the step, we weren't able to find an underlying object whose alignment
> could be increased.
>
> As with PR81635, I think the simple_iv was only handling one special
> case of something that ought to be more general.  The more general
> thing here is that if the address can be analysed as a scalar
> evolution, and if all updates preserve alignment N, it's possible
> to align the address to N by increasing the alignment of the base
> object to N.  That applies also to outer loops, and to both loop
> and BB analysis.
>
> I wasn't sure where the new functions ought to live, but tree-data-ref.c
> seemed OK since (a) that already does scev analysis on addresses and
> (b) you'd want to use dr_analyze_innermost first if you were analysing
> a reference.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
>
> Richard
>
>
> 2018-03-17  Richard Sandiford  
>
> gcc/
> PR tree-optimization/84005
> * tree-data-ref.h (get_base_for_alignment): Declare.
> * tree-data-ref.c (get_base_for_alignment_1): New function.
> (get_base_for_alignment): Likewise.
> * tree-vect-data-refs.c (vect_compute_data_ref_alignment): Use
> get_base_for_alignment to find a suitable base object, instead
> of always using drb->base_address.
>
> gcc/testsuite/
> PR tree-optimization/84005
> * gcc.dg/vect/bb-slp-1.c: Make sure there is no message about
> failing to force the alignment.
>
> Index: gcc/tree-data-ref.h
> ===
> --- gcc/tree-data-ref.h 2018-01-13 18:02:00.948360274 +
> +++ gcc/tree-data-ref.h 2018-03-17 10:43:42.544669549 +
> @@ -463,6 +463,7 @@ extern bool compute_all_dependences (vec
>  extern tree find_data_references_in_bb (struct loop *, basic_block,
>  vec *);
>  extern unsigned int dr_alignment (innermost_loop_behavior *);
> +extern tree get_base_for_alignment (tree, unsigned int *);
>
>  /* Return the alignment in bytes that DR is guaranteed to have at all
> times.  */
> Index: gcc/tree-data-ref.c
> ===
> --- gcc/tree-data-ref.c 2018-02-14 13:14:36.268006810 +
> +++ gcc/tree-data-ref.c 2018-03-17 10:43:42.544669549 +
> @@ -5200,6 +5200,75 @@ dr_alignment (innermost_loop_behavior *d
>return alignment;
>  }
>
> +/* If BASE is a pointer-typed SSA name, try to find the object that it
> +   is based on.  Return this object X on success and store the alignment
> +   in bytes of BASE - &X in *ALIGNMENT_OUT.  */
> +
> +static tree
> +get_base_for_alignment_1 (tree base, unsigned int *alignment_out)
> +{
> +  if (TREE_CODE (base) != SSA_NAME || !POINTER_TYPE_P (TREE_TYPE (base)))
> +return NULL_TREE;
> +
> +  gimple *def = SSA_NAME_DEF_STMT (base);
> +  base = analyze_scalar_evolution (loop_containing_stmt (def), base);

I think it is an error to use the def stmt of base here.  Instead you
need to pass down
the point/loop of the use.  For the same reason you also want to instantiate
parameters after analyzing the evolution.

In the end, if the BB to be vectorized is contained in a loop nest we want to
instantiate up to the point where (eventually) a DECL we can re-align appears,
but using the containing loop for now looks OK.

> +  /* Peel chrecs and record the minimum alignment preserved by
> + all steps.  */
> +  unsigned int alignment = MAX_OFILE_ALIGNMENT / BITS_PER_UNIT;
> +  while (TREE_CODE (base) == POLYNOMIAL_CHREC)
> +{
> +  unsigned int step_alignment = highest_pow2_factor (CHREC_RIGHT (base));
> +  alignment = MIN (alignment, step_alignment);
> +  base = CHREC_LEFT (base);
> +}
> +
> +  /* Punt if the expression is too complicated to handle.  */
> +  if (tree_contains_chrecs (base, NULL) || !POINTER_TYPE_P (TREE_TYPE 
> (base)))
> +return NULL_TREE;
> +
> +  /* Analyze the base to which the steps we peeled were applied.  */
> +  poly_int64 bitsize, bitpos, bytepos;
> +  machine_mode mode;
> +  int unsignedp, reversep, volatilep;
> +  tree offset;
> +  base = get_inner_reference (build_fold_indirect_ref (base),

ick :/

what happens if you simply use get_pointer_alignment here and
strip down POINTER_PLUS_EXPRs to the ultimate LHS?  (basically
what get_pointer_alignment_1 does)  After all get_base_for_alignment_1
itself only deals with plain SSA_NAMEs, not, say, &a_1->b.c or &MEM[a_1 + 4].

Thanks,
Richard.

> + &bitsize, &bitpos, &offset, &mode,
> + &unsignedp, &reversep, &volatilep);
> +  if (!base || !multiple_p (bitpos, BITS_PER_UNIT, &bytepos))
> +return NULL_TREE;
> +
> +  /* Restrict the alignment to that guaranteed by the offsets.  */
> +  unsig

Re: [og7] Update nvptx_fork/join barrier placement

2018-03-19 Thread Cesar Philippidis

On 03/19/2018 07:04 AM, Tom de Vries wrote:
> On 03/09/2018 05:55 PM, Cesar Philippidis wrote:
>> On 03/09/2018 08:21 AM, Tom de Vries wrote:
>>> On 03/09/2018 12:31 AM, Cesar Philippidis wrote:
 Nvidia Volta GPUs now support warp-level synchronization.
>>>
>>> Well, let's try to make that statement a bit more precise.
>>>
>>> All Nvidia architectures have supported synchronization of threads in a
>>> warp on a very basic level: by means of convergence (and unfortunately,
>>> we've seen that this is very error-prone).
>>>
>>> What is new in ptx 6.0 combined with sm_70 is the ability to sync
>>> divergent threads without having to converge, f.i. by using new
>>> instructions bar.warp.sync and barrier.sync.
>>
>> Yes. The major difference sm_70 GPU architectures and earlier GPUs is
>> that sm_70 allows the user to explicitly synchronize divergent warps. At
>> least on Maxwell and Pascal, the PTX SASS compiler uses two instructions
>> to branch, SYNC and BRA. I think, SYNC guarantees that a warp is
>> convergent at the SYNC point, whereas BRA makes no such guarantees.
>>
> 
> If you want to understand the interplay of sync (or .s suffix), branch
> and ssy, please read
> https://people.engr.ncsu.edu/hzhou/ispass_15-poster.pdf .

Interesting, thanks!

>> What's worse, once a warp has become divergent on sm_60 and earlier
>> GPUs, there's no way to reliably reconverge them. So, to avoid that
>> problem, it critical that the PTX SASS compiler use SYNC instructions
>> when possible. Fortunately, bar.warp.sync resolves the divergent warp
>> problem on sm_70+.
>>
 As such, the
 semantics of legacy bar.sync instructions have slightly changed on
 newer
 GPUs.
>>>
>>> Before in ptx 3.1, we have for bar.sync:
>>> ...
>>> Barriers are executed on a per-warp basis as if all the threads in a
>>> warp are active. Thus, if any thread in a warp executes a bar
>>> instruction, it is as if all the threads in the warp have executed
>>> the bar instruction. All threads in the warp are stalled until the
>>> barrier completes, and the arrival count for the barrier is incremented
>>> by the warp size (not the number of active threads in the warp). In
>>> conditionally executed code, a bar instruction should only be used if it
>>> is known that all threads evaluate the condition identically (the warp
>>> does not diverge).
>>> ...
>>>
>>> But in ptx 6.0, we have:
>>> ...
>>> bar.sync is equivalent to barrier.sync.aligned
>>> ...
>>> and:
>>> ...
>>> Instruction barrier has optional .aligned modifier. When specified, it
>>> indicates that all threads in CTA will execute the same barrier
>>> instruction. In conditionally executed code, an aligned barrier
>>> instruction should only be used if it is known that all threads in
>>> CTA evaluate the condition identically, otherwise behavior is undefined.
>>> ...
>>>
>>> So, in ptx 3.1 bar.sync should be executed in convergent mode (all the
>>> threads in each warp executing the same). But in ptx 6.0, bar.sync
>>> should be executed in the mode that the whole CTA is executing the same
>>> code.
>>>
>>> So going from the description of ptx, it seems indeed that the semantics
>>> of bar.sync has changed. That is however surprising, since it would
>>> break the forward compatibility that AFAIU is the idea behind ptx.
>>>
>>> So for now my hope is that this is a documentation error.
>>
>> I spent a lot of time debugging deadlocks with the vector length changes
>> and I have see no changes in the SASS code generated in the newer Nvidia
>> drivers when compared to the older ones, at lease with respect to the
>> barrier instructions. This isn't the first time I've seen
>> inconsistencies with thread synchronization in Nvidia's documentation.
>> For the longest time, the "CUDA Programming Guide" provided slightly
>> conflicting semantics for the __syncthreads() function, which ultimately
>> gets implemented as bar.sync in PTX.
>>
 The PTX JIT will now, occasionally, emit a warpsync instruction
 immediately before a bar.sync for Volta GPUs. That implies that warps
 must be convergent on entry to those threads barriers.

>>>
>>> That warps must be convergent on entry to bar.sync is already required
>>> by ptx 3.1.
>>>
>>> [ And bar.warp.sync does not force convergence, so if the warpsync
>>> instruction you mention is equivalent to bar.warp.sync then your
>>> reasoning is incorrect. ]
>>
>> I'm under the impression that bar.warp.sync converges all of the
>> non-exited threads in a warp.
> 
> I have not played around with the instruction yet, so I'm not sure, but
> what I read from the docs is that bar.warp.sync converges all of the
> non-exited threads in a warp only and only if it's positioned at a point
> post-dominating a divergent branch.
> 
> Consider this case:
> ...
> if (tid.x == 0)
>   {
>     A;
>     bar.warp.sync 32;
>     B;
>   }
> else
>   {
>     C;
>     bar.warp.sync 32;
>     D;
>   }
> ...
> AFAIU, this allows bar.warp.sync to synchronize the thr

Re: [og7] Update nvptx_fork/join barrier placement

2018-03-19 Thread Tom de Vries


On 03/19/2018 03:55 PM, Cesar Philippidis wrote:

Is your patch purely for debugging, or are you planning on committing it
to og7 and trunk?


I plan to commit it.

We have no test-cases testing the neutering code order explicitly. So 
this check is the only thing that allows us to detect regressions, other 
than execution failures on newer archs.


Thanks,
- Tom

[PATCH][PR sanitizer/78651] Incorrect exception handling when catch clause uses local class and PIC and sanitizer are active

2018-03-19 Thread Maxim Ostapenko


Hi,


as noted in bugzilla, ASan inserts redzones for  `.LDFCM*' variables and 
breaks internal ABI between GCC and libstdc++ because libstdc++ tries to 
obtain a pointer to `typeinfo for (anonymous namespace)::SomeRandomType' 
from a constant offset from `.LDFCM*' labels and hits these redzones. 
This can be trivially fixed by not sanitizing `.LDFCM*' variables (and 
other debug variables) at all.


Attached patch is tested/bootstrapped on x86_64-unknown-linux-gnu and 
ppc64le-redhat-linux targets.


Is it OK for trunk now, or should I wait for stage 1?


-Maxim

gcc/

	PR sanitizer/78651
	* dwarf2asm.c (dw2_output_indirect_constant_1): Call
	assemble_variable with new parameter.
	* output.h (assemble_variable): Add new parameter.
	* varasm.c (assemble_variable): Add new parameter. Do not
	protect variable with ASan if is_debug_var is true.

gcc/testsuite/ChangeLog:

	PR sanitizer/78651
	* g++.dg/asan/pr78651.C: New test.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index b5b5559..a26f05e 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,12 @@
+2018-03-19  Maxim Ostapenko  
+
+	PR sanitizer/78651
+	* dwarf2asm.c (dw2_output_indirect_constant_1): Call assemble_variable
+	with new parameter.
+	* output.h (assemble_variable): Add new parameter.
+	* varasm.c (assemble_variable): Add new parameter. Do not protect
+	variable with ASan if is_debug_var is true.
+
 2018-03-19  Richard Biener  
 
 	PR tree-optimization/84929
diff --git a/gcc/dwarf2asm.c b/gcc/dwarf2asm.c
index e9b18b8..5c90994 100644
--- a/gcc/dwarf2asm.c
+++ b/gcc/dwarf2asm.c
@@ -967,7 +967,7 @@ dw2_output_indirect_constant_1 (const char *sym, tree id)
 }
 
   sym_ref = gen_rtx_SYMBOL_REF (Pmode, sym);
-  assemble_variable (decl, 1, 1, 1);
+  assemble_variable (decl, 1, 1, 1, 1);
   assemble_integer (sym_ref, POINTER_SIZE_UNITS, POINTER_SIZE, 1);
 
   return 0;
diff --git a/gcc/output.h b/gcc/output.h
index f708cc7..678c370 100644
--- a/gcc/output.h
+++ b/gcc/output.h
@@ -200,7 +200,7 @@ extern void assemble_end_function (tree, const char *);
to define things that have had only tentative definitions.
DONT_OUTPUT_DATA if nonzero means don't actually output the
initial value (that will be done by the caller).  */
-extern void assemble_variable (tree, int, int, int);
+extern void assemble_variable (tree, int, int, int, int is_debug_var = 0);
 
 /* Put the vtable verification constructor initialization function
into the preinit array.  */
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 868d8e8..6ff9217 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,8 @@
+2018-03-19  Maxim Ostapenko  
+
+	PR sanitizer/78651
+	* g++.dg/asan/pr78651.C: New test.
+
 2018-03-19  Richard Biener  
 
 	PR tree-optimization/84929
diff --git a/gcc/testsuite/g++.dg/asan/pr78651.C b/gcc/testsuite/g++.dg/asan/pr78651.C
new file mode 100644
index 000..3f14be7
--- /dev/null
+++ b/gcc/testsuite/g++.dg/asan/pr78651.C
@@ -0,0 +1,24 @@
+// { dg-do run { target fpic } }
+
+struct A { };
+
+namespace {
+
+void thisThrows () {
+  throw A();
+}
+
+struct SomeRandomType {};
+}
+
+int main() {
+  try {
+thisThrows();
+  }
+  catch (SomeRandomType) {
+throw;
+  }
+  catch (A) {
+  }
+  return 0;
+}
diff --git a/gcc/varasm.c b/gcc/varasm.c
index 2b5c70c..b18d011 100644
--- a/gcc/varasm.c
+++ b/gcc/varasm.c
@@ -2158,11 +2158,14 @@ assemble_undefined_decl (tree decl)
AT_END is nonzero if this is the special handling, at end of compilation,
to define things that have had only tentative definitions.
DONT_OUTPUT_DATA if nonzero means don't actually output the
-   initial value (that will be done by the caller).  */
+   initial value (that will be done by the caller).
+   IS_DEBUG_VAR if nonzero that we are assembling debug variable.
+   This information is useful for ASan, see pr78651.  */
 
 void
 assemble_variable (tree decl, int top_level ATTRIBUTE_UNUSED,
-		   int at_end ATTRIBUTE_UNUSED, int dont_output_data)
+		   int at_end ATTRIBUTE_UNUSED, int dont_output_data,
+		   int is_debug_var)
 {
   const char *name;
   rtx decl_rtl, symbol;
@@ -2258,6 +2261,7 @@ assemble_variable (tree decl, int top_level ATTRIBUTE_UNUSED,
   align_variable (decl, dont_output_data);
 
   if ((flag_sanitize & SANITIZE_ADDRESS)
+  && !is_debug_var
   && asan_protect_global (decl))
 {
   asan_protected = true;

Re: [PATCH][PR sanitizer/78651] Incorrect exception handling when catch clause uses local class and PIC and sanitizer are active

2018-03-19 Thread Maxim Ostapenko


On 03/19/2018 06:55 PM, Jakub Jelinek wrote:

On Mon, Mar 19, 2018 at 06:44:39PM +0300, Maxim Ostapenko wrote:

as noted in bugzilla, ASan inserts redzones for  `.LDFCM*' variables and
breaks internal ABI between GCC and libstdc++ because libstdc++ tries to
obtain a pointer to `typeinfo for (anonymous namespace)::SomeRandomType'
from a constant offset from `.LDFCM*' labels and hits these redzones. This
can be trivially fixed by not sanitizing `.LDFCM*' variables (and other
debug variables) at all.

I don't like very much adding an extra argument for such so frequently used
function to handle a corner case.
Wouldn't just:
   /* Don't instrument this decl with -fsanitize=*address.  */
   unsigned int save_flag_sanitize = flag_sanitize;
   flag_sanitize &= ~(SANITIZE_ADDRESS | SANITIZE_USER_ADDRESS
 | SANITIZE_KERNEL_ADDRESS);
   assemble_variable (decl, 1, 1, 1);
   flag_sanitize = save_flag_sanitize;
DTRT?




Yes, it works, attaching the patch.

-Maxim
diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index b5b5559..356e68c 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,9 @@
+2018-03-19  Maxim Ostapenko  
+
+	PR sanitizer/78651
+	* dwarf2asm.c (dw2_output_indirect_constant_1): Disable ASan before
+	calling assemble_variable.
+
 2018-03-19  Richard Biener  
 
 	PR tree-optimization/84929
diff --git a/gcc/dwarf2asm.c b/gcc/dwarf2asm.c
index e9b18b8..065406b 100644
--- a/gcc/dwarf2asm.c
+++ b/gcc/dwarf2asm.c
@@ -967,7 +967,11 @@ dw2_output_indirect_constant_1 (const char *sym, tree id)
 }
 
   sym_ref = gen_rtx_SYMBOL_REF (Pmode, sym);
+  unsigned int save_flag_sanitize = flag_sanitize;
+  flag_sanitize &= ~(SANITIZE_ADDRESS | SANITIZE_USER_ADDRESS
+		 | SANITIZE_KERNEL_ADDRESS);
   assemble_variable (decl, 1, 1, 1);
+  flag_sanitize = save_flag_sanitize;
   assemble_integer (sym_ref, POINTER_SIZE_UNITS, POINTER_SIZE, 1);
 
   return 0;
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 868d8e8..6ff9217 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,8 @@
+2018-03-19  Maxim Ostapenko  
+
+	PR sanitizer/78651
+	* g++.dg/asan/pr78651.C: New test.
+
 2018-03-19  Richard Biener  
 
 	PR tree-optimization/84929
diff --git a/gcc/testsuite/g++.dg/asan/pr78651.C b/gcc/testsuite/g++.dg/asan/pr78651.C
new file mode 100644
index 000..3f14be7
--- /dev/null
+++ b/gcc/testsuite/g++.dg/asan/pr78651.C
@@ -0,0 +1,24 @@
+// { dg-do run { target fpic } }
+
+struct A { };
+
+namespace {
+
+void thisThrows () {
+  throw A();
+}
+
+struct SomeRandomType {};
+}
+
+int main() {
+  try {
+thisThrows();
+  }
+  catch (SomeRandomType) {
+throw;
+  }
+  catch (A) {
+  }
+  return 0;
+}

Re: [PATCH][PR sanitizer/78651] Incorrect exception handling when catch clause uses local class and PIC and sanitizer are active

2018-03-19 Thread Jakub Jelinek

On Mon, Mar 19, 2018 at 06:44:39PM +0300, Maxim Ostapenko wrote:
> as noted in bugzilla, ASan inserts redzones for  `.LDFCM*' variables and
> breaks internal ABI between GCC and libstdc++ because libstdc++ tries to
> obtain a pointer to `typeinfo for (anonymous namespace)::SomeRandomType'
> from a constant offset from `.LDFCM*' labels and hits these redzones. This
> can be trivially fixed by not sanitizing `.LDFCM*' variables (and other
> debug variables) at all.

I don't like very much adding an extra argument for such so frequently used
function to handle a corner case.
Wouldn't just:
  /* Don't instrument this decl with -fsanitize=*address.  */
  unsigned int save_flag_sanitize = flag_sanitize;
  flag_sanitize &= ~(SANITIZE_ADDRESS | SANITIZE_USER_ADDRESS
 | SANITIZE_KERNEL_ADDRESS);
  assemble_variable (decl, 1, 1, 1);
  flag_sanitize = save_flag_sanitize;
DTRT?

Jakub

Re: [PATCH][PR sanitizer/78651] Incorrect exception handling when catch clause uses local class and PIC and sanitizer are active

2018-03-19 Thread Maxim Ostapenko


On 03/19/2018 07:35 PM, Jakub Jelinek wrote:

On Mon, Mar 19, 2018 at 07:28:11PM +0300, Maxim Ostapenko wrote:

Yes, it works, attaching the patch.

Can you please add a short comment why we do this?  Also, the testcase
needs work, see below.  Ok for trunk with those changes and after a
while for affected release branches as well.


Ok, thanks, attaching the patch I'm going to apply.

-Maxim
diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index b5b5559..356e68c 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,9 @@
+2018-03-19  Maxim Ostapenko  
+
+	PR sanitizer/78651
+	* dwarf2asm.c (dw2_output_indirect_constant_1): Disable ASan before
+	calling assemble_variable.
+
 2018-03-19  Richard Biener  
 
 	PR tree-optimization/84929
diff --git a/gcc/dwarf2asm.c b/gcc/dwarf2asm.c
index e9b18b8..2e108ac 100644
--- a/gcc/dwarf2asm.c
+++ b/gcc/dwarf2asm.c
@@ -967,7 +967,13 @@ dw2_output_indirect_constant_1 (const char *sym, tree id)
 }
 
   sym_ref = gen_rtx_SYMBOL_REF (Pmode, sym);
+  /* Disable ASan for decl because redzones cause ABI breakage between GCC and
+ libstdc++ for `.LDFCM*' variables.  See PR 78651 for details.  */
+  unsigned int save_flag_sanitize = flag_sanitize;
+  flag_sanitize &= ~(SANITIZE_ADDRESS | SANITIZE_USER_ADDRESS
+		 | SANITIZE_KERNEL_ADDRESS);
   assemble_variable (decl, 1, 1, 1);
+  flag_sanitize = save_flag_sanitize;
   assemble_integer (sym_ref, POINTER_SIZE_UNITS, POINTER_SIZE, 1);
 
   return 0;
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 868d8e8..6ff9217 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,8 @@
+2018-03-19  Maxim Ostapenko  
+
+	PR sanitizer/78651
+	* g++.dg/asan/pr78651.C: New test.
+
 2018-03-19  Richard Biener  
 
 	PR tree-optimization/84929
diff --git a/gcc/testsuite/g++.dg/asan/pr78651.C b/gcc/testsuite/g++.dg/asan/pr78651.C
new file mode 100644
index 000..09f1be5
--- /dev/null
+++ b/gcc/testsuite/g++.dg/asan/pr78651.C
@@ -0,0 +1,26 @@
+// PR sanitizer/78651
+// { dg-do run }
+// { dg-additional-options "-fpic" { target fpic } }
+
+struct A { };
+
+namespace {
+
+void thisThrows () {
+  throw A();
+}
+
+struct SomeRandomType {};
+}
+
+int main() {
+  try {
+thisThrows();
+  }
+  catch (SomeRandomType) {
+throw;
+  }
+  catch (A) {
+  }
+  return 0;
+}

Re: [og7] Update nvptx_fork/join barrier placement

2018-03-19 Thread Tom de Vries


On 03/19/2018 03:55 PM, Cesar Philippidis wrote:

Note that this changes ordering of the vector-neutering jump and
worker-neutering jump at the end. In principle, this should not be
harmful, but it violates the invariant that vector-neutering
branch-around code should be as short-lived as possible. So, this needs
to be fixed.

I've found this issue by adding verification of the neutering, as
attached below.

ACK, thanks. I'll take a closer look at this.


I've got a tentative patch at 
https://gcc.gnu.org/bugzilla/attachment.cgi?id=43707 ( PR84952 - 
"[nvptx] bar.sync generated in divergent code" ).


Thanks,
- Tom

Re: [GCC 6] PATCH: Backport -mindirect-branch= patches

2018-03-19 Thread H.J. Lu

On Mon, Mar 19, 2018 at 10:00 AM, H.J. Lu  wrote:
> On Thu, Mar 15, 2018 at 1:00 PM, H.J. Lu  wrote:
>> On Thu, Mar 15, 2018 at 12:54 PM, Jan Hubicka  wrote:
 On Thu, Mar 15, 2018 at 10:47 AM, Jan Hubicka  wrote:
 >> On Thu, Mar 15, 2018 at 9:03 AM, Jan Hubicka  wrote:
 >> >> > What is the reason for using different names for return and 
 >> >> > indirect thunks at first place?
 >> >> >
 >> >>
 >> >> These 2 thunks are identical.  But one may want to provide an
 >> >> alternate thunk only for
 >> >> indirect branch and leaves return thunk alone.  You can't do that if
 >> >> both have the same
 >> >> name.
 >> >
 >> > Hmm, OK, what is the benefit to have two different thunks? It is just
 >> > safety precaution so we could adjust one without adjusting the other 
 >> > in
 >> > future?
 >> >
 >>
 >> That is correct.
 >
 > Hmm, I guess the patch is OK. Things are slightly more flexible this way 
 > and
 > duplicating thunk is not terribly expensive. One can always link with
 > non-comdat+ alias.
 >

 That is true.  OK to backport to GCC 7 after a few days?
>>> OK.  I suppose you are testing return thunks on some real environment, like 
>>> GCC bootstrap :)
>>
>> Yes.
>>
>
> I checked my backport into GCC 7 branch.
>
> Here are GCC 6 patches to backport all -mindirect-branch= patches.
> OK for GCC 6.
>

Resend the patches compressed with xz.  Sorry if you get the patches twice.


-- 
H.J.


0013-i386-Don-t-generate-alias-for-function-return-thunk.patch.xz
Description: application/xz


0001-i386-Move-struct-ix86_frame-to-machine_function.patch.xz
Description: application/xz


0012-i386-Add-TARGET_INDIRECT_BRANCH_REGISTER.patch.xz
Description: application/xz


0011-i386-Update-mfunction-return-for-return-with-pop.patch.xz
Description: application/xz


0010-i386-Pass-INVALID_REGNUM-as-invalid-register-number.patch.xz
Description: application/xz


0009-Use-INVALID_REGNUM-in-indirect-thunk-processing.patch.xz
Description: application/xz


0008-x86-Disallow-mindirect-branch-mfunction-return-with-.patch.xz
Description: application/xz


0007-x86-Add-V-register-operand-modifier.patch.xz
Description: application/xz


0006-x86-Add-mindirect-branch-register.patch.xz
Description: application/xz


0005-x86-Add-mfunction-return.patch.xz
Description: application/xz


0004-x86-Add-mindirect-branch.patch.xz
Description: application/xz


0003-i386-Use-const-reference-of-struct-ix86_frame-to-avo.patch.xz
Description: application/xz


0002-i386-Use-reference-of-struct-ix86_frame-to-avoid-cop.patch.xz
Description: application/xz

[PR middle-end/70359] uncoalesce IVs outside of loops

2018-03-19 Thread Aldy Hernandez


Hi Richard.

As discussed in the PR, the problem here is that we have two different 
iterations of an IV live outside of a loop.  This inhibits us from using 
autoinc/dec addressing on ARM, and causes extra lea's on x86.


An abbreviated example is this:

loop:
  # p_9 = PHI 
  p_20 = p_9 + 18446744073709551615;
goto loop
  p_24 = p_9 + 18446744073709551614;
  MEM[(char *)p_20 + -1B] = 45;

Here we have both the previous IV (p_9) and the current IV (p_20) used 
outside of the loop.  On Arm this keeps us from using auto-dec 
addressing, because one use is -2 and the other one is -1.


With the attached patch we attempt to rewrite out-of-loop uses of the IV 
in terms of the current/last IV (p_20 in the case above).  With it, we 
end up with:


  p_24 = p_20 + 18446744073709551615;
  *p_24 = 45;

...which helps both x86 and Arm.

As you have suggested in comment 38 on the PR, I handle specially 
out-of-loop IV uses of the form IV+CST and propagate those accordingly 
(along with the MEM_REF above).  Otherwise, in less specific cases, we 
un-do the IV increment, and use that value in all out-of-loop uses.  For 
instance, in the attached testcase, we rewrite:


  george (p_9);

into

  _26 = p_20 + 1;
  ...
  george (_26);

The attached testcase tests the IV+CST specific case, as well as the 
more generic case with george().


Although the original PR was for ARM, this behavior can be noticed on 
x86, so I tested on x86 with a full bootstrap + tests.  I also ran the 
specific test on an x86 cross ARM build and made sure we had 2 auto-dec 
with the test.  For the original test (slightly different than the 
testcase in this patch), with this patch we are at 104 bytes versus 116 
without it.  There is still the issue of a division optimization which 
would further reduce the code size.  I will discuss this separately as 
it is independent from this patch.


Oh yeah, we could make this more generic, and maybe handle any multiple 
of the constant, or perhaps *= and /=.  Perhaps something for next stage1...


OK for trunk?
Aldy
gcc/

	PR middle-end/70359
	* tree-outof-ssa.c (uncoalesce_ivs_outside_loop): New.
	(is_iv_plus_constant): New.
	(maybe_optimize_iv_plus_constant): New.
	(undo_iv_increment): New.

diff --git a/gcc/testsuite/gcc.dg/pr70359.c b/gcc/testsuite/gcc.dg/pr70359.c
new file mode 100644
index 000..b903548e45d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr70359.c
@@ -0,0 +1,44 @@
+/* { dg-do compile } */
+/* { dg-options "-Os -fdump-rtl-expand-details" } */
+
+/* Test uncoalesce_ivs_outside_loop().  */
+
+void george (const char *);
+
+char* inttostr(int i, char* buf, int len)
+{
+  unsigned int ui = (i > 0) ? i : -i;
+  char *p = buf + len - 1;
+  *p = '\0';
+  do {
+*--p = '0' + (ui % 10);
+  } while ((ui /= 10) != 0);
+  if (i < 0) {
+/* p+1 is the IV in the loop.  This line should trigger un-doing
+   the increment and cause this: */
+/* { dg-final { scan-rtl-dump-times "Adjusted one out of loop IV" 1 "expand" } } */
+george (p+1);
+
+*--p = '-';
+  }
+  return p;
+}
+
+/* At the last gimple dump we have:
+
+ p_22 = p_8 + 4294967294;
+ MEM[(char *)p_19 + 4294967295B] = 45;
+
+   The above should be converted by RTL expansion time into:
+
+ p_22 = p_19 + 4294967295;
+ *p_22 = 45;
+*/
+
+/* { dg-final { scan-rtl-dump "p_\[0-9\]+ = p_\[0-9\]+ \\+ \[0-9\]+;\[\n\r \]+\\*p_\[0-9\]+ = 45;" "expand" } } */
+
+/* On ARM we should get two post-increment stores.  */
+/* { dg-final { scan-assembler-times "str\[^\n\r]+!" 2 { target arm-*-* } } }  */
+
+/* On x86 we should not get more than two lea's.  */
+/* { dg-final { scan-assembler-times "lea.\t" 2 { target i?86-*-* x86_64-*-* } } } */
diff --git a/gcc/tree-outof-ssa.c b/gcc/tree-outof-ssa.c
index 59bdcd6fadd..42cc53d7052 100644
--- a/gcc/tree-outof-ssa.c
+++ b/gcc/tree-outof-ssa.c
@@ -43,6 +43,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa-coalesce.h"
 #include "tree-outof-ssa.h"
 #include "dojump.h"
+#include "fold-const.h"
+#include "tree-ssa-loop-niter.h"
+#include "cfgloop.h"
 
 /* FIXME: A lot of code here deals with expanding to RTL.  All that code
should be in cfgexpand.c.  */
@@ -1035,6 +1038,221 @@ trivially_conflicts_p (basic_block bb, tree result, tree arg)
   return false;
 }
 
+/* Return TRUE if STMT is in the form: x_99 = SSA +p INT.  */
+static bool
+is_iv_plus_constant (gimple *stmt, tree ssa)
+{
+  return is_gimple_assign (stmt)
+&& gimple_assign_rhs_code (stmt) == POINTER_PLUS_EXPR
+&& gimple_assign_rhs1 (stmt) == ssa
+&& TREE_CODE (gimple_assign_rhs2 (stmt)) == INTEGER_CST;
+}
+
+/* Helper function for uncoalesce_ivs_outside_loop.  Optimize out of
+   loop uses of the IV when they are of the form IV+CST.
+
+   Return TRUE if we were able to rewrite:
+  iv_incr = iv + CST
+  goto LOOP;
+  iv_use = iv + 2*CST
+into:
+  iv_use = iv_incr + CST
+
+   IV is the induction variable for the current iteration.
+   IV_INCR is the IV value fo

Re: [og7] Update nvptx_fork/join barrier placement

2018-03-19 Thread Cesar Philippidis

On 03/19/2018 10:02 AM, Tom de Vries wrote:
> On 03/19/2018 03:55 PM, Cesar Philippidis wrote:
>>> Note that this changes ordering of the vector-neutering jump and
>>> worker-neutering jump at the end. In principle, this should not be
>>> harmful, but it violates the invariant that vector-neutering
>>> branch-around code should be as short-lived as possible. So, this needs
>>> to be fixed.
>>>
>>> I've found this issue by adding verification of the neutering, as
>>> attached below.
>> ACK, thanks. I'll take a closer look at this.
> 
> I've got a tentative patch at
> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43707 ( PR84952 -
> "[nvptx] bar.sync generated in divergent code" ).

I attached my WIP patch. But, given that you've spent a lot of time on
this, I'll let you continue working on it. Just remember to backport any
fix to og7.

Thanks,
Cesar
diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index a6f444340fd..0d288cb81ba 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -4037,6 +4037,22 @@ nvptx_single (unsigned mask, basic_block from, basic_block to)
 	return;
 }
 
+  /* NVPTX_BARSYNC barriers are placed immediately before NVPTX_JOIN
+ in order to ensure that all of the threads in a CTA reach the
+ barrier.  Don't nueter BLOCK if head is NVPTX_BARSYNC and tail is
+ NVPTX_JOIN.  */
+  if (from == to
+  && recog_memoized (head) == CODE_FOR_nvptx_barsync
+  && recog_memoized (tail) == CODE_FOR_nvptx_join)
+return;
+
+  /* Adjust HEAD to point to the NVPTX_JOIN instruction after a
+ NVPTX_BARSYNC, so that any successive state neutering code does
+ not get placed before the dummy JOIN comment. */
+  if (recog_memoized (head) == CODE_FOR_nvptx_barsync
+  && recog_memoized (NEXT_INSN (head)) == CODE_FOR_nvptx_join)
+head = NEXT_INSN (head);
+
   /* Insert the vector test inside the worker test.  */
   unsigned mode;
   rtx_insn *before = tail;
@@ -4057,7 +4073,23 @@ nvptx_single (unsigned mask, basic_block from, basic_block to)
 	  br = gen_br_true (pred, label);
 	else
 	  br = gen_br_true_uni (pred, label);
-	emit_insn_before (br, head);
+
+	if (recog_memoized (head) == CODE_FOR_nvptx_forked
+	&& recog_memoized (NEXT_INSN (head)) == CODE_FOR_nvptx_barsync)
+	  {
+	head = NEXT_INSN (head);
+	emit_insn_after (br, head);
+	  }
+	else if (recog_memoized (head) == CODE_FOR_nvptx_join)
+	  {
+	if (recog_memoized (NEXT_INSN (head)) == CODE_FOR_br_true_uni
+		&& mode == GOMP_DIM_VECTOR)
+	  emit_insn_after (br, NEXT_INSN (head));
+	else
+	  emit_insn_after (br, head);
+	  }
+	else
+	  emit_insn_before (br, head);
 
 	LABEL_NUSES (label)++;
 	if (tail_branch)
@@ -4276,7 +4308,7 @@ nvptx_process_pars (parallel *par)
   nvptx_wpropagate (true, par->forked_block, par->fork_insn);
   /* Insert begin and end synchronizations.  */
   emit_insn_after (nvptx_wsync (false), par->forked_insn);
-  emit_insn_before (nvptx_wsync (true), par->joining_insn);
+  emit_insn_before (nvptx_wsync (true), par->join_insn);
 }
   else if (par->mask & GOMP_DIM_MASK (GOMP_DIM_VECTOR))
 nvptx_vpropagate (par->forked_block, par->forked_insn);

[PATCH] RISC-V: Fix bootstrap failure.

2018-03-19 Thread Jim Wilson

This fixes the RISC-V bootstrap failure, based on a suggestion/question from
Serge Belyshev asking why the patch wasn't using PREFERRED_STACK_BOUNDARY,
which turns out to be a good solution.  So the patch renames STACK_BOUNDARY to
PREFERRED_STACK_BOUNDARY, and renames MIN_STACK_BOUNDARY to STACK_BOUNDARY.

This was tested with a cross compiler build with extra warning option to
force the failure, and a cross make check.  Tested with a native build.
Tested by dissembling code with and without the patch to verify that I
didn't accidentally change the ABI.  And hand testing to verify that the
option -mpreferred-stack-boundary was still working.

Committed.

Jim

gcc/
PR bootstrap/84856
* config/riscv/riscv.c (riscv_function_arg_boundary): Use
PREFERRED_STACK_BOUNDARY instead of STACK_BOUNDARY.
(riscv_first_stack_step): Likewise.
(riscv_option_override): Use STACK_BOUNDARY instead of
MIN_STACK_BOUNDARY.
* config/riscv/riscv.h (STACK_BOUNDARY): Renamed from
MIN_STACK_BOUNDARY.
(BIGGEST_ALIGNMENT): Set to 128.
(PREFERRED_STACK_BOUNDARY): Renamed from STACK_BOUNDARY.
(RISCV_STACK_ALIGN): Use PREFERRED_STACK_BOUNDARY instead of
STACK_BOUNDARY.
---
 gcc/config/riscv/riscv.c | 8 
 gcc/config/riscv/riscv.h | 8 
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
index c38f6c394d5..0a75c8ae17f 100644
--- a/gcc/config/riscv/riscv.c
+++ b/gcc/config/riscv/riscv.c
@@ -2202,7 +2202,7 @@ riscv_expand_conditional_branch (rtx label, rtx_code 
code, rtx op0, rtx op1)
 
 /* Implement TARGET_FUNCTION_ARG_BOUNDARY.  Every parameter gets at
least PARM_BOUNDARY bits of alignment, but will be given anything up
-   to STACK_BOUNDARY bits if the type requires it.  */
+   to PREFERRED_STACK_BOUNDARY bits if the type requires it.  */
 
 static unsigned int
 riscv_function_arg_boundary (machine_mode mode, const_tree type)
@@ -2215,7 +2215,7 @@ riscv_function_arg_boundary (machine_mode mode, 
const_tree type)
   else
 alignment = type ? TYPE_ALIGN (type) : GET_MODE_ALIGNMENT (mode);
 
-  return MIN (STACK_BOUNDARY, MAX (PARM_BOUNDARY, alignment));
+  return MIN (PREFERRED_STACK_BOUNDARY, MAX (PARM_BOUNDARY, alignment));
 }
 
 /* If MODE represents an argument that can be passed or returned in
@@ -3506,7 +3506,7 @@ riscv_first_stack_step (struct riscv_frame_info *frame)
 return frame->total_size;
 
   HOST_WIDE_INT min_first_step = frame->total_size - frame->fp_sp_offset;
-  HOST_WIDE_INT max_first_step = IMM_REACH / 2 - STACK_BOUNDARY / 8;
+  HOST_WIDE_INT max_first_step = IMM_REACH / 2 - PREFERRED_STACK_BOUNDARY / 8;
   HOST_WIDE_INT min_second_step = frame->total_size - max_first_step;
   gcc_assert (min_first_step <= max_first_step);
 
@@ -4137,7 +4137,7 @@ riscv_option_override (void)
   riscv_stack_boundary = ABI_STACK_BOUNDARY;
   if (riscv_preferred_stack_boundary_arg)
 {
-  int min = ctz_hwi (MIN_STACK_BOUNDARY / 8);
+  int min = ctz_hwi (STACK_BOUNDARY / 8);
   int max = 8;
 
   if (!IN_RANGE (riscv_preferred_stack_boundary_arg, min, max))
diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index 6144e267727..ebd80c0a5f2 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -124,13 +124,13 @@ along with GCC; see the file COPYING3.  If not see
 #define FUNCTION_BOUNDARY (TARGET_RVC ? 16 : 32)
 
 /* The smallest supported stack boundary the calling convention supports.  */
-#define MIN_STACK_BOUNDARY (2 * BITS_PER_WORD)
+#define STACK_BOUNDARY (2 * BITS_PER_WORD)
 
 /* The ABI stack alignment.  */
 #define ABI_STACK_BOUNDARY 128
 
 /* There is no point aligning anything to a rounder boundary than this.  */
-#define BIGGEST_ALIGNMENT STACK_BOUNDARY
+#define BIGGEST_ALIGNMENT 128
 
 /* The user-level ISA permits unaligned accesses, but they are not required
of the privileged architecture.  */
@@ -482,7 +482,7 @@ enum reg_class
`crtl->outgoing_args_size'.  */
 #define OUTGOING_REG_PARM_STACK_SPACE(FNTYPE) 1
 
-#define STACK_BOUNDARY riscv_stack_boundary
+#define PREFERRED_STACK_BOUNDARY riscv_stack_boundary
 
 /* Symbolic macros for the registers used to return integer and floating
point values.  */
@@ -540,7 +540,7 @@ typedef struct {
 
 /* Align based on stack boundary, which might have been set by the user.  */
 #define RISCV_STACK_ALIGN(LOC) \
-  (((LOC) + ((STACK_BOUNDARY/8)-1)) & -(STACK_BOUNDARY/8))
+  (((LOC) + ((PREFERRED_STACK_BOUNDARY/8)-1)) & -(PREFERRED_STACK_BOUNDARY/8))
 
 /* EXIT_IGNORE_STACK should be nonzero if, when returning from a function,
the stack pointer does not matter.  The value is tested only in
-- 
2.14.1

C++ PATCH for c++/84927, ICE with NSDMI and reference

2018-03-19 Thread Marek Polacek

This nice test crashes in verify_constructor_flags because while the
constructor as a whole is TREE_CONSTANT, one of its elements is not.

We wound up in this situation because when we first get to
cxx_eval_bare_aggregate, ctx->ctor is "{}", which is TREE_CONSTANT, while T is
"{.r=(int &) &j, .i=*(&)->r}".  We evaluate the
first element of T and then append it to the empty ctx->ctor, so now ctx->ctor
is "{.r=(int &) &j}", a constant ctor with a non-constant element.  Then we get
onto evaluating the second element of T.  The second elt has a
PLACEHOLDER_EXPR, so we get to
4695 case PLACEHOLDER_EXPR:
4696   /* Use of the value or address of the current object.  */
4697   if (tree ctor = lookup_placeholder (ctx, lval, TREE_TYPE (t)))
4698 return cxx_eval_constant_expression (ctx, ctor, lval,
4699  non_constant_p, overflow_p);
where lookup_placeholder returns "{.r=(int &) &j}", which we try to
evaluate, and crash in verify_constructor_flags in the CONSTRUCTOR case.

I think let's update the flags of the constructor as we evaluate elements in
cxx_eval_bare_aggregate; this should be cheaper than calling
recompute_constructor_flags someplace else.

I took the liberty to remove a stale FIXME; testing didn't reveal any new
issues.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2018-03-19  Marek Polacek  

PR c++/84927
* constexpr.c (cxx_eval_bare_aggregate): Update constructor's flags
as we evaluate the elements.
(cxx_eval_constant_expression): Verify constructor's flags
unconditionally.

* g++.dg/cpp1y/nsdmi-aggr9.C: New test.

diff --git gcc/cp/constexpr.c gcc/cp/constexpr.c
index 05a1cb64d61..894bcd0bb3e 100644
--- gcc/cp/constexpr.c
+++ gcc/cp/constexpr.c
@@ -2880,7 +2880,12 @@ cxx_eval_bare_aggregate (const constexpr_ctx *ctx, tree 
t,
  (*p)->last().value = elt;
}
   else
-   CONSTRUCTOR_APPEND_ELT (*p, index, elt);
+   {
+ CONSTRUCTOR_APPEND_ELT (*p, index, elt);
+ /* Adding an element might change the ctor's flags.  */
+ TREE_CONSTANT (ctx->ctor) = constant_p;
+ TREE_SIDE_EFFECTS (ctx->ctor) = side_effects_p;
+   }
 }
   if (*non_constant_p || !changed)
 return t;
@@ -4530,11 +4535,7 @@ cxx_eval_constant_expression (const constexpr_ctx *ctx, 
tree t,
{
  /* Don't re-process a constant CONSTRUCTOR, but do fold it to
 VECTOR_CST if applicable.  */
- /* FIXME after GCC 6 branches, make the verify unconditional.  */
- if (CHECKING_P)
-   verify_constructor_flags (t);
- else
-   recompute_constructor_flags (t);
+ verify_constructor_flags (t);
  if (TREE_CONSTANT (t))
return fold (t);
}
diff --git gcc/testsuite/g++.dg/cpp1y/nsdmi-aggr9.C 
gcc/testsuite/g++.dg/cpp1y/nsdmi-aggr9.C
index e69de29bb2d..4e13fc5c9d8 100644
--- gcc/testsuite/g++.dg/cpp1y/nsdmi-aggr9.C
+++ gcc/testsuite/g++.dg/cpp1y/nsdmi-aggr9.C
@@ -0,0 +1,14 @@
+// PR c++/84927 - ICE with NSDMI and reference
+// { dg-do compile { target c++14 } }
+
+struct A
+{
+  int& r;
+  int i = r;
+};
+
+void foo()
+{
+  int j;
+  A a = A{j};
+}

Marek

Re: C++ PATCH for c++/84927, ICE with NSDMI and reference

2018-03-19 Thread Jason Merrill

OK.

On Mon, Mar 19, 2018 at 2:22 PM, Marek Polacek  wrote:
> This nice test crashes in verify_constructor_flags because while the
> constructor as a whole is TREE_CONSTANT, one of its elements is not.
>
> We wound up in this situation because when we first get to
> cxx_eval_bare_aggregate, ctx->ctor is "{}", which is TREE_CONSTANT, while T is
> "{.r=(int &) &j, .i=*(&)->r}".  We evaluate the
> first element of T and then append it to the empty ctx->ctor, so now ctx->ctor
> is "{.r=(int &) &j}", a constant ctor with a non-constant element.  Then we 
> get
> onto evaluating the second element of T.  The second elt has a
> PLACEHOLDER_EXPR, so we get to
> 4695 case PLACEHOLDER_EXPR:
> 4696   /* Use of the value or address of the current object.  */
> 4697   if (tree ctor = lookup_placeholder (ctx, lval, TREE_TYPE (t)))
> 4698 return cxx_eval_constant_expression (ctx, ctor, lval,
> 4699  non_constant_p, overflow_p);
> where lookup_placeholder returns "{.r=(int &) &j}", which we try to
> evaluate, and crash in verify_constructor_flags in the CONSTRUCTOR case.
>
> I think let's update the flags of the constructor as we evaluate elements in
> cxx_eval_bare_aggregate; this should be cheaper than calling
> recompute_constructor_flags someplace else.
>
> I took the liberty to remove a stale FIXME; testing didn't reveal any new
> issues.
>
> Bootstrapped/regtested on x86_64-linux, ok for trunk?
>
> 2018-03-19  Marek Polacek  
>
> PR c++/84927
> * constexpr.c (cxx_eval_bare_aggregate): Update constructor's flags
> as we evaluate the elements.
> (cxx_eval_constant_expression): Verify constructor's flags
> unconditionally.
>
> * g++.dg/cpp1y/nsdmi-aggr9.C: New test.
>
> diff --git gcc/cp/constexpr.c gcc/cp/constexpr.c
> index 05a1cb64d61..894bcd0bb3e 100644
> --- gcc/cp/constexpr.c
> +++ gcc/cp/constexpr.c
> @@ -2880,7 +2880,12 @@ cxx_eval_bare_aggregate (const constexpr_ctx *ctx, 
> tree t,
>   (*p)->last().value = elt;
> }
>else
> -   CONSTRUCTOR_APPEND_ELT (*p, index, elt);
> +   {
> + CONSTRUCTOR_APPEND_ELT (*p, index, elt);
> + /* Adding an element might change the ctor's flags.  */
> + TREE_CONSTANT (ctx->ctor) = constant_p;
> + TREE_SIDE_EFFECTS (ctx->ctor) = side_effects_p;
> +   }
>  }
>if (*non_constant_p || !changed)
>  return t;
> @@ -4530,11 +4535,7 @@ cxx_eval_constant_expression (const constexpr_ctx 
> *ctx, tree t,
> {
>   /* Don't re-process a constant CONSTRUCTOR, but do fold it to
>  VECTOR_CST if applicable.  */
> - /* FIXME after GCC 6 branches, make the verify unconditional.  */
> - if (CHECKING_P)
> -   verify_constructor_flags (t);
> - else
> -   recompute_constructor_flags (t);
> + verify_constructor_flags (t);
>   if (TREE_CONSTANT (t))
> return fold (t);
> }
> diff --git gcc/testsuite/g++.dg/cpp1y/nsdmi-aggr9.C 
> gcc/testsuite/g++.dg/cpp1y/nsdmi-aggr9.C
> index e69de29bb2d..4e13fc5c9d8 100644
> --- gcc/testsuite/g++.dg/cpp1y/nsdmi-aggr9.C
> +++ gcc/testsuite/g++.dg/cpp1y/nsdmi-aggr9.C
> @@ -0,0 +1,14 @@
> +// PR c++/84927 - ICE with NSDMI and reference
> +// { dg-do compile { target c++14 } }
> +
> +struct A
> +{
> +  int& r;
> +  int i = r;
> +};
> +
> +void foo()
> +{
> +  int j;
> +  A a = A{j};
> +}
>
> Marek

[PR c++/84835] ICE with generic lambda in extern "C"

2018-03-19 Thread Nathan Sidwell

In an extern "C" region the lambda member fns were getting C linkage. 
and triggering an assert I added.  I'm not sure why just the generic 
case was being triggered, but the fix is to just force the language to 
be C++ -- this is what we do in finish_member_declaration.


We were also not copying the linkage into the template header.  Mostly 
that was ok, because we then ignored it.  But sometimes, like this, 
things went wrong.


applying to trunk.

nathan
--
Nathan Sidwell
2018-03-19  Nathan Sidwell  

	PR c++/84835
	* lambda.c (maybe_add_lambda_conv_op): Force C++ linkage.
	* pt.c (build_template_decl): Propagate language linkage.

	PR c++/84835
	* g++.dg/cpp1y/pr84835.C: New.

Index: cp/lambda.c
===
--- cp/lambda.c	(revision 258642)
+++ cp/lambda.c	(working copy)
@@ -1176,6 +1176,7 @@ maybe_add_lambda_conv_op (tree type)
   tree thistype = cp_build_qualified_type (type, TYPE_QUAL_CONST);
   tree fntype = build_method_type_directly (thistype, rettype, void_list_node);
   tree convfn = build_lang_decl (FUNCTION_DECL, name, fntype);
+  SET_DECL_LANGUAGE (convfn, lang_cplusplus);
   tree fn = convfn;
   DECL_SOURCE_LOCATION (fn) = DECL_SOURCE_LOCATION (callop);
   SET_DECL_ALIGN (fn, MINIMUM_METHOD_BOUNDARY);
@@ -1208,6 +1209,7 @@ maybe_add_lambda_conv_op (tree type)
 
   name = get_identifier ("_FUN");
   tree statfn = build_lang_decl (FUNCTION_DECL, name, stattype);
+  SET_DECL_LANGUAGE (statfn, lang_cplusplus);
   fn = statfn;
   DECL_SOURCE_LOCATION (fn) = DECL_SOURCE_LOCATION (callop);
   grokclassfn (type, fn, NO_SPECIAL);
Index: cp/pt.c
===
--- cp/pt.c	(revision 258642)
+++ cp/pt.c	(working copy)
@@ -4677,6 +4677,7 @@ tree
 build_template_decl (tree decl, tree parms, bool member_template_p)
 {
   tree tmpl = build_lang_decl (TEMPLATE_DECL, DECL_NAME (decl), NULL_TREE);
+  SET_DECL_LANGUAGE (tmpl, DECL_LANGUAGE (decl));
   DECL_TEMPLATE_PARMS (tmpl) = parms;
   DECL_CONTEXT (tmpl) = DECL_CONTEXT (decl);
   DECL_SOURCE_LOCATION (tmpl) = DECL_SOURCE_LOCATION (decl);
Index: testsuite/g++.dg/cpp1y/pr84835.C
===
--- testsuite/g++.dg/cpp1y/pr84835.C	(revision 0)
+++ testsuite/g++.dg/cpp1y/pr84835.C	(working copy)
@@ -0,0 +1,20 @@
+// { dg-do compile { target c++14 } }
+// PR c++/84835
+// ICE with generic lambda inside extern "C"
+
+extern "C" 
+{
+  auto r = [] (auto x) 
+  {
+void baz (); // extern "C"
+baz ();
+  };
+  
+}
+
+void g ()
+{
+  r (0);
+}
+
+//  { dg-final { scan-assembler "\[^0-9\]baz" } }

Re: [PATCH PR81647][AARCH64] Fix handling of Unordered Comparisons in aarch64-simd.md

2018-03-19 Thread Sudakshina Das


Hi

On 19/03/18 14:29, James Greenhalgh wrote:

On Fri, Dec 15, 2017 at 11:57:46AM +, Sudi Das wrote:

Hi

This patch fixes the inconsistent behavior observed at -O3 for the
unordered comparisons. According to the online docs
(https://gcc.gnu.org/onlinedocs/gcc-7.2.0/gccint/Unary-and-Binary-Expressions.html),
all of the following should not raise an FP exception:
- UNGE_EXPR
- UNGT_EXPR
- UNLE_EXPR
- UNLT_EXPR
- UNEQ_EXPR
Also ORDERED_EXPR and UNORDERED_EXPR should only return zero or one.

The aarch64-simd.md handling of these were generating exception raising
instructions such as fcmgt. This patch changes the instructions that are
emitted to in order to not give out the exceptions. We first check each
operand for NaNs and force any elements containing NaN to zero before
using them in the compare.

Example: UN (a, b) -> UNORDERED (a, b) | (cm (isnan (a) ? 0.0 :
a, isnan (b) ? 0.0 : b))


The ORDERED_EXPR is now handled as (cmeq (a, a) & cmeq (b, b)) and
UNORDERED_EXPR as ~ORDERED_EXPR and UNEQ as (~ORDERED_EXPR | cmeq (a,b)).

Testing done: Checked for regressions on bootstrapped
aarch64-none-linux-gnu and added a new test case.

Is this ok for trunk? This will probably need a back-port to
gcc-7-branch as well.


OK.

Let it soak on trunk for a while before the backport.


Thanks. Committed to trunk as r258653. Will wait a week before backport.

Sudi



Thanks,
James


ChangeLog Entries:

*** gcc/ChangeLog ***

2017-12-15  Sudakshina Das  

PR target/81647
* config/aarch64/aarch64-simd.md (vec_cmp): Modify
instructions for
UNLT, UNLE, UNGT, UNGE, UNEQ, UNORDERED and ORDERED.

*** gcc/testsuite/ChangeLog ***

2017-12-15  Sudakshina Das  

PR target/81647
* gcc.target/aarch64/pr81647.c: New.



diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
f90f74fe7fd5990a97b9f4eb68f5735b7d4fb9aa..acff06c753b3e3aaa5775632929909afa4d3294b
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -2731,10 +2731,10 @@
  break;
}
/* Fall through.  */
-case UNGE:
+case UNLT:
std::swap (operands[2], operands[3]);
/* Fall through.  */
-case UNLE:
+case UNGT:
  case GT:
comparison = gen_aarch64_cmgt;
break;
@@ -2745,10 +2745,10 @@
  break;
}
/* Fall through.  */
-case UNGT:
+case UNLE:
std::swap (operands[2], operands[3]);
/* Fall through.  */
-case UNLT:
+case UNGE:
  case GE:
comparison = gen_aarch64_cmge;
break;
@@ -2771,21 +2771,35 @@
  case UNGT:
  case UNLE:
  case UNLT:
-case NE:
-  /* FCM returns false for lanes which are unordered, so if we use
-the inverse of the comparison we actually want to emit, then
-invert the result, we will end up with the correct result.
-Note that a NE NaN and NaN NE b are true for all a, b.
-
-Our transformations are:
-a UNGE b -> !(b GT a)
-a UNGT b -> !(b GE a)
-a UNLE b -> !(a GT b)
-a UNLT b -> !(a GE b)
-a   NE b -> !(a EQ b)  */
-  gcc_assert (comparison != NULL);
-  emit_insn (comparison (operands[0], operands[2], operands[3]));
-  emit_insn (gen_one_cmpl2 (operands[0], operands[0]));
+  {
+   /* All of the above must not raise any FP exceptions.  Thus we first
+  check each operand for NaNs and force any elements containing NaN to
+  zero before using them in the compare.
+  Example: UN (a, b) -> UNORDERED (a, b) |
+(cm (isnan (a) ? 0.0 : a,
+ isnan (b) ? 0.0 : b))
+  We use the following transformations for doing the comparisions:
+  a UNGE b -> a GE b
+  a UNGT b -> a GT b
+  a UNLE b -> b GE a
+  a UNLT b -> b GT a.  */
+
+   rtx tmp0 = gen_reg_rtx (mode);
+   rtx tmp1 = gen_reg_rtx (mode);
+   rtx tmp2 = gen_reg_rtx (mode);
+   emit_insn (gen_aarch64_cmeq (tmp0, operands[2], operands[2]));
+   emit_insn (gen_aarch64_cmeq (tmp1, operands[3], operands[3]));
+   emit_insn (gen_and3 (tmp2, tmp0, tmp1));
+   emit_insn (gen_and3 (tmp0, tmp0,
+   lowpart_subreg (mode, operands[2], 
mode)));
+   emit_insn (gen_and3 (tmp1, tmp1,
+   lowpart_subreg (mode, operands[3], 
mode)));
+   gcc_assert (comparison != NULL);
+   emit_insn (comparison (operands[0],
+  lowpart_subreg (mode, tmp0, 
mode),
+  lowpart_subreg (mode, tmp1, 
mode)));
+   emit_insn (gen_orn3 (operands[0], tmp2, operands[0]));
+  }
break;
  
  case LT:

@@ -2793,25 +2807,19 @@
  case GT:
  case GE:
  case EQ:
+case NE:
/* The easy case.  Here we emit one of FCMGE, FCMGT or FCMEQ.
 As a LT b <=> b GE a && a LE b <=> b GT

Re: C++ PATCH for c++/84925, ICE in enclosing_instantiation_of

2018-03-19 Thread Jason Merrill

OK.

On Mon, Mar 19, 2018 at 7:55 AM, Marek Polacek  wrote:
> Seems like with this testcase we end up in a scenario where, when counting the
> lambda count in enclosing_instantiation_of, we arrive at decl_function_context
> being null, so checking DECL_TEMPLATE_INFO then crashes.  It appears that just
> the following does the right thing, but I'm not too sure about this function.
>
> Bootstrapped/regtested on x86_64-linux, ok for trunk?
>
> 2018-03-19  Marek Polacek  
>
> PR c++/84925
> * pt.c (enclosing_instantiation_of): Check if fn is null.
>
> * g++.dg/cpp1z/lambda-__func__.C: New test.
>
> diff --git gcc/cp/pt.c gcc/cp/pt.c
> index 745c9acd6ee..066cb627a70 100644
> --- gcc/cp/pt.c
> +++ gcc/cp/pt.c
> @@ -12898,7 +12898,7 @@ enclosing_instantiation_of (tree otctx)
>for (; flambda_count < lambda_count && fn && LAMBDA_FUNCTION_P (fn);
>fn = decl_function_context (fn))
> ++flambda_count;
> -  if (DECL_TEMPLATE_INFO (fn)
> +  if ((fn && DECL_TEMPLATE_INFO (fn))
>   ? most_general_template (fn) != most_general_template (tctx)
>   : fn != tctx)
> continue;
> diff --git gcc/testsuite/g++.dg/cpp1z/lambda-__func__.C 
> gcc/testsuite/g++.dg/cpp1z/lambda-__func__.C
> index e69de29bb2d..d5d1c1cb7b6 100644
> --- gcc/testsuite/g++.dg/cpp1z/lambda-__func__.C
> +++ gcc/testsuite/g++.dg/cpp1z/lambda-__func__.C
> @@ -0,0 +1,13 @@
> +// PR c++/84925
> +// { dg-options "-std=c++17" }
> +
> +template 
> +struct A {
> +  static const int value = 0;
> +  static auto constexpr fn = [] { return __func__; };
> +};
> +
> +template 
> +int x = A::value;
> +
> +auto s = x;
>
> Marek

[PATCH] Fix __builtin_cpu_supports (PR target/84945)

2018-03-19 Thread Jakub Jelinek

Hi!

As mentioned in the PR, this is another case where we've run into too many
separate feature bits issues on x86. unfortunately in this case it affects
ABI.  __cpu_model variable contains just 32 bits for the features, and we
now need 4 extra features that won't fit in there.  The current code just
invokes in the compiler as well as runtime initialization.

While on {x86_64,i?86}-linux __cpu_model is exported from libgcc_s.so.1
only for backwards compatibility and new code is always using a hidden
copy in each DSO or binary from libgcc.a, that is not the case on other
x86 targets.  For a Linux only solution, we could just grow up __cpu_model
except for the one in libgcc_s.so.1 (i.e. #ifdef SHARED) and as long as GCC
8+ libgcc.a would be used for linking of __builtin_cpu_supports code from
both old and new gcc things would just work.  For other targets, as
__cpu_model is exported data object, binaries could have copy relocations
against it, so we really can't change the size.

As the preferred way is to put it into libgcc.a only, this patch just puts
the overflow features into a new separate variable which is not put into
libgcc_s.so.1 at all, which will force use of the libgcc.a copy for
at least __builtin_cpu_supports calls that ask for the new features.

Bootstrapped/regtested on x86_64-linux and i686-linux (on Haswell-E, don't
have access to any of the CPUs with the new features), ok for trunk?

2018-03-19  Jakub Jelinek  

PR target/84945
* config/i386/i386.c (fold_builtin_cpu): For features above 31
use __cpu_features2 variable instead of __cpu_model.__cpu_features[0].
Use 1U instead of 1.  Formatting fixes.

* gcc.target/i386/pr84945.c: New test.

* config/i386/cpuinfo.h (__cpu_features2): Declare.
* config/i386/cpuinfo.c (__cpu_features2): New variable for
ifndef SHARED only.
(set_feature): Define.
(get_available_features): Use set_feature macro.  Set __cpu_features2
to the second word of features ifndef SHARED.

--- gcc/config/i386/i386.c.jj   2018-03-17 12:11:39.954436461 +0100
+++ gcc/config/i386/i386.c  2018-03-19 16:36:55.565231809 +0100
@@ -33265,8 +33264,8 @@ fold_builtin_cpu (tree fndecl, tree *arg
}
 
   /* Get the appropriate field in __cpu_model.  */
-  ref =  build3 (COMPONENT_REF, TREE_TYPE (field), __cpu_model_var,
-field, NULL_TREE);
+  ref = build3 (COMPONENT_REF, TREE_TYPE (field), __cpu_model_var,
+   field, NULL_TREE);
 
   /* Check the value.  */
   final = build2 (EQ_EXPR, unsigned_type_node, ref,
@@ -33296,20 +33295,34 @@ fold_builtin_cpu (tree fndecl, tree *arg
  return integer_zero_node;
}
 
+  if (isa_names_table[i].feature >= 32)
+   {
+ tree __cpu_features2_var = make_var_decl (unsigned_type_node,
+   "__cpu_features2");
+
+ varpool_node::add (__cpu_features2_var);
+ field_val = (1U << (isa_names_table[i].feature - 32));
+ /* Return __cpu_features2 & field_val  */
+ final = build2 (BIT_AND_EXPR, unsigned_type_node,
+ __cpu_features2_var,
+ build_int_cstu (unsigned_type_node, field_val));
+ return build1 (CONVERT_EXPR, integer_type_node, final);
+   }
+
   field = TYPE_FIELDS (__processor_model_type);
   /* Get the last field, which is __cpu_features.  */
   while (DECL_CHAIN (field))
 field = DECL_CHAIN (field);
 
   /* Get the appropriate field: __cpu_model.__cpu_features  */
-  ref =  build3 (COMPONENT_REF, TREE_TYPE (field), __cpu_model_var,
-field, NULL_TREE);
+  ref = build3 (COMPONENT_REF, TREE_TYPE (field), __cpu_model_var,
+   field, NULL_TREE);
 
   /* Access the 0th element of __cpu_features array.  */
   array_elt = build4 (ARRAY_REF, unsigned_type_node, ref,
  integer_zero_node, NULL_TREE, NULL_TREE);
 
-  field_val = (1 << isa_names_table[i].feature);
+  field_val = (1U << (isa_names_table[i].feature));
   /* Return __cpu_model.__cpu_features[0] & field_val  */
   final = build2 (BIT_AND_EXPR, unsigned_type_node, array_elt,
  build_int_cstu (unsigned_type_node, field_val));
--- gcc/testsuite/gcc.target/i386/pr84945.c.jj  2018-03-19 16:37:14.667228396 
+0100
+++ gcc/testsuite/gcc.target/i386/pr84945.c 2018-03-19 16:36:42.057234231 
+0100
@@ -0,0 +1,16 @@
+/* PR target/84945 */
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+int
+main ()
+{
+  /* AVX512_VNNI instructions are all EVEX encoded, so if
+ __builtin_cpu_supports says avx512vnni is available and avx512f is not,
+ this is a GCC bug.  Ditto for AVX512_BITALG  */
+  if (!__builtin_cpu_supports ("avx512f")
+  && (__builtin_cpu_supports ("avx512vnni")
+ || __builtin_cpu_supports ("avx512bitalg")))
+__builtin_abort ();

[PATCH] Avoid UBSAN -fsanitize=enum errors on GCC memmodel enum (PR rtl-optimization/84643)

2018-03-19 Thread Jakub Jelinek

Hi!

The enum memmodel doesn't declare the target specific enumerators,
which are just defines in i386.h, but as those are above any
enumerators in the generic enum, it is strictly speaking UB when
values with that enum memmodel type have the higher target dependent bits
set.  This patch just makes sure that the enumerator is full signed 32-bit.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2018-03-19  Jakub Jelinek  

PR rtl-optimization/84643
* memmodel.h (enum memmodel): Add MEMMODEL_MAX enumerator.

--- gcc/memmodel.h.jj   2018-01-03 10:19:55.495534012 +0100
+++ gcc/memmodel.h  2018-03-19 10:31:42.980812157 +0100
@@ -45,7 +45,9 @@ enum memmodel
   MEMMODEL_LAST = 6,
   MEMMODEL_SYNC_ACQUIRE = MEMMODEL_ACQUIRE | MEMMODEL_SYNC,
   MEMMODEL_SYNC_RELEASE = MEMMODEL_RELEASE | MEMMODEL_SYNC,
-  MEMMODEL_SYNC_SEQ_CST = MEMMODEL_SEQ_CST | MEMMODEL_SYNC
+  MEMMODEL_SYNC_SEQ_CST = MEMMODEL_SEQ_CST | MEMMODEL_SYNC,
+  /* Say that all the higher bits are valid target extensions.  */
+  MEMMODEL_MAX = INTTYPE_MAXIMUM (int)
 };
 
 /* Return the memory model from a host integer.  */

Jakub

[C++ PATCH] Fix FIX_TRUNC_EXPR instantiation (PR c++/84942)

2018-03-19 Thread Jakub Jelinek

Hi!

We instantiate FIX_TRUNC_EXPR using cp_build_unary_op, but that function
doesn't look like a good match for this tree, what it really does for it is:
1) handle error_operand_p
2) on ia64 only diagnose __fpreg uses (I believe a cast of __fpreg to int is
   fine, so we shouldn't warn)
3) and then it does:
argtype = TREE_TYPE (arg);
  return build1 (code, argtype, arg);
   which creates FIX_TRUNC_EXPR with the type of the argument rather than
   the integral type it should have.
The following patch thus bypasses that function and just builds the
FIX_TRUNC_EXPR with the right type.  I think we don't need to tsubst the
type, because if it is type dependent, we wouldn't be creating
FIX_TRUNC_EXPR, but record it just as a some kind of conversion.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2018-03-19  Jakub Jelinek  

PR c++/84942
* pt.c (tsubst_copy_and_build) : Don't call
cp_build_unary_op, just use build1.

* g++.dg/cpp1y/pr84942.C: New test.

--- gcc/cp/pt.c.jj  2018-03-16 21:11:04.440773108 +0100
+++ gcc/cp/pt.c 2018-03-19 10:58:39.803657613 +0100
@@ -17495,8 +17495,10 @@ tsubst_copy_and_build (tree t,
complain|decltype_flag));
 
 case FIX_TRUNC_EXPR:
-  RETURN (cp_build_unary_op (FIX_TRUNC_EXPR, RECUR (TREE_OPERAND (t, 0)),
-false, complain));
+  op1 = RECUR (TREE_OPERAND (t, 0));
+  if (error_operand_p (op1))
+RETURN (error_mark_node);
+  RETURN (build1 (FIX_TRUNC_EXPR, TREE_TYPE (t), op1));
 
 case ADDR_EXPR:
   op1 = TREE_OPERAND (t, 0);
--- gcc/testsuite/g++.dg/cpp1y/pr84942.C.jj 2018-03-19 11:04:16.201588211 
+0100
+++ gcc/testsuite/g++.dg/cpp1y/pr84942.C2018-03-19 11:02:37.950609950 
+0100
@@ -0,0 +1,7 @@
+// PR c++/84942
+// { dg-do compile { target c++14 } }
+// { dg-options "-w" }
+
+int a(__attribute__((b((int)__builtin_inf() * 1ULL / auto;
+// { dg-error "expected primary-expression before" "" { target *-*-* } .-1 }
+// { dg-error "declared as implicit template" "" { target *-*-* } .-2 }

Jakub

C++ PATCH for c++/71834, template-id with too few arguments accepted

2018-03-19 Thread Jason Merrill

Here we set "lost" and then returned error_mark_node without ever
actually giving an error, because the comment that !arg only occurred
with a previous error was wrong (and lacked an assert (seen_error())).

We already had code for dealing with the case of fixed packs and too
many arguments, but not for too few.  With this patch we now calculate
the actual number of arguments expected as we go along and adjust
them; if we end up with the wrong number, we can give an error at the
end.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit f7f58d8bee895207d10a776a72f8c33da3e648cb
Author: Jason Merrill 
Date:   Fri Mar 16 16:15:45 2018 -0400

PR c++/71834 - template-id with too few arguments.

* pt.c (coerce_template_parms): Check fixed_parameter_pack_p.

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 745c9acd6ee..abdb77f826a 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -8234,6 +8234,11 @@ coerce_template_parms (tree parms,
   int variadic_args_p = 0;
   int post_variadic_parms = 0;
 
+  /* Adjustment to nparms for fixed parameter packs.  */
+  int fixed_pack_adjust = 0;
+  int fixed_packs = 0;
+  int missing = 0;
+
   /* Likewise for parameters with default arguments.  */
   int default_p = 0;
 
@@ -8278,6 +8283,7 @@ coerce_template_parms (tree parms,
 	  || (TREE_VEC_ELT (parms, nargs) != error_mark_node
   && !TREE_PURPOSE (TREE_VEC_ELT (parms, nargs))
 {
+bad_nargs:
   if (complain & tf_error)
 	{
   if (variadic_p || default_p)
@@ -8389,11 +8395,17 @@ coerce_template_parms (tree parms,
 	  lost++;
 	  /* We are done with all of the arguments.  */
 	  arg_idx = nargs;
+	  break;
 	}
 	  else
 	{
 	  pack_adjust = TREE_VEC_LENGTH (ARGUMENT_PACK_ARGS (arg)) - 1;
 	  arg_idx += pack_adjust;
+	  if (fixed_parameter_pack_p (TREE_VALUE (parm)))
+		{
+		  ++fixed_packs;
+		  fixed_pack_adjust += pack_adjust;
+		}
 	}
   
   continue;
@@ -8451,12 +8463,18 @@ coerce_template_parms (tree parms,
 	error ("template argument %d is invalid", arg_idx + 1);
 	}
   else if (!arg)
-/* This only occurs if there was an error in the template
-   parameter list itself (which we would already have
-   reported) that we are trying to recover from, e.g., a class
-   template with a parameter list such as
-   template.  */
-	++lost;
+	{
+	  /* This can occur if there was an error in the template
+	 parameter list itself (which we would already have
+	 reported) that we are trying to recover from, e.g., a class
+	 template with a parameter list such as
+	 template (cpp0x/variadic150.C).  */
+	  ++lost;
+
+	  /* This can also happen with a fixed parameter pack (71834).  */
+	  if (arg_idx >= nargs)
+	++missing;
+	}
   else
 	arg = convert_template_argument (TREE_VALUE (parm),
 	 arg, new_args, complain, 
@@ -8469,20 +8487,20 @@ coerce_template_parms (tree parms,
   cp_unevaluated_operand = saved_unevaluated_operand;
   c_inhibit_evaluation_warnings = saved_inhibit_evaluation_warnings;
 
-  if (variadic_p && arg_idx < nargs)
+  if (missing || arg_idx < nargs - variadic_args_p)
 {
-  if (complain & tf_error)
-	{
-	  error ("wrong number of template arguments "
-		 "(%d, should be %d)", nargs, arg_idx);
-	  if (in_decl)
-	error ("provided for %q+D", in_decl);
-	}
-  return error_mark_node;
+  /* If we had fixed parameter packs, we didn't know how many arguments we
+	 actually needed earlier; now we do.  */
+  nparms += fixed_pack_adjust;
+  variadic_p -= fixed_packs;
+  goto bad_nargs;
 }
 
   if (lost)
-return error_mark_node;
+{
+  gcc_assert (!(complain & tf_error) || seen_error ());
+  return error_mark_node;
+}
 
   if (CHECKING_P && !NON_DEFAULT_TEMPLATE_ARGS_COUNT (new_inner_args))
 SET_NON_DEFAULT_TEMPLATE_ARGS_COUNT (new_inner_args,
diff --git a/gcc/testsuite/g++.dg/cpp0x/variadic-nested3.C b/gcc/testsuite/g++.dg/cpp0x/variadic-nested3.C
new file mode 100644
index 000..381ff731c09
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/variadic-nested3.C
@@ -0,0 +1,10 @@
+// PR c++/71834
+// { dg-do compile { target c++11 } }
+
+template < typename ... Ts > struct A 
+{
+  template < Ts ..., typename U > struct B {};
+};
+
+// should be, e.g.: A < int >::B < 0, int > e; 
+A < int >::B < 0 > e;	   // { dg-error "wrong number of template arguments" }

[PATCH] Security improvement for dwarf indirect constants (PR sanitizer/78651)

2018-03-19 Thread Jakub Jelinek

Hi!

When Maxim posted his patch today, I had a look and found that we emit
these indirect constants into .data sections, even when they always contain
just some pointer that needs relocation, but shouldn't be changed
afterwards; after all, the indirect constants have TREE_READONLY set.

The problem is that we use DECL_INITIAL (decl) = decl; and that isn't
something the section selection code recognizes as valid or reasonable
initializer.  This patch uses ADDR_EXPR of the decl instead, i.e. something
that needs a relocation if -fpic/-fpie, but can be relro, and in position
dependent binaries can be .rodata from the beginning.  This will make it
harder from somebody trying to change these.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2018-03-19  Jakub Jelinek  

PR sanitizer/78651
* dwarf2asm.c: Include fold-const.c.
(dw2_output_indirect_constant_1): Set DECL_INITIAL (decl) to ADDR_EXPR
of decl rather than decl itself.

--- gcc/dwarf2asm.c.jj  2018-02-09 06:44:29.952809199 +0100
+++ gcc/dwarf2asm.c 2018-03-19 17:18:09.82448 +0100
@@ -33,6 +33,7 @@ along with GCC; see the file COPYING3.
 #include "dwarf2.h"
 #include "function.h"
 #include "emit-rtl.h"
+#include "fold-const.h"
 
 #ifndef XCOFF_DEBUGGING_INFO
 #define XCOFF_DEBUGGING_INFO 0
@@ -954,7 +955,7 @@ dw2_output_indirect_constant_1 (const ch
   SET_DECL_ASSEMBLER_NAME (decl, id);
   DECL_ARTIFICIAL (decl) = 1;
   DECL_IGNORED_P (decl) = 1;
-  DECL_INITIAL (decl) = decl;
+  DECL_INITIAL (decl) = build_fold_addr_expr (decl);
   TREE_READONLY (decl) = 1;
   TREE_STATIC (decl) = 1;
 

Jakub

Re: [PATCH] [Microblaze]: PIC Data Text Relative

2018-03-19 Thread Michael Eager


Hi Andrew --

Good work.

Please submit your updated patch.  Check that you follow
GNU coding standards.  Also make sure that the new options
are documented in gcc/doc/invoke.texi.

On 03/18/18 03:27, Andrew Sadek wrote:

Hello Michael,

I have run the test using the new PIC options.
Actually, I have discovered 2 unhandled cases in 
'microblaze_expand_move' + missing conditions in linker relax

leading some test cases execution to fail.
After fixing them, I made a re-run for the whole regression, and the 
results analogy below:


Original, without my patch:
         === gcc Summary ===

# of expected passes        90776
# of unexpected failures    1317
# of unexpected successes    3
# of expected failures        207
# of unresolved testcases    115
# of unsupported tests        2828

With my patch, calling '-fPIE - mpic-data-text-rel'
         === gcc Summary ===

# of expected passes        90843
# of unexpected failures    1256
# of unexpected successes    3
# of expected failures        207
# of unresolved testcases    115
# of unsupported tests        2853

After running the 'dg-cmp-results.sh' in contrib folder, the PASS->FAIL 
are below:


PASS->FAIL: gcc.dg/uninit-19.c (test for excess errors)
PASS->FAIL: gcc.target/microblaze/others/data_var1.c   -O0   
scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0
PASS->FAIL: gcc.target/microblaze/others/data_var1.c   -O1   
scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0
PASS->FAIL: gcc.target/microblaze/others/data_var1.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none   scan-assembler 
lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0
PASS->FAIL: gcc.target/microblaze/others/data_var1.c   -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects   scan-assembler 
lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0
PASS->FAIL: gcc.target/microblaze/others/data_var1.c   -O2   
scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0
PASS->FAIL: gcc.target/microblaze/others/data_var1.c   -O3 -g   
scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0
PASS->FAIL: gcc.target/microblaze/others/data_var1.c   -Os   
scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0
PASS->FAIL: gcc.target/microblaze/others/data_var2.c   -O0   
scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0
PASS->FAIL: gcc.target/microblaze/others/data_var2.c   -O1   
scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0
PASS->FAIL: gcc.target/microblaze/others/data_var2.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none   scan-assembler 
lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0
PASS->FAIL: gcc.target/microblaze/others/data_var2.c   -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects   scan-assembler 
lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0
PASS->FAIL: gcc.target/microblaze/others/data_var2.c   -O2   
scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0
PASS->FAIL: gcc.target/microblaze/others/data_var2.c   -O3 -g   
scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0
PASS->FAIL: gcc.target/microblaze/others/data_var2.c   -Os   
scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0
PASS->FAIL: gcc.target/microblaze/others/sdata_var1.c   -O0   
scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r13
PASS->FAIL: gcc.target/microblaze/others/sdata_var1.c   -O1   
scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r13
PASS->FAIL: gcc.target/microblaze/others/sdata_var1.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none   scan-assembler 
lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r13
PASS->FAIL: gcc.target/microblaze/others/sdata_var1.c   -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects   scan-assembler 
lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r13
PASS->FAIL: gcc.target/microblaze/others/sdata_var1.c   -O2   
scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r13
PASS->FAIL: gcc.target/microblaze/others/sdata_var1.c   -O3 -g   
scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r13
PASS->FAIL: gcc.target/microblaze/others/sdata_var1.c   -Os   
scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r13
PASS->FAIL: gcc.target/microblaze/others/sdata_var2.c   -O0   
scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r13
PASS->FAIL: gcc.target/microblaze/others/sdata_var2.c   -O1   
scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r13
PASS->FAIL: gcc.target/microblaze/others/sdata_var2.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none   scan-assembler 
lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r13
PASS->FAIL: gcc.target/microblaze/others/sdata_var2.c   -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects   scan-assembler 
lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r13
PASS->FAIL: gcc.target/microblaze/others/sdata_var2.c   -O2   
scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r13
PASS->FAIL: gcc.target/microblaze/others/sdata_var2.c   -O3 -g   
scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r13
PASS->FAIL: gcc.target/microblaze/others/sdata_var2.c   -Os   
scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r13
PASS->FAIL: gcc.target/microblaze/others/sdata_var4.c   -O0   
scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r13
PASS->FAIL: gcc.target/microblaze/others/sdata_var4.c   -O1   
scan-assembler lwi\tr([0-9]|[

Re: [PATCH] Fix ICE in match_asm_constraints_1 (PR inline-asm/84941)

2018-03-19 Thread Michael Matz

Hi,

On Mon, 19 Mar 2018, Jakub Jelinek wrote:

> The inline-asm here has "1p" constraint and we end up with
> (plus (frame) (const_int ...))

Blaeh.  Note that "1p" is actually invalid:

--
`0', `1', `2', ... `9'
 An operand that matches the specified operand number is allowed.
 If a digit is used together with letters within the same
 alternative, the digit should come last.
--

Changing the order to "p1" would disable the transformation as well, 
because match_asm_constraints_1() uses strtoul on the constraint start.

But let's say we don't want to go there and reject this form for good 
(though I think we should), so ...

> as input; even when the matching output is a REG, I'm not really
> sure emit_move_insn can handle arbitrary expressions (consider e.g.
> "1X" constraint), and calling reg_overlap_mentioned_p on something other
> than REG/SUBREG/MEM/constant (and a couple of selected other cases)
> will ICE too.  My understanding is that the match_asm_constraints mini-pass
> is an optimization, so we can try to handle cases which make sense, but if
> there is something too difficult to handle we can just punt and let reload
> do its job or error_for_asm if it can't reload it.  Especially when I
> believe such input expressions can be there only when the numeric constraint
> is mixed with some other one, otherwise it should have been a REG or MEM
> or something similar.

... this makes sense.  But I think you're too generous in supporting 
strange inputs:

>if (! REG_P (output)
> || rtx_equal_p (output, input)
> || (GET_MODE (input) != VOIDmode
> -   && GET_MODE (input) != GET_MODE (output)))
> +   && GET_MODE (input) != GET_MODE (output))
> +   || !(REG_P (input) || SUBREG_P (input)
> +|| MEM_P (input) || CONSTANT_P (input)))

I'd only allow REG_P (input) as well, not any of the other forms.


Ciao,
Michael.

Re: [PATCH] Security improvement for dwarf indirect constants (PR sanitizer/78651)

2018-03-19 Thread Richard Biener

On March 19, 2018 9:01:53 PM GMT+01:00, Jakub Jelinek  wrote:
>Hi!
>
>When Maxim posted his patch today, I had a look and found that we emit
>these indirect constants into .data sections, even when they always
>contain
>just some pointer that needs relocation, but shouldn't be changed
>afterwards; after all, the indirect constants have TREE_READONLY set.
>
>The problem is that we use DECL_INITIAL (decl) = decl; and that isn't
>something the section selection code recognizes as valid or reasonable
>initializer.  This patch uses ADDR_EXPR of the decl instead, i.e.
>something
>that needs a relocation if -fpic/-fpie, but can be relro, and in
>position
>dependent binaries can be .rodata from the beginning.  This will make
>it
>harder from somebody trying to change these.
>
>Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK. 

Richard. 

>2018-03-19  Jakub Jelinek  
>
>   PR sanitizer/78651
>   * dwarf2asm.c: Include fold-const.c.
>   (dw2_output_indirect_constant_1): Set DECL_INITIAL (decl) to ADDR_EXPR
>   of decl rather than decl itself.
>
>--- gcc/dwarf2asm.c.jj 2018-02-09 06:44:29.952809199 +0100
>+++ gcc/dwarf2asm.c2018-03-19 17:18:09.82448 +0100
>@@ -33,6 +33,7 @@ along with GCC; see the file COPYING3.
> #include "dwarf2.h"
> #include "function.h"
> #include "emit-rtl.h"
>+#include "fold-const.h"
> 
> #ifndef XCOFF_DEBUGGING_INFO
> #define XCOFF_DEBUGGING_INFO 0
>@@ -954,7 +955,7 @@ dw2_output_indirect_constant_1 (const ch
>   SET_DECL_ASSEMBLER_NAME (decl, id);
>   DECL_ARTIFICIAL (decl) = 1;
>   DECL_IGNORED_P (decl) = 1;
>-  DECL_INITIAL (decl) = decl;
>+  DECL_INITIAL (decl) = build_fold_addr_expr (decl);
>   TREE_READONLY (decl) = 1;
>   TREE_STATIC (decl) = 1;
> 
>
>   Jakub

Re: [PATCH] Fix libsanitizer on i?86-linux with glibc 2.27+ (PR sanitizer/84761)

2018-03-19 Thread Richard Biener

On March 19, 2018 8:57:22 PM GMT+01:00, Jakub Jelinek  wrote:
>Hi!
>
>As mentioned in the PR, libsanitizer cheats and attempts to call
>@GLIBC_PRIVATE functions which are reserved to glibc and can change
>anytime.  They have changed on i?86 in particular in glibc 2.27, where
>internal_function attribute has been dropped, so the functions for some
>reason sadly aren't regparm(3) anymore.
>
>This patch changes libsanitizer, so that when built against glibc 2.26
>and
>earlier, it checks at runtime if glibc 2.27+ is used using a symbol
>that has
>been added ~ a month after the internal_function change, and when built
>against glibc 2.27+, it just assumes it will never be regparm(3).
>
>Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK. 

Richard. 

>I've submitted it to upstream in https://reviews.llvm.org/D44623 too,
>but
>it might take some time to get it through there.
>
>2018-03-19  Jakub Jelinek  
>
>   PR sanitizer/84761
>   * sanitizer_common/sanitizer_linux_libcdep.cc (__GLIBC_PREREQ):
>   Define if not defined.
>   (DL_INTERNAL_FUNCTION): Don't define.
>   (InitTlsSize): For __i386__ if not compiled against glibc 2.27+
>   determine at runtime whether to use regparm(3), stdcall calling
>   convention for older glibcs or normal calling convention for
>   newer glibcs for call to _dl_get_tls_static_info.
>
>---
>libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cc.jj2017-10-19
>13:20:58.972958379 +0200
>+++ libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cc   2018-03-19
>14:05:56.178952563 +0100
>@@ -147,29 +147,44 @@ bool SanitizerGetThreadName(char *name,
> #endif
> }
> 
>+#ifndef __GLIBC_PREREQ
>+#define __GLIBC_PREREQ(x, y) 0
>+#endif
>+
> #if !SANITIZER_FREEBSD && !SANITIZER_ANDROID && !SANITIZER_GO && \
> !SANITIZER_NETBSD
> static uptr g_tls_size;
> 
>-#ifdef __i386__
>-# define DL_INTERNAL_FUNCTION __attribute__((regparm(3), stdcall))
>-#else
>-# define DL_INTERNAL_FUNCTION
>-#endif
>-
> void InitTlsSize() {
> // all current supported platforms have 16 bytes stack alignment
>   const size_t kStackAlign = 16;
>-  typedef void (*get_tls_func)(size_t*, size_t*) DL_INTERNAL_FUNCTION;
>-  get_tls_func get_tls;
>-  void *get_tls_static_info_ptr = dlsym(RTLD_NEXT,
>"_dl_get_tls_static_info");
>-  CHECK_EQ(sizeof(get_tls), sizeof(get_tls_static_info_ptr));
>-  internal_memcpy(&get_tls, &get_tls_static_info_ptr,
>-  sizeof(get_tls_static_info_ptr));
>-  CHECK_NE(get_tls, 0);
>   size_t tls_size = 0;
>   size_t tls_align = 0;
>-  get_tls(&tls_size, &tls_align);
>+  void *get_tls_static_info_ptr = dlsym(RTLD_NEXT,
>"_dl_get_tls_static_info");
>+#if defined(__i386__) && !__GLIBC_PREREQ(2, 27)
>+  /* On i?86, _dl_get_tls_static_info used to be internal_function,
>i.e.
>+ __attribute__((regparm(3), stdcall)) before glibc 2.27 and is
>normal
>+ function in 2.27 and later.  */
>+  if (!dlvsym(RTLD_NEXT, "glob", "GLIBC_2.27")) {
>+typedef void (*get_tls_func)(size_t*, size_t*)
>+  __attribute__((regparm(3), stdcall));
>+get_tls_func get_tls;
>+CHECK_EQ(sizeof(get_tls), sizeof(get_tls_static_info_ptr));
>+internal_memcpy(&get_tls, &get_tls_static_info_ptr,
>+sizeof(get_tls_static_info_ptr));
>+CHECK_NE(get_tls, 0);
>+get_tls(&tls_size, &tls_align);
>+  } else
>+#endif
>+  {
>+typedef void (*get_tls_func)(size_t*, size_t*);
>+get_tls_func get_tls;
>+CHECK_EQ(sizeof(get_tls), sizeof(get_tls_static_info_ptr));
>+internal_memcpy(&get_tls, &get_tls_static_info_ptr,
>+sizeof(get_tls_static_info_ptr));
>+CHECK_NE(get_tls, 0);
>+get_tls(&tls_size, &tls_align);
>+  }
>   if (tls_align < kStackAlign)
> tls_align = kStackAlign;
>   g_tls_size = RoundUpTo(tls_size, tls_align);
>
>   Jakub

Re: [PATCH][PR sanitizer/78651] Incorrect exception handling when catch clause uses local class and PIC and sanitizer are active

2018-03-19 Thread Jakub Jelinek

On Mon, Mar 19, 2018 at 07:28:11PM +0300, Maxim Ostapenko wrote:
> Yes, it works, attaching the patch.

Can you please add a short comment why we do this?  Also, the testcase
needs work, see below.  Ok for trunk with those changes and after a
while for affected release branches as well.

Thanks.

> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/asan/pr78651.C
> @@ -0,0 +1,24 @@
> +// { dg-do run { target fpic } }

Effective target fpic just means that -fpic or -fPIC is supported, nothing
else.  So, you want instead:
// PR sanitizer/78651
// { dg-do run }
// { dg-additional-options "-fpic" { target fpic } }

and verify make check-c++-all RUNTESTFLAGS=asan.exp=pr78651.C fails without
the dwarf2asm.c patch and succeeds with it.

Jakub

Re: [PATCH] Fix ICE in match_asm_constraints_1 (PR inline-asm/84941)

2018-03-19 Thread Jakub Jelinek

On Mon, Mar 19, 2018 at 08:09:00PM +, Michael Matz wrote:
> Hi,
> 
> On Mon, 19 Mar 2018, Jakub Jelinek wrote:
> 
> > The inline-asm here has "1p" constraint and we end up with
> > (plus (frame) (const_int ...))
> 
> Blaeh.  Note that "1p" is actually invalid:
> 
> --
> `0', `1', `2', ... `9'
>  An operand that matches the specified operand number is allowed.
>  If a digit is used together with letters within the same
>  alternative, the digit should come last.
> --
> 
> Changing the order to "p1" would disable the transformation as well, 
> because match_asm_constraints_1() uses strtoul on the constraint start.

Ah, ok, but
  asm volatile ("" : "=m,m" (b), "=r,r" (b) : "1,p" (b));
ICEs the same way, and that should be valid even according to the above
description.

> ... this makes sense.  But I think you're too generous in supporting 
> strange inputs:
> 
> >if (! REG_P (output)
> >   || rtx_equal_p (output, input)
> >   || (GET_MODE (input) != VOIDmode
> > - && GET_MODE (input) != GET_MODE (output)))
> > + && GET_MODE (input) != GET_MODE (output))
> > + || !(REG_P (input) || SUBREG_P (input)
> > +  || MEM_P (input) || CONSTANT_P (input)))
> 
> I'd only allow REG_P (input) as well, not any of the other forms.

I'll try to gather some statistics on what kind of inputs appear there
during bootstrap/regtest and will try to write a few testcases to see
if just || ! REG_P (output) is sufficient or not.

Jakub

[PATCH] Fix ICE in match_asm_constraints_1 (PR inline-asm/84941)

2018-03-19 Thread Jakub Jelinek

Hi!

The inline-asm here has "1p" constraint and we end up with
(plus (frame) (const_int ...))
as input; even when the matching output is a REG, I'm not really
sure emit_move_insn can handle arbitrary expressions (consider e.g.
"1X" constraint), and calling reg_overlap_mentioned_p on something other
than REG/SUBREG/MEM/constant (and a couple of selected other cases)
will ICE too.  My understanding is that the match_asm_constraints mini-pass
is an optimization, so we can try to handle cases which make sense, but if
there is something too difficult to handle we can just punt and let reload
do its job or error_for_asm if it can't reload it.  Especially when I
believe such input expressions can be there only when the numeric constraint
is mixed with some other one, otherwise it should have been a REG or MEM
or something similar.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2018-03-19  Jakub Jelinek  

PR inline-asm/84941
* function.c (match_asm_constraints_1): Don't do the optimization
if input isn't a REG, SUBREG, MEM or constant.

* gcc.dg/pr84941.c: New test.

--- gcc/function.c.jj   2018-02-22 12:37:02.621387697 +0100
+++ gcc/function.c  2018-03-19 11:51:51.429738406 +0100
@@ -6662,7 +6662,9 @@ match_asm_constraints_1 (rtx_insn *insn,
   if (! REG_P (output)
  || rtx_equal_p (output, input)
  || (GET_MODE (input) != VOIDmode
- && GET_MODE (input) != GET_MODE (output)))
+ && GET_MODE (input) != GET_MODE (output))
+ || !(REG_P (input) || SUBREG_P (input)
+  || MEM_P (input) || CONSTANT_P (input)))
continue;
 
   /* We can't do anything if the output is also used as input,
--- gcc/testsuite/gcc.dg/pr84941.c.jj   2018-03-19 11:56:34.086713406 +0100
+++ gcc/testsuite/gcc.dg/pr84941.c  2018-03-19 11:55:47.301717544 +0100
@@ -0,0 +1,10 @@
+/* PR inline-asm/84941 */
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+void
+foo (void)
+{
+  short *b[1] = { 0 };
+  asm volatile ("" : "=m" (b), "=r" (b) : "1p" (b));
+}

Jakub

Re: [PATCH] Avoid UBSAN -fsanitize=enum errors on GCC memmodel enum (PR rtl-optimization/84643)

2018-03-19 Thread Richard Biener

On March 19, 2018 8:39:20 PM GMT+01:00, Jakub Jelinek  wrote:
>Hi!
>
>The enum memmodel doesn't declare the target specific enumerators,
>which are just defines in i386.h, but as those are above any
>enumerators in the generic enum, it is strictly speaking UB when
>values with that enum memmodel type have the higher target dependent
>bits
>set.  This patch just makes sure that the enumerator is full signed
>32-bit.
>
>Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK. 

Richard. 

>2018-03-19  Jakub Jelinek  
>
>   PR rtl-optimization/84643
>   * memmodel.h (enum memmodel): Add MEMMODEL_MAX enumerator.
>
>--- gcc/memmodel.h.jj  2018-01-03 10:19:55.495534012 +0100
>+++ gcc/memmodel.h 2018-03-19 10:31:42.980812157 +0100
>@@ -45,7 +45,9 @@ enum memmodel
>   MEMMODEL_LAST = 6,
>   MEMMODEL_SYNC_ACQUIRE = MEMMODEL_ACQUIRE | MEMMODEL_SYNC,
>   MEMMODEL_SYNC_RELEASE = MEMMODEL_RELEASE | MEMMODEL_SYNC,
>-  MEMMODEL_SYNC_SEQ_CST = MEMMODEL_SEQ_CST | MEMMODEL_SYNC
>+  MEMMODEL_SYNC_SEQ_CST = MEMMODEL_SEQ_CST | MEMMODEL_SYNC,
>+  /* Say that all the higher bits are valid target extensions.  */
>+  MEMMODEL_MAX = INTTYPE_MAXIMUM (int)
> };
> 
> /* Return the memory model from a host integer.  */
>
>   Jakub

[PATCH] Fix libsanitizer on i?86-linux with glibc 2.27+ (PR sanitizer/84761)

2018-03-19 Thread Jakub Jelinek

Hi!

As mentioned in the PR, libsanitizer cheats and attempts to call
@GLIBC_PRIVATE functions which are reserved to glibc and can change
anytime.  They have changed on i?86 in particular in glibc 2.27, where
internal_function attribute has been dropped, so the functions for some
reason sadly aren't regparm(3) anymore.

This patch changes libsanitizer, so that when built against glibc 2.26 and
earlier, it checks at runtime if glibc 2.27+ is used using a symbol that has
been added ~ a month after the internal_function change, and when built
against glibc 2.27+, it just assumes it will never be regparm(3).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

I've submitted it to upstream in https://reviews.llvm.org/D44623 too, but
it might take some time to get it through there.

2018-03-19  Jakub Jelinek  

PR sanitizer/84761
* sanitizer_common/sanitizer_linux_libcdep.cc (__GLIBC_PREREQ):
Define if not defined.
(DL_INTERNAL_FUNCTION): Don't define.
(InitTlsSize): For __i386__ if not compiled against glibc 2.27+
determine at runtime whether to use regparm(3), stdcall calling
convention for older glibcs or normal calling convention for
newer glibcs for call to _dl_get_tls_static_info.

--- libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cc.jj 2017-10-19 
13:20:58.972958379 +0200
+++ libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cc2018-03-19 
14:05:56.178952563 +0100
@@ -147,29 +147,44 @@ bool SanitizerGetThreadName(char *name,
 #endif
 }
 
+#ifndef __GLIBC_PREREQ
+#define __GLIBC_PREREQ(x, y) 0
+#endif
+
 #if !SANITIZER_FREEBSD && !SANITIZER_ANDROID && !SANITIZER_GO && \
 !SANITIZER_NETBSD
 static uptr g_tls_size;
 
-#ifdef __i386__
-# define DL_INTERNAL_FUNCTION __attribute__((regparm(3), stdcall))
-#else
-# define DL_INTERNAL_FUNCTION
-#endif
-
 void InitTlsSize() {
 // all current supported platforms have 16 bytes stack alignment
   const size_t kStackAlign = 16;
-  typedef void (*get_tls_func)(size_t*, size_t*) DL_INTERNAL_FUNCTION;
-  get_tls_func get_tls;
-  void *get_tls_static_info_ptr = dlsym(RTLD_NEXT, "_dl_get_tls_static_info");
-  CHECK_EQ(sizeof(get_tls), sizeof(get_tls_static_info_ptr));
-  internal_memcpy(&get_tls, &get_tls_static_info_ptr,
-  sizeof(get_tls_static_info_ptr));
-  CHECK_NE(get_tls, 0);
   size_t tls_size = 0;
   size_t tls_align = 0;
-  get_tls(&tls_size, &tls_align);
+  void *get_tls_static_info_ptr = dlsym(RTLD_NEXT, "_dl_get_tls_static_info");
+#if defined(__i386__) && !__GLIBC_PREREQ(2, 27)
+  /* On i?86, _dl_get_tls_static_info used to be internal_function, i.e.
+ __attribute__((regparm(3), stdcall)) before glibc 2.27 and is normal
+ function in 2.27 and later.  */
+  if (!dlvsym(RTLD_NEXT, "glob", "GLIBC_2.27")) {
+typedef void (*get_tls_func)(size_t*, size_t*)
+  __attribute__((regparm(3), stdcall));
+get_tls_func get_tls;
+CHECK_EQ(sizeof(get_tls), sizeof(get_tls_static_info_ptr));
+internal_memcpy(&get_tls, &get_tls_static_info_ptr,
+sizeof(get_tls_static_info_ptr));
+CHECK_NE(get_tls, 0);
+get_tls(&tls_size, &tls_align);
+  } else
+#endif
+  {
+typedef void (*get_tls_func)(size_t*, size_t*);
+get_tls_func get_tls;
+CHECK_EQ(sizeof(get_tls), sizeof(get_tls_static_info_ptr));
+internal_memcpy(&get_tls, &get_tls_static_info_ptr,
+sizeof(get_tls_static_info_ptr));
+CHECK_NE(get_tls, 0);
+get_tls(&tls_size, &tls_align);
+  }
   if (tls_align < kStackAlign)
 tls_align = kStackAlign;
   g_tls_size = RoundUpTo(tls_size, tls_align);

Jakub

Re: [PATCH] Fix ICE in match_asm_constraints_1 (PR inline-asm/84941)

2018-03-19 Thread Michael Matz

Hi,

On Mon, 19 Mar 2018, Jakub Jelinek wrote:

> > Blaeh.  Note that "1p" is actually invalid:
> > 
> > --
> > `0', `1', `2', ... `9'
> >  An operand that matches the specified operand number is allowed.
> >  If a digit is used together with letters within the same
> >  alternative, the digit should come last.
> > --
> > 
> > Changing the order to "p1" would disable the transformation as well, 
> > because match_asm_constraints_1() uses strtoul on the constraint start.
> 
> Ah, ok, but
>   asm volatile ("" : "=m,m" (b), "=r,r" (b) : "1,p" (b));
> ICEs the same way, and that should be valid even according to the above
> description.

Yes that's valid and shouldn't ICE.

> > ... this makes sense.  But I think you're too generous in supporting 
> > strange inputs:
> > 
> > >if (! REG_P (output)
> > > || rtx_equal_p (output, input)
> > > || (GET_MODE (input) != VOIDmode
> > > -   && GET_MODE (input) != GET_MODE (output)))
> > > +   && GET_MODE (input) != GET_MODE (output))
> > > +   || !(REG_P (input) || SUBREG_P (input)
> > > +|| MEM_P (input) || CONSTANT_P (input)))
> > 
> > I'd only allow REG_P (input) as well, not any of the other forms.
> 
> I'll try to gather some statistics on what kind of inputs appear there
> during bootstrap/regtest and will try to write a few testcases to see
> if just || ! REG_P (output) is sufficient or not.

I think you don't need to try too hard.  The function is designed to 
handle the situation where the matching constraint is a result from an 
input-output constraint, not for improving arbitrary matching constraints.
As such the expected situation is that input and output are lvalues, and 
hence (when their type is anything sensible) gimple registers, and 
therefore pseudos at RTL time.

You could basically reject any constraint that isn't just a single integer 
(i.e. anything with not only digits after the optional '%') and still 
handle the sitatuations for which this function was invented.  I.e. like 
this:

Index: function.c
===
--- function.c  (revision 254008)
+++ function.c  (working copy)
@@ -6605,7 +6605,7 @@ match_asm_constraints_1 (rtx_insn *insn,
constraint++;

   match = strtoul (constraint, &end, 10);
-  if (end == constraint)
+  if (end == constraint || *end)
continue;

   gcc_assert (match < noutputs);

Ciao,
Michael.

Re: [PATCH] Avoid compiler UB in store-merging (PR tree-optimization/84946)

2018-03-19 Thread Richard Biener

On March 19, 2018 9:05:23 PM GMT+01:00, Jakub Jelinek  wrote:
>Hi!
>
>In this code, both bitpos and bitsize are poly_int64, but we know that
>bitpos is >= 0 and bitsize.  On the testcase mentioned in the PR (but
>even
>without -mavx512f, so we already invoke it during testing) we would
>overflow
>the computation, 0x7ffsomething + 0x20 (and then cast
>it
>to poly_uint64).  This patch fixes it by casting to poly_uint64
>earlier.
>
>Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK. 

Richard. 

>2018-03-19  Jakub Jelinek  
>
>   PR tree-optimization/84946
>   * gimple-ssa-store-merging.c (mem_valid_for_store_merging): Compute
>   bitsize + bitsize in poly_uint64 rather than poly_int64.
>
>--- gcc/gimple-ssa-store-merging.c.jj  2018-02-22 09:28:07.0
>+0100
>+++ gcc/gimple-ssa-store-merging.c 2018-03-19 17:55:22.486472527 +0100
>@@ -3948,7 +3948,8 @@ mem_valid_for_store_merging (tree mem, p
>   if (known_eq (bitregion_end, 0U))
> {
>   bitregion_start = round_down_to_byte_boundary (bitpos);
>-  bitregion_end = round_up_to_byte_boundary (bitpos + bitsize);
>+  bitregion_end = bitpos;
>+  bitregion_end = round_up_to_byte_boundary (bitregion_end +
>bitsize);
> }
> 
>   if (offset != NULL_TREE)
>
>   Jakub

[wwwdocs] Update gcc-8/changes.html for some IPA and x86 canges

2018-03-19 Thread Jan Hubicka

Hi,
this patch adds some documentation for what is new in IPA and x86.
For lto we should mention early-debug.  Richard, perhaps you can suggest
wording?

Honza

Index: changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-8/changes.html,v
retrieving revision 1.41
diff -u -r1.41 changes.html
--- changes.html12 Mar 2018 17:26:20 -  1.41
+++ changes.html19 Mar 2018 21:07:07 -
@@ -42,9 +42,32 @@
 
 General Improvements
 
-  The ipa-pure-const pass is extended to propagate malloc attribute, and 
the
-  corresponding warning option Wsuggest-attribute=malloc emits a
-  diagnostic for a function, which can be annotated with malloc attribute.
+  Inter-procedural optimization improvements:
+  
+Reworked runtime estimation metrics leading to more realistic guesses
+   driving inliner and clonning heuristics.
+The ipa-pure-const pass is extended to propagate malloc attribute, and 
the
+   corresponding warning option Wsuggest-attribute=malloc 
emits a
+   diagnostic for a function, which can be annotated with malloc 
attribute.
+  
+  Profile driven optimization improvements:
+  
+New infrastructure for representing profile (both statically guessed
+   and profile feedback) which allows propagation of furhter information
+   about reliablility of the profile.
+Number of improvements in profile updating code solving problems
+   found by new verification code.
+Static detection of code which is not executed in valid run of the
+   program. This includes paths which triggers undefined behaviour
+   as well as call to functions declared with cold attribute.
+   Newly noreturn attribute does not imply all effects of
+   cold to make difference between exit (which
+   is noreturn/code> and abort (which is in addition
+   not executed in valid runs).
+-freorder-blocks-and-partition, a pass splitting function
+   bodies into hot and cold regions, is now enabled by default at 
-O2
+   and higher for i386.
+  
   
 A new option -fcf-protection=[full|branch|return|none] is
 introduced to perform code instrumentation to increase program security by
@@ -435,6 +458,11 @@
 
   
 The x86 port now supports the naked function attribute.
+  
+Better tuning for znver1 and Intel Core based CPUs.
+  
+Vectorization costs metrics has been reworked leading to significant 
improvments
+on some benchmarks
   GCC now supports the Intel CPU named Cannonlake through
 -march=cannonlake. The switch enables the AVX512VBMI,
 AVX512IFMA and SHA ISA extensions.

Re: Use SCEV information when aligning for vectorisation (PR 84005)

2018-03-19 Thread Richard Sandiford

Richard Biener  writes:
> On Sat, Mar 17, 2018 at 11:45 AM, Richard Sandiford
>  wrote:
>> Index: gcc/tree-data-ref.c
>> ===
>> --- gcc/tree-data-ref.c 2018-02-14 13:14:36.268006810 +
>> +++ gcc/tree-data-ref.c 2018-03-17 10:43:42.544669549 +
>> @@ -5200,6 +5200,75 @@ dr_alignment (innermost_loop_behavior *d
>>return alignment;
>>  }
>>
>> +/* If BASE is a pointer-typed SSA name, try to find the object that it
>> +   is based on.  Return this object X on success and store the alignment
>> +   in bytes of BASE - &X in *ALIGNMENT_OUT.  */
>> +
>> +static tree
>> +get_base_for_alignment_1 (tree base, unsigned int *alignment_out)
>> +{
>> +  if (TREE_CODE (base) != SSA_NAME || !POINTER_TYPE_P (TREE_TYPE (base)))
>> +return NULL_TREE;
>> +
>> +  gimple *def = SSA_NAME_DEF_STMT (base);
>> +  base = analyze_scalar_evolution (loop_containing_stmt (def), base);
>
> I think it is an error to use the def stmt of base here.  Instead you
> need to pass down the point/loop of the use.  For the same reason you
> also want to instantiate parameters after analyzing the evolution.
>
> In the end, if the BB to be vectorized is contained in a loop nest we want to
> instantiate up to the point where (eventually) a DECL we can re-align appears,
> but using the containing loop for now looks OK.

Why's that necessary/better though?  We're not interested in the
evolution of the value at the point it*s used by the potential vector
code, but in how we get from the ultimate base (if there is one) to the
definition of the SSA name.

I don't think instantiating the SCEV would give any new information,
but it could lose some.  E.g. if we have:

  x = &foo;
  do
x += 8;
  while (*y++ < 10)
  ...potential vector use of *x...

we wouldn't be able to express the address of x after the do-while
loop, because there's nothing that counts the number of iterations.
But the uninstantiated SCEV could tell us that x = &foo + N * 8 for
some (unknown) N.

(OK, so that doesn't actually work unless we take the effort
to look through loop-closed SSA form, but still...)

>> +  /* Peel chrecs and record the minimum alignment preserved by
>> + all steps.  */
>> +  unsigned int alignment = MAX_OFILE_ALIGNMENT / BITS_PER_UNIT;
>> +  while (TREE_CODE (base) == POLYNOMIAL_CHREC)
>> +{
>> +  unsigned int step_alignment = highest_pow2_factor (CHREC_RIGHT 
>> (base));
>> +  alignment = MIN (alignment, step_alignment);
>> +  base = CHREC_LEFT (base);
>> +}
>> +
>> +  /* Punt if the expression is too complicated to handle.  */
>> + if (tree_contains_chrecs (base, NULL) || !POINTER_TYPE_P (TREE_TYPE
>> (base)))
>> +return NULL_TREE;
>> +
>> +  /* Analyze the base to which the steps we peeled were applied.  */
>> +  poly_int64 bitsize, bitpos, bytepos;
>> +  machine_mode mode;
>> +  int unsignedp, reversep, volatilep;
>> +  tree offset;
>> +  base = get_inner_reference (build_fold_indirect_ref (base),
>
> ick :/
>
> what happens if you simply use get_pointer_alignment here and
> strip down POINTER_PLUS_EXPRs to the ultimate LHS?  (basically
> what get_pointer_alignment_1 does)  After all get_base_for_alignment_1
> itself only deals with plain SSA_NAMEs, not, say, &a_1->b.c or &MEM[a_1 + 4].

Yeah, but those have (hopefully) been handled by dr_analyze_innermost
already, which should have stripped off both the constant and variable
parts of the offset.  So I think SSA names are the only interesting
input.  The problem is that once we follow the definitions of an SSA
name through CHREC_LEFTs, we get a general address again.

get_pointer_alignment and get_pointer_alignment_1 don't do what we want
because they take the alignment of the existing object into account,
whereas here we explicitly want to ignore that and only calculate the
alignment of the offset.

I guess we could pass a flag down to ignore the current alignment though?

Thanks,
Richard

Re: [wwwdocs] Update gcc-8/changes.html for some IPA and x86 canges

2018-03-19 Thread Gerald Pfeifer

Hi Honza,

thanks for putting this together.  Below you'll find a number of
simple suggestions.  This patch is fine if you make those changes;
no need to review again, though posting the final patch would be
good.


On Mon, 19 Mar 2018, Jan Hubicka wrote:
> +Reworked runtime estimation metrics leading to more realistic guesses
> + driving inliner and clonning heuristics.

"run-time" (since it's an adjective here)

"cloning"

> +The ipa-pure-const pass is extended to propagate malloc attribute, 
> and the

"the malloc attribute", i.e., add "the" and markup.

> + corresponding warning option Wsuggest-attribute=malloc 
> emits a
> + diagnostic for a function, which can be annotated with malloc 
> attribute.

"with the malloc"

> +New infrastructure for representing profile (both statically guessed
> + and profile feedback) which allows propagation of furhter information
> + about reliablility of the profile.

"representing profiles"?

"further" (typo)

"about the reliability"

> +Number of improvements in profile updating code solving problems
> + found by new verification code.

"A number of..."

"in the profile updating code"

"> +Static detection of code which is not executed in valid run of the

"in a valid run"

> + program. This includes paths which triggers undefined behaviour

"trigger" (plural)

And "behavior" (American English)

> + as well as call to functions declared with cold attribute.

"calls" (plural)

"declared with the" (add article)

> + Newly noreturn attribute does not imply all effects of
> + cold to make difference between exit (which
> + is noreturn/code> and abort (which is in addition
> + not executed in valid runs).

"The new...attribute", or did you mean "Newly the ...attribute"?

"to make a difference" or better "differentiate"?

And closing ")" in the description of exit.

> +  
> +Vectorization costs metrics has been reworked leading to significant 
> improvments
> +on some benchmarks

"cost metrics"

And "." at the very end.

Cheers,
Gerald

[PATCH] PR fortran/42651 -- Fix accepts-invalid

2018-03-19 Thread Steve Kargl

The attached patch fixes an accepts-invalid while enforcing

  C1560 (R1530) If RESULT appears, the function-name shall not appear
  in any specification statement in the scoping unit of the function
  subprogram.

Regression tested on x86_64-*-freebsd.  OK to commit?

2018-03-19  Steven G. Kargl  

PR fortran/42651
* decl.c (check_function_name): Improved error message
(gfc_match_volatile, gfc_match_asynchronous) Use check_function_name.

2018-03-19  Steven G. Kargl  

PR fortran/42651
* gfortran.dg/pr42651.f90: New test.
* gfortran.df/func_result_7.f90: Update for new error message.
-- 
Steve
Index: gcc/fortran/decl.c
===
--- gcc/fortran/decl.c	(revision 258646)
+++ gcc/fortran/decl.c	(working copy)
@@ -2242,7 +2242,9 @@ check_function_name (char *name)
 	  && strcmp (block->result->name, "ppr@") != 0
 	  && strcmp (block->name, name) == 0)
 	{
-	  gfc_error ("Function name %qs not allowed at %C", name);
+	  gfc_error ("RESULT variable %qs at %L prohibits FUNCTION name %qs at %C "
+		 "from appearing in a specification statement",
+		 block->result->name, &block->result->declared_at, name);
 	  return false;
 	}
 }
@@ -9091,6 +9093,7 @@ match
 gfc_match_volatile (void)
 {
   gfc_symbol *sym;
+  char *name;
   match m;
 
   if (!gfc_notify_std (GFC_STD_F2003, "VOLATILE statement at %C"))
@@ -9112,6 +9115,10 @@ gfc_match_volatile (void)
   switch (m)
 	{
 	case MATCH_YES:
+	  name = XCNEWVAR (char, strlen (sym->name) + 1);
+	  strcpy (name, sym->name);
+	  if (!check_function_name (name))
+	return MATCH_ERROR;
 	  /* F2008, C560+C561. VOLATILE for host-/use-associated variable or
 	 for variable in a BLOCK which is defined outside of the BLOCK.  */
 	  if (sym->ns != gfc_current_ns && sym->attr.codimension)
@@ -9150,6 +9157,7 @@ match
 gfc_match_asynchronous (void)
 {
   gfc_symbol *sym;
+  char *name;
   match m;
 
   if (!gfc_notify_std (GFC_STD_F2003, "ASYNCHRONOUS statement at %C"))
@@ -9171,6 +9179,10 @@ gfc_match_asynchronous (void)
   switch (m)
 	{
 	case MATCH_YES:
+	  name = XCNEWVAR (char, strlen (sym->name) + 1);
+	  strcpy (name, sym->name);
+	  if (!check_function_name (name))
+	return MATCH_ERROR;
 	  if (!gfc_add_asynchronous (&sym->attr, sym->name, &gfc_current_locus))
 	return MATCH_ERROR;
 	  goto next_item;
Index: gcc/testsuite/gfortran.dg/pr42651.f90
===
--- gcc/testsuite/gfortran.dg/pr42651.f90	(nonexistent)
+++ gcc/testsuite/gfortran.dg/pr42651.f90	(working copy)
@@ -0,0 +1,24 @@
+! { dg-do compile }
+! PR fortran/42651
+integer function func()
+  asynchronous :: func
+  integer, asynchronous:: b
+  allocatable :: c
+  volatile :: func
+  type t
+sequence
+integer :: i = 5
+  end type t
+end function func
+
+function func2() result(res) ! { dg-error " RESULT variable" }
+  volatile res
+  asynchronous res
+  target func2! { dg-error " RESULT variable" }
+  volatile func2  ! { dg-error " RESULT variable" }
+  asynchronous func2  ! { dg-error " RESULT variable" }
+  allocatable func2   ! { dg-error " RESULT variable" }
+  dimension func2(2)  ! { dg-error " RESULT variable" }
+  codimension func2[*]! { dg-error " RESULT variable" }
+  contiguous func2! { dg-error " RESULT variable" }
+end function func2 
Index: gcc/testsuite/gfortran.dg/func_result_7.f90
===
--- gcc/testsuite/gfortran.dg/func_result_7.f90	(revision 258646)
+++ gcc/testsuite/gfortran.dg/func_result_7.f90	(working copy)
@@ -4,8 +4,8 @@
 !
 ! Contributed by Vittorio Zecca 
 
-function fun() result(f)
-  pointer fun   ! { dg-error "not allowed" }
-  dimension fun(1)  ! { dg-error "not allowed" }
+function fun() result(f)  ! { dg-error "RESULT variable" } 
+  pointer fun ! { dg-error "RESULT variable" }
+  dimension fun(1)! { dg-error "RESULT variable" }
   f=0
 end

Re: [PATCH] [Microblaze]: PIC Data Text Relative

2018-03-19 Thread Michael Eager


Hi Andrew --

Please take a look at the test case description:
https://gcc.gnu.org/wiki/HowToPrepareATestcase
and see if you can do one of the following:
  - Modify the regex expression in the scan-assembler to accept
either format of generated output
or
  - Add { dg-option } directives to turn off your new options
if specified.  (You should be able to specify -mno-pic)
or
  - Duplicate the test cases and add { dg-option } directives
to specify the correct options, with and without your
new options.
or
  - Add test cases with a { dg-option } directive with your
new options.

This is not required -- your patch appears to work OK.  Normally,
the new PIC Data options would not be used when running the test
suite, so the tests would not fail.  It's just nice to have the
test suite updated when new options are added.

On 03/19/2018 01:07 PM, Michael Eager wrote:

Hi Andrew --

Good work.

Please submit your updated patch.  Check that you follow
GNU coding standards.  Also make sure that the new options
are documented in gcc/doc/invoke.texi.

On 03/18/18 03:27, Andrew Sadek wrote:

Hello Michael,

I have run the test using the new PIC options.
Actually, I have discovered 2 unhandled cases in 
'microblaze_expand_move' + missing conditions in linker relax

leading some test cases execution to fail.
After fixing them, I made a re-run for the whole regression, and the 
results analogy below:


Original, without my patch:
     === gcc Summary ===

# of expected passes        90776
# of unexpected failures    1317
# of unexpected successes    3
# of expected failures        207
# of unresolved testcases    115
# of unsupported tests        2828

With my patch, calling '-fPIE - mpic-data-text-rel'
     === gcc Summary ===

# of expected passes        90843
# of unexpected failures    1256
# of unexpected successes    3
# of expected failures        207
# of unresolved testcases    115
# of unsupported tests        2853

After running the 'dg-cmp-results.sh' in contrib folder, the 
PASS->FAIL are below:


PASS->FAIL: gcc.dg/uninit-19.c (test for excess errors)
PASS->FAIL: gcc.target/microblaze/others/data_var1.c   -O0 
scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0
PASS->FAIL: gcc.target/microblaze/others/data_var1.c   -O1 
scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0
PASS->FAIL: gcc.target/microblaze/others/data_var1.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none   scan-assembler 
lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0
PASS->FAIL: gcc.target/microblaze/others/data_var1.c   -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects   scan-assembler 
lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0
PASS->FAIL: gcc.target/microblaze/others/data_var1.c   -O2 
scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0
PASS->FAIL: gcc.target/microblaze/others/data_var1.c   -O3 -g 
scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0
PASS->FAIL: gcc.target/microblaze/others/data_var1.c   -Os 
scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0
PASS->FAIL: gcc.target/microblaze/others/data_var2.c   -O0 
scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0
PASS->FAIL: gcc.target/microblaze/others/data_var2.c   -O1 
scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0
PASS->FAIL: gcc.target/microblaze/others/data_var2.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none   scan-assembler 
lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0
PASS->FAIL: gcc.target/microblaze/others/data_var2.c   -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects   scan-assembler 
lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0
PASS->FAIL: gcc.target/microblaze/others/data_var2.c   -O2 
scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0
PASS->FAIL: gcc.target/microblaze/others/data_var2.c   -O3 -g 
scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0
PASS->FAIL: gcc.target/microblaze/others/data_var2.c   -Os 
scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0
PASS->FAIL: gcc.target/microblaze/others/sdata_var1.c   -O0 
scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r13
PASS->FAIL: gcc.target/microblaze/others/sdata_var1.c   -O1 
scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r13
PASS->FAIL: gcc.target/microblaze/others/sdata_var1.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none   scan-assembler 
lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r13
PASS->FAIL: gcc.target/microblaze/others/sdata_var1.c   -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects   scan-assembler 
lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r13
PASS->FAIL: gcc.target/microblaze/others/sdata_var1.c   -O2 
scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r13
PASS->FAIL: gcc.target/microblaze/others/sdata_var1.c   -O3 -g 
scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r13
PASS->FAIL: gcc.target/microblaze/others/sdata_var1.c   -Os 
scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r13
PASS->FAIL: gcc.target/microblaze/others/sdata_var2.c   -O0 
scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r13
PASS->FAIL: gcc.target/microblaze/others/sdata_var2.c   -O1 
scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r13
PASS->FAIL: gcc.t

Re: [PATCH] [Microblaze]: PIC Data Text Relative

2018-03-19 Thread Michael Eager


Also check the { dg-skip-if } directive.
https://gcc.gnu.org/onlinedocs/gccint/Directives.html

On 03/19/2018 06:14 PM, Michael Eager wrote:

Hi Andrew --

Please take a look at the test case description:
https://gcc.gnu.org/wiki/HowToPrepareATestcase
and see if you can do one of the following:
   - Modify the regex expression in the scan-assembler to accept
     either format of generated output
or
   - Add { dg-option } directives to turn off your new options
     if specified.  (You should be able to specify -mno-pic)
or
   - Duplicate the test cases and add { dg-option } directives
     to specify the correct options, with and without your
     new options.
or
   - Add test cases with a { dg-option } directive with your
     new options.

This is not required -- your patch appears to work OK.  Normally,
the new PIC Data options would not be used when running the test
suite, so the tests would not fail.  It's just nice to have the
test suite updated when new options are added.

On 03/19/2018 01:07 PM, Michael Eager wrote:

Hi Andrew --

Good work.

Please submit your updated patch.  Check that you follow
GNU coding standards.  Also make sure that the new options
are documented in gcc/doc/invoke.texi.

On 03/18/18 03:27, Andrew Sadek wrote:

Hello Michael,

I have run the test using the new PIC options.
Actually, I have discovered 2 unhandled cases in 
'microblaze_expand_move' + missing conditions in linker relax

leading some test cases execution to fail.
After fixing them, I made a re-run for the whole regression, and the 
results analogy below:


Original, without my patch:
     === gcc Summary ===

# of expected passes        90776
# of unexpected failures    1317
# of unexpected successes    3
# of expected failures        207
# of unresolved testcases    115
# of unsupported tests        2828

With my patch, calling '-fPIE - mpic-data-text-rel'
     === gcc Summary ===

# of expected passes        90843
# of unexpected failures    1256
# of unexpected successes    3
# of expected failures        207
# of unresolved testcases    115
# of unsupported tests        2853

After running the 'dg-cmp-results.sh' in contrib folder, the 
PASS->FAIL are below:


PASS->FAIL: gcc.dg/uninit-19.c (test for excess errors)
PASS->FAIL: gcc.target/microblaze/others/data_var1.c   -O0 
scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0
PASS->FAIL: gcc.target/microblaze/others/data_var1.c   -O1 
scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0
PASS->FAIL: gcc.target/microblaze/others/data_var1.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none   scan-assembler 
lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0
PASS->FAIL: gcc.target/microblaze/others/data_var1.c   -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects   scan-assembler 
lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0
PASS->FAIL: gcc.target/microblaze/others/data_var1.c   -O2 
scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0
PASS->FAIL: gcc.target/microblaze/others/data_var1.c   -O3 -g 
scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0
PASS->FAIL: gcc.target/microblaze/others/data_var1.c   -Os 
scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0
PASS->FAIL: gcc.target/microblaze/others/data_var2.c   -O0 
scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0
PASS->FAIL: gcc.target/microblaze/others/data_var2.c   -O1 
scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0
PASS->FAIL: gcc.target/microblaze/others/data_var2.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none   scan-assembler 
lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0
PASS->FAIL: gcc.target/microblaze/others/data_var2.c   -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects   scan-assembler 
lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0
PASS->FAIL: gcc.target/microblaze/others/data_var2.c   -O2 
scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0
PASS->FAIL: gcc.target/microblaze/others/data_var2.c   -O3 -g 
scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0
PASS->FAIL: gcc.target/microblaze/others/data_var2.c   -Os 
scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0
PASS->FAIL: gcc.target/microblaze/others/sdata_var1.c   -O0 
scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r13
PASS->FAIL: gcc.target/microblaze/others/sdata_var1.c   -O1 
scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r13
PASS->FAIL: gcc.target/microblaze/others/sdata_var1.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none   scan-assembler 
lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r13
PASS->FAIL: gcc.target/microblaze/others/sdata_var1.c   -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects   scan-assembler 
lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r13
PASS->FAIL: gcc.target/microblaze/others/sdata_var1.c   -O2 
scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r13
PASS->FAIL: gcc.target/microblaze/others/sdata_var1.c   -O3 -g 
scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r13
PASS->FAIL: gcc.target/microblaze/others/sdata_var1.c   -Os 
scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r13
PASS->FAIL: gcc.target/microblaze/others/sdata_var2.c   -O0 
scan-assembler lwi\tr([0-9]|[

[PATCH] relax -Wclass-memaccess for class members (PR 84850)

2018-03-19 Thread Martin Sebor


The -Wclass-memaccess warning makes an exception for ctors and
dtors of non-trivial classes with no bases to avoid triggering
for uses of raw memory functions with this as the destination.
All other members as well as non-members trigger the warning.

In bug 84850 the reporter shows that some code (squid in this
case) calls memset(this, 0. ...) in both the copy assignment
operator and the copy ctor as a short-hand to clear all trivial
members without having to explicitly enumerate them.  A similar
idiom has been seen in Firefox (some of the bugs linked from
https://bugzilla.mozilla.org/show_bug.cgi?id=1411029).

Since calling memset(this, 0, sizeof *this) from any non-static
member of a non-trivial class containing no non-trivial members
is safe, the attached patch relaxes the warning to allow this
idiom.

I also noticed that the exemption for ctors and dtors is overly
permissive and causes the warning not to trigger for classes with
no bases or virtual functions but containing non-trivial members.
This is bug 84851.  Jakub suggested to fix just 84850 for GCC 8
and defer 84851 to GCC 9, so the patch follows that suggestion.
Fixing the latter is a matter of removing the block in
warn_for_restrict() with the FIXME comment above it.

Martin
PR c++/84850 - -Wclass-memaccess on a memcpy in a copy assignment operator with no nontrivial bases or members

gcc/cp/ChangeLog:

	PR c++/84850
	* call.c (first_non_public_field): New template and function.
	(first_non_trivial_field): New function.
	(maybe_warn_class_memaccess): Call them.

gcc/testsuite/ChangeLog:

	PR c++/84850
	* g++.dg/Wclass-memaccess-3.C: New test.
	* g++.dg/Wclass-memaccess-4.C: New test.

Index: gcc/cp/call.c
===
--- gcc/cp/call.c	(revision 258646)
+++ gcc/cp/call.c	(working copy)
@@ -8261,13 +8261,17 @@ build_over_call (struct z_candidate *cand, int fla
   return call;
 }
 
-/* Return the DECL of the first non-public data member of class TYPE
-   or null if none can be found.  */
+namespace
+{
 
-static tree
-first_non_public_field (tree type)
+/* Return the DECL of the first non-static subobject of class TYPE
+   that satisfies the predicate PRED or null if none can be found.  */
+
+template 
+tree
+first_non_static_field (tree type, Predicate pred)
 {
-  if (!CLASS_TYPE_P (type))
+  if (!type || !CLASS_TYPE_P (type))
 return NULL_TREE;
 
   for (tree field = TYPE_FIELDS (type); field; field = DECL_CHAIN (field))
@@ -8276,7 +8280,7 @@ build_over_call (struct z_candidate *cand, int fla
 	continue;
   if (TREE_STATIC (field))
 	continue;
-  if (TREE_PRIVATE (field) || TREE_PROTECTED (field))
+  if (pred (field))
 	return field;
 }
 
@@ -8286,8 +8290,9 @@ build_over_call (struct z_candidate *cand, int fla
BINFO_BASE_ITERATE (binfo, i, base_binfo); i++)
 {
   tree base = TREE_TYPE (base_binfo);
-
-  if (tree field = first_non_public_field (base))
+  if (pred (base))
+	return base;
+  if (tree field = first_non_static_field (base, pred))
 	return field;
 }
 
@@ -8294,6 +8299,42 @@ build_over_call (struct z_candidate *cand, int fla
   return NULL_TREE;
 }
 
+struct NonPublicField
+{
+  bool operator() (const_tree t)
+  {
+return DECL_P (t) && (TREE_PRIVATE (t) || TREE_PROTECTED (t));
+  }
+};
+
+/* Return the DECL of the first non-public subobject of class TYPE
+   or null if none can be found.  */
+
+static inline tree
+first_non_public_field (tree type)
+{
+  return first_non_static_field (type, NonPublicField ());
+}
+
+struct NonTrivialField
+{
+  bool operator() (const_tree t)
+  {
+return !trivial_type_p (DECL_P (t) ? TREE_TYPE (t) : t);
+  }
+};
+
+/* Return the DECL of the first non-trivial subobject of class TYPE
+   or null if none can be found.  */
+
+static inline tree
+first_non_trivial_field (tree type)
+{
+  return first_non_static_field (type, NonTrivialField ());
+}
+
+}   /* unnamed namespace */
+
 /* Return true if all copy and move assignment operator overloads for
class TYPE are trivial and at least one of them is not deleted and,
when ACCESS is set, accessible.  Return false otherwise.  Set
@@ -8419,23 +8460,31 @@ maybe_warn_class_memaccess (location_t loc, tree f
   if (!desttype || !COMPLETE_TYPE_P (desttype) || !CLASS_TYPE_P (desttype))
 return;
 
-  /* Check to see if the raw memory call is made by a ctor or dtor
- with this as the destination argument for the destination type.
- If so, be more permissive.  */
+  /* Check to see if the raw memory call is made by a non-static member
+ function with THIS as the destination argument for the destination
+ type.  If so, and if the class has no non-trivial bases or members,
+ be more permissive.  */
   if (current_function_decl
-  && (DECL_CONSTRUCTOR_P (current_function_decl)
-	  || DECL_DESTRUCTOR_P (current_function_decl))
+  && DECL_NONSTATIC_MEMBER_FUNCTION_P (current_function_decl)
   && is_this_parameter (tre

Re: [PATCH, rs6000] Fix PR83789: __builtin_altivec_lvx fails for powerpc for altivec-4.c

2018-03-19 Thread Peter Bergner

On 3/12/18 1:55 PM, Segher Boessenkool wrote:
> On Sun, Mar 11, 2018 at 10:23:02AM -0500, Peter Bergner wrote:
>> +; The following patterns embody what lvx should usually look like.
>> +(define_expand "altivec_lvx_"
>> +  [(set (match_operand:VM2 0 "register_operand" "")
>> +(match_operand:VM2 1 "altivec_indexed_or_indirect_operand" ""))]
> 
> No "" please.  Expanders do not have constraints.

Cut & paste error on my part I believe.  Will fix.

>>  #ifdef HAVE_V8HFmode
>> -  else if (mode == V8HFmode)
>> -stvx = TARGET_64BIT
>> -  ? gen_altivec_stvx_v8hf_1op (src_exp, memory_address)
>> -  : gen_altivec_stvx_v8hf_1op_si (src_exp, memory_address);
>> +  else if (mode == V8HFmode)
>> +stvx = gen_altivec_stvx_v8hf (src_exp, dest_exp);
>>  #endif
> 
> Btw, don't we always have V8HFmode these days?

I have no idea, I was just working around code that was already there.
Looking at the history, Kelvin seems to have added the tests.
Kelvin, what is the above trying to protect?

Looking at mu build dirs insn-modes.h, I don't see HAVE_V8HFmode being
defined on either my LE or BE builds.  What am I missing?

Peter

[PATCH] Fix strpbrk (x, "") folding (PR c/84953)

2018-03-19 Thread Jakub Jelinek

Hi!

We use incorrect type for the NULL return value, the builtin is
char *strpbrk (const char *, const char *), so the first argument
is cast to const char * and we return (const char *) 0, while we
really should return (char *) 0.  fold_builtin_n then adds:
  ret = build1 (NOP_EXPR, TREE_TYPE (ret), ret);
  SET_EXPR_LOCATION (ret, loc);
  TREE_NO_WARNING (ret) = 1;
and thus we in the end have (char *) (_Literal (const char *) 0).
This bug then results in a -Wdiscarded-qualifiers warning.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

2018-03-20  Jakub Jelinek  

PR c/84953
* builtins.c (fold_builtin_strpbrk): For strpbrk(x, "") use type
instead of TREE_TYPE (s1) for the return value.

* gcc.dg/pr84953.c: New test.

--- gcc/builtins.c.jj   2018-03-07 22:51:58.871478732 +0100
+++ gcc/builtins.c  2018-03-19 18:49:45.313898848 +0100
@@ -9573,7 +9573,7 @@ fold_builtin_strpbrk (location_t loc, tr
   if (p2[0] == '\0')
/* strpbrk(x, "") == NULL.
   Evaluate and ignore s1 in case it had side-effects.  */
-   return omit_one_operand_loc (loc, TREE_TYPE (s1), integer_zero_node, 
s1);
+   return omit_one_operand_loc (loc, type, integer_zero_node, s1);
 
   if (p2[1] != '\0')
return NULL_TREE;  /* Really call strpbrk.  */
--- gcc/testsuite/gcc.dg/pr84953.c.jj   2018-03-19 18:52:48.295893571 +0100
+++ gcc/testsuite/gcc.dg/pr84953.c  2018-03-19 18:52:31.935894043 +0100
@@ -0,0 +1,11 @@
+/* PR c/84953 */
+/* { dg-do compile } */
+
+char *strpbrk (const char *, const char *);
+
+char *
+test (char *p)
+{
+  p = strpbrk (p, ""); /* { dg-bogus "assignment discards 'const' qualifier 
from pointer target type" } */
+  return p;
+}

Jakub

[PATCH] Avoid compiler UB in store-merging (PR tree-optimization/84946)

2018-03-19 Thread Jakub Jelinek

Hi!

In this code, both bitpos and bitsize are poly_int64, but we know that
bitpos is >= 0 and bitsize.  On the testcase mentioned in the PR (but even
without -mavx512f, so we already invoke it during testing) we would overflow
the computation, 0x7ffsomething + 0x20 (and then cast it
to poly_uint64).  This patch fixes it by casting to poly_uint64 earlier.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2018-03-19  Jakub Jelinek  

PR tree-optimization/84946
* gimple-ssa-store-merging.c (mem_valid_for_store_merging): Compute
bitsize + bitsize in poly_uint64 rather than poly_int64.

--- gcc/gimple-ssa-store-merging.c.jj   2018-02-22 09:28:07.0 +0100
+++ gcc/gimple-ssa-store-merging.c  2018-03-19 17:55:22.486472527 +0100
@@ -3948,7 +3948,8 @@ mem_valid_for_store_merging (tree mem, p
   if (known_eq (bitregion_end, 0U))
 {
   bitregion_start = round_down_to_byte_boundary (bitpos);
-  bitregion_end = round_up_to_byte_boundary (bitpos + bitsize);
+  bitregion_end = bitpos;
+  bitregion_end = round_up_to_byte_boundary (bitregion_end + bitsize);
 }
 
   if (offset != NULL_TREE)

Jakub

Re: [PATCH] Fix strpbrk (x, "") folding (PR c/84953)

2018-03-19 Thread Richard Biener

On March 20, 2018 7:32:46 AM GMT+01:00, Jakub Jelinek  wrote:
>Hi!
>
>We use incorrect type for the NULL return value, the builtin is
>char *strpbrk (const char *, const char *), so the first argument
>is cast to const char * and we return (const char *) 0, while we
>really should return (char *) 0.  fold_builtin_n then adds:
>  ret = build1 (NOP_EXPR, TREE_TYPE (ret), ret);
>  SET_EXPR_LOCATION (ret, loc);
>  TREE_NO_WARNING (ret) = 1;
>and thus we in the end have (char *) (_Literal (const char *) 0).
>This bug then results in a -Wdiscarded-qualifiers warning.
>
>Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok
>for
>trunk?

OK. 

Richard. 

>2018-03-20  Jakub Jelinek  
>
>   PR c/84953
>   * builtins.c (fold_builtin_strpbrk): For strpbrk(x, "") use type
>   instead of TREE_TYPE (s1) for the return value.
>
>   * gcc.dg/pr84953.c: New test.
>
>--- gcc/builtins.c.jj  2018-03-07 22:51:58.871478732 +0100
>+++ gcc/builtins.c 2018-03-19 18:49:45.313898848 +0100
>@@ -9573,7 +9573,7 @@ fold_builtin_strpbrk (location_t loc, tr
>   if (p2[0] == '\0')
>   /* strpbrk(x, "") == NULL.
>  Evaluate and ignore s1 in case it had side-effects.  */
>-  return omit_one_operand_loc (loc, TREE_TYPE (s1), integer_zero_node,
>s1);
>+  return omit_one_operand_loc (loc, type, integer_zero_node, s1);
> 
>   if (p2[1] != '\0')
>   return NULL_TREE;  /* Really call strpbrk.  */
>--- gcc/testsuite/gcc.dg/pr84953.c.jj  2018-03-19 18:52:48.295893571
>+0100
>+++ gcc/testsuite/gcc.dg/pr84953.c 2018-03-19 18:52:31.935894043 +0100
>@@ -0,0 +1,11 @@
>+/* PR c/84953 */
>+/* { dg-do compile } */
>+
>+char *strpbrk (const char *, const char *);
>+
>+char *
>+test (char *p)
>+{
>+  p = strpbrk (p, "");/* { dg-bogus "assignment discards 'const'
>qualifier from pointer target type" } */
>+  return p;
>+}
>
>   Jakub

59 matches

Mail list logo