Re: [patch, fortran] Make ABI ready for BACK argument of MINLOC and MAXLOC

2018-01-09 Thread Janne Blomqvist
On Mon, Jan 8, 2018 at 11:48 PM, Thomas Koenig  wrote:
> Hi Janne,
>
>> If I understand it correctly, in the library the back argument is
>> always a LOGICAL(kind=4). But in the frontend, the back argument is
>> coerced to gfc_default_logical_kind. So this doesn't work if one uses
>> the -fdefault-integer-8 option, because then gfc_default_logical_kind
>> will be 8.
>>
>> I suggest you create a constant "gfc_logical4_kind" and use that in
>> the frontend.
>
>
> Implemented in the version below. (I discussed briefly with myself
> if it would make sense to create a global variable to hold the value
> of 4, then decided against that, hence the macro). I also added
> a new test case with -fdefault-integer-8.

Thanks, good enough for me.

> OK for trunk?

In iresolve.c your patch adds code like

+  if (back->ts.kind != gfc_logical_4_kind)
+{
+  gfc_typespec ts;
+  gfc_clear_ts (&ts);
+  ts.type = BT_LOGICAL;
+  ts.kind = gfc_default_logical_kind;
+  gfc_convert_type_warn (back, &ts, 2, 0);
+}

in two places. Presumably the ts.kind assignment should be
gfc_logical_4_kind, no?

Secondly, since the back argument is always present (you always set it
to .false., as there is no need in the library to distinguish between
a non-present and a .false. argument in this case), I think it would
make more sense to pass it by value than a pointer to it. Sorry for
missing this the first time around.


-- 
Janne Blomqvist


Re: [PATCH] MicroBlaze use default ident output generation

2018-01-09 Thread Nathan Rossi
On 18 November 2017 at 22:13, Nathan Rossi  wrote:
> On 18 November 2017 at 04:25, Jeff Law  wrote:
>> On 11/15/2017 11:58 PM, Nathan Rossi wrote:
>>> Remove the MicroBlaze specific TARGET_ASM_OUTPUT_IDENT definition, and
>>> use the default.
>>>
>>> This resolves issues associated with the use of the .sdata2 operation in
>>> cases where emitted assembly after the ident output is incorrectly in
>>> the .sdata2 section instead of .text or any other expected section.
>>> Which results in assembly failures including operations with symbols
>>> across different segments.
>>>
>>> gcc/ChangeLog
>>>
>>> 2017-11-16  Nathan Rossi  
>>>
>>>   PR target/83013
>>>   * config/microblaze/microblaze-protos.h
>>>   (microblaze_asm_output_ident): Delete
>>>   * config/microblaze/microblaze.c (microblaze_asm_output_ident): Delete
>>>   * config/microblaze/microblaze.h (TARGET_ASM_OUTPUT_IDENT): Default
>> But isn't the purpose of the override to force certain small-enough
>> objects into the .sdata2 section and by removing the override aren't you
>> losing that capability?
>>
>> It does seem like the override is broken in that it changes the current
>> section behind the back of the generic code.
>>
>> Wouldn't a better fix be to ensure that the override arranges to switch
>> back to whatever the current section is?  Perhaps using .pushsection and
>> .popsection would help here?
>>
>
> That would be a better fix, however I sent this change first as it
> seemed it might be preferred to remove the target specific behavior
> instead of fixing it. Since it is the only target that actually uses
> the TARGET_ASM_OUTPUT_IDENT to change the output asm content (others
> either define the default or have a function that calls the default).
>
> But I can sort out a patch that fixes the behavior instead if that is 
> preferred?

Ping. Should I sort out a patch which uses the push/pop of the section
or is this patch preferred?

Regards,
Nathan


Fix ICE on inlined thunks

2018-01-09 Thread Jan Hubicka
Hi,
this patch fixes ICE in somewhat rare situation when thunk is inlined and
later promoted into a comdat group.  The reduced testcase in the PR no longer
reproduces as it relies on quite fragile inlining decisions, but it would
be nice to have re-reduced one.

Bootstrapped/regtested x86_64-linux, comitted to trunk so far.

Honza

PR ipa/80763
* ipa-comdats.c (set_comdat_group): Only set comdat group of real
symbols; not inline clones.

Index: ipa-comdats.c
===
--- ipa-comdats.c   (revision 256332)
+++ ipa-comdats.c   (working copy)
@@ -211,8 +211,11 @@ set_comdat_group (symtab_node *symbol,
   symtab_node *head = (symtab_node *)head_p;
 
   gcc_assert (!symbol->get_comdat_group ());
-  symbol->set_comdat_group (head->get_comdat_group ());
-  symbol->add_to_same_comdat_group (head);
+  if (symbol->real_symbol_p ())
+{
+  symbol->set_comdat_group (head->get_comdat_group ());
+  symbol->add_to_same_comdat_group (head);
+}
   return false;
 }
 


Revert accidental checkin to ipa-inline.

2018-01-09 Thread Jan Hubicka
Hi,
as noticed by Richi, I have accidentally comitted a hack I was experimenting 
with.
Bootstrapped/regtested x86_64-linux, comitted.

Honza

* ipa-inline.c (edge_badness): Revert accidental checkin.

Index: ipa-inline.c
===
--- ipa-inline.c(revision 256329)
+++ ipa-inline.c(working copy)
@@ -1122,7 +1122,7 @@ edge_badness (struct cgraph_edge *edge,
overall_growth += 256 * 256 - 256;
  denominator *= overall_growth;
 }
-  /*denominator *= inlined_time;*/
+  denominator *= inlined_time;
 
   badness = - numerator / denominator;
 


[PATCH] Fix PR83572

2018-01-09 Thread Richard Biener

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2018-01-09  Richard Biener  

PR tree-optimization/83572
* graphite.c: Include cfganal.h.
(graphite_transform_loops): Connect infinite loops to exit
and remove fake edges at the end.

* gcc.dg/graphite/pr83572.c: New testcase.

Index: gcc/graphite.c
===
--- gcc/graphite.c  (revision 256343)
+++ gcc/graphite.c  (working copy)
@@ -38,6 +38,7 @@ along with GCC; see the file COPYING3.
 #include "tree-pass.h"
 #include "params.h"
 #include "pretty-print.h"
+#include "cfganal.h"
 
 #ifdef HAVE_isl
 #include "cfghooks.h"
@@ -350,6 +351,10 @@ graphite_transform_loops (void)
 
   calculate_dominance_info (CDI_DOMINATORS);
 
+  /* We rely on post-dominators during merging of SESE regions so those
+ have to be meaningful.  */
+  connect_infinite_loops_to_exit ();
+
   ctx = isl_ctx_alloc ();
   isl_options_set_on_error (ctx, ISL_ON_ERROR_ABORT);
   the_isl_ctx = ctx;
@@ -368,6 +373,10 @@ graphite_transform_loops (void)
   build_scops (&scops);
   free_dominance_info (CDI_POST_DOMINATORS);
 
+  /* Remove the fake exits before transform given they are not reflected
+ in loop structures we end up verifying.  */
+  remove_fake_exit_edges ();
+
   if (dump_file && (dump_flags & TDF_DETAILS))
 {
   print_graphite_statistics (dump_file, scops);
@@ -428,7 +437,6 @@ graphite_transform_loops (void)
   release_recorded_exits (cfun);
   tree_estimate_probability (false);
 }
-
 }
 
 #else /* If isl is not available: #ifndef HAVE_isl.  */
Index: gcc/testsuite/gcc.dg/graphite/pr83572.c
===
--- gcc/testsuite/gcc.dg/graphite/pr83572.c (nonexistent)
+++ gcc/testsuite/gcc.dg/graphite/pr83572.c (working copy)
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-O -floop-nest-optimize -fno-tree-loop-im" } */
+
+int u0, l1;
+
+void
+u3 (int s1)
+{
+  for (;;)
+{
+  for (u0 = 0; u0 < 2; ++u0)
+   {
+   }
+
+  if (s1 != 0)
+   for (l1 = 0; l1 < 2; ++l1)
+ {
+ }
+
+  l1 = 0;
+}
+}


RE: [PATCH][GCC][ARM] Fix fragile arm fpu attribute tests.

2018-01-09 Thread Tamar Christina
Ping,

Is the fix ok for trunk?

Thanks,
Tamar

> -Original Message-
> From: Christophe Lyon [mailto:christophe.l...@linaro.org]
> Sent: Thursday, December 21, 2017 21:39
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; Ramana Radhakrishnan
> ; Richard Earnshaw
> ; ni...@redhat.com; Kyrylo Tkachov
> 
> Subject: Re: [PATCH][GCC][ARM] Fix fragile arm fpu attribute tests.
> 
> On 21 December 2017 at 15:24, Tamar Christina 
> wrote:
> > The 12/14/2017 20:46, Christophe Lyon wrote:
> >> On 14 December 2017 at 11:56, Tamar Christina
>  wrote:
> >> > The 12/13/2017 08:49, Christophe Lyon wrote:
> >> >> On 12 December 2017 at 18:29, Tamar Christina
>  wrote:
> >> >> > Hi All,
> >> >> >
> >> >> > The previous test made use of arm_neon.h which made the whole
> >> >> > test rather fragile and only applicable to some of the arm targets.
> >> >> >
> >> >> > So instead I make use of different fpus now to test the
> >> >> > generation of fmla instructions. The actual instruction itself
> >> >> > is not tested as all we care about if that the proper .fpu directives 
> >> >> > are
> generated.
> >> >> >
> >> >> > Regtested on arm-none-eabi and arm-none-linux-gnueabihf with no
> >> >> > regressions.
> >> >> >
> >> >> > Ok for trunk?
> >> >> >
> >> >> >
> >> >> > gcc/testsuite/
> >> >> > 2017-12-12  Tamar Christina  
> >> >> >
> >> >> > PR target/82641
> >> >> > * gcc.target/arm/pragma_fpu_attribute.c: New.
> >> >> > * gcc.target/arm/pragma_fpu_attribute_2.c: New.
> >> >
> >> > Hi Christophe,
> >> >
> >> > My apologies, I have rebased the patch.
> >> > New Changelog:
> >> >
> >> > gcc/testsuite/
> >> > 2017-12-14  Tamar Christina  
> >> >
> >> > PR target/82641
> >> > * gcc.target/arm/pragma_fpu_attribute.c: Rewrite to use
> >> > no NEON.
> >> > * gcc.target/arm/pragma_fpu_attribute_2.c: Likewise.
> >> >
> >>
> >> Hi,
> >>
> >> Sorry I think there is still something wrong with this patch.
> >> In pragma_fpu_attribute_2.c, you are not removing #include
> >>  as the ChangeLog seems to imply?
> >>
> >
> > Sorry that was extremely sloppy of me. I noticed the changelog after
> sending
> > but hadn't noticed the #include being left in.
> >
> >> So, with this patch, there are problems on arm-none-linux-gnueabi and
> >> arm-none-eabi:
> >> FAIL:gcc.target/arm/pragma_fpu_attribute.c scan-assembler-times
> >> \\.fpu\\s+vfpv3-d16 1 (found 0 times)
> >> FAIL:gcc.target/arm/pragma_fpu_attribute.c scan-assembler-times
> >> \\.fpu\\s+vfpv4 1 (found 0 times)
> >>
> >> and pragma_fpu_attribute_2.c still fails to compile:
> >> In file included from
> /gcc/testsuite/gcc.target/arm/pragma_fpu_attribute_2.c:6:
> >> /aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-
> eabi/gcc3/gcc/include/arm_neon.h:31:2:
> >> error: #error "NEON intrinsics not available with the soft-float ABI.
> >> Please use -mfloat-abi=softfp or -mfloat-abi=hard"
> >>
> >> I'm not sure why you don't see this when testing on arm-none-eabi?
> >
> > It's because I don't have a compiler configured with only -mfloat-abi=soft.
> So when I run
> > the tests it's always able to just change the ABI. I resorted to manually
> testing it.
> >
> > I've now prevented the tests from running at all on soft float only targets.
> This should fix
> > the problem once and for all.
> >
> > Regtested on arm-none-eabi, arm-none-linux-gnueabi and arm-none-
> linux-gnueabihf.
> >
> > Thanks and sorry for the noise,
> > Tamar
> >
> > Ok for trunk?
> >
> > gcc/testsuite/
> > 2017-12-21  Tamar Christina  
> >
> > PR target/82641
> > * gcc.target/arm/pragma_fpu_attribute.c: Rewrite to use
> > no NEON and require softfp or hard float-abi.
> > * gcc.target/arm/pragma_fpu_attribute_2.c: Likewise.
> >
> FWIW, this version passes validation on my side.
> Thanks
> 
> >>
> >> If you want to see more details:
> >> http://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-
> patches/255624-rb8655.patch-2/report-build-info.html
> >> (ignore the lines with "interrupted", this means there was a problem
> >> on the host during the build)
> >>
> >> Christophe
> >>
> >>
> >> >> >
> >> >> Sorry, it seems your patch does not apply against ToT, and
> >> >> the ChangeLog looks incorrect (these are not new files)
> >> >>
> >> >> Christophe
> >> >
> >> > Thanks,
> >> > Tamar
> >> >
> >> > --
> >
> > --


Re: [PATCH][GCC][ARM] Fix fragile arm fpu attribute tests.

2018-01-09 Thread Kyrill Tkachov


On 09/01/18 10:16, Tamar Christina wrote:

Ping,

Is the fix ok for trunk?


Hi Tamar,

Yes, thanks for pinging this.
I had reviewed it before the break but had forgotten to send an ok out.
Please commit.

Kyrill



Thanks,
Tamar

> -Original Message-
> From: Christophe Lyon [mailto:christophe.l...@linaro.org]
> Sent: Thursday, December 21, 2017 21:39
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; Ramana Radhakrishnan
> ; Richard Earnshaw
> ; ni...@redhat.com; Kyrylo Tkachov
> 
> Subject: Re: [PATCH][GCC][ARM] Fix fragile arm fpu attribute tests.
>
> On 21 December 2017 at 15:24, Tamar Christina 
> wrote:
> > The 12/14/2017 20:46, Christophe Lyon wrote:
> >> On 14 December 2017 at 11:56, Tamar Christina
>  wrote:
> >> > The 12/13/2017 08:49, Christophe Lyon wrote:
> >> >> On 12 December 2017 at 18:29, Tamar Christina
>  wrote:
> >> >> > Hi All,
> >> >> >
> >> >> > The previous test made use of arm_neon.h which made the whole
> >> >> > test rather fragile and only applicable to some of the arm 
targets.

> >> >> >
> >> >> > So instead I make use of different fpus now to test the
> >> >> > generation of fmla instructions. The actual instruction itself
> >> >> > is not tested as all we care about if that the proper .fpu 
directives are

> generated.
> >> >> >
> >> >> > Regtested on arm-none-eabi and arm-none-linux-gnueabihf with no
> >> >> > regressions.
> >> >> >
> >> >> > Ok for trunk?
> >> >> >
> >> >> >
> >> >> > gcc/testsuite/
> >> >> > 2017-12-12  Tamar Christina 
> >> >> >
> >> >> > PR target/82641
> >> >> > * gcc.target/arm/pragma_fpu_attribute.c: New.
> >> >> > * gcc.target/arm/pragma_fpu_attribute_2.c: New.
> >> >
> >> > Hi Christophe,
> >> >
> >> > My apologies, I have rebased the patch.
> >> > New Changelog:
> >> >
> >> > gcc/testsuite/
> >> > 2017-12-14  Tamar Christina 
> >> >
> >> > PR target/82641
> >> > * gcc.target/arm/pragma_fpu_attribute.c: Rewrite to use
> >> > no NEON.
> >> > * gcc.target/arm/pragma_fpu_attribute_2.c: Likewise.
> >> >
> >>
> >> Hi,
> >>
> >> Sorry I think there is still something wrong with this patch.
> >> In pragma_fpu_attribute_2.c, you are not removing #include
> >>  as the ChangeLog seems to imply?
> >>
> >
> > Sorry that was extremely sloppy of me. I noticed the changelog after
> sending
> > but hadn't noticed the #include being left in.
> >
> >> So, with this patch, there are problems on arm-none-linux-gnueabi and
> >> arm-none-eabi:
> >> FAIL:gcc.target/arm/pragma_fpu_attribute.c scan-assembler-times
> >> \\.fpu\\s+vfpv3-d16 1 (found 0 times)
> >> FAIL:gcc.target/arm/pragma_fpu_attribute.c scan-assembler-times
> >> \\.fpu\\s+vfpv4 1 (found 0 times)
> >>
> >> and pragma_fpu_attribute_2.c still fails to compile:
> >> In file included from
> /gcc/testsuite/gcc.target/arm/pragma_fpu_attribute_2.c:6:
> >> /aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-
> eabi/gcc3/gcc/include/arm_neon.h:31:2:
> >> error: #error "NEON intrinsics not available with the soft-float ABI.
> >> Please use -mfloat-abi=softfp or -mfloat-abi=hard"
> >>
> >> I'm not sure why you don't see this when testing on arm-none-eabi?
> >
> > It's because I don't have a compiler configured with only 
-mfloat-abi=soft.

> So when I run
> > the tests it's always able to just change the ABI. I resorted to 
manually

> testing it.
> >
> > I've now prevented the tests from running at all on soft float 
only targets.

> This should fix
> > the problem once and for all.
> >
> > Regtested on arm-none-eabi, arm-none-linux-gnueabi and arm-none-
> linux-gnueabihf.
> >
> > Thanks and sorry for the noise,
> > Tamar
> >
> > Ok for trunk?
> >
> > gcc/testsuite/
> > 2017-12-21  Tamar Christina 
> >
> > PR target/82641
> > * gcc.target/arm/pragma_fpu_attribute.c: Rewrite to use
> > no NEON and require softfp or hard float-abi.
> > * gcc.target/arm/pragma_fpu_attribute_2.c: Likewise.
> >
> FWIW, this version passes validation on my side.
> Thanks
>
> >>
> >> If you want to see more details:
> >> 
http://people.linaro.org/~christophe.lyon/cross-validation/gcc-test- 


> patches/255624-rb8655.patch-2/report-build-info.html
> >> (ignore the lines with "interrupted", this means there was a problem
> >> on the host during the build)
> >>
> >> Christophe
> >>
> >>
> >> >> >
> >> >> Sorry, it seems your patch does not apply against ToT, and
> >> >> the ChangeLog looks incorrect (these are not new files)
> >> >>
> >> >> Christophe
> >> >
> >> > Thanks,
> >> > Tamar
> >> >
> >> > --
> >
> > --




Re: [PATCH 1/3] [builtins] Generic support for __builtin_load_no_speculate()

2018-01-09 Thread Richard Earnshaw (lists)
On 08/01/18 16:01, Bill Schmidt wrote:
> On Jan 8, 2018, at 8:06 AM, Richard Earnshaw (lists) 
>  wrote:
>>
>> On 08/01/18 02:20, Bill Schmidt wrote:
>>> Hi Richard,
>>>
>>> Unfortunately, I don't see any way that this will be useful for the ppc 
>>> targets.  We don't
>>> have a way to force resolution of a condition prior to continuing 
>>> speculation, so this
>>> will just introduce another comparison that we would speculate past.  For 
>>> our mitigation
>>> we will have to introduce an instruction that halts all speculation at that 
>>> point, and place
>>> it in front of all dangerous loads.  I wish it were otherwise.
>>
>> So can't you make the builtin expand to (in pseudo code):
>>
>>  if (bounds_check)
>>{
>>  __asm ("barrier");
>>  result = *ptr;
>>  }
>>else
>>result = failval;
> 
> Could, but this just generates unnecessary code for Power.  We would instead 
> generate
> 
>   __asm ("barrier");
>   result = *ptr;
> 
> without any checks.  We would ignore everything but the first argument.

You can't do that with the builtin as it is currently specified as it
also has a defined behaviour for the result it returns.  You can,
however, expand the code as normal RTL and let the optimizers remove any
redundant code if they can make that deduction and you don't need the
additional behaviour.

R.

> 
> Thanks,
> Bill
> 
>>
>> R.
>>
>>>
>>> Thanks,
>>> Bill
>>>
 On Jan 4, 2018, at 7:58 AM, Richard Earnshaw  
 wrote:


 This patch adds generic support for the new builtin
 __builtin_load_no_speculate.  It provides the overloading of the
 different access sizes and a default fall-back expansion for targets
 that do not support a mechanism for inhibiting speculation.

* builtin_types.def (BT_FN_I1_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR):
New builtin type signature.
(BT_FN_I2_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.
(BT_FN_I4_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.
(BT_FN_I8_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.
(BT_FN_I16_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.
* builtins.def (BUILT_IN_LOAD_NO_SPECULATE_N): New builtin.
(BUILT_IN_LOAD_NO_SPECULATE_1): Likewise.
(BUILT_IN_LOAD_NO_SPECULATE_2): Likewise.
(BUILT_IN_LOAD_NO_SPECULATE_4): Likewise.
(BUILT_IN_LOAD_NO_SPECULATE_8): Likewise.
(BUILT_IN_LOAD_NO_SPECULATE_16): Likewise.
* target.def (inhibit_load_speculation): New hook.
* doc/tm.texi.in (TARGET_INHIBIT_LOAD_SPECULATION): Add to
documentation.
* doc/tm.texi: Regenerated.
* doc/cpp.texi: Document __HAVE_LOAD_NO_SPECULATE.
* doc/extend.texi: Document __builtin_load_no_speculate.
* c-family/c-common.c (load_no_speculate_resolve_size): New function.
(load_no_speculate_resolve_params): New function.
(load_no_speculate_resolve_return): New function.
(resolve_overloaded_builtin): Handle overloading
__builtin_load_no_speculate.
* builtins.c (expand_load_no_speculate): New function.
(expand_builtin): Handle new no-speculation builtins.
* targhooks.h (default_inhibit_load_speculation): Declare.
* targhooks.c (default_inhibit_load_speculation): New function.
 ---
 gcc/builtin-types.def   |  16 +
 gcc/builtins.c  |  99 ++
 gcc/builtins.def|  22 ++
 gcc/c-family/c-common.c | 164 
 
 gcc/c-family/c-cppbuiltin.c |   5 +-
 gcc/doc/cpp.texi|   4 ++
 gcc/doc/extend.texi |  53 ++
 gcc/doc/tm.texi |   6 ++
 gcc/doc/tm.texi.in  |   2 +
 gcc/target.def  |  20 ++
 gcc/targhooks.c |  69 +++
 gcc/targhooks.h |   3 +
 12 files changed, 462 insertions(+), 1 deletion(-)

 <0001-builtins-Generic-support-for-__builtin_load_no_specu.patch>
>>>
>>
> 



Disable autogeneration of gather instructions on Ryzen and generic

2018-01-09 Thread Jan Hubicka
Hi,
gather instructions are rather hard to implement in hardware and except for
skylake+ chips (i.e. haswell and Zen) they seems to be rather slow; to the
degree I did not find real world loop where gather would help on Zen.
This patch simply adds a knob to disable its autogeneration (builtin still
works). I have considered two alternatives
 1) tune this with x86-tune-costs because gather is still profitable than
scalar code if we do very expensive operations (such as sequence of divides)
on the values gathered/scattered
 2) implement expansion of gathers into primitive instructions.  This is faster
as semantics of gather is bit weird and not fully needed for our vectorizer

I did not have luck to get any good results out of 1 alone as cost model is not
very realistic.  1+2 probably makes sense but we can do this incrementally as
that would make most sense to be implemented generically in vectorizer on the
top of this change.

Given that gather is problematic even on skylake+ as shown in the PR (which has
most optimized implementation of it) it is good to have a knob to control its
codegen at first place.

I have also disabled gathers for generic.  This is because its use causes
some two-digit regression on zen for spec2k17 while there are no measurable
benefits on Intel.  Note that this affects only
-march= -mtune=generic
as by default we do not use AVX2.

Bootstrapped/regtested x86_64-linux, plan to commit it later today if there
are no complains.

Honza

PR target/81616
* i386.c (ix86_vectorize_builtin_gather): Check TARGET_USE_GATHER.
* i386.h (TARGET_USE_GATHER): Define.
* x86-tune.def (X86_TUNE_USE_GATHER): New.
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 256369)
+++ config/i386/i386.c  (working copy)
@@ -38233,7 +38233,7 @@ ix86_vectorize_builtin_gather (const_tre
   bool si;
   enum ix86_builtins code;
 
-  if (! TARGET_AVX2)
+  if (! TARGET_AVX2 || !TARGET_USE_GATHER)
 return NULL_TREE;
 
   if ((TREE_CODE (index_type) != INTEGER_TYPE
Index: config/i386/i386.h
===
--- config/i386/i386.h  (revision 256369)
+++ config/i386/i386.h  (working copy)
@@ -498,6 +498,8 @@ extern unsigned char ix86_tune_features[
ix86_tune_features[X86_TUNE_SLOW_PSHUFB]
 #define TARGET_AVOID_4BYTE_PREFIXES \
ix86_tune_features[X86_TUNE_AVOID_4BYTE_PREFIXES]
+#define TARGET_USE_GATHER \
+   ix86_tune_features[X86_TUNE_USE_GATHER]
 #define TARGET_FUSE_CMP_AND_BRANCH_32 \
ix86_tune_features[X86_TUNE_FUSE_CMP_AND_BRANCH_32]
 #define TARGET_FUSE_CMP_AND_BRANCH_64 \
Index: config/i386/x86-tune.def
===
--- config/i386/x86-tune.def(revision 256369)
+++ config/i386/x86-tune.def(working copy)
@@ -399,6 +399,10 @@ DEF_TUNE (X86_TUNE_SLOW_PSHUFB, "slow_ps
 DEF_TUNE (X86_TUNE_AVOID_4BYTE_PREFIXES, "avoid_4byte_prefixes",
   m_SILVERMONT | m_INTEL)
 
+/* X86_TUNE_USE_GATHER: Use gather instructions.  */
+DEF_TUNE (X86_TUNE_USE_GATHER, "use_gather",
+  ~(m_ZNVER1 | m_GENERIC))
+
 /*/
 /* AVX instruction selection tuning (some of SSE flags affects AVX, too) */
 /*/


Re: Disable autogeneration of gather instructions on Ryzen and generic

2018-01-09 Thread Richard Biener
On Tue, Jan 9, 2018 at 11:26 AM, Jan Hubicka  wrote:
> Hi,
> gather instructions are rather hard to implement in hardware and except for
> skylake+ chips (i.e. haswell and Zen) they seems to be rather slow; to the
> degree I did not find real world loop where gather would help on Zen.
> This patch simply adds a knob to disable its autogeneration (builtin still
> works). I have considered two alternatives
>  1) tune this with x86-tune-costs because gather is still profitable than
> scalar code if we do very expensive operations (such as sequence of 
> divides)
> on the values gathered/scattered
>  2) implement expansion of gathers into primitive instructions.  This is 
> faster
> as semantics of gather is bit weird and not fully needed for our 
> vectorizer
>
> I did not have luck to get any good results out of 1 alone as cost model is 
> not
> very realistic.  1+2 probably makes sense but we can do this incrementally as
> that would make most sense to be implemented generically in vectorizer on the
> top of this change.

Note that the vectorizer gives up on loops with gathers with no target
support for
gathers.  It could simply open-code the gather though (and properly cost that
open-coded variant), that's probably the way to go here.

Richard.

> Given that gather is problematic even on skylake+ as shown in the PR (which 
> has
> most optimized implementation of it) it is good to have a knob to control its
> codegen at first place.
>
> I have also disabled gathers for generic.  This is because its use causes
> some two-digit regression on zen for spec2k17 while there are no measurable
> benefits on Intel.  Note that this affects only
> -march= -mtune=generic
> as by default we do not use AVX2.
>
> Bootstrapped/regtested x86_64-linux, plan to commit it later today if there
> are no complains.
>
> Honza
>
> PR target/81616
> * i386.c (ix86_vectorize_builtin_gather): Check TARGET_USE_GATHER.
> * i386.h (TARGET_USE_GATHER): Define.
> * x86-tune.def (X86_TUNE_USE_GATHER): New.
> Index: config/i386/i386.c
> ===
> --- config/i386/i386.c  (revision 256369)
> +++ config/i386/i386.c  (working copy)
> @@ -38233,7 +38233,7 @@ ix86_vectorize_builtin_gather (const_tre
>bool si;
>enum ix86_builtins code;
>
> -  if (! TARGET_AVX2)
> +  if (! TARGET_AVX2 || !TARGET_USE_GATHER)
>  return NULL_TREE;
>
>if ((TREE_CODE (index_type) != INTEGER_TYPE
> Index: config/i386/i386.h
> ===
> --- config/i386/i386.h  (revision 256369)
> +++ config/i386/i386.h  (working copy)
> @@ -498,6 +498,8 @@ extern unsigned char ix86_tune_features[
> ix86_tune_features[X86_TUNE_SLOW_PSHUFB]
>  #define TARGET_AVOID_4BYTE_PREFIXES \
> ix86_tune_features[X86_TUNE_AVOID_4BYTE_PREFIXES]
> +#define TARGET_USE_GATHER \
> +   ix86_tune_features[X86_TUNE_USE_GATHER]
>  #define TARGET_FUSE_CMP_AND_BRANCH_32 \
> ix86_tune_features[X86_TUNE_FUSE_CMP_AND_BRANCH_32]
>  #define TARGET_FUSE_CMP_AND_BRANCH_64 \
> Index: config/i386/x86-tune.def
> ===
> --- config/i386/x86-tune.def(revision 256369)
> +++ config/i386/x86-tune.def(working copy)
> @@ -399,6 +399,10 @@ DEF_TUNE (X86_TUNE_SLOW_PSHUFB, "slow_ps
>  DEF_TUNE (X86_TUNE_AVOID_4BYTE_PREFIXES, "avoid_4byte_prefixes",
>m_SILVERMONT | m_INTEL)
>
> +/* X86_TUNE_USE_GATHER: Use gather instructions.  */
> +DEF_TUNE (X86_TUNE_USE_GATHER, "use_gather",
> +  ~(m_ZNVER1 | m_GENERIC))
> +
>  
> /*/
>  /* AVX instruction selection tuning (some of SSE flags affects AVX, too) 
> */
>  
> /*/


[PATCH][GCC][AArch64] Enable dotproduct by default for Armv8.4-a

2018-01-09 Thread Tamar Christina
Hi All,

This patch makes the Dot Product extension mandatory on Armv8.4-A.

Regtested on aarch64-none-elf and no regressions.

gcc/
2018-01-09  Tamar Christina  

* config/aarch64/aarch64.h
(AARCH64_FL_FOR_ARCH8_4): Add  AARCH64_FL_DOTPROD.

gcc/testsuite/
2018-01-09  Tamar Christina  

* gcc.target/aarch64/advsimd-intrinsics/vdot-compile-2.c: New.

Ok for trunk?

Thanks,
Tamar

-- 
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 20cef53079c955733db105331b7f1bd3bdb03005..02546e278ed860f0e42fa1423ceb39f1d4cb5e9b 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -171,7 +171,7 @@ extern unsigned aarch64_architecture_version;
 #define AARCH64_FL_FOR_ARCH8_3			\
   (AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_V8_3)
 #define AARCH64_FL_FOR_ARCH8_4			\
-  (AARCH64_FL_FOR_ARCH8_3 | AARCH64_FL_V8_4)
+  (AARCH64_FL_FOR_ARCH8_3 | AARCH64_FL_V8_4 | AARCH64_FL_DOTPROD)
 
 /* Macros to test ISA flags.  */
 
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vdot-compile-2.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vdot-compile-2.c
new file mode 100644
index ..7d8d641bcf0f4cc31b9df51b0c0c499bc8dedfce
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vdot-compile-2.c
@@ -0,0 +1,73 @@
+/* { dg-do compile { target { aarch64*-*-* } } } */
+/* { dg-additional-options "-O3 -march=armv8.4-a" } */
+
+#include 
+
+/* Unsigned Dot Product instructions.  */
+
+uint32x2_t ufoo (uint32x2_t r, uint8x8_t x, uint8x8_t y)
+{
+  return vdot_u32 (r, x, y);
+}
+
+uint32x4_t ufooq (uint32x4_t r, uint8x16_t x, uint8x16_t y)
+{
+  return vdotq_u32 (r, x, y);
+}
+
+uint32x2_t ufoo_lane (uint32x2_t r, uint8x8_t x, uint8x8_t y)
+{
+  return vdot_lane_u32 (r, x, y, 0);
+}
+
+uint32x2_t ufoo_laneq (uint32x2_t r, uint8x8_t x, uint8x16_t y)
+{
+  return vdot_laneq_u32 (r, x, y, 0);
+}
+
+uint32x4_t ufooq_lane (uint32x4_t r, uint8x16_t x, uint8x8_t y)
+{
+  return vdotq_lane_u32 (r, x, y, 0);
+}
+
+uint32x4_t ufooq_laneq (uint32x4_t r, uint8x16_t x, uint8x16_t y)
+{
+  return vdotq_laneq_u32 (r, x, y, 0);
+}
+
+/* Signed Dot Product instructions.  */
+
+int32x2_t sfoo (int32x2_t r, int8x8_t x, int8x8_t y)
+{
+  return vdot_s32 (r, x, y);
+}
+
+int32x4_t sfooq (int32x4_t r, int8x16_t x, int8x16_t y)
+{
+  return vdotq_s32 (r, x, y);
+}
+
+int32x2_t sfoo_lane (int32x2_t r, int8x8_t x, int8x8_t y)
+{
+  return vdot_lane_s32 (r, x, y, 0);
+}
+
+int32x2_t sfoo_laneq (int32x2_t r, int8x8_t x, int8x16_t y)
+{
+  return vdot_laneq_s32 (r, x, y, 0);
+}
+
+int32x4_t sfooq_lane (int32x4_t r, int8x16_t x, int8x8_t y)
+{
+  return vdotq_lane_s32 (r, x, y, 0);
+}
+
+int32x4_t sfooq_laneq (int32x4_t r, int8x16_t x, int8x16_t y)
+{
+  return vdotq_laneq_s32 (r, x, y, 0);
+}
+
+/* { dg-final { scan-assembler-times {[us]dot\tv[0-9]+\.2s, v[0-9]+\.8b, v[0-9]+\.8b} 2 } } */
+/* { dg-final { scan-assembler-times {[us]dot\tv[0-9]+\.2s, v[0-9]+\.8b, v[0-9]+\.4b\[[0-9]+\]}  4 } } */
+/* { dg-final { scan-assembler-times {[us]dot\tv[0-9]+\.4s, v[0-9]+\.16b, v[0-9]+\.16b}  2 } } */
+/* { dg-final { scan-assembler-times {[us]dot\tv[0-9]+\.4s, v[0-9]+\.16b, v[0-9]+\.4b\[[0-9]+\]}  4 } } */



[PATCH] Add static assert about stack alignment (PR sanitizer/82517).

2018-01-09 Thread Martin Liška
Hi.

Folowing static assert is added as we may potentially adjust 
ASAN_SHADOW_GRANULARITY
(via ASAN_SHADOW_SHIFT). The assert ensures stack variables will have sufficient
alignment.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Ready to be installed?
Martin

gcc/ChangeLog:

2018-01-09  Martin Liska  

PR sanitizer/82517
* asan.c (shadow_mem_size): Add static assert.
---
 gcc/asan.c | 5 +
 1 file changed, 5 insertions(+)


diff --git a/gcc/asan.c b/gcc/asan.c
index 53630088b76..0421d4282a1 100644
--- a/gcc/asan.c
+++ b/gcc/asan.c
@@ -1228,6 +1228,11 @@ asan_function_start (void)
 static unsigned HOST_WIDE_INT
 shadow_mem_size (unsigned HOST_WIDE_INT size)
 {
+  /* It must be possible to align stack variables to granularity
+ of shadow memory.  */
+  STATIC_ASSERT (BITS_PER_UNIT
+		 * ASAN_SHADOW_GRANULARITY <= MAX_SUPPORTED_STACK_ALIGNMENT);
+
   return ROUND_UP (size, ASAN_SHADOW_GRANULARITY) / ASAN_SHADOW_GRANULARITY;
 }
 



Re: [PATCH 0/3] Add __builtin_load_no_speculate

2018-01-09 Thread Richard Earnshaw (lists)
On 05/01/18 13:08, Alexander Monakov wrote:
> On Fri, 5 Jan 2018, Richard Earnshaw (lists) wrote:
>> This is quite tricky.  For ARM we have to have a speculated load.
> 
> Sorry, I don't follow. On ARM, it is surprising that CSEL-CSDB-LDR sequence
> wouldn't work (applying CSEL to the address rather than loaded value), and
> if it wouldn't, then ARM-specific lowering of the builtin can handle that
> anyhow, right? (by spilling the pointer)

The load has to feed /in/ to the csel/csdb sequence, not come after it.

> 
> (on x86 the current Intel's recommendation is to emit LFENCE prior to the 
> load)

That can be supported in the way you expand the builtin.  The builtin
expander is given a (MEM (ptr)) , but it's up to the back-end where to
put that in the expanded sequence to materialize the load, so you could
write (sorry, don't know x86 asm very well, but I think this is how
you'd put it)

lfence
mov (ptr), dest

with branches around that as appropriate to support the remainder of the
builtin's behaviour.

> Is the main issue expressing the CSEL condition in the source code? Perhaps 
> it is
> possible to introduce
> 
>   int guard = __builtin_nontransparent(predicate);
> 
>   if (predicate)
> foo = __builtin_load_no_speculate(&arr[addr], guard);
> 
> ... or maybe even
> 
>   if (predicate)
> foo = arr[__builtin_loadspecbarrier(addr, guard)];
> 
> where internally __builtin_nontransparent is the same as
> 
>   guard = predicate;
>   asm volatile("" : "+g"(guard));
> 
> although admittedly this is not perfect since it forces evaluation of 'guard'
> before the branch.

As I explained to Bernd last night, I think this is likely be unsafe.
If there's some control path before __builtin_nontransparent that allows
'predicate' to be simplified (eg by value range propagation), then your
guard doesn't protect against the speculation that you think it does.
Changing all the optimizers to guarantee that wouldn't happen (and
guaranteeing that all future optimizers won't introduce new problems of
that nature) is, I suspect, very non-trivial.

R.

> 
> Alexander
> 



Re: [PATCH] Add static assert about stack alignment (PR sanitizer/82517).

2018-01-09 Thread Jakub Jelinek
On Tue, Jan 09, 2018 at 11:41:17AM +0100, Martin Liška wrote:
> Folowing static assert is added as we may potentially adjust 
> ASAN_SHADOW_GRANULARITY
> (via ASAN_SHADOW_SHIFT). The assert ensures stack variables will have 
> sufficient
> alignment.
> 
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
> 
> Ready to be installed?
> Martin
> 
> gcc/ChangeLog:
> 
> 2018-01-09  Martin Liska  
> 
>   PR sanitizer/82517
>   * asan.c (shadow_mem_size): Add static assert.

STATIC_ASSERT assumes all the 3 macros expand to constants, not sure if we
for the future can always guarantee it, e.g. MAX_SUPPORTED_STACK_ALIGNMENT
could be dependent on some command line option etc.
Use gcc_assert or gcc_checking_assert instead?

Jakub


Re: Disable autogeneration of gather instructions on Ryzen and generic

2018-01-09 Thread Jan Hubicka
> On Tue, Jan 9, 2018 at 11:26 AM, Jan Hubicka  wrote:
> > Hi,
> > gather instructions are rather hard to implement in hardware and except for
> > skylake+ chips (i.e. haswell and Zen) they seems to be rather slow; to the
> > degree I did not find real world loop where gather would help on Zen.
> > This patch simply adds a knob to disable its autogeneration (builtin still
> > works). I have considered two alternatives
> >  1) tune this with x86-tune-costs because gather is still profitable than
> > scalar code if we do very expensive operations (such as sequence of 
> > divides)
> > on the values gathered/scattered
> >  2) implement expansion of gathers into primitive instructions.  This is 
> > faster
> > as semantics of gather is bit weird and not fully needed for our 
> > vectorizer
> >
> > I did not have luck to get any good results out of 1 alone as cost model is 
> > not
> > very realistic.  1+2 probably makes sense but we can do this incrementally 
> > as
> > that would make most sense to be implemented generically in vectorizer on 
> > the
> > top of this change.
> 
> Note that the vectorizer gives up on loops with gathers with no target
> support for
> gathers.  It could simply open-code the gather though (and properly cost that
> open-coded variant), that's probably the way to go here.

Agreed, that is whay i meant by 2.  Generic code has all info to implement 
gathers/scatters
by hand when they are not supported and estimate cost of the implementation.
This is what I meant by the paragraph above.  I tried to look into it about 
month
ago but got bit lost in vectorizer internals :(

Honza
> 
> Richard.
> 
> > Given that gather is problematic even on skylake+ as shown in the PR (which 
> > has
> > most optimized implementation of it) it is good to have a knob to control 
> > its
> > codegen at first place.
> >
> > I have also disabled gathers for generic.  This is because its use causes
> > some two-digit regression on zen for spec2k17 while there are no measurable
> > benefits on Intel.  Note that this affects only
> > -march= -mtune=generic
> > as by default we do not use AVX2.
> >
> > Bootstrapped/regtested x86_64-linux, plan to commit it later today if there
> > are no complains.
> >
> > Honza
> >
> > PR target/81616
> > * i386.c (ix86_vectorize_builtin_gather): Check TARGET_USE_GATHER.
> > * i386.h (TARGET_USE_GATHER): Define.
> > * x86-tune.def (X86_TUNE_USE_GATHER): New.
> > Index: config/i386/i386.c
> > ===
> > --- config/i386/i386.c  (revision 256369)
> > +++ config/i386/i386.c  (working copy)
> > @@ -38233,7 +38233,7 @@ ix86_vectorize_builtin_gather (const_tre
> >bool si;
> >enum ix86_builtins code;
> >
> > -  if (! TARGET_AVX2)
> > +  if (! TARGET_AVX2 || !TARGET_USE_GATHER)
> >  return NULL_TREE;
> >
> >if ((TREE_CODE (index_type) != INTEGER_TYPE
> > Index: config/i386/i386.h
> > ===
> > --- config/i386/i386.h  (revision 256369)
> > +++ config/i386/i386.h  (working copy)
> > @@ -498,6 +498,8 @@ extern unsigned char ix86_tune_features[
> > ix86_tune_features[X86_TUNE_SLOW_PSHUFB]
> >  #define TARGET_AVOID_4BYTE_PREFIXES \
> > ix86_tune_features[X86_TUNE_AVOID_4BYTE_PREFIXES]
> > +#define TARGET_USE_GATHER \
> > +   ix86_tune_features[X86_TUNE_USE_GATHER]
> >  #define TARGET_FUSE_CMP_AND_BRANCH_32 \
> > ix86_tune_features[X86_TUNE_FUSE_CMP_AND_BRANCH_32]
> >  #define TARGET_FUSE_CMP_AND_BRANCH_64 \
> > Index: config/i386/x86-tune.def
> > ===
> > --- config/i386/x86-tune.def(revision 256369)
> > +++ config/i386/x86-tune.def(working copy)
> > @@ -399,6 +399,10 @@ DEF_TUNE (X86_TUNE_SLOW_PSHUFB, "slow_ps
> >  DEF_TUNE (X86_TUNE_AVOID_4BYTE_PREFIXES, "avoid_4byte_prefixes",
> >m_SILVERMONT | m_INTEL)
> >
> > +/* X86_TUNE_USE_GATHER: Use gather instructions.  */
> > +DEF_TUNE (X86_TUNE_USE_GATHER, "use_gather",
> > +  ~(m_ZNVER1 | m_GENERIC))
> > +
> >  
> > /*/
> >  /* AVX instruction selection tuning (some of SSE flags affects AVX, too)   
> >   */
> >  
> > /*/


Re: [patch,avr] Implement PR83738

2018-01-09 Thread Georg-Johann Lay

On 08.01.2018 18:39, Denis Chertykov wrote:

2018-01-08 20:19 GMT+04:00 Georg-Johann Lay :

This PR skips saving of any registers in main.

Attribute OS_main can do this as well, however we can just drop
any saves / restores in all optimized compilation -- not even
the test suite needs these saves.

The feature can still be switched off by new -mno-OS_main

Ok for trunk?


I like it.

Please commit.


Committed a different approach:

1) The effect is the same as OS_task, hence named the
   option -mmain-is-OS_task.  OS_main is too strict
   because that'd assume that IRQs are off when main is
   entered, hence due to that rare case (and when main
   needs a frame) OS_task is the right feature.

2) Instead of fiddling with prologue / epilogue directly,
   now just add OS_task to main.

3) Some restrictions have been removed re. diagnostics
   when naked, OS_task, OS_main are specified at the
   same time.  Instead, the logic follows now just
   what the attributes are requesting (e.g. OS_main
   supersedes OS_task and naked supersedes OS_main).

The original subject has a typo: The PR is PR83738.


PR target/83738
* doc/invoke.texi (AVR Options) [-mmain-is-OS_task]: Document it.
* config/avr/avr.opt (-mmain-is-OS_task): New target option.
* config/avr/avr.c (avr_set_current_function): Don't error if
naked, OS_task or OS_main are specified at the same time.
(avr_function_ok_for_sibcall): Don't disable sibcalls for OS_task,
OS_main.
(avr_insert_attributes) [-mmain-is-OS_task] : Add OS_task
attribute.
* common/config/avr/avr-common.c (avr_option_optimization_table):
Switch on -mmain-is-OS_task for optimizing compilations.
Index: common/config/avr/avr-common.c
===
--- common/config/avr/avr-common.c	(revision 256338)
+++ common/config/avr/avr-common.c	(working copy)
@@ -31,6 +31,7 @@ static const struct default_options avr_
 // a frame without need when it tries to be smart around calls.
 { OPT_LEVELS_ALL, OPT_fcaller_saves, NULL, 0 },
 { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_mgas_isr_prologues, NULL, 1 },
+{ OPT_LEVELS_1_PLUS, OPT_mmain_is_OS_task, NULL, 1 },
 { OPT_LEVELS_NONE, 0, NULL, 0 }
   };
 
Index: config/avr/avr.c
===
--- config/avr/avr.c	(revision 256338)
+++ config/avr/avr.c	(working copy)
@@ -1030,9 +1030,6 @@ avr_no_gccisr_function_p (tree func)
 static void
 avr_set_current_function (tree decl)
 {
-  location_t loc;
-  const char *isr;
-
   if (decl == NULL_TREE
   || current_function_decl == NULL_TREE
   || current_function_decl == error_mark_node
@@ -1040,7 +1037,7 @@ avr_set_current_function (tree decl)
   || cfun->machine->attributes_checked_p)
 return;
 
-  loc = DECL_SOURCE_LOCATION (decl);
+  location_t loc = DECL_SOURCE_LOCATION (decl);
 
   cfun->machine->is_naked = avr_naked_function_p (decl);
   cfun->machine->is_signal = avr_signal_function_p (decl);
@@ -1049,21 +1046,19 @@ avr_set_current_function (tree decl)
   cfun->machine->is_OS_main = avr_OS_main_function_p (decl);
   cfun->machine->is_no_gccisr = avr_no_gccisr_function_p (decl);
 
-  isr = cfun->machine->is_interrupt ? "interrupt" : "signal";
+  const char *isr = cfun->machine->is_interrupt ? "interrupt" : "signal";
 
   /* Too much attributes make no sense as they request conflicting features. */
 
-  if (cfun->machine->is_OS_task + cfun->machine->is_OS_main
-  + (cfun->machine->is_signal || cfun->machine->is_interrupt) > 1)
-error_at (loc, "function attributes %qs, %qs and %qs are mutually"
-  " exclusive", "OS_task", "OS_main", isr);
-
-  /* 'naked' will hide effects of 'OS_task' and 'OS_main'.  */
-
-  if (cfun->machine->is_naked
-  && (cfun->machine->is_OS_task || cfun->machine->is_OS_main))
-warning_at (loc, OPT_Wattributes, "function attributes %qs and %qs have"
-" no effect on %qs function", "OS_task", "OS_main", "naked");
+  if (cfun->machine->is_OS_task
+  && (cfun->machine->is_signal || cfun->machine->is_interrupt))
+error_at (loc, "function attributes %qs and %qs are mutually exclusive",
+  "OS_task", isr);
+
+  if (cfun->machine->is_OS_main
+  && (cfun->machine->is_signal || cfun->machine->is_interrupt))
+error_at (loc, "function attributes %qs and %qs are mutually exclusive",
+  "OS_main", isr);
 
   if (cfun->machine->is_interrupt || cfun->machine->is_signal)
 {
@@ -3526,12 +3521,7 @@ avr_function_ok_for_sibcall (tree decl_c
   if (cfun->machine->is_interrupt
   || cfun->machine->is_signal
   || cfun->machine->is_naked
-  || avr_naked_function_p (decl_callee)
-  /* FIXME: For OS_task and OS_main, this might be over-conservative.  */
-  || (avr_OS_task_function_p (decl_callee)
-  != cfun->machine->is_OS_task)
-  || (avr_OS_main_function_p (decl_callee)
-

Tweak predictors based on SPEC2006 and SPEC2017

2018-01-09 Thread Martin Liška
Hello.

In current stage1 Honza spent a significant amount of time to improve profiling 
infrastructure.
New classes profile_count and profile_probability have been added and we hope 
the profile
is maintained more sensitively among various optimization passes.

My patch series makes adjustments to values of predictors defined in 
predict.def based on
what I measured on SPEC2006 and SPEC2017. For both, -O2 -march=native was used 
and it was
run on a Haswell CPU. Note that numbers should be reproducible on any CPU, as 
coverage mapping
to source files should match.

Compare to previous release (GCC 7.x branch), I made 2 main differences:
1) For GCC 7.x tuning I wrongly used train run to collect numbers -> fixed by 
running train run
with ref size of benchmarks
2) I noticed we have branches that in some cases dominate a predictor. That's 
caused by fact that
we calculate average predictor value based on number of successful branching. 
Imagine a very hot
condition in a loop. Such branch can dominate a predictor. So that I decided to 
calculate statistics
also without branches that have coverage >= 10%. Doing that shows that some 
predictors can be close
to 50%, or value chosen in predict.def is very far from what we have 
considering all branches.
In such cases, I explain why I've chosen to either remove a predictor or adjust 
based on non-dominating
branches.

The email contains of statistics for SPEC2006 suite, SPEC2017 suite and 
combined values based on these.
Apart from that, statistics for each individual benchmark can be also found.

There are numbers for 2 configurations of SPEC2006 and SPEC2017 which were run 
with the series on a Ryzen 5
machine:

+---++--+--+
| 1) = CPU2006: -O2 -march=native   ||  |  |
| Performance Regressions - Execution Time  | Δ (B)  | Baseline | Current  |
| SPEC/SPEC2006/FP/459.GemsFDTD | 3.78%  | 234.6328 | 243.5055 |
| SPEC/SPEC2006/INT/456.hmmer   | 3.08%  | 312.4193 | 322.0455 |
| SPEC/SPEC2006/FP/482.sphinx3  | 2.26%  | 370.0952 | 378.4484 |
| SPEC/SPEC2006/FP/433.milc | 1.95%  | 289.1947 | 294.8295 |
| SPEC/SPEC2006/FP/465.tonto| 1.70%  | 325.3729 | 330.9081 |
| SPEC/SPEC2006/FP/416.gamess   | 1.63%  | 550.4813 | 559.4563 |
| SPEC/SPEC2006/INT/429.mcf | 1.61%  | 245.2728 | 249.2234 |
|   ||  |  |
| Performance Improvements - Execution Time | Δ (B)  | Baseline | Current  |
| SPEC/SPEC2006/FP/436.cactusADM| -5.94% | 305.2817 | 287.1523 |
| SPEC/SPEC2006/INT/483.xalancbmk   | -2.84% | 200.1121 | 194.4234 |
| SPEC/SPEC2006/INT/401.bzip2   | -1.31% | 378.0334 | 373.0694 |
|   ||  |  |
| 2) = CPU2006: -Ofast -march=native||  |  |
| Performance Regressions - Execution Time  | Δ (B)  | Baseline | Current  |
| SPEC/SPEC2006/FP/459.GemsFDTD | 3.11%  | 210.9732 | 217.5323 |
| SPEC/SPEC2006/INT/429.mcf | 1.63%  | 242.1709 | 246.1208 |
| SPEC/SPEC2006/FP/482.sphinx3  | 1.57%  | 270.7089 | 274.9587 |
| SPEC/SPEC2006/INT/400.perlbench   | 1.40%  | 274.1239 | 277.9529 |
| SPEC/SPEC2006/FP/481.wrf  | 1.33%  | 180.1668 | 182.555  |
|   ||  |  |
| Performance Improvements - Execution Time | Δ (B)  | Baseline | Current  |
| SPEC/SPEC2006/FP/450.soplex   | -4.38% | 214.6197 | 205.2086 |
| SPEC/SPEC2006/FP/436.cactusADM| -3.24% | 153.1062 | 148.1397 |
| SPEC/SPEC2006/FP/435.gromacs  | -2.94% | 199.6355 | 193.7684 |
| SPEC/SPEC2006/INT/471.omnetpp | -1.93% | 285.8131 | 280.3005 |
| SPEC/SPEC2006/INT/445.gobmk   | -1.14% | 375.2672 | 370.9727 |
|   ||  |  |
| 3) = CPU2017: -O2 -march=native   ||  |  |
| Performance Regressions - Execution Time  | Δ (B)  | Baseline | Current  |
| SPEC/SPEC2017/INT/557.xz_r| 2.51%  | 397.0281 | 407.0107 |
| SPEC/SPEC2017/INT/520.omnetpp_r   | 1.69%  | 443.3886 | 450.8843 |
| SPEC/SPEC2017/FP/549.fotonik3d_r  | 1.44%  | 361.6036 | 366.7952 |
| SPEC/SPEC2017/FP/508.namd_r   | 1.07%  | 198.1652 | 200.2947 |
|   ||  |  |
| Performance Improvements - Execution Time | Δ (B)  | Baseline | Current  |
| SPEC/SPEC2017/INT/548.exchange2_r | -6.78% | 433.2191 | 403.8602 |
| SPEC/SPEC2017/FP/507.cactuBSSN_r  | -1.49% | 223.5739 | 220.2388 |
| SPEC/SPEC2017/FP/511.povray_r | -1.34% | 483.2885 | 476.8329 |
| SPEC/SPEC2017/INT/500.perlbench_r | -1.03% | 417.0421 | 412.7557 |
|   

[PATCH 1/5] Fix usage of analyze_brprob.py script.

2018-01-09 Thread Martin Liška
First patch from the series simplifies how we dump and post-process statistics.
I switched to use ';' separated values instead of a complex human language. 
Changes
to analyze_brprob.py add possibility to filter out dominant edges. Having that 
one
can see how predictor values change if a dominant edge is ignored.

Martin
>From ef3c162961f3599ee65e87ecde5138d9f37f0221 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Thu, 21 Dec 2017 17:19:13 +0100
Subject: [PATCH 1/5] Fix usage of analyze_brprob.py script.

contrib/ChangeLog:

2017-12-21  Martin Liska  

	* analyze_brprob.py: Support new format that can be easily
	parsed. Add new column to report.

gcc/ChangeLog:

2017-12-21  Martin Liska  

	* predict.c (dump_prediction): Add new format for
	analyze_brprob.py script which is enabled with -details
	suboption.
	* profile-count.h (precise_p): New function.
---
 contrib/analyze_brprob.py | 103 --
 gcc/predict.c |  13 ++
 gcc/profile-count.h   |   5 +++
 3 files changed, 90 insertions(+), 31 deletions(-)

diff --git a/contrib/analyze_brprob.py b/contrib/analyze_brprob.py
index e03d1da1cde..de5f474d629 100755
--- a/contrib/analyze_brprob.py
+++ b/contrib/analyze_brprob.py
@@ -71,6 +71,7 @@ from math import *
 
 counter_aggregates = set(['combined', 'first match', 'DS theory',
 'no prediction'])
+hot_threshold = 10
 
 def percentage(a, b):
 return 100.0 * a / b
@@ -131,47 +132,87 @@ class PredictDefFile:
 with open(self.path, 'w+') as f:
 for l in modified_lines:
 f.write(l + '\n')
+class Heuristics:
+def __init__(self, count, hits, fits):
+self.count = count
+self.hits = hits
+self.fits = fits
 
 class Summary:
 def __init__(self, name):
 self.name = name
-self.branches = 0
-self.successfull_branches = 0
-self.count = 0
-self.hits = 0
-self.fits = 0
+self.edges= []
+
+def branches(self):
+return len(self.edges)
+
+def hits(self):
+return sum([x.hits for x in self.edges])
+
+def fits(self):
+return sum([x.fits for x in self.edges])
+
+def count(self):
+return sum([x.count for x in self.edges])
+
+def successfull_branches(self):
+return len([x for x in self.edges if 2 * x.hits >= x.count])
 
 def get_hitrate(self):
-return 100.0 * self.hits / self.count
+return 100.0 * self.hits() / self.count()
 
 def get_branch_hitrate(self):
-return 100.0 * self.successfull_branches / self.branches
+return 100.0 * self.successfull_branches() / self.branches()
 
 def count_formatted(self):
-v = self.count
+v = self.count()
 for unit in ['', 'k', 'M', 'G', 'T', 'P', 'E', 'Z', 'Y']:
 if v < 1000:
 return "%3.2f%s" % (v, unit)
 v /= 1000.0
 return "%.1f%s" % (v, 'Y')
 
+def count(self):
+return sum([x.count for x in self.edges])
+
 def print(self, branches_max, count_max, predict_def):
+# filter out most hot edges (if requested)
+self.edges = sorted(self.edges, reverse = True, key = lambda x: x.count)
+if args.coverage_threshold != None:
+threshold = args.coverage_threshold * self.count() / 100
+edges = [x for x in self.edges if x.count < threshold]
+if len(edges) != 0:
+self.edges = edges
+
 predicted_as = None
 if predict_def != None and self.name in predict_def.predictors:
 predicted_as = predict_def.predictors[self.name]
 
 print('%-40s %8i %5.1f%% %11.2f%% %7.2f%% / %6.2f%% %14i %8s %5.1f%%' %
-(self.name, self.branches,
-percentage(self.branches, branches_max),
+(self.name, self.branches(),
+percentage(self.branches(), branches_max),
 self.get_branch_hitrate(),
 self.get_hitrate(),
-percentage(self.fits, self.count),
-self.count, self.count_formatted(),
-percentage(self.count, count_max)), end = '')
+percentage(self.fits(), self.count()),
+self.count(), self.count_formatted(),
+percentage(self.count(), count_max)), end = '')
 
 if predicted_as != None:
 print('%12i%% %5.1f%%' % (predicted_as,
 self.get_hitrate() - predicted_as), end = '')
+else:
+print(' ' * 20, end = '')
+
+# print details about the most important edges
+if args.coverage_threshold == None:
+edges = [x for x in self.edges[:100] if x.count * hot_threshold > self.count()]
+if args.verbose:
+for c in edges:
+r = 100.0 * c.count / self.count()
+print(' %.0f%%:%d' % (r, c.count), end = '')
+elif len(edges) > 0:
+   

[PATCH 2/5] Introduce PROB_UNINITIALIZED constant and use it in, predict.def.

2018-01-09 Thread Martin Liška
Second patch cleans up predictors that do not use a value from predict.def,
but their value is derived from e.g. number of iterations of a loop.
These predictors have set probability to PROB_UNINITIALIZED and 
analyze_branch.py
does not compare statistics to values defined in predict.def.

The patch is no-op.

Martin
>From 6f6f2d88d3141b1e1604698abf857fffbb42330e Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 27 Dec 2017 14:49:20 +0100
Subject: [PATCH 2/5] Introduce PROB_UNINITIALIZED constant and use it in
 predict.def.

gcc/ChangeLog:

2017-12-28  Martin Liska  

	* predict.c (predict_insn_def): Add new assert.
	(struct branch_predictor): Change type to signed integer.
	(test_prediction_value_range): Amend test to cover
	PROB_UNINITIALIZED.
	* predict.def (PRED_LOOP_ITERATIONS): Use the new constant.
	(PRED_LOOP_ITERATIONS_GUESSED): Likewise.
	(PRED_LOOP_ITERATIONS_MAX): Likewise.
	(PRED_LOOP_IV_COMPARE): Likewise.
	* predict.h (PROB_UNINITIALIZED): Define new constant.
---
 gcc/predict.c   | 6 +-
 gcc/predict.def | 8 
 gcc/predict.h   | 1 +
 3 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/gcc/predict.c b/gcc/predict.c
index 3ac18a2c5f2..51fd14205c2 100644
--- a/gcc/predict.c
+++ b/gcc/predict.c
@@ -545,6 +545,7 @@ predict_insn_def (rtx_insn *insn, enum br_predictor predictor,
 		  enum prediction taken)
 {
int probability = predictor_info[(int) predictor].hitrate;
+   gcc_assert (probability != PROB_UNINITIALIZED);
 
if (taken != TAKEN)
  probability = REG_BR_PROB_BASE - probability;
@@ -4185,7 +4186,7 @@ namespace selftest {
 struct branch_predictor
 {
   const char *name;
-  unsigned probability;
+  int probability;
 };
 
 #define DEF_PREDICTOR(ENUM, NAME, HITRATE, FLAGS) { NAME, HITRATE },
@@ -4200,6 +4201,9 @@ test_prediction_value_range ()
 
   for (unsigned i = 0; predictors[i].name != NULL; i++)
 {
+  if (predictors[i].probability == PROB_UNINITIALIZED)
+	continue;
+
   unsigned p = 100 * predictors[i].probability / REG_BR_PROB_BASE;
   ASSERT_TRUE (p > 50 && p <= 100);
 }
diff --git a/gcc/predict.def b/gcc/predict.def
index 0f37e399312..390b9a35fa7 100644
--- a/gcc/predict.def
+++ b/gcc/predict.def
@@ -54,7 +54,7 @@ DEF_PREDICTOR (PRED_UNCONDITIONAL, "unconditional jump", PROB_ALWAYS,
 /* Use number of loop iterations determined by # of iterations
analysis to set probability.  We don't want to use Dempster-Shaffer
theory here, as the predictions is exact.  */
-DEF_PREDICTOR (PRED_LOOP_ITERATIONS, "loop iterations", PROB_ALWAYS,
+DEF_PREDICTOR (PRED_LOOP_ITERATIONS, "loop iterations", PROB_UNINITIALIZED,
 	   PRED_FLAG_FIRST_MATCH)
 
 /* Assume that any given atomic operation has low contention,
@@ -71,11 +71,11 @@ DEF_PREDICTOR (PRED_BUILTIN_EXPECT, "__builtin_expect", PROB_VERY_LIKELY,
 
 /* Use number of loop iterations guessed by the contents of the loop.  */
 DEF_PREDICTOR (PRED_LOOP_ITERATIONS_GUESSED, "guessed loop iterations",
-	   PROB_ALWAYS, PRED_FLAG_FIRST_MATCH)
+	   PROB_UNINITIALIZED, PRED_FLAG_FIRST_MATCH)
 
 /* Use number of loop iterations guessed by the contents of the loop.  */
 DEF_PREDICTOR (PRED_LOOP_ITERATIONS_MAX, "guessed loop iterations",
-	   PROB_ALWAYS, PRED_FLAG_FIRST_MATCH)
+	   PROB_UNINITIALIZED, PRED_FLAG_FIRST_MATCH)
 
 /* Branch containing goto is probably not taken.  */
 DEF_PREDICTOR (PRED_CONTINUE, "continue", HITRATE (67), 0)
@@ -151,7 +151,7 @@ DEF_PREDICTOR (PRED_LOOP_IV_COMPARE_GUESS, "guess loop iv compare",
 
 /* Use number of loop iterations determined by # of iterations analysis
to set probability of branches that compares IV to loop bound variable.  */
-DEF_PREDICTOR (PRED_LOOP_IV_COMPARE, "loop iv compare", PROB_VERY_LIKELY,
+DEF_PREDICTOR (PRED_LOOP_IV_COMPARE, "loop iv compare", PROB_UNINITIALIZED,
 	   PRED_FLAG_FIRST_MATCH)
 
 /* In the following code
diff --git a/gcc/predict.h b/gcc/predict.h
index 57715159b95..e4d1da090ca 100644
--- a/gcc/predict.h
+++ b/gcc/predict.h
@@ -31,6 +31,7 @@ along with GCC; see the file COPYING3.  If not see
 #define PROB_ALWAYS		(REG_BR_PROB_BASE)
 #define PROB_UNLIKELY   (REG_BR_PROB_BASE / 5 - 1)
 #define PROB_LIKELY (REG_BR_PROB_BASE - PROB_UNLIKELY)
+#define PROB_UNINITIALIZED  (-1)
 
 #define DEF_PREDICTOR(ENUM, NAME, HITRATE, FLAGS) ENUM,
 enum br_predictor
-- 
2.14.3



Re: Disable autogeneration of gather instructions on Ryzen and generic

2018-01-09 Thread Richard Biener
On Tue, Jan 9, 2018 at 11:58 AM, Jan Hubicka  wrote:
>> On Tue, Jan 9, 2018 at 11:26 AM, Jan Hubicka  wrote:
>> > Hi,
>> > gather instructions are rather hard to implement in hardware and except for
>> > skylake+ chips (i.e. haswell and Zen) they seems to be rather slow; to the
>> > degree I did not find real world loop where gather would help on Zen.
>> > This patch simply adds a knob to disable its autogeneration (builtin still
>> > works). I have considered two alternatives
>> >  1) tune this with x86-tune-costs because gather is still profitable than
>> > scalar code if we do very expensive operations (such as sequence of 
>> > divides)
>> > on the values gathered/scattered
>> >  2) implement expansion of gathers into primitive instructions.  This is 
>> > faster
>> > as semantics of gather is bit weird and not fully needed for our 
>> > vectorizer
>> >
>> > I did not have luck to get any good results out of 1 alone as cost model 
>> > is not
>> > very realistic.  1+2 probably makes sense but we can do this incrementally 
>> > as
>> > that would make most sense to be implemented generically in vectorizer on 
>> > the
>> > top of this change.
>>
>> Note that the vectorizer gives up on loops with gathers with no target
>> support for
>> gathers.  It could simply open-code the gather though (and properly cost that
>> open-coded variant), that's probably the way to go here.
>
> Agreed, that is whay i meant by 2.  Generic code has all info to implement 
> gathers/scatters
> by hand when they are not supported and estimate cost of the implementation.
> This is what I meant by the paragraph above.  I tried to look into it about 
> month
> ago but got bit lost in vectorizer internals :(

Yeah, I think the only bits that cannot be done open-coded in an
efficient way is
the masking.

Richard.

> Honza
>>
>> Richard.
>>
>> > Given that gather is problematic even on skylake+ as shown in the PR 
>> > (which has
>> > most optimized implementation of it) it is good to have a knob to control 
>> > its
>> > codegen at first place.
>> >
>> > I have also disabled gathers for generic.  This is because its use causes
>> > some two-digit regression on zen for spec2k17 while there are no measurable
>> > benefits on Intel.  Note that this affects only
>> > -march= -mtune=generic
>> > as by default we do not use AVX2.
>> >
>> > Bootstrapped/regtested x86_64-linux, plan to commit it later today if there
>> > are no complains.
>> >
>> > Honza
>> >
>> > PR target/81616
>> > * i386.c (ix86_vectorize_builtin_gather): Check TARGET_USE_GATHER.
>> > * i386.h (TARGET_USE_GATHER): Define.
>> > * x86-tune.def (X86_TUNE_USE_GATHER): New.
>> > Index: config/i386/i386.c
>> > ===
>> > --- config/i386/i386.c  (revision 256369)
>> > +++ config/i386/i386.c  (working copy)
>> > @@ -38233,7 +38233,7 @@ ix86_vectorize_builtin_gather (const_tre
>> >bool si;
>> >enum ix86_builtins code;
>> >
>> > -  if (! TARGET_AVX2)
>> > +  if (! TARGET_AVX2 || !TARGET_USE_GATHER)
>> >  return NULL_TREE;
>> >
>> >if ((TREE_CODE (index_type) != INTEGER_TYPE
>> > Index: config/i386/i386.h
>> > ===
>> > --- config/i386/i386.h  (revision 256369)
>> > +++ config/i386/i386.h  (working copy)
>> > @@ -498,6 +498,8 @@ extern unsigned char ix86_tune_features[
>> > ix86_tune_features[X86_TUNE_SLOW_PSHUFB]
>> >  #define TARGET_AVOID_4BYTE_PREFIXES \
>> > ix86_tune_features[X86_TUNE_AVOID_4BYTE_PREFIXES]
>> > +#define TARGET_USE_GATHER \
>> > +   ix86_tune_features[X86_TUNE_USE_GATHER]
>> >  #define TARGET_FUSE_CMP_AND_BRANCH_32 \
>> > ix86_tune_features[X86_TUNE_FUSE_CMP_AND_BRANCH_32]
>> >  #define TARGET_FUSE_CMP_AND_BRANCH_64 \
>> > Index: config/i386/x86-tune.def
>> > ===
>> > --- config/i386/x86-tune.def(revision 256369)
>> > +++ config/i386/x86-tune.def(working copy)
>> > @@ -399,6 +399,10 @@ DEF_TUNE (X86_TUNE_SLOW_PSHUFB, "slow_ps
>> >  DEF_TUNE (X86_TUNE_AVOID_4BYTE_PREFIXES, "avoid_4byte_prefixes",
>> >m_SILVERMONT | m_INTEL)
>> >
>> > +/* X86_TUNE_USE_GATHER: Use gather instructions.  */
>> > +DEF_TUNE (X86_TUNE_USE_GATHER, "use_gather",
>> > +  ~(m_ZNVER1 | m_GENERIC))
>> > +
>> >  
>> > /*/
>> >  /* AVX instruction selection tuning (some of SSE flags affects AVX, too)  
>> >*/
>> >  
>> > /*/


[PATCH v2.4 of 02/14] Support for adding and stripping location_t wrapper nodes

2018-01-09 Thread David Malcolm
On Mon, 2018-01-08 at 19:08 +0100, Jakub Jelinek wrote:
> On Mon, Jan 08, 2018 at 01:02:37PM -0500, David Malcolm wrote:
> > Thanks Nathan and Jakub: a quick smoketest using TREE_LANG_FLAG_0
> > worked, and fixes this issue.
> > 
> > However, should I be using a TREE_LANG_FLAG for something that's in
> > tree.h/c, rather than just in the "cp" subdir?  (the wrapper nodes
> > are
> > only added to C++ in this patch kit, but given that e.g. STRIP_NOPS
> > needs to remove them, the lang-independent code needs to handle
> > them,
> > and ultimately we may want wrapper nodes in other FEs).
> 
> If location_wrapper_p etc. are in generic code rather than only in
> the FE,
> sure, you'd need to use some generic flag rather than FE specific.
> I bet e.g. private_flag and protected_flag aren't used yet for
> VIEW_CONVERT_EXPR/NON_LVALUE_EXPR, but they are used on some other
> expressions, e.g. CALL_EXPR, which suggests that it could be used for
> that.
> 
>   Jakub

Thanks.  "public_flag" seemed to be available for those codes, so I
used that, via a new EXPR_LOCATION_WRAPPER_P macro

Here's an updated version of patch 2 of the kit to do so [1].

Successfully bootstrapped®rtested on x86_64-pc-linux-gnu, as
part of the kit.
Also, manually tested with "make s-selftest-c++" (since we don't run
the selftests for cc1plus by default).

OK for trunk in conjunction with the rest of the kit?  (I'm
about to post an updated version of the [03/14] patch, which is
the only other part of the kit still needing review).

Dave

[1] (it also merges in the part of patch [08/14] for making wrappers
for STRING_CST use VIEW_CONVERT_EXPR, which Jason has already approved;
needs to be here to avoid git conflicts; all of this will be one
squashed patch if/when committed).

Blurb follows:

Changed in v2.4:
* use a flag rather than types for indicating location_wrapper_p (to
cope with decls changing type from under us)
* add the STRING_CST part of the [08/14] lvalue_kind patch

Changed in v2.3:
* move most of location_wrapper_p's comment inside the function
(approved by Jason here:
   https://gcc.gnu.org/ml/gcc-patches/2017-11/msg02618.html )

Changed in v2.2:
* added tree_strip_any_location_wrapper
* removed checking for !CONSTANT_CLASS_P on VIEW_CONVERT_EXPR

Changed in v2.1:
* updated comments for location_wrapper_p and
maybe_wrap_with_location
* added selftests

This patch provides a mechanism in tree.c for adding a wrapper node
for expressing a location_t, for those nodes for which
!CAN_HAVE_LOCATION_P.  It's called in later patches in the kit via a
new method of cp_expr.

In this version of the patch, I use NON_LVALUE_EXPR for wrapping
constants, and VIEW_CONVERT_EXPR for other nodes.

I also turned off wrapper nodes for EXCEPTIONAL_CLASS_P, for the sake
of keeping the patch kit more minimal.

The patch also adds a STRIP_ANY_LOCATION_WRAPPER macro for stripping
such nodes, used later on in the patch kit.

gcc/cp/ChangeLog:
PR c++/43486
* cp-tree.h (cp_expr::maybe_add_location_wrapper): New method.

gcc/ChangeLog:
PR c++/43486
* tree-core.h: Document EXPR_LOCATION_WRAPPER_P's usage of
"public_flag".
* tree.c (maybe_wrap_with_location): New function.
(selftest::test_location_wrappers): New function.
(selftest::tree_c_tests): Call it.
* tree.h (STRIP_ANY_LOCATION_WRAPPER): New macro.
(maybe_wrap_with_location): New decl.
(EXPR_LOCATION_WRAPPER_P): New macro.
(location_wrapper_p): New inline function.
(tree_strip_any_location_wrapper): New inline function.
---
 gcc/cp/cp-tree.h |  6 
 gcc/tree-core.h  |  3 ++
 gcc/tree.c   | 83 
 gcc/tree.h   | 47 
 4 files changed, 139 insertions(+)

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index d408370..d4b45a4 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -93,6 +93,12 @@ public:
 set_location (make_location (m_loc, start, finish));
   }
 
+  cp_expr& maybe_add_location_wrapper ()
+  {
+m_value = maybe_wrap_with_location (m_value, m_loc);
+return *this;
+  }
+
  private:
   tree m_value;
   location_t m_loc;
diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index b08d215..b127bc7 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -1116,6 +1116,9 @@ struct GTY(()) tree_base {
SSA_NAME_IS_VIRTUAL_OPERAND in
   SSA_NAME
 
+   EXPR_LOCATION_WRAPPER_P in
+  NON_LVALUE_EXPR, VIEW_CONVERT_EXPR
+
private_flag:
 
TREE_PRIVATE in
diff --git a/gcc/tree.c b/gcc/tree.c
index aa647de..8b1c4b2 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -14066,6 +14066,40 @@ set_source_range (tree expr, source_range src_range)
   return adhoc;
 }
 
+/* Return EXPR, potentially wrapped with a node expression LOC,
+   if !CAN_HAVE_LOCATION_P (expr).
+
+   NON_LVALUE_EXPR is used for wrapping constants, apart from STRING_CST.
+   VIEW_CONVERT_EXPR is used for wrapping

Fix permute handling when vectorising scatters

2018-01-09 Thread Richard Sandiford
As mentioned in https://gcc.gnu.org/ml/gcc-patches/2017-11/msg01575.html ,
the scatter handling in vectorizable_store seems to be dead code at the
moment.  Enabling it with the vect_analyze_data_ref_access part of
that patch triggered an ICE in the avx512f-scatter-*.c tests (which
previously didn't use scatters).  The problem was that the NARROW
and WIDEN handling uses permute_vec_elements to marshal the inputs,
and permute_vec_elements expected the lhs of the stmt to be an SSA_NAME,
which of course it isn't for stores.

This patch makes permute_vec_elements create a fresh variable in this case.

Tested on x86_64-linux-gnu.  OK to install?

Richard


2018-01-09  Richard Sandiford  

gcc/
* tree-vect-stmts.c (permute_vec_elements): Create a fresh variable
if the destination isn't an SSA_NAME.

Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   2018-01-03 21:47:27.956862491 +
+++ gcc/tree-vect-stmts.c   2018-01-09 11:30:00.578821631 +
@@ -6585,7 +6585,11 @@ permute_vec_elements (tree x, tree y, tr
   tree perm_dest, data_ref;
   gimple *perm_stmt;
 
-  perm_dest = vect_create_destination_var (gimple_get_lhs (stmt), vectype);
+  tree scalar_dest = gimple_get_lhs (stmt);
+  if (TREE_CODE (scalar_dest) == SSA_NAME)
+perm_dest = vect_create_destination_var (scalar_dest, vectype);
+  else
+perm_dest = vect_get_new_vect_var (vectype, vect_simple_var, NULL);
   data_ref = make_ssa_name (perm_dest);
 
   /* Generate the permute statement.  */


[PATCH 3/5] Adjust predictor values according to SPEC2006 and, SPEC2017.

2018-01-09 Thread Martin Liška
Third part changes predictors values:

1) PRED_LOOP_EXIT: no dominant branch; value simply taken from statistics
2) PRED_LOOP_EXIT_WITH_RECURSION: there are 4 dominant edges; value taken w/o 
these edges
3) PRED_LOOP_EXTRA_EXIT: there's one really dominant edge; value taken w/o the 
edge; note that
coverage of the predictor is quite small
4) PRED_OPCODE_POSITIVE: no dominant branch; value simply taken from statistics
5) PRED_CONST_RETURN: there are 4 dominant edges, value taken w/o these edges
6) PRED_NULL_RETURN: one dominant edge, value taken w/o these edges
7) PRED_LOOP_IV_COMPARE_GUESS: very fragile predictor (according how is 
identified in predict.c);
has a dominant edge; value taken w/o these edges; in future I plan to 
investigate it
8) PRED_LOOP_GUARD: adjusted based on numbers without a one dominant edge

Martin
>From 960f16a6e3e916524d8881f53913c15a3c2ec2ae Mon Sep 17 00:00:00 2001
From: marxin 
Date: Thu, 28 Dec 2017 10:13:50 +0100
Subject: [PATCH 3/5] Adjust predictor values according to SPEC2006 and
 SPEC2017.

gcc/ChangeLog:

2017-12-28  Martin Liska  

	* predict.def (PRED_LOOP_EXIT): Change from 85 to 89.
	(PRED_LOOP_EXIT_WITH_RECURSION): Change from 72 to 78.
	(PRED_LOOP_EXTRA_EXIT): Change from 83 to 67.
	(PRED_OPCODE_POSITIVE): Change from 64 to 59.
	(PRED_TREE_OPCODE_POSITIVE): Change from 64 to 59.
	(PRED_CONST_RETURN): Change from 69 to 65.
	(PRED_NULL_RETURN): Change from 91 to 71.
	(PRED_LOOP_IV_COMPARE_GUESS): Change from 98 to 64.
	(PRED_LOOP_GUARD): Change from 66 to 73.
---
 gcc/predict.def | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/gcc/predict.def b/gcc/predict.def
index 390b9a35fa7..fe72080d5bd 100644
--- a/gcc/predict.def
+++ b/gcc/predict.def
@@ -89,16 +89,16 @@ DEF_PREDICTOR (PRED_COLD_FUNCTION, "cold function call", PROB_VERY_LIKELY,
 	   PRED_FLAG_FIRST_MATCH)
 
 /* Edge causing loop to terminate is probably not taken.  */
-DEF_PREDICTOR (PRED_LOOP_EXIT, "loop exit", HITRATE (85),
+DEF_PREDICTOR (PRED_LOOP_EXIT, "loop exit", HITRATE (89),
 	   PRED_FLAG_FIRST_MATCH)
 
 /* Same as LOOP_EXIT but for loops containing recursive call.  */
 DEF_PREDICTOR (PRED_LOOP_EXIT_WITH_RECURSION, "loop exit with recursion",
-	   HITRATE (72), PRED_FLAG_FIRST_MATCH)
+	   HITRATE (78), PRED_FLAG_FIRST_MATCH)
 
 /* Edge causing loop to terminate by computing value used by later
conditional.  */
-DEF_PREDICTOR (PRED_LOOP_EXTRA_EXIT, "extra loop exit", HITRATE (83),
+DEF_PREDICTOR (PRED_LOOP_EXTRA_EXIT, "extra loop exit", HITRATE (67),
 	   PRED_FLAG_FIRST_MATCH)
 
 /* Pointers are usually not NULL.  */
@@ -106,11 +106,11 @@ DEF_PREDICTOR (PRED_POINTER, "pointer", HITRATE (70), 0)
 DEF_PREDICTOR (PRED_TREE_POINTER, "pointer (on trees)", HITRATE (70), 0)
 
 /* NE is probable, EQ not etc...  */
-DEF_PREDICTOR (PRED_OPCODE_POSITIVE, "opcode values positive", HITRATE (64), 0)
+DEF_PREDICTOR (PRED_OPCODE_POSITIVE, "opcode values positive", HITRATE (59), 0)
 DEF_PREDICTOR (PRED_OPCODE_NONEQUAL, "opcode values nonequal", HITRATE (66), 0)
 DEF_PREDICTOR (PRED_FPOPCODE, "fp_opcode", HITRATE (90), 0)
 DEF_PREDICTOR (PRED_TREE_OPCODE_POSITIVE, "opcode values positive (on trees)",
-	   HITRATE (64), 0)
+	   HITRATE (59), 0)
 DEF_PREDICTOR (PRED_TREE_OPCODE_NONEQUAL, "opcode values nonequal (on trees)",
 	   HITRATE (66), 0)
 DEF_PREDICTOR (PRED_TREE_FPOPCODE, "fp_opcode (on trees)", HITRATE (90), 0)
@@ -136,18 +136,18 @@ DEF_PREDICTOR (PRED_TREE_EARLY_RETURN, "early return (on trees)", HITRATE (66),
 DEF_PREDICTOR (PRED_GOTO, "goto", HITRATE (66), 0)
 
 /* Branch ending with return constant is probably not taken.  */
-DEF_PREDICTOR (PRED_CONST_RETURN, "const return", HITRATE (69), 0)
+DEF_PREDICTOR (PRED_CONST_RETURN, "const return", HITRATE (65), 0)
 
 /* Branch ending with return negative constant is probably not taken.  */
 DEF_PREDICTOR (PRED_NEGATIVE_RETURN, "negative return", HITRATE (98), 0)
 
 /* Branch ending with return; is probably not taken */
-DEF_PREDICTOR (PRED_NULL_RETURN, "null return", HITRATE (91), 0)
+DEF_PREDICTOR (PRED_NULL_RETURN, "null return", HITRATE (71), 0)
 
 /* Branches to compare induction variable to a loop bound is
extremely likely.  */
 DEF_PREDICTOR (PRED_LOOP_IV_COMPARE_GUESS, "guess loop iv compare",
-	   HITRATE (98), 0)
+	   HITRATE (64), 0)
 
 /* Use number of loop iterations determined by # of iterations analysis
to set probability of branches that compares IV to loop bound variable.  */
@@ -160,7 +160,7 @@ DEF_PREDICTOR (PRED_LOOP_IV_COMPARE, "loop iv compare", PROB_UNINITIALIZED,
for (loop2)
 	 body;
guess that cond is unlikely.  */
-DEF_PREDICTOR (PRED_LOOP_GUARD, "loop guard", HITRATE (66), 0)
+DEF_PREDICTOR (PRED_LOOP_GUARD, "loop guard", HITRATE (73), 0)
 
 /* Same but for loops containing recursion.  */
 DEF_PREDICTOR (PRED_LOOP_GUARD_WITH_RECURSION, "loop guard with recursion",
-- 
2.14.3



Re: [PATCH 0/3] Add __builtin_load_no_speculate

2018-01-09 Thread Richard Earnshaw (lists)
On 05/01/18 17:24, Jeff Law wrote:
> On 01/04/2018 06:58 AM, Richard Earnshaw wrote:
>>
>> Recently, Google Project Zero disclosed several classes of attack
>> against speculative execution. One of these, known as variant-1
>> (CVE-2017-5753), allows explicit bounds checks to be bypassed under
>> speculation, providing an arbitrary read gadget. Further details can
>> be found on the GPZ blog [1] and the documentation that is included
>> with the first patch.
> So I think it's important for anyone reading this stuff to read
> Richard's paragraph carefully --  "an arbitrary read gadget".
> 
> I fully expect that over the course of time we're going to see other
> arbitrary read gadgets to be found.  Those gadgets may have lower
> bandwidth, have a higher degree of jitter or be harder to exploit, but
> they're out there just waiting to be discovered.
> 
> So I don't expect this to be the only mitigation we have to put into place.
> 
> Anyway...
> 
> 
>>
>> Some optimizations are permitted to make the builtin easier to use.
>> The final two arguments can both be omitted (c++ style): failval will
>> default to 0 in this case and if cmpptr is omitted ptr will be used
>> for expansions of the range check.  In addition either lower or upper
>> (but not both) may be a literal NULL and the expansion will then
>> ignore that boundary condition when expanding.
> So what are the cases where FAILVAL is useful rather than just using
> zero (or some other constant) all the time?

So some background first.  My initial attempts to define a builtin were
based entirely around trying to specify the builtin without out any hard
functional behaviour as well.  The specification was a mess.  Things
just became so much easier to specify when I moved to defining a logical
behaviour on top of the intended side effects on speculation.  Having
done that it seemed sensible to permit the user to use the builtin in
more creative ways by allowing it to select the failure value.  The idea
was that code such as

  if (ptr >= base && ptr < limit) // bounds check
return *ptr;
  return FAILVAL;

could simply be rewritten as

  return __builtin_load_no_speculate (ptr, base, limit, FAILVAL);

and now you don't have to worry about writing the condition out twice or
any other such nonsense.  In this case the builtin does exactly what you
want it to do.  (It's also easier now to write test cases that check the
correctness of the builtin's expansion, since you can test for a
behaviour of the code's execution, even if you can't check the
speculation behaviour properly.)



> 
> Similarly under what conditions would one want to use CMPPTR rather than
> PTR?

This was at the request of some kernel folk.  They have some cases where
CMPPTR may be a pointer to an atomic type and want to do something like

  if (cmpptr >= lower && cmpptr < upper)
val = __atomic_read_and_inc (cmpptr);

The atomics are complicated enough already and rewriting them all to
additionally inhibit speculation based on the result would be a
nightmare.  By separating out cmpptr from ptr you can now write

  if (cmpptr >= lower && cmpptr < upper)
{
  TYPE tmp_val = __atomic_read_and_inc (cmpptr);
  val = builtin_load_no_speculate (&tmp_val, lower, upper, 0,
   cmpptr);
}

It's meaningless in this case to check the bounds of tmp_val - it's just
a value pushed onto the stack in order to satisfy the requirement that
we need a load that is being speculatively executed; but we can still
test against the speculation conditions, which are still the range check
for cmpptr.

There may be other similar cases as well where you have an object that
you want to clean up that is somehow derived from cmpptr but is not
cmpptr itself.  You do have to be more careful with the extended form,
since it is possible to write sequences that don't inhibit speculation
in the way you might think they do, but with greater flexibility also
comes greater risk.  I don't think there's ever a problem if ptr is
somehow derived from dereferencing cmpptr.

R.

> 
> I wandered down through the LKML thread but didn't see anything which
> would shed light on those two questions.
> 
> jeff
>>



Re: Fix permute handling when vectorising scatters

2018-01-09 Thread Richard Biener
On Tue, Jan 9, 2018 at 12:34 PM, Richard Sandiford
 wrote:
> As mentioned in https://gcc.gnu.org/ml/gcc-patches/2017-11/msg01575.html ,
> the scatter handling in vectorizable_store seems to be dead code at the
> moment.  Enabling it with the vect_analyze_data_ref_access part of
> that patch triggered an ICE in the avx512f-scatter-*.c tests (which
> previously didn't use scatters).  The problem was that the NARROW
> and WIDEN handling uses permute_vec_elements to marshal the inputs,
> and permute_vec_elements expected the lhs of the stmt to be an SSA_NAME,
> which of course it isn't for stores.
>
> This patch makes permute_vec_elements create a fresh variable in this case.
>
> Tested on x86_64-linux-gnu.  OK to install?

Ok.

Richard.

> Richard
>
>
> 2018-01-09  Richard Sandiford  
>
> gcc/
> * tree-vect-stmts.c (permute_vec_elements): Create a fresh variable
> if the destination isn't an SSA_NAME.
>
> Index: gcc/tree-vect-stmts.c
> ===
> --- gcc/tree-vect-stmts.c   2018-01-03 21:47:27.956862491 +
> +++ gcc/tree-vect-stmts.c   2018-01-09 11:30:00.578821631 +
> @@ -6585,7 +6585,11 @@ permute_vec_elements (tree x, tree y, tr
>tree perm_dest, data_ref;
>gimple *perm_stmt;
>
> -  perm_dest = vect_create_destination_var (gimple_get_lhs (stmt), vectype);
> +  tree scalar_dest = gimple_get_lhs (stmt);
> +  if (TREE_CODE (scalar_dest) == SSA_NAME)
> +perm_dest = vect_create_destination_var (scalar_dest, vectype);
> +  else
> +perm_dest = vect_get_new_vect_var (vectype, vect_simple_var, NULL);
>data_ref = make_ssa_name (perm_dest);
>
>/* Generate the permute statement.  */


[PATCH 4/5] Remove predictors that are unrealiable.

2018-01-09 Thread Martin Liška
These predictors are in my opinion not reliable and thus I decided to remove 
them:

1) PRED_NEGATIVE_RETURN: probability is ~51%
2) PRED_RECURSIVE_CALL: there are 2 dominant edges that influence value to 63%;
w/o these edges it goes down to 52%
3) PRED_POLYMORPHIC_CALL: having very low coverage, probability is ~51%
4) PRED_INDIR_CALL: likewise

Question here is whether we want to remove them, or to predict them with a 
'ignored'
flag? Doing that, we can measure statistics of the predictor in the future?

Martin
>From afbc86cb72eab37bcf6325954d0bf306b301f76e Mon Sep 17 00:00:00 2001
From: marxin 
Date: Thu, 28 Dec 2017 10:23:48 +0100
Subject: [PATCH 4/5] Remove predictors that are unrealiable.

gcc/ChangeLog:

2017-12-28  Martin Liska  

	* predict.c (return_prediction): Do not predict
	PRED_NEGATIVE_RETURN.
	(tree_bb_level_predictions): Do not predict PRED_RECURSIVE_CALL.
	(tree_estimate_probability_bb): Do not predict
	PRED_POLYMORPHIC_CALL and PRED_INDIR_CALL.
	* predict.def (PRED_INDIR_CALL): Remove unused predictors.
	(PRED_POLYMORPHIC_CALL): Likewise.
	(PRED_RECURSIVE_CALL): Likewise.
	(PRED_NEGATIVE_RETURN): Likewise.
---
 gcc/predict.c   | 17 ++---
 gcc/predict.def | 13 -
 2 files changed, 2 insertions(+), 28 deletions(-)

diff --git a/gcc/predict.c b/gcc/predict.c
index 51fd14205c2..f53724792e9 100644
--- a/gcc/predict.c
+++ b/gcc/predict.c
@@ -2632,14 +2632,6 @@ return_prediction (tree val, enum prediction *prediction)
 }
   else if (INTEGRAL_TYPE_P (TREE_TYPE (val)))
 {
-  /* Negative return values are often used to indicate
- errors.  */
-  if (TREE_CODE (val) == INTEGER_CST
-	  && tree_int_cst_sgn (val) < 0)
-	{
-	  *prediction = NOT_TAKEN;
-	  return PRED_NEGATIVE_RETURN;
-	}
   /* Constant return values seems to be commonly taken.
  Zero/one often represent booleans so exclude them from the
 	 heuristics.  */
@@ -2820,9 +2812,6 @@ tree_bb_level_predictions (void)
    DECL_ATTRIBUTES (decl)))
 		predict_paths_leading_to (bb, PRED_COLD_FUNCTION,
 	  NOT_TAKEN);
-	  if (decl && recursive_call_p (current_function_decl, decl))
-		predict_paths_leading_to (bb, PRED_RECURSIVE_CALL,
-	  NOT_TAKEN);
 	}
 	  else if (gimple_code (stmt) == GIMPLE_PREDICT)
 	{
@@ -2880,12 +2869,10 @@ tree_estimate_probability_bb (basic_block bb, bool local_only)
 		 something exceptional.  */
 		  && gimple_has_side_effects (stmt))
 		{
+		  /* Consider just normal function calls, skip indirect and
+		  polymorphic calls as these tend to be unreliable.  */
 		  if (gimple_call_fndecl (stmt))
 		predict_edge_def (e, PRED_CALL, NOT_TAKEN);
-		  else if (virtual_method_call_p (gimple_call_fn (stmt)))
-		predict_edge_def (e, PRED_POLYMORPHIC_CALL, NOT_TAKEN);
-		  else
-		predict_edge_def (e, PRED_INDIR_CALL, TAKEN);
 		  break;
 		}
 	}
diff --git a/gcc/predict.def b/gcc/predict.def
index fe72080d5bd..7291650d237 100644
--- a/gcc/predict.def
+++ b/gcc/predict.def
@@ -118,16 +118,6 @@ DEF_PREDICTOR (PRED_TREE_FPOPCODE, "fp_opcode (on trees)", HITRATE (90), 0)
 /* Branch guarding call is probably taken.  */
 DEF_PREDICTOR (PRED_CALL, "call", HITRATE (67), 0)
 
-/* PRED_CALL is not very reliable predictor and it turns out to be even
-   less reliable for indirect calls and polymorphic calls.  For spec2k6
-   the predictio nis slightly in the direction of taking the call.  */
-DEF_PREDICTOR (PRED_INDIR_CALL, "indirect call", HITRATE (86), 0)
-DEF_PREDICTOR (PRED_POLYMORPHIC_CALL, "polymorphic call", HITRATE (59), 0)
-
-/* Recursive calls are usually not taken or the function will recurse
-   indefinitely.  */
-DEF_PREDICTOR (PRED_RECURSIVE_CALL, "recursive call", HITRATE (75), 0)
-
 /* Branch causing function to terminate is probably not taken.  */
 DEF_PREDICTOR (PRED_TREE_EARLY_RETURN, "early return (on trees)", HITRATE (66),
 	   0)
@@ -138,9 +128,6 @@ DEF_PREDICTOR (PRED_GOTO, "goto", HITRATE (66), 0)
 /* Branch ending with return constant is probably not taken.  */
 DEF_PREDICTOR (PRED_CONST_RETURN, "const return", HITRATE (65), 0)
 
-/* Branch ending with return negative constant is probably not taken.  */
-DEF_PREDICTOR (PRED_NEGATIVE_RETURN, "negative return", HITRATE (98), 0)
-
 /* Branch ending with return; is probably not taken */
 DEF_PREDICTOR (PRED_NULL_RETURN, "null return", HITRATE (71), 0)
 
-- 
2.14.3



[PATCH 5/5] Fix test-suite fallout.

2018-01-09 Thread Martin Liška
Last patch from the series it clean up of test-suite.

Martin
>From e76a4544e236d7d586380f33e20431f854436ce4 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Tue, 9 Jan 2018 10:52:36 +0100
Subject: [PATCH 5/5] Fix test-suite fallout.

gcc/testsuite/ChangeLog:

2018-01-09  Martin Liska  

	* gcc.dg/predict-1.c: Adjust expected probability.
	* gcc.dg/predict-10.c: Remove.
	* gcc.dg/predict-12.c: Do not scan for removed predictor.
	* gcc.dg/predict-3.c: Adjust expected probability.
	* gcc.dg/predict-5.c: Likewise.
	* gcc.dg/predict-6.c: Likewise.
	* gcc.dg/predict-9.c: Likewise.
---
 gcc/testsuite/gcc.dg/predict-1.c  |  2 +-
 gcc/testsuite/gcc.dg/predict-10.c | 11 ---
 gcc/testsuite/gcc.dg/predict-12.c |  1 -
 gcc/testsuite/gcc.dg/predict-3.c  |  2 +-
 gcc/testsuite/gcc.dg/predict-5.c  |  2 +-
 gcc/testsuite/gcc.dg/predict-6.c  |  2 +-
 gcc/testsuite/gcc.dg/predict-9.c  |  4 ++--
 7 files changed, 6 insertions(+), 18 deletions(-)
 delete mode 100644 gcc/testsuite/gcc.dg/predict-10.c

diff --git a/gcc/testsuite/gcc.dg/predict-1.c b/gcc/testsuite/gcc.dg/predict-1.c
index 65f6bad9d7c..4ba26e6e256 100644
--- a/gcc/testsuite/gcc.dg/predict-1.c
+++ b/gcc/testsuite/gcc.dg/predict-1.c
@@ -23,4 +23,4 @@ void foo (int bound)
 }
 }
 
-/* { dg-final { scan-tree-dump-times "guess loop iv compare heuristics of edge\[^:\]*: 2.0%" 4 "profile_estimate"} } */
+/* { dg-final { scan-tree-dump-times "guess loop iv compare heuristics of edge\[^:\]*: 36.0%" 4 "profile_estimate"} } */
diff --git a/gcc/testsuite/gcc.dg/predict-10.c b/gcc/testsuite/gcc.dg/predict-10.c
deleted file mode 100644
index a99819a24c6..000
--- a/gcc/testsuite/gcc.dg/predict-10.c
+++ /dev/null
@@ -1,11 +0,0 @@
-/* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-profile_estimate" } */
-int
-ee(int i)
-{
-  if (i>2)
-return (ee(i-1)+ee(i-2))/2;
-  else
-return i;
-}
-/* { dg-final { scan-tree-dump-times "recursive call" 1 "profile_estimate"} } */
diff --git a/gcc/testsuite/gcc.dg/predict-12.c b/gcc/testsuite/gcc.dg/predict-12.c
index 1fd4d67c60e..eb5178e7a2e 100644
--- a/gcc/testsuite/gcc.dg/predict-12.c
+++ b/gcc/testsuite/gcc.dg/predict-12.c
@@ -14,4 +14,3 @@ t(void)
 }
 /* { dg-final { scan-tree-dump-times "loop guard with recursion" 1 "profile_estimate"} } */
 /* { dg-final { scan-tree-dump-times "loop exit with recursion" 2 "profile_estimate"} } */
-/* { dg-final { scan-tree-dump-times "recursive call" 1 "profile_estimate"} } */
diff --git a/gcc/testsuite/gcc.dg/predict-3.c b/gcc/testsuite/gcc.dg/predict-3.c
index 7274963b943..81addde1667 100644
--- a/gcc/testsuite/gcc.dg/predict-3.c
+++ b/gcc/testsuite/gcc.dg/predict-3.c
@@ -25,4 +25,4 @@ void foo (int bound)
 }
 }
 
-/* { dg-final { scan-tree-dump-times "guess loop iv compare heuristics of edge\[^:\]*: 98.0%" 3 "profile_estimate"} } */
+/* { dg-final { scan-tree-dump-times "guess loop iv compare heuristics of edge\[^:\]*: 64.0%" 3 "profile_estimate"} } */
diff --git a/gcc/testsuite/gcc.dg/predict-5.c b/gcc/testsuite/gcc.dg/predict-5.c
index 135081de2a4..c80b2928d57 100644
--- a/gcc/testsuite/gcc.dg/predict-5.c
+++ b/gcc/testsuite/gcc.dg/predict-5.c
@@ -21,4 +21,4 @@ void foo (int base, int bound)
 }
 }
 
-/* { dg-final { scan-tree-dump-times "guess loop iv compare heuristics of edge\[^:\]*: 98.0%" 4 "profile_estimate"} } */
+/* { dg-final { scan-tree-dump-times "guess loop iv compare heuristics of edge\[^:\]*: 64.0%" 4 "profile_estimate"} } */
diff --git a/gcc/testsuite/gcc.dg/predict-6.c b/gcc/testsuite/gcc.dg/predict-6.c
index 104683f7f43..3acc7644629 100644
--- a/gcc/testsuite/gcc.dg/predict-6.c
+++ b/gcc/testsuite/gcc.dg/predict-6.c
@@ -21,4 +21,4 @@ void foo (int base, int bound)
 }
 }
 
-/* { dg-final { scan-tree-dump-times "guess loop iv compare heuristics of edge\[^:\]*: 2.0%" 4 "profile_estimate"} } */
+/* { dg-final { scan-tree-dump-times "guess loop iv compare heuristics of edge\[^:\]*: 36.0%" 4 "profile_estimate"} } */
diff --git a/gcc/testsuite/gcc.dg/predict-9.c b/gcc/testsuite/gcc.dg/predict-9.c
index ec467519504..3823775e3f8 100644
--- a/gcc/testsuite/gcc.dg/predict-9.c
+++ b/gcc/testsuite/gcc.dg/predict-9.c
@@ -19,5 +19,5 @@ void foo (int base)
   }
 }
 
-/* { dg-final { scan-tree-dump-times "first match heuristics: 3.0%" 3 "profile_estimate"} } */
-/* { dg-final { scan-tree-dump-times "first match heuristics: 7.5%" 1 "profile_estimate"} } */
+/* { dg-final { scan-tree-dump-times "first match heuristics: 2.2%" 3 "profile_estimate"} } */
+/* { dg-final { scan-tree-dump-times "first match heuristics: 5.5%" 1 "profile_estimate"} } */
-- 
2.14.3



Re: [PATCH] Add static assert about stack alignment (PR sanitizer/82517).

2018-01-09 Thread Martin Liška
On 01/09/2018 11:47 AM, Jakub Jelinek wrote:
> On Tue, Jan 09, 2018 at 11:41:17AM +0100, Martin Liška wrote:
>> Folowing static assert is added as we may potentially adjust 
>> ASAN_SHADOW_GRANULARITY
>> (via ASAN_SHADOW_SHIFT). The assert ensures stack variables will have 
>> sufficient
>> alignment.
>>
>> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>>
>> Ready to be installed?
>> Martin
>>
>> gcc/ChangeLog:
>>
>> 2018-01-09  Martin Liska  
>>
>>  PR sanitizer/82517
>>  * asan.c (shadow_mem_size): Add static assert.
> 
> STATIC_ASSERT assumes all the 3 macros expand to constants, not sure if we
> for the future can always guarantee it, e.g. MAX_SUPPORTED_STACK_ALIGNMENT
> could be dependent on some command line option etc.
> Use gcc_assert or gcc_checking_assert instead?

Agree, I've changed that to gcc_assert and installed the patch as r256378.

Martin

> 
>   Jakub
> 



[PATCH v3 of 03/14] C++: add location_t wrapper nodes during parsing (minimal impl)

2018-01-09 Thread David Malcolm
On Mon, 2018-01-08 at 16:59 -0500, Jason Merrill wrote:
> On 01/08/2018 04:49 PM, Jason Merrill wrote:
> > On 01/08/2018 04:01 PM, David Malcolm wrote:
> > > On Fri, 2018-01-05 at 15:29 -0500, Jason Merrill wrote:
> > > > 
> > > > I'd rather handle location wrappers separately, and abort if
> > > > VIEW_CONVERT_EXPR or NON_LVALUE_EXPR appear other than as
> > > > wrappers.
> > > 
> > > Once I fixed the issue with location_wrapper_p with decls
> > > changing
> > > type, it turns out that trunk is already passing
> > > VIEW_CONVERT_EXPR to
> > > tsubst_copy_and_build for non-wrapper nodes (and from there to
> > > tsubst_copy), where the default case currently handles them. 
> > > Adding an
> > > assert turns this into an ICE.
> > > 
> > > g++.dg/delayedfold/builtin1.C is the only instance of it I found
> > > in our
> > > test suite, where it's used here:
> > > 
> > > class RegionLock {
> > >template  void m_fn1();
> > >int spinlock;
> > > } acquire_zero;
> > > int acquire_one;
> > > template  void RegionLock::m_fn1() {
> > >__atomic_compare_exchange(&spinlock, &acquire_zero,
> > > &acquire_one, 
> > > false, 2, 2);
> > > ^
> > > }
> > > 
> > > (gdb) call debug_tree (t)
> > >> 
> > ...
> > >  arg:0  > 
> > ...
> > >  arg:0  > > 0x71a18150>
> > >  arg:0 
> > > 
> > > (This one is just for VIEW_CONVERT_EXPR; I don't yet know of any
> > > existing places where NON_LVALUE_EXPR can be passed to tsubst_*).
> > 
> > Hmm, a NON_DEPENDENT_EXPR also shouldn't make it into the saved
> > trees 
> > for a template.  I'll take a look.
> 
> OK, we're only seeing this when args is NULL_TREE (i.e. under 
> instantiate_non_dependent_expr), so tsubst_copy immediately returns. 
> The assert should come after that.
> 
> Jason

Aha - thanks!

Here's an updated version of the patch.  It adds VIEW_CONVERT_EXPR and
NON_LVALUE_EXPR to tsubst_copy, but only supports them there for location
wrapper nodes, with an assertion.  It also adds them to
tsubst_copy_and_build, with a suitable assertion to handle the case
described above.

I also updated build_non_dependent_expr to avoid losing location wrappers
(by capturing the "orig_expr" and return it rather than "expr").

Successfully bootstrapped®rtested on x86_64-pc-linux-gnu, as
part of the kit.
Also, manually tested with "make s-selftest-c++" (since we don't run
the selftests for cc1plus by default).

OK for trunk in conjunction with the rest of the kit?

(this version of the patch requires the updated version of the [02/14]
patch I posted here:
  https://gcc.gnu.org/ml/gcc-patches/2018-01/msg00591.html
to handle the types of decls from changing under a wrapper,
which is the only other part of the kit still needing review).

Dave

gcc/cp/ChangeLog:
* cp-gimplify.c (cp_fold): Remove the early bailout when
processing_template_decl.
* cp-lang.c (selftest::run_cp_tests): Call
selftest::cp_pt_c_tests.
* cp-tree.h (selftest::cp_pt_c_tests): New decl.
* mangle.c (write_expression): Skip location wrapper nodes.
* parser.c (cp_parser_postfix_expression): Call
maybe_add_location_wrapper on the result for RID_TYPEID. Pass true
for new "wrap_locations_p" param of
cp_parser_parenthesized_expression_list.
(cp_parser_parenthesized_expression_list): Add "wrap_locations_p"
param, defaulting to false.  Convert "expr" to a cp_expr, and call
maybe_add_location_wrapper on it when wrap_locations_p is true.
(cp_parser_unary_expression): Call maybe_add_location_wrapper on
the result for RID_ALIGNOF and RID_SIZEOF.
(cp_parser_builtin_offsetof): Likewise.
* pt.c: Include "selftest.h".
(tsubst_copy): Handle location wrappers.
(tsubst_copy_and_build): Likewise.
(build_non_dependent_expr): Likewise.
(selftest::test_build_non_dependent_expr): New function.
(selftest::cp_pt_c_tests): New function.
---
 gcc/cp/cp-gimplify.c |  5 ++--
 gcc/cp/cp-lang.c |  1 +
 gcc/cp/cp-tree.h | 10 +++
 gcc/cp/mangle.c  |  1 +
 gcc/cp/parser.c  | 25 
 gcc/cp/pt.c  | 85 +++-
 6 files changed, 111 insertions(+), 16 deletions(-)

diff --git a/gcc/cp/cp-gimplify.c b/gcc/cp/cp-gimplify.c
index 934f674..9bdaafc 100644
--- a/gcc/cp/cp-gimplify.c
+++ b/gcc/cp/cp-gimplify.c
@@ -2058,7 +2058,7 @@ clear_fold_cache (void)
 
 /*  This function tries to fold an expression X.
 To avoid combinatorial explosion, folding results are kept in fold_cache.
-If we are processing a template or X is invalid, we don't fold at all.
+If X is invalid, we don't fold at all.
 For performance reasons we don't cache expressions representing a
 declaration or constant.
 Function returns X or its folded variant.  */
@@ -2075,8 +2075,7 @@ cp_fold (tree x)
   if (!x || x == error_mark_node)
 

[PATCH] Fix PR83668

2018-01-09 Thread Richard Biener

This fixes another case of bogus initial schedule generated by GRAPHITE.
Rather than adding my own RPO computation routine I am swapping the
edges in the edge vector containing loop exits...  not exactly nice
but sufficient.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2018-01-09  Richard Biener  

PR tree-optimization/83668
* graphite.c (canonicalize_loop_closed_ssa): Add edge argument,
move prologue...
(canonicalize_loop_form): ... here, renamed from ...
(canonicalize_loop_closed_ssa_form): ... this and amended to
swap successor edges for loop exit blocks to make us use
the RPO order we need for initial schedule generation.

* gcc.dg/graphite/pr83668.c: New testcase.

Index: gcc/graphite.c
===
--- gcc/graphite.c  (revision 256372)
+++ gcc/graphite.c  (working copy)
@@ -227,15 +227,11 @@ free_scops (vec scops)
 /* Transforms LOOP to the canonical loop closed SSA form.  */
 
 static void
-canonicalize_loop_closed_ssa (loop_p loop)
+canonicalize_loop_closed_ssa (loop_p loop, edge e)
 {
-  edge e = single_exit (loop);
   basic_block bb;
   gphi_iterator psi;
 
-  if (!e || (e->flags & EDGE_COMPLEX))
-return;
-
   bb = e->dest;
 
   /* Make the loop-close PHI node BB contain only PHIs and have a
@@ -314,14 +310,34 @@ canonicalize_loop_closed_ssa (loop_p loo
other statements.
 
- there exist only one phi node per definition in the loop.
+
+   In addition to that we also make sure that loop exit edges are
+   first in the successor edge vector.  This is to make RPO order
+   as computed by pre_and_rev_post_order_compute be consistent with
+   what initial schedule generation expects.
 */
 
 static void
-canonicalize_loop_closed_ssa_form (void)
+canonicalize_loop_form (void)
 {
   loop_p loop;
   FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
-canonicalize_loop_closed_ssa (loop);
+{
+  edge e = single_exit (loop);
+  if (!e || (e->flags & EDGE_COMPLEX))
+   continue;
+
+  canonicalize_loop_closed_ssa (loop, e);
+
+  /* If the exit is not first in the edge vector make it so.  */
+  if (e != EDGE_SUCC (e->src, 0))
+   {
+ unsigned ei;
+ for (ei = 0; EDGE_SUCC (e->src, ei) != e; ++ei)
+   ;
+ std::swap (EDGE_SUCC (e->src, ei), EDGE_SUCC (e->src, 0));
+   }
+}
 
   /* We can end up releasing duplicate exit PHIs and also introduce
  additional copies so the cached information isn't correct anymore.  */
@@ -360,7 +376,7 @@ graphite_transform_loops (void)
   the_isl_ctx = ctx;
 
   sort_sibling_loops (cfun);
-  canonicalize_loop_closed_ssa_form ();
+  canonicalize_loop_form ();
 
   /* Print the loop structure.  */
   if (dump_file && (dump_flags & TDF_DETAILS))
Index: gcc/testsuite/gcc.dg/graphite/pr83668.c
===
--- gcc/testsuite/gcc.dg/graphite/pr83668.c (nonexistent)
+++ gcc/testsuite/gcc.dg/graphite/pr83668.c (working copy)
@@ -0,0 +1,34 @@
+/* { dg-do run } */
+/* { dg-options "-O -fno-tree-dominator-opts -fgraphite-identity" } */
+
+int a, b, c;
+int d[14];
+
+int
+main (void)
+{
+  short e;
+  char f;
+
+  for (; b >= 0; b--)
+{
+  e = 0;
+  for (; e < 2; e++)
+   {
+ a = 0;
+ for (; a < 7; a++)
+   d[a] = 1;
+   }
+  if (c)
+   {
+ f = 0;
+ for (; f >= 0; f--)
+   ;
+   }
+}
+
+  if (a != 7)
+__builtin_abort ();
+
+  return 0;
+}


[PATCH] PR 78534, 83704 Large character lengths

2018-01-09 Thread Janne Blomqvist
This patch fixes various parts of the code to use a larger type than
int for the character length. Depending on the situation,
HOST_WIDE_INT, size_t, or gfc_charlen_t is appropriate.

Regtested on x86_64-pc-linux-gnu and i686-pc-linux-gnu, Ok for trunk?

gcc/fortran/ChangeLog:

2018-01-09  Janne Blomqvist  

PR 78534
PR 83704
* arith.c (gfc_arith_concat): Use size_t for string length.
(gfc_compare_string): Likewise.
(gfc_compare_with_Cstring): Likewise.
* array.c (gfc_resolve_character_array_constructor): Use
HOST_WIDE_INT, gfc_mpz_get_hwi.
* check.c (gfc_check_fe_runtime_error): Use size_t.
* data.c (create_character_initializer): Use HOST_WIDE_INT,
gfc_extract_hwi.
* decl.c (gfc_set_constant_character_len): Use gfc_charlen_t.
(add_init_expr_to_sym): Use HOST_WIDE_INT.
* expr.c (gfc_build_init_expr): Use HOST_WIDE_INT,
gfc_extract_hwi.
(gfc_apply_init): Likewise.
* match.h (gfc_set_constant_character_len): Update prototype.
* primary.c (match_string_constant): Use size_t.
* resolve.c (resolve_ordinary_assign): Use HOST_WIDE_INT,
gfc_mpz_get_hwi.
* simplify.c (init_result_expr): Likewise.
(gfc_simplify_len_trim): Use size_t.
* target-memory.c (gfc_encode_character): Use size_t.
(gfc_target_encode_expr): Use HOST_WIDE_INT, gfc_mpz_get_hwi.
(interpret_array): Use size_t.
(gfc_interpret_character): Likewise.
* target-memory.h (gfc_encode_character): Update prototype.
(gfc_interpret_character): Likewise.
(gfc_target_interpret_expr): Likewise.
* trans-const.c (gfc_build_string_const): Use size_t for length
argument.
(gfc_build_wide_string_const): Likewise.
* trans-const.h (gfc_build_string_const): Likewise.
(gfc_build_wide_string_const): Likewise.
---
 gcc/fortran/arith.c |  6 +++---
 gcc/fortran/array.c | 31 ---
 gcc/fortran/check.c |  2 +-
 gcc/fortran/data.c  | 14 +++---
 gcc/fortran/decl.c  | 19 +++
 gcc/fortran/expr.c  | 13 -
 gcc/fortran/match.h |  3 ++-
 gcc/fortran/primary.c   |  5 +++--
 gcc/fortran/resolve.c   | 10 +-
 gcc/fortran/simplify.c  |  8 
 gcc/fortran/target-memory.c | 31 +++
 gcc/fortran/target-memory.h |  6 +++---
 gcc/fortran/trans-const.c   |  4 ++--
 gcc/fortran/trans-const.h   |  4 ++--
 14 files changed, 74 insertions(+), 82 deletions(-)

diff --git a/gcc/fortran/arith.c b/gcc/fortran/arith.c
index 92067ac..8f328fe 100644
--- a/gcc/fortran/arith.c
+++ b/gcc/fortran/arith.c
@@ -980,7 +980,7 @@ static arith
 gfc_arith_concat (gfc_expr *op1, gfc_expr *op2, gfc_expr **resultp)
 {
   gfc_expr *result;
-  int len;
+  size_t len;
 
   gcc_assert (op1->ts.kind == op2->ts.kind);
   result = gfc_get_constant_expr (BT_CHARACTER, op1->ts.kind,
@@ -1089,7 +1089,7 @@ compare_complex (gfc_expr *op1, gfc_expr *op2)
 int
 gfc_compare_string (gfc_expr *a, gfc_expr *b)
 {
-  int len, alen, blen, i;
+  size_t len, alen, blen, i;
   gfc_char_t ac, bc;
 
   alen = a->value.character.length;
@@ -1116,7 +1116,7 @@ gfc_compare_string (gfc_expr *a, gfc_expr *b)
 int
 gfc_compare_with_Cstring (gfc_expr *a, const char *b, bool case_sensitive)
 {
-  int len, alen, blen, i;
+  size_t len, alen, blen, i;
   gfc_char_t ac, bc;
 
   alen = a->value.character.length;
diff --git a/gcc/fortran/array.c b/gcc/fortran/array.c
index 882fe57..ed5fdb2 100644
--- a/gcc/fortran/array.c
+++ b/gcc/fortran/array.c
@@ -1962,7 +1962,7 @@ bool
 gfc_resolve_character_array_constructor (gfc_expr *expr)
 {
   gfc_constructor *p;
-  int found_length;
+  HOST_WIDE_INT found_length;
 
   gcc_assert (expr->expr_type == EXPR_ARRAY);
   gcc_assert (expr->ts.type == BT_CHARACTER);
@@ -1994,7 +1994,7 @@ got_charlen:
   for (p = gfc_constructor_first (expr->value.constructor);
   p; p = gfc_constructor_next (p))
{
- int current_length = -1;
+ HOST_WIDE_INT current_length = -1;
  gfc_ref *ref;
  for (ref = p->expr->ref; ref; ref = ref->next)
if (ref->type == REF_SUBSTRING
@@ -2005,19 +2005,11 @@ got_charlen:
  if (p->expr->expr_type == EXPR_CONSTANT)
current_length = p->expr->value.character.length;
  else if (ref)
-   {
- long j;
- j = mpz_get_ui (ref->u.ss.end->value.integer)
-   - mpz_get_ui (ref->u.ss.start->value.integer) + 1;
- current_length = (int) j;
-   }
+   current_length = gfc_mpz_get_hwi (ref->u.ss.end->value.integer)
+ - gfc_mpz_get_hwi (ref->u.ss.start->value.integer) + 1;
  else if (p->expr->ts.u.cl && p->expr->ts.u.cl->length
   && p->expr->ts.u.cl->length->expr_type == EXPR_CONSTANT)
-   {
- 

RE: [PATCH] RX pragma address

2018-01-09 Thread Sebastian Perta
Hello,

The AP4 tool team have agreed with this change and are willing to update their 
tool.
So I would like to drop this patch.

Best Regards,
Sebastian


-Original Message-
From: Sebastian Perta [mailto:sebastian.pe...@renesas.com] 
Sent: 05 January 2018 13:46
To: 'Oleg Endo' ; gcc-patches@gcc.gnu.org
Subject: RE: [PATCH] RX pragma address

Hi Oleg,

Thank you very much for those suggestions, they are definitely the better way 
to go.
>From my point of view I would like drop both patches(RX and RL78) however I 
>contacted the AP4 tool team to see if they agree with this change and are 
>willing to update their tool.
If not I will follow your suggestion and move it out from the M32C target and 
make it available for every target.

Best Regards,
Sebastian

-Original Message-
From: Oleg Endo [mailto:oleg.e...@t-online.de] 
Sent: 05 January 2018 12:50
To: Sebastian Perta ; gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] RX pragma address

On Fri, 2018-01-05 at 12:12 +, Sebastian Perta wrote:
> 
> > > 
> > > Is this for some kind of legacy backwards compatibility of
> > > something?

> Sort of, this is required by the following tool
> https://www.renesas.com/en-eu/products/software-tools/tools/code-
> generator/ap4-application-leading-tool-applilet.html

> There are not many plans to improve this tool and other solutions
> (which might be more complex) might not be possible to implement in
> this tool.

I see.

> 
> The only way I can think of is to put the variable in a separate
> section (using section attribute) and then in the linker script put
> that section at the desired address.
> The problem is AP4 does not touch the linker script it only generates
> source code.
> 
> Do you have any other ideas/suggestions? I'm very happy to listen.

If you can compile the code only as plain C, for example

#define const_addr_var(addr, type) (*(volatile type*)addr)

#define DTCCR const_addr_var (0x00082400, uint8_t)
#define DTCSTS const_addr_var (0x0008240E, uint16_t)


If you can compile the code as C++11 there are certainly more options,
albeit probably not useful for generated code.  For example I do those
things this way:

// read-write hardware register with constant address
static constexpr hw_reg_rw> DTCCR = { };

// ready-only hardware register with constant address
static constexpr hw_reg_r> DTCSTS = { };


In both cases (C and C++) the following will compile to the same code:

void test_wr (void)
{
  DTCCR = 123;
}

int test_rd (void)
{
  return DTCSTS;
}

volatile void* get_reg_addr (void)
{
  return &DTCCR;
}

For a possible implementation of the hw_reg thingy see
https://github.com/shared-ptr/bits/blob/master/hw_reg.hpp

But really, if that is just for some code generator, why not simply
adjust the code generator to spit out normal C code instead of
cluttering the compiler with non-portable pragmas?  You have also
posted a similar thing for RL78 a while ago, so in the end the same
pragma thing would be re-implemented in the compiler three times (M32C,
RL78, RX)?  In that case, maybe better move it out from the M32C target
and make it available for every target?

Cheers,
Oleg



RE: [PATCH] RL78 pragma address

2018-01-09 Thread Sebastian Perta
Hello,

So I would like to drop this patch.
Oleg Endo has proposed a better solution:
https://gcc.gnu.org/ml/gcc-patches/2018-01/msg00305.html

Best Regards,
Sebastian

-Original Message-
From: Sebastian Perta [mailto:sebastian.pe...@renesas.com] 
Sent: 15 December 2017 09:25
To: 'gcc-patches@gcc.gnu.org' 
Subject: [PATCH] RL78 pragma address

Hello 

The following patch adds a new pragma, "pragma address" for RL78.
The patch updates extend.texi and add a test case to the regression as well.
For the test case I checked than test is getting picked up in gcc.log
unfortunately 
for the .texi part I don't know where to look/what to do to get the
documentation generated.

This is similar to the pragma address implemented for M32C.

Regression test is OK, tested with the following command:
make -k check-gcc RUNTESTFLAGS=--target_board=rl78-sim

Please let me know if this is OK, Thank you!
Sebastian

Index: ChangeLog
===
--- ChangeLog   (revision 255643)
+++ ChangeLog   (working copy)
@@ -1,3 +1,19 @@
+2017-12-14  Sebastian Perta  
+
+   * config/rl78/rl78.c (rl78_get_pragma_address): New function
+   * config/rl78/rl78.c (rl78_note_pragma_address): New function 
+   * config/rl78/rl78.c (rl78_output_aligned_common): use .set instead 
+   of .comm for pragma address variables
+   * config/rl78/rl78.c (rl78_insert_attributes): make pragma address 
+   variables volatile
+   * config/rl78/rl78-c.c (rl78_pragma_address): New function
+   * config/rl78/rl78-c.c (rl78_register_pragmas): registered the new 
+   pragma address
+   * config/rl78/rl78-protos.h: New declaration
rl78_note_pragma_address
+   * doc/entend.texi: Added documenation for RL78 pragmas
+   * testsuite/gcc.target/rl78/test_pragma_address.c: New file
+   
+   
 2017-12-14  Andreas Schwab  
 
PR bootstrap/83396
Index: config/rl78/rl78-c.c
===
--- config/rl78/rl78-c.c(revision 255643)
+++ config/rl78/rl78-c.c(working copy)
@@ -23,7 +23,42 @@
 #include "coretypes.h"
 #include "tm.h"
 #include "c-family/c-common.h"
+#include "c-family/c-pragma.h"
+#include "rl78-protos.h"
 
+/* Implements the "pragma ADDRESS" pragma.  This pragma takes a
+   variable name and an address, and arranges for that variable to be
+   "at" that address.  The variable is also made volatile.  */
+static void
+rl78_pragma_address (cpp_reader * reader ATTRIBUTE_UNUSED)
+{
+  /* on off */
+  tree var, addr;
+  enum cpp_ttype type;
+
+  type = pragma_lex (&var);
+  if (type == CPP_NAME)
+{
+  type = pragma_lex (&addr);
+  if (type == CPP_NUMBER)
+   {
+ if (var != error_mark_node)
+   {
+ unsigned uaddr = tree_to_uhwi (addr);
+ rl78_note_pragma_address (IDENTIFIER_POINTER (var), uaddr);
+   }
+
+ type = pragma_lex (&var);
+ if (type != CPP_EOF)
+   {
+ error ("junk at end of #pragma ADDRESS");
+   }
+ return;
+   }
+}
+  error ("malformed #pragma ADDRESS variable address");
+}
+
 /* Implements REGISTER_TARGET_PRAGMAS.  */
 void
 rl78_register_pragmas (void)
@@ -30,4 +65,7 @@
 {
   c_register_addr_space ("__near", ADDR_SPACE_NEAR);
   c_register_addr_space ("__far", ADDR_SPACE_FAR);
+  
+  c_register_pragma (NULL, "ADDRESS", rl78_pragma_address);
+  c_register_pragma (NULL, "address", rl78_pragma_address);
 }
Index: config/rl78/rl78-protos.h
===
--- config/rl78/rl78-protos.h   (revision 255643)
+++ config/rl78/rl78-protos.h   (working copy)
@@ -52,6 +52,7 @@
 intrl78_sfr_p (rtx x);
 void   rl78_output_aligned_common (FILE *, tree, const char *,
int, int, int);
+void   rl78_note_pragma_address (const char *varname, unsigned
address);
 
 intrl78_one_far_p (rtx *operands, int num_operands);
 
Index: config/rl78/rl78.c
===
--- config/rl78/rl78.c  (revision 255643)
+++ config/rl78/rl78.c  (working copy)
@@ -4565,6 +4565,30 @@
   fputs (str2, file);
 }
 
+struct GTY(()) pragma_entry {
+  const char *varname;
+  unsigned address;
+};
+typedef struct pragma_entry pragma_entry;
+
+/* Hash table of pragma info.  */
+static GTY(()) hash_map *pragma_htab;
+
+static bool
+rl78_get_pragma_address (const char *varname, unsigned *address)
+{
+  if (!pragma_htab)
+return false;
+
+  unsigned int *slot = pragma_htab->get (varname);
+  if (slot)
+{
+  *address = *slot;
+  return true;
+}
+  return false;
+}
+
 void
 rl78_output_aligned_common (FILE *stream,
tree decl ATTRIBUTE_UNUSED,
@@ -4571,6 +4595,7 @@
const char *name,
int size, int align, int global)
 {

[PATCH] Clean up partitioning in try_optimize_cfg (PR bootstrap/82831).

2018-01-09 Thread Martin Liška
Hello.

This is patch for i586 bootstrap that is triggered by bug in detail described
in the PR.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Ready to be installed?
Martin

gcc/ChangeLog:

2018-01-09  Martin Liska  

PR bootstrap/82831
* basic-block.h (CLEANUP_NO_PARTITIONING): New define.
* bb-reorder.c (pass_reorder_blocks::execute): Do not clean up
partitioning.
* cfgcleanup.c (try_optimize_cfg): Fix up partitioning if
CLEANUP_NO_PARTITIONING is not set.
---
 gcc/basic-block.h | 1 +
 gcc/bb-reorder.c  | 2 +-
 gcc/cfgcleanup.c  | 3 ++-
 3 files changed, 4 insertions(+), 2 deletions(-)


diff --git a/gcc/basic-block.h b/gcc/basic-block.h
index b9394cdb1f2..823627a14a8 100644
--- a/gcc/basic-block.h
+++ b/gcc/basic-block.h
@@ -506,6 +506,7 @@ ei_cond (edge_iterator ei, edge *p)
 	   insns.  */
 #define CLEANUP_CFGLAYOUT	32	/* Do cleanup in cfglayout mode.  */
 #define CLEANUP_CFG_CHANGED	64  /* The caller changed the CFG.  */
+#define CLEANUP_NO_PARTITIONING	128 /* Do not try to fix partitions.  */
 
 /* Return true if BB is in a transaction.  */
 
diff --git a/gcc/bb-reorder.c b/gcc/bb-reorder.c
index f977082de5a..9d18fcc495f 100644
--- a/gcc/bb-reorder.c
+++ b/gcc/bb-reorder.c
@@ -2575,7 +2575,7 @@ pass_reorder_blocks::execute (function *fun)
   cfg_layout_initialize (CLEANUP_EXPENSIVE);
 
   reorder_basic_blocks ();
-  cleanup_cfg (CLEANUP_EXPENSIVE);
+  cleanup_cfg (CLEANUP_EXPENSIVE | CLEANUP_NO_PARTITIONING);
 
   FOR_EACH_BB_FN (bb, fun)
 if (bb->next_bb != EXIT_BLOCK_PTR_FOR_FN (fun))
diff --git a/gcc/cfgcleanup.c b/gcc/cfgcleanup.c
index 43d18766d8a..eebbe8f7959 100644
--- a/gcc/cfgcleanup.c
+++ b/gcc/cfgcleanup.c
@@ -3012,7 +3012,8 @@ try_optimize_cfg (int mode)
  to detect and fix during edge forwarding, and in some cases
  is only visible after newly unreachable blocks are deleted,
  which will be done in fixup_partitions.  */
-	  fixup_partitions ();
+	  if ((mode & CLEANUP_NO_PARTITIONING) == 0)
+		fixup_partitions ();
 	  checking_verify_flow_info ();
 }
 



Re: [13/13] [AArch64] Use vec_perm_indices helper routines

2018-01-09 Thread James Greenhalgh
On Thu, Jan 04, 2018 at 11:27:56AM +, Richard Sandiford wrote:
> Ping**2

This is OK.

It took me a while to get the hang of the interface - a worked example
in the comment in vec-perm-indices.c would probably have been helpful.
It took until your code for REV for this to really make sense to me; so
perhaps that make for a good example.

James

> 
> Richard Sandiford  writes:
> > Ping
> >
> > Richard Sandiford  writes:
> >> This patch makes the AArch64 vec_perm_const code use the new
> >> vec_perm_indices routines, instead of checking each element individually.
> >> This means that they extend naturally to variable-length vectors.
> >>
> >> Also, aarch64_evpc_dup was the only function that generated rtl when
> >> testing_p is true, and that looked accidental.  The patch adds the
> >> missing check and then replaces the gen_rtx_REG/start_sequence/
> >> end_sequence stuff with an assert that no rtl is generated.
> >>
> >> Tested on aarch64-linux-gnu.  Also tested by making sure that there
> >> were no assembly output differences for aarch64_be-linux-gnu or
> >> aarch64_be-linux-gnu.  OK to install?
> >>
> >> Richard
> >>
> >>
> >> 2017-12-09  Richard Sandiford  
> >>
> >> gcc/
> >>* config/aarch64/aarch64.c (aarch64_evpc_trn): Use d.perm.series_p
> >>instead of checking each element individually.
> >>(aarch64_evpc_uzp): Likewise.
> >>(aarch64_evpc_zip): Likewise.
> >>(aarch64_evpc_ext): Likewise.
> >>(aarch64_evpc_rev): Likewise.
> >>(aarch64_evpc_dup): Test the encoding for a single duplicated element,
> >>instead of checking each element individually.  Return true without
> >>generating rtl if
> >>(aarch64_vectorize_vec_perm_const): Use all_from_input_p to test
> >>whether all selected elements come from the same input, instead of
> >>checking each element individually.  Remove calls to gen_rtx_REG,
> >>start_sequence and end_sequence and instead assert that no rtl is
> >>generated.
> >>
> >> Index: gcc/config/aarch64/aarch64.c
> >> ===
> >> --- gcc/config/aarch64/aarch64.c   2017-12-09 22:48:47.535824832 +
> >> +++ gcc/config/aarch64/aarch64.c   2017-12-09 22:49:00.139270410 +
> >> @@ -13295,7 +13295,7 @@ aarch64_expand_vec_perm (rtx target, rtx
> >>  static bool
> >>  aarch64_evpc_trn (struct expand_vec_perm_d *d)
> >>  {
> >> -  unsigned int i, odd, mask, nelt = d->perm.length ();
> >> +  unsigned int odd, nelt = d->perm.length ();
> >>rtx out, in0, in1, x;
> >>machine_mode vmode = d->vmode;
> >>  
> >> @@ -13304,21 +13304,11 @@ aarch64_evpc_trn (struct expand_vec_perm
> >>  
> >>/* Note that these are little-endian tests.
> >>   We correct for big-endian later.  */
> >> -  if (d->perm[0] == 0)
> >> -odd = 0;
> >> -  else if (d->perm[0] == 1)
> >> -odd = 1;
> >> -  else
> >> +  odd = d->perm[0];
> >> +  if ((odd != 0 && odd != 1)
> >> +  || !d->perm.series_p (0, 2, odd, 2)
> >> +  || !d->perm.series_p (1, 2, nelt + odd, 2))
> >>  return false;
> >> -  mask = (d->one_vector_p ? nelt - 1 : 2 * nelt - 1);
> >> -
> >> -  for (i = 0; i < nelt; i += 2)
> >> -{
> >> -  if (d->perm[i] != i + odd)
> >> -  return false;
> >> -  if (d->perm[i + 1] != ((i + nelt + odd) & mask))
> >> -  return false;
> >> -}
> >>  
> >>/* Success!  */
> >>if (d->testing_p)
> >> @@ -13342,7 +13332,7 @@ aarch64_evpc_trn (struct expand_vec_perm
> >>  static bool
> >>  aarch64_evpc_uzp (struct expand_vec_perm_d *d)
> >>  {
> >> -  unsigned int i, odd, mask, nelt = d->perm.length ();
> >> +  unsigned int odd;
> >>rtx out, in0, in1, x;
> >>machine_mode vmode = d->vmode;
> >>  
> >> @@ -13351,20 +13341,10 @@ aarch64_evpc_uzp (struct expand_vec_perm
> >>  
> >>/* Note that these are little-endian tests.
> >>   We correct for big-endian later.  */
> >> -  if (d->perm[0] == 0)
> >> -odd = 0;
> >> -  else if (d->perm[0] == 1)
> >> -odd = 1;
> >> -  else
> >> +  odd = d->perm[0];
> >> +  if ((odd != 0 && odd != 1)
> >> +  || !d->perm.series_p (0, 1, odd, 2))
> >>  return false;
> >> -  mask = (d->one_vector_p ? nelt - 1 : 2 * nelt - 1);
> >> -
> >> -  for (i = 0; i < nelt; i++)
> >> -{
> >> -  unsigned elt = (i * 2 + odd) & mask;
> >> -  if (d->perm[i] != elt)
> >> -  return false;
> >> -}
> >>  
> >>/* Success!  */
> >>if (d->testing_p)
> >> @@ -13388,7 +13368,7 @@ aarch64_evpc_uzp (struct expand_vec_perm
> >>  static bool
> >>  aarch64_evpc_zip (struct expand_vec_perm_d *d)
> >>  {
> >> -  unsigned int i, high, mask, nelt = d->perm.length ();
> >> +  unsigned int high, nelt = d->perm.length ();
> >>rtx out, in0, in1, x;
> >>machine_mode vmode = d->vmode;
> >>  
> >> @@ -13397,25 +13377,11 @@ aarch64_evpc_zip (struct expand_vec_perm
> >>  
> >>/* Note that these are little-endian tests.
> >>   We correct for big-endian later.  */
> >> -  high = nelt / 2;
> >> -  if (d->perm[0] == high

Re: [AArch64] Reject (high (const (plus anchor offset)))

2018-01-09 Thread James Greenhalgh
On Thu, Jan 04, 2018 at 06:15:58PM +, Richard Sandiford wrote:
> The aarch64_legitimate_constant_p tests for HIGH and CONST seem
> to be the wrong way round: (high (const ...)) is valid rtl that
> could be passed in, but (const (high ...)) isn't.  As it stands,
> we disallow anchor+offset but allow (high anchor+offset).
> 
> TBH I can't remember whether this caused a test failure or if it
> was just something I noticed by inspection.  Either way, the SVE port
> adds more invalid (const ...)s and it doesn't make sense to test for
> them before peeling the HIGH.
> 
> Tested on aarch64-linux-gnu.  OK to install?

OK.

Thanks,
James

> 2018-01-04  Richard Sandiford  
> 
> gcc/
>   * config/aarch64/aarch64.c (aarch64_legitimate_constant_p): Fix
>   order of HIGH and CONST checks.
> 
> Index: gcc/config/aarch64/aarch64.c
> ===
> --- gcc/config/aarch64/aarch64.c  2018-01-03 21:43:30.290452983 +
> +++ gcc/config/aarch64/aarch64.c  2018-01-04 18:11:30.299730142 +
> @@ -10449,6 +10449,9 @@ aarch64_legitimate_constant_p (machine_m
>if (CONST_WIDE_INT_P (x))
>  return false;
>  
> +  if (GET_CODE (x) == HIGH)
> +x = XEXP (x, 0);
> +
>/* Do not allow const (plus (anchor_symbol, const_int)).  */
>if (GET_CODE (x) == CONST)
>  {
> @@ -10460,9 +10463,6 @@ aarch64_legitimate_constant_p (machine_m
>   return false;
>  }
>  
> -  if (GET_CODE (x) == HIGH)
> -x = XEXP (x, 0);
> -
>/* Treat symbols as constants.  Avoid TLS symbols as they are complex,
>   so spilling them is better than rematerialization.  */
>if (SYMBOL_REF_P (x) && !SYMBOL_REF_TLS_MODEL (x))


Re: [PATCH][AArch64] Use LDP/STP in shrinkwrapping

2018-01-09 Thread Wilco Dijkstra
Segher Boessenkool wrote:
> On Mon, Jan 08, 2018 at 0:25:47PM +, Wilco Dijkstra wrote:
>> > Always pairing two registers together *also* degrades code quality.
>> 
>> No, while it's not optimal, it means smaller code and fewer memory accesses.
>
> It means you execute *more* memory accesses.  Always.  This may be
> sometimes hidden, sure.  I'm not saying you do not want more ldp's;
> I'm saying this particular strategy is very far from ideal.

No it means less since the number of memory accesses reduces (memory
bandwidth may increase but that's not an issue). You can argue that different
pairings might work better as you may be able to push some LDP/STPs into
less frequently executed blocks, but finding optimal pairings is a hard problem
since the offsets need to be consecutive.

>> That may well be the problem. So if there are N predecessors, of which N-1
>> need to restore the same set of callee saves, but one was shrinkwrapped,
>> N-1 copies of the same restores might be emitted. N could be the number
>> of blocks in a function - I really hope it doesn't work out like that...
>
> In the worst case it would.  OTOH, joining every combo into blocks costs
> O(2**C) (where C is the # components) bb's worst case.
>
> It isn't a simple problem.  The current tuning works pretty well for us,
> but no doubt it can be improved!

Well if there are C components, we could limit the total number of 
saves/restores
inserted to say 4C. Similarly common cases could easily share the restores
without increasing the number of branches.

Wilco


Re: PR82665 - missing value range optimization for memchr

2018-01-09 Thread Prathamesh Kulkarni
On 7 January 2018 at 12:28, Prathamesh Kulkarni
 wrote:
> On 5 January 2018 at 00:20, Jeff Law  wrote:
>> On 01/03/2018 12:08 AM, Prathamesh Kulkarni wrote:
>>> On 3 January 2018 at 12:33, Prathamesh Kulkarni
>>>  wrote:
 On 2 January 2018 at 17:49, Jakub Jelinek  wrote:
> On Tue, Jan 02, 2018 at 05:39:17PM +0530, Prathamesh Kulkarni wrote:
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr82665.c
>> @@ -0,0 +1,22 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -fdump-tree-optimized" } */
>> +
>> +void f1 (char *p, __SIZE_TYPE__ sz)
>> +{
>> +  char *q = __builtin_memchr (p, 0, sz);
>> +  long n = q - p;
> Please use __PTRDIFF_TYPE__ instead of long, here and in other spots.
>
>> --- a/gcc/vr-values.c
>> +++ b/gcc/vr-values.c
>> @@ -793,6 +793,42 @@ vr_values::extract_range_from_binary_expr 
>> (value_range *vr,
>>
>>extract_range_from_binary_expr_1 (vr, code, expr_type, &vr0, &vr1);
>>
>> +  /* Set value_range for n in following sequence:
>> + def = __builtin_memchr (arg, 0, sz)
>> + n = def - arg
>> + Here the range for n can be set to [0, PTRDIFF_MAX - 1]. */
>> +
>> +  if (vr->type == VR_VARYING
>> +  && (code == POINTER_DIFF_EXPR)
>> +  && (TREE_CODE (op0) == SSA_NAME)
>> +  && (TREE_CODE (op1) == SSA_NAME))
> Please drop the useless ()s around the comparisons.  They are needed
> only if the condition is multi-line and needed for emacs indentation,
> or around assignments tested as conditions.
>
>> +  gcall *call_stmt = NULL;
>> +  if (def && arg
>> +   && (TREE_CODE (def) == SSA_NAME)
> Likewise (many times).
>
>> +   && ((TREE_CODE (TREE_TYPE (def)) == POINTER_TYPE)
>> +   && (TREE_TYPE (TREE_TYPE (def)) == char_type_node))
>> +   && (TREE_CODE (arg) == SSA_NAME)
>> +   && ((TREE_CODE (TREE_TYPE (arg)) == POINTER_TYPE)
>> +   && (TREE_TYPE (TREE_TYPE (arg)) == char_type_node))
> What is so special about char_type_node?  Why e.g. unsigned char or signed
> char pointer shouldn't be handled the same?
>
>> +   && (call_stmt = dyn_cast(SSA_NAME_DEF_STMT (def)))
>> +   && (gimple_call_combined_fn (call_stmt) == CFN_BUILT_IN_MEMCHR)
> We don't have an ifn for memchr, so why this?  On the other side, it 
> should
> be making sure one doesn't use unprototyped memchr with weirdo argument
> types, so you need gimple_call_builtin_p.
 Hi Jakub,
 Thanks for the review. I have tried to address your suggestions in the
 attached patch.
 Does it look OK ?
 Validation in progress.
>>> Oops, I mistakenly made argument sz in the tests of type
>>> __PTRDIFF_TYPE__ in the previous patch.
>>> The attached patch restores it's type to __SIZE_TYPE__.
>>>
>>> Thanks,
>>> Prathamesh
 Thanks,
 Prathamesh
>> +   && operand_equal_p (def, gimple_call_lhs (call_stmt), 0)
>> +   && operand_equal_p (arg, gimple_call_arg (call_stmt, 0), 0)
>> +   && integer_zerop (gimple_call_arg (call_stmt, 1)))
>> + {
>> +   tree max = vrp_val_max (ptrdiff_type_node);
>> +   wide_int wmax = wi::to_wide (max, TYPE_PRECISION (TREE_TYPE 
>> (max)));
>> +   tree range_min = build_zero_cst (expr_type);
>> +   tree range_max = wide_int_to_tree (expr_type, wmax - 1);
>> +   set_value_range (vr, VR_RANGE, range_min, range_max, NULL);
>> +   return;
>> + }
>> + }
>> +
>>/* Try harder for PLUS and MINUS if the range of one operand is 
>> symbolic
>>   and based on the other operand, for example if it was deduced from 
>> a
>>   symbolic comparison.  When a bound of the range of the first 
>> operand
>
> Jakub
>>>
>>> pr82665-8.diff
>>>
>>>
>>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr82665.c 
>>> b/gcc/testsuite/gcc.dg/tree-ssa/pr82665.c
>>> new file mode 100644
>>> index 000..b37c3d1d7ce
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr82665.c
>>> @@ -0,0 +1,32 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-O2 -fdump-tree-optimized" } */
>>> +
>>> +void f1 (char *p, __SIZE_TYPE__ sz)
>>> +{
>>> +  char *q = __builtin_memchr (p, 0, sz);
>>> +  __PTRDIFF_TYPE__ n = q - p;
>>> +
>>> +  if (n >= __PTRDIFF_MAX__)
>>> +__builtin_abort ();
>>> +}
>>> +
>>> +void f2 (unsigned char *p, __SIZE_TYPE__ sz)
>>> +{
>>> +  unsigned char *q = __builtin_memchr (p, 0, sz);
>>> +  __PTRDIFF_TYPE__ n = q - p;
>>> +
>>> +  if (n >= __PTRDIFF_MAX__)
>>> +__builtin_abort ();
>>> +}
>>> +
>>> +void f3 (signed char *p, __SIZE_TYPE__ sz)
>>> +{
>>> +  signed char *q = __builtin_memchr (p, 0, sz);
>>> +  __PTRDIFF_TYPE__ n = q - p;
>>> +
>>> +  if (n >= __PTRDIFF_MAX__)
>>> +__builtin_abort ();
>>> +}
>>> +
>>> +
>>> +/* { dg-final {

Re: PR83648

2018-01-09 Thread Prathamesh Kulkarni
On 3 January 2018 at 18:58, Jan Hubicka  wrote:
>
>> diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c
>> index 09ca3590039..0406d5588d2 100644
>> --- a/gcc/ipa-pure-const.c
>> +++ b/gcc/ipa-pure-const.c
>> @@ -910,7 +910,8 @@ malloc_candidate_p (function *fun, bool ipa)
>>  #define DUMP_AND_RETURN(reason)  \
>>  {  \
>>if (dump_file && (dump_flags & TDF_DETAILS))  \
>> -fprintf (dump_file, "%s", (reason));  \
>> +fprintf (dump_file, "\n%s is not a malloc candidate, reason: %s\n", \
>> +  (node->name()), (reason));  \
>>return false;  \
>>  }
>>
>> @@ -961,11 +962,14 @@ malloc_candidate_p (function *fun, bool ipa)
>>   for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
>> {
>>   tree arg = gimple_phi_arg_def (phi, i);
>> + if (arg == null_pointer_node)
>> +   continue;
>
> I think you want operand_equal_p here and also check for 
> flag_delete_null_pointer_checks
> because otherwise 0 can be legal memory block addrss.
>> +
>>   if (TREE_CODE (arg) != SSA_NAME)
>> -   DUMP_AND_RETURN("phi arg is not SSA_NAME.")
>> - if (!(arg == null_pointer_node || check_retval_uses (arg, phi)))
>> -   DUMP_AND_RETURN("phi arg has uses outside phi"
>> -   " and comparisons against 0.")
>> +   DUMP_AND_RETURN ("phi arg is not SSA_NAME.");
>> + if (!check_retval_uses (arg, phi))
>> +   DUMP_AND_RETURN ("phi arg has uses outside phi"
>> +" and comparisons against 0.")
>
> So the case of phi(0,0) is not really correctness issue just missed 
> optimization?
> I would still add code to handle it (it is easy).
Yes, I suppose it won't affect correctness, since it would be marked
as TOP and after propagation
it would be lowered to BOTTOM, but I added the check anyway.
>
> Do you also handle a case where parameter of phi is another phi?
Indeed, it doesn't handle multiple-phi's as in the following
test-case, good catch!

void *g (int cond1, int cond2, int cond3)
{
  void *ret;
  void *a;
  void *b;

  if (cond1)
a = __builtin_malloc (10);
  else
a = __builtin_malloc (20);

  if (cond2)
b = __builtin_malloc (30);
  else
b = __builtin_malloc (40);

  if (cond3)
ret = a;
  else
ret = b;

  return ret;
}

local-pure-const dump shows the function not being marked as malloc
while it should be.
I will address this issue in a follow up patch.

This patch check for flag_delete_null_pointer_checks and wheteher all
phi args are 0 in the attached patch.
Validation in progress.
Does it look OK ?

As Jakub pointed out for the case:
void *f()
{
  return __builtin_malloc (0);
}

The malloc propagation would set f() to malloc.
However AFAIU, malloc(0) returns NULL (?) and the function shouldn't
be marked as malloc ?

Thanks,
Prathamesh
>
> Honza
>>
>>   gimple *arg_def = SSA_NAME_DEF_STMT (arg);
>>   gcall *call_stmt = dyn_cast (arg_def);
>> diff --git a/gcc/testsuite/gcc.dg/ipa/pr83648.c 
>> b/gcc/testsuite/gcc.dg/ipa/pr83648.c
>> new file mode 100644
>> index 000..03b45de671b
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/ipa/pr83648.c
>> @@ -0,0 +1,9 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -fdump-tree-local-pure-const-details" } */
>> +
>> +void *g(unsigned n)
>> +{
>> +  return n ? __builtin_malloc (n) : 0;
>> +}
>> +
>> +/* { dg-final { scan-tree-dump "Function found to be malloc: g" 
>> "local-pure-const1" } } */
>
diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c
index 09ca3590039..dd15a499f6b 100644
--- a/gcc/ipa-pure-const.c
+++ b/gcc/ipa-pure-const.c
@@ -910,11 +910,13 @@ malloc_candidate_p (function *fun, bool ipa)
 #define DUMP_AND_RETURN(reason)  \
 {  \
   if (dump_file && (dump_flags & TDF_DETAILS))  \
-fprintf (dump_file, "%s", (reason));  \
+fprintf (dump_file, "\n%s is not a malloc candidate, reason: %s\n", \
+(node->name()), (reason));  \
   return false;  \
 }
 
-  if (EDGE_COUNT (exit_block->preds) == 0)
+  if (EDGE_COUNT (exit_block->preds) == 0
+  || !flag_delete_null_pointer_checks)
 return false;
 
   FOR_EACH_EDGE (e, ei, exit_block->preds)
@@ -958,34 +960,45 @@ malloc_candidate_p (function *fun, bool ipa)
}
 
   else if (gphi *phi = dyn_cast (def))
-   for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
- {
-   tree arg = gimple_phi_arg_def (phi, i);
-   if (TREE_CODE (arg) != SSA_NAME)
- DUMP_AND_RETURN("phi arg is not SSA_NAME.")
-   if (!(arg == null_pointer_node || check_retval_uses (arg, phi)))
- DUMP_AND_RETURN("phi arg has uses outside phi"
- " and comparisons against 0.")
-
-   gimple *arg_def = SSA_NAME_DEF_STMT (arg);
-   gcall *call_stmt = dyn_cast (arg_def);
-   if (!call_stmt)
- return false;
-   tree callee_decl = gimple_call_fndecl (call_stmt);
-   if (!callee_decl)
- 

Re: [PATCH] PR libstdc++/83709 don't rehash if no insertion

2018-01-09 Thread Jonathan Wakely

On 08/01/18 22:31 +0100, François Dumont wrote:

Hi

    Bug confirmed, limited to range insertion on unordered_set and 
unordered_map.


    I had to specialize _M_insert_range for those containers. Now this 
method maintains the theoretical number of elements to insert which is 
used only if an insertion takes place.


    I also took this oportunity to introduce a small change in 
__distance_fw to report 0 only if __first == __last and do nothing in 
this case.


    * include/bits/hashtable_policy.h
    (__distance_fwd(_Iterator, _Iterator, input_iterator_tag)): Return 1 if
    __first != __last.
    (_Insert_base::_M_insert_range(_Ite, _Ite, _NodeGetter, 
true_type)): New.

    (_Insert_base::_M_insert_range(_Ite, _Ite, _NodeGetter, false_type)):
    Add false_type parameter.
    (_Insert_base::insert): Adapt.
    * include/bits/hashtable.h (_Hashtable::operator=(initializzr_list<>)):
    Adapt.
    (_Hashtable::_M_insert_unique_node): Add __n_elt parameter, defaulted
    to 1.
    (_Hashtable::_M_insert(_Arg&&, const _NodeGen&, true_type, size_t)):
    Likewise.
    (_Hashtable::_M_merge_unique): Pass target number of elements to add to
    produce only 1 rehash if necessary.
    * testsuite/23_containers/unordered_map/insert/83709.cc: New.
    * testsuite/23_containers/unordered_set/insert/83709.cc: New.

Tested under Linux x86_64 normal and debug modes.

Ok to commit ?


OK, thanks for the quick fix.




Re: [PATCH 0/3] Add __builtin_load_no_speculate

2018-01-09 Thread Alexander Monakov
On Tue, 9 Jan 2018, Richard Earnshaw (lists) wrote:
> > Sorry, I don't follow. On ARM, it is surprising that CSEL-CSDB-LDR sequence
> > wouldn't work (applying CSEL to the address rather than loaded value), and
> > if it wouldn't, then ARM-specific lowering of the builtin can handle that
> > anyhow, right? (by spilling the pointer)
> 
> The load has to feed /in/ to the csel/csdb sequence, not come after it.

Again, I'm sorry, but I have to insist that what you're saying here contradicts
the documentation linked from https://developer.arm.com/support/security-update
The PDF currently says, in "Details of the CSDB barrier":

Until the barrier completes:
1) For any load, store, data or instruction preload, RW2, appearing in
program order *after the barrier* [...]

2) For any indirect branch (B2), appearing in program order
*after the barrier* [...]

[...] the speculative execution of RW2/B2 does not influence the
allocations of entries in a cache [...]

It doesn't say anything about the behavior of CSDB being dependent on the loads
encountered prior to it.  (and imho it doesn't make sense for a hardware
implementation to work that way)

> As I explained to Bernd last night, I think this is likely be unsafe.
> If there's some control path before __builtin_nontransparent that allows
> 'predicate' to be simplified (eg by value range propagation), then your
> guard doesn't protect against the speculation that you think it does.
> Changing all the optimizers to guarantee that wouldn't happen (and
> guaranteeing that all future optimizers won't introduce new problems of
> that nature) is, I suspect, very non-trivial.

But note that in that case the compiler could have performed the same
simplification in the original code as well, emitting straight-line machine code
lacking speculatively executable parts in the first place.

Alexander


Re: [PATCH 0/3] Add __builtin_load_no_speculate

2018-01-09 Thread Richard Earnshaw (lists)
On 09/01/18 13:27, Alexander Monakov wrote:
> On Tue, 9 Jan 2018, Richard Earnshaw (lists) wrote:
>> > Sorry, I don't follow. On ARM, it is surprising that CSEL-CSDB-LDR sequence
>> > wouldn't work (applying CSEL to the address rather than loaded value), and
>> > if it wouldn't, then ARM-specific lowering of the builtin can handle that
>> > anyhow, right? (by spilling the pointer)
>> 
>> The load has to feed /in/ to the csel/csdb sequence, not come after it.
> 
> Again, I'm sorry, but I have to insist that what you're saying here
> contradicts
> the documentation linked from
> https://developer.arm.com/support/security-update
> The PDF currently says, in "Details of the CSDB barrier":
> 
>     Until the barrier completes:
>     1) For any load, store, data or instruction preload, RW2, appearing in
>     program order *after the barrier* [...]
> 
>     2) For any indirect branch (B2), appearing in program order
>     *after the barrier* [...]
> 
>     [...] the speculative execution of RW2/B2 does not influence the
>     allocations of entries in a cache [...]
> 
> It doesn't say anything about the behavior of CSDB being dependent on
> the loads
> encountered prior to it.  (and imho it doesn't make sense for a hardware
> implementation to work that way)

Read condition 1 i) again.


i) the conditional select instruction has a register data dependency on
a load R1, that has been executed speculatively, for one of its input
registers, and

To be executed speculatively it must be *after* a predicted operation
that has not yet resolved.  Once it has resolved it is no-longer
speculative, it's committed (one way or another).

R.

> 
>> As I explained to Bernd last night, I think this is likely be unsafe.
>> If there's some control path before __builtin_nontransparent that allows
>> 'predicate' to be simplified (eg by value range propagation), then your
>> guard doesn't protect against the speculation that you think it does.
>> Changing all the optimizers to guarantee that wouldn't happen (and
>> guaranteeing that all future optimizers won't introduce new problems of
>> that nature) is, I suspect, very non-trivial.
> 
> But note that in that case the compiler could have performed the same
> simplification in the original code as well, emitting straight-line
> machine code
> lacking speculatively executable parts in the first place.
> 
> Alexander



Re: [PATCH 2/5] libstdc++ futex: Use FUTEX_CLOCK_REALTIME for wait

2018-01-09 Thread Jonathan Wakely

On 07/01/18 20:55 +, Mike Crowe wrote:

The futex system call supports waiting for an absolute time if
FUTEX_WAIT_BITSET is used rather than FUTEX_WAIT. Doing so provides two
benefits:

1. The call to gettimeofday is not required in order to calculate a
  relative timeout.

2. If someone changes the system clock during the wait then the futex
  timeout will correctly expire earlier or later. Currently that only
  happens if the clock is changed prior to the call to gettimeofday.

According to futex(2), support for FUTEX_CLOCK_REALTIME was added in the
v2.6.28 Linux kernel and FUTEX_WAIT_BITSET was added in v2.6.25. There is
no attempt to detect the kernel version and fall back to the previous
method.


I don't think we can require a specific kernel version just for this.

What happens if you use those bits on an older kernel, will there be
an ENOSYS error? Because that would allow us to try the new form, and
fallback to the old.

With that I think this change looks good.



---
libstdc++-v3/src/c++11/futex.cc | 21 ++---
1 file changed, 6 insertions(+), 15 deletions(-)

diff --git a/libstdc++-v3/src/c++11/futex.cc b/libstdc++-v3/src/c++11/futex.cc
index f5000aa77b3..40ec7f9b0f7 100644
--- a/libstdc++-v3/src/c++11/futex.cc
+++ b/libstdc++-v3/src/c++11/futex.cc
@@ -35,6 +35,9 @@

// Constants for the wait/wake futex syscall operations
const unsigned futex_wait_op = 0;
+const unsigned futex_wait_bitset_op = 9;
+const unsigned futex_clock_realtime_flag = 256;
+const unsigned futex_bitset_match_any = ~0;
const unsigned futex_wake_op = 1;

namespace std _GLIBCXX_VISIBILITY(default)
@@ -58,22 +61,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  }
else
  {
-   struct timeval tv;
-   gettimeofday (&tv, NULL);
-   // Convert the absolute timeout value to a relative timeout
struct timespec rt;
-   rt.tv_sec = __s.count() - tv.tv_sec;
-   rt.tv_nsec = __ns.count() - tv.tv_usec * 1000;
-   if (rt.tv_nsec < 0)
- {
-   rt.tv_nsec += 10;
-   --rt.tv_sec;
- }
-   // Did we already time out?
-   if (rt.tv_sec < 0)
- return false;
-
-   if (syscall (SYS_futex, __addr, futex_wait_op, __val, &rt) == -1)
+   rt.tv_sec = __s.count();
+   rt.tv_nsec = __ns.count();
+   if (syscall (SYS_futex, __addr, futex_wait_bitset_op | 
futex_clock_realtime_flag, __val, &rt, nullptr, futex_bitset_match_any) == -1)
  {
_GLIBCXX_DEBUG_ASSERT(errno == EINTR || errno == EAGAIN
  || errno == ETIMEDOUT);
--
2.11.0




[PING 5] Ability to remap file names in __FILE__, etc (PR other/70268)

2018-01-09 Thread Boris Kolpackov
Hi,

Looks like this is the last chance for this patch to make GCC 8
so I would like to ping it one last time:

https://gcc.gnu.org/ml/gcc-patches/2017-11/msg01451.html

It has been reviewed (with thanks) by David Malcolm[1] and
Martin Sebor[2]. Their concerns are addressed in the latest
revision of the patch:

https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00544.html

If there are any other issues with the patch, please let me
know and I will try hard to address them before the Sun
deadline.

[1] https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00322.html
[2] https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00398.html

Thanks,
Boris


Re: [PATCH 1/5] Improve libstdc++-v3 async test

2018-01-09 Thread Jonathan Wakely

On 07/01/18 20:55 +, Mike Crowe wrote:

Add tests for waiting for the future using both std::chrono::steady_clock
and std::chrono::system_clock in preparation for dealing with those clocks
properly in futex.cc.
---
libstdc++-v3/testsuite/30_threads/async/async.cc | 36 
1 file changed, 36 insertions(+)

diff --git a/libstdc++-v3/testsuite/30_threads/async/async.cc 
b/libstdc++-v3/testsuite/30_threads/async/async.cc
index 8071cb133fc..7326c5f7cd6 100644
--- a/libstdc++-v3/testsuite/30_threads/async/async.cc
+++ b/libstdc++-v3/testsuite/30_threads/async/async.cc
@@ -52,17 +52,53 @@ void test02()
  VERIFY( status == std::future_status::timeout );
  status = f1.wait_until(std::chrono::system_clock::now());
  VERIFY( status == std::future_status::timeout );
+  status = f1.wait_until(std::chrono::steady_clock::now());
+  VERIFY( status == std::future_status::timeout );
  l.unlock();  // allow async thread to proceed
  f1.wait();   // wait for it to finish
  status = f1.wait_for(std::chrono::milliseconds(0));
  VERIFY( status == std::future_status::ready );
  status = f1.wait_until(std::chrono::system_clock::now());
  VERIFY( status == std::future_status::ready );
+  status = f1.wait_until(std::chrono::steady_clock::now());
+  VERIFY( status == std::future_status::ready );
+}
+
+void work03()
+{
+std::this_thread::sleep_for(std::chrono::seconds(15));


I don't think we want 15s - 20s pauses in the testsuite.

Could the sleep_for be 5s, with waits of 500ms, 1s, and then 10s, so
the expected wait is only 5s?

I don't think it's the end of the world if the test occasionally
fails on very loaded machines.



+}
+
+// This test is slow in order to try and avoid failing on a loaded
+// machine. Nevertheless, it could still fail if the kernel decides
+// not to schedule us for several seconds. It also assumes that no-one
+// will change CLOCK_REALTIME whilst the test is running.
+template
+void test03()
+{
+  auto const start = CLOCK::now();
+  future f1 = async(launch::async, &work03);
+  std::future_status status;
+
+  status = f1.wait_for(std::chrono::seconds(5));
+  VERIFY( status == std::future_status::timeout );
+
+  status = f1.wait_until(start + std::chrono::seconds(10));
+  VERIFY( status == std::future_status::timeout );
+
+  status = f1.wait_until(start + std::chrono::seconds(25));
+  VERIFY( status == std::future_status::ready );
+
+  auto const elapsed = CLOCK::now() - start;
+  VERIFY( elapsed >= std::chrono::seconds(15) );
+  VERIFY( elapsed < std::chrono::seconds(20) );
}

int main()
{
  test01();
  test02();
+  test03();
+  test03();
  return 0;
}
--
2.11.0




Re: [PATCH 0/3] Add __builtin_load_no_speculate

2018-01-09 Thread Alexander Monakov
On Tue, 9 Jan 2018, Richard Earnshaw (lists) wrote:
> Read condition 1 i) again.
> 
> 
> i) the conditional select instruction has a register data dependency on
> a load R1, that has been executed speculatively, for one of its input
> registers, and

Sorry - I don't know how I missed that. I understand the motivation for the
behavior of the new built-in now; the CSDB specification is surprising, but
that is off-topic I guess. Seems quite unfortunate that CSDB is constraining
the generic built-in like this, though.

Thank you for clearing this up.

I suggest that the user documentation in extend.texi should explicitly call out
that the load itself still may be speculatively executed - only its consumers
can be expected to be fenced off. For example, after

+but in addition target-specific code will be inserted to ensure that
+speculation using @code{*ptr} cannot occur when @var{cmpptr} lies outside of
+the specified bounds.

it can go on to say e.g.,

  However note that the load of @code{*ptr} itself and the conditional branch
  leading to it may still be subject to speculative execution (and the load may
  have observable effects on the cache as a result).

Thanks.
Alexander


Re: [PATCH v3 of 03/14] C++: add location_t wrapper nodes during parsing (minimal impl)

2018-01-09 Thread Jason Merrill

On 01/09/2018 06:53 AM, David Malcolm wrote:

+case NON_LVALUE_EXPR:
+case VIEW_CONVERT_EXPR:
+   {
+ /* Handle location wrappers by substituting the wrapped node
+first,*then*  reusing the resulting type.  Doing the type
+first ensures that we handle template parameters and
+parameter pack expansions.  */
+ gcc_assert (location_wrapper_p (t));
+ tree op0 = tsubst_copy (TREE_OPERAND (t, 0), args, complain, in_decl);
+ return build1 (code, TREE_TYPE (op0), op0);
+   }


Doesn't this lose the location information?


+  /* We should only see these for location wrapper nodes, or for
+VIEW_CONVERT_EXPRs within instantiate_non_dependent_expr (when
+args is NULL_TREE).  */
+  gcc_assert (location_wrapper_p (t)
+ || (TREE_CODE (t) == VIEW_CONVERT_EXPR
+ && args == NULL_TREE));


Let's just say "|| args == NULL_TREE", it might be possible to see a 
NON_LVALUE_EXPR in that context as well.


Jason


Re: [PATCH v3 of 03/14] C++: add location_t wrapper nodes during parsing (minimal impl)

2018-01-09 Thread Jakub Jelinek
On Tue, Jan 09, 2018 at 09:36:58AM -0500, Jason Merrill wrote:
> On 01/09/2018 06:53 AM, David Malcolm wrote:
> > +case NON_LVALUE_EXPR:
> > +case VIEW_CONVERT_EXPR:
> > +   {
> > + /* Handle location wrappers by substituting the wrapped node
> > +first,*then*  reusing the resulting type.  Doing the type
> > +first ensures that we handle template parameters and
> > +parameter pack expansions.  */
> > + gcc_assert (location_wrapper_p (t));
> > + tree op0 = tsubst_copy (TREE_OPERAND (t, 0), args, complain, in_decl);
> > + return build1 (code, TREE_TYPE (op0), op0);
> > +   }
> 
> Doesn't this lose the location information?

And the public_flag...

Jakub


Re: Improve canonicalisation of TARGET_MEM_REFs

2018-01-09 Thread Richard Sandiford
Richard Biener  writes:
> On Tue, Nov 7, 2017 at 7:04 PM, Richard Sandiford
>  wrote:
>> Richard Biener  writes:
>>> On Fri, Nov 3, 2017 at 5:32 PM, Richard Sandiford
>>>  wrote:
 A general TARGET_MEM_REF is:

 BASE + STEP * INDEX + INDEX2 + OFFSET

 After classifying the address in this way, the code that builds
 TARGET_MEM_REFs tries to simplify the address until it's valid
 for the current target and for the mode of memory being addressed.
 It does this in a fixed order:

 (1) add SYMBOL to BASE
 (2) add INDEX * STEP to the base, if STEP != 1
 (3) add OFFSET to INDEX or BASE (reverted if unsuccessful)
 (4) add INDEX to BASE
 (5) add OFFSET to BASE

 So suppose we had an address:

 &symbol + offset + index * 8   (e.g. "a[i + 1]" for a global "a")

 on a target that only allows an index or an offset, not both.  Following
 the steps above, we'd first create:

 tmp = symbol
 tmp2 = tmp + index * 8

 Then if the given offset value was valid for the mode being addressed,
 we'd create:

 MEM[base:tmp2, offset:offset]

 while if it was invalid we'd create:

 tmp3 = tmp2 + offset
 MEM[base:tmp3, offset:0]

 The problem is that this could happen if ivopts had decided to use
 a scaled index for an address that happens to have a constant base.
 The old procedure failed to give an indexed TARGET_MEM_REF in that case,
 and adding the offset last prevented later passes from being able to
 fold the index back in.

 The patch avoids this by skipping (2) if BASE + INDEX * STEP
 is a legitimate address and if OFFSET is stopping the address
 being valid.

 Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64-linux-gnu.
 OK to install?

 Richard


 2017-10-31  Richard Sandiford  
 Alan Hayward  
 David Sherwood  

 gcc/
 * tree-ssa-address.c (keep_index_p): New function.
 (create_mem_ref): Use it.  Only split out the INDEX * STEP
 component if that is invalid even with the symbol and offset
 removed.

 Index: gcc/tree-ssa-address.c
 ===
 --- gcc/tree-ssa-address.c  2017-11-03 12:15:44.097060121 +
 +++ gcc/tree-ssa-address.c  2017-11-03 12:21:18.060359821 +
 @@ -746,6 +746,20 @@ gimplify_mem_ref_parts (gimple_stmt_iter
  true, GSI_SAME_STMT);
  }

 +/* Return true if the STEP in PARTS gives a valid BASE + INDEX * STEP
 +   address for type TYPE and if the offset is making it appear invalid.  
 */
 +
 +static bool
 +keep_index_p (tree type, mem_address parts)
>>>
>>> mem_ref_valid_without_offset_p (...)
>>>
>>> ?
>>
>> OK.
>>
 +{
 +  if (!parts.base)
 +return false;
 +
 +  gcc_assert (!parts.symbol);
 +  parts.offset = NULL_TREE;
 +  return valid_mem_ref_p (TYPE_MODE (type), TYPE_ADDR_SPACE (type), 
 &parts);
 +}
 +
  /* Creates and returns a TARGET_MEM_REF for address ADDR.  If necessary
 computations are emitted in front of GSI.  TYPE is the mode
 of created memory reference. IV_CAND is the selected iv candidate in 
 ADDR,
 @@ -809,7 +823,8 @@ create_mem_ref (gimple_stmt_iterator *gs
>>>
>>> Which means all of the following would be more naturally written as
>>>
   into:
 index' = index << step;
 [... + index' + ,,,].  */
 -  if (parts.step && !integer_onep (parts.step))
 +  bool scaled_p = (parts.step && !integer_onep (parts.step));
 +  if (scaled_p && !keep_index_p (type, parts))
  {
>>>
>>>   if (mem_ref_valid_without_offset_p (...))
>>>{
>>>  ...
>>>  return create_mem_ref_raw (...);
>>>}
>>
>> Is this inside the test for a scale:
>>
>>   if (parts.step && !integer_onep (parts.step))
>> {
>>   if (mem_ref_valid_without_offset_p (...))
>> {
>>   tree tmp = parts.offset;
>>   if (parts.base)
>> {
>>   tmp = fold_build_pointer_plus (parts.base, tmp);
>>   tmp = force_gimple_operand_gsi_1 (gsi, tmp,
>> is_gimple_mem_ref_addr,
>> NULL_TREE, true,
>> GSI_SAME_STMT);
>> }
>>   parts.base = tmp;
>>   parts.offset = NULL_TREE;
>>   mem_ref = create_mem_ref_raw (type, alias_ptr_type, &parts, true);
>>   gcc_assert (mem_ref);
>>   return mem_ref;
>> }
>>   ...current code...
>> }
>>
>> ?  I can do that if you don't mind the cut-&-paste.  Or I could
>> split the code that adds the offset out into a subroutine.
>
> 

Re: [PATCH 0/5] Make std::future::wait_* use std::chrono::steady_clock when required

2018-01-09 Thread Jonathan Wakely

On 07/01/18 20:55 +, Mike Crowe wrote:

This patch series was originally submitted back in September at
https://gcc.gnu.org/ml/libstdc++/2017-09/msg00083.html which ended up
as https://patchwork.ozlabs.org/cover/817379/ . The patches received
no comments at all, which may mean that they are perfect or that they
are so bad that no-one knew where to start with criticising them. It
would be good to know which. :)


Sorry for the lack of review. Although not perfect I think the patches
look good, but I didn't feel entirely confident reviewing them at the
time, and let them slip through the cracks (we could do with more
reviewers for libstdc++ patches).

I'm taking another look now, and have asked the original author of the
__atomic_futex_unsigned code to look too, so thanks for the ping.



Re: [PATCH][RFC] Radically simplify emission of balanced tree for switch statements.

2018-01-09 Thread Martin Liška
On 09/20/2017 05:00 PM, Jeff Law wrote:
> On 09/20/2017 01:24 AM, Martin Liška wrote:
> 
>>
>> Hello.
>>
>> Thank you Jeff for very verbose explanation what's happening. I'm planning 
>> to do
>> follow-up of this patch that will include clustering for bit-tests and jump 
>> tables.
>> Maybe that will make aforementioned issues even more difficult, but we'll 
>> see.
> FWIW, the DOM changes to simplify the conditionals seem to help both
> cases, trigger reasonably consistently in a bootstrap and for some
> subset of the triggers actually result in transformations that allow
> other passes to do a better job in the common (-O2) case.  So my
> inclination is to polish them a bit further get them on the trunk.
> 
> My recommendation is to ignore the two regressions for now and focus on
> the cleanups you're trying to do.
> 
> jeff
> 

Hello.

Some time ago I've decided that I'll make patch submission of switch clustering
in next stage1. However, this patch can be applied as is in this stage3. Would
it be possible or is it too late?

Thanks,
Martin


[PATCH] RX movsicc degrade fix

2018-01-09 Thread Sebastian Perta
Hello,

In recent versions of GCC the define_expand "movsicc" has stopped being used
by GCC (approx. 4.7.x/4.8.x onwards)
The reason for this is that the first operand of if_then_else has SI mode
and it shouldn't have. If we take a look at movsicc for all other targets we
see this is true.
The fix in rx.md is basically a copy paste from v850.md

The patch also adds a testcase in gcc.target/rx to make sure this degrade
does not occur again. 


Regression test is OK with one observation (see below), tested with the
following command:
make -k check-gcc RUNTESTFLAGS=--target_board=rx-sim


I have the following fail (which was not present before):
FAIL: gcc.dg/loop-8.c scan-rtl-dump-times loop2_invariant "Decided" 1 (found
0 times)

This is because the patch is effective in this test case and the dump is not
the same, I checked the asm code manually and it is OK.
Is it possible to disable parts of a test case, not the whole test case (* {
dg-final { scan-rtl-dump-times "Decided" 1 "loop2_invariant" } } */  from
loop-8.c in this example) for a particular target?

The total numbers of failures remains the same because the following FAIL is
not present anymore with this patch:
FAIL: gcc.dg/ifcvt-4.c scan-rtl-dump ce1 "2 true changes made"


Please let me know if this is OK. Thank you!

Best Regards,
Sebastian


Index: ChangeLog
===
--- ChangeLog   (revision 256382)
+++ ChangeLog   (working copy)
@@ -1,3 +1,8 @@
+2018-01-09  Sebastian Perta  
+
+   *config/rx.md: updated "movsicc" expand to be matched by GCC
+   *testsuite/gcc.target/rx/movsicc.c: new test case
+
 2018-01-09  Richard Biener  
 
PR tree-optimization/83668
Index: config/rx/rx.md
===
--- config/rx/rx.md (revision 256382)
+++ config/rx/rx.md (working copy)
@@ -733,12 +733,17 @@
 (define_expand "movsicc"
   [(parallel
 [(set (match_operand:SI  0 "register_operand")
- (if_then_else:SI (match_operand:SI 1 "comparison_operator")
+ (if_then_else:SI (match_operand 1 "comparison_operator")
   (match_operand:SI 2 "nonmemory_operand")
   (match_operand:SI 3 "nonmemory_operand")))
  (clobber (reg:CC CC_REG))])]
   ""
 {
+  /* Make sure that we have an integer comparison...  */
+  if (GET_MODE (XEXP (operands[1], 0)) != CCmode
+  && GET_MODE (XEXP (operands[1], 0)) != SImode)
+FAIL;
+
   /* One operand must be a constant or a register, the other must be a
register.  */
   if (   ! CONSTANT_P (operands[2])
   && ! CONSTANT_P (operands[3])
Index: testsuite/gcc.target/rx/movsicc.c
===
--- testsuite/gcc.target/rx/movsicc.c   (nonexistent)
+++ testsuite/gcc.target/rx/movsicc.c   (working copy)
@@ -0,0 +1,94 @@
+/* { dg-do compile } */
+/* { dg-options "-Os" } */
+
+typedef unsigned char u8;
+typedef unsigned short u16;
+signed int Xa, Xb;
+
+signed int stzreg_beq(int i, int a, int b)
+{
+  signed int x;
+  x = a;
+  if (i)
+x = b;
+  return x;
+}
+
+/* { dg-final { scan-assembler "bne 1f" } } */
+
+signed int stzreg_bge(int i, int a, int b, int c)
+{
+  signed int x;
+  x = a;
+  if (i0)
+x = b;
+  return x;
+}
+
+/* { dg-final { scan-assembler "bgt 1f" } } */
+
+signed int stzreg_blt(int i, int a, int b)
+{
+  signed int x;
+  x = a;
+  if (i<0)
+x = b;
+  return x;
+}
+
+/* { dg-final { scan-assembler "blt 1f" } } */
+
+signed int stzreg_bne(int i, int a, int b)
+{
+  signed int x;
+  x = a;
+  if (!i)
+x = b;
+  return x;
+}
+
+/* { dg-final { scan-assembler "beq 1f" } } */
+
+signed int stzimm_le( int i, int a )
+{
+  signed int x;
+  x = a;
+  if (i>0)
+x = 5;
+  return x;
+}
+
+/* { dg-final { scan-assembler "ble 1f" } } */
+
+signed int stzimm_le_r( int i, int a )
+{
+  signed int x;
+  x = a;
+  if (i<0)
+x = 5;
+  return x;
+}
+
+/* { dg-final { scan-assembler "bge 1f" } } */




Re: [PATCH][AArch64] Use LDP/STP in shrinkwrapping

2018-01-09 Thread Segher Boessenkool
On Tue, Jan 09, 2018 at 12:23:42PM +, Wilco Dijkstra wrote:
> Segher Boessenkool wrote:
> > On Mon, Jan 08, 2018 at 0:25:47PM +, Wilco Dijkstra wrote:
> >> > Always pairing two registers together *also* degrades code quality.
> >> 
> >> No, while it's not optimal, it means smaller code and fewer memory 
> >> accesses.
> >
> > It means you execute *more* memory accesses.  Always.  This may be
> > sometimes hidden, sure.  I'm not saying you do not want more ldp's;
> > I'm saying this particular strategy is very far from ideal.
> 
> No it means less since the number of memory accesses reduces (memory
> bandwidth may increase but that's not an issue).

The problem is *more* memory accesses are executed at runtime.  Which is
why separate shrink-wrapping does what it does: to have *fewer* executed.
(It's not just the direct execution cost why that helps: more important
are latencies to dependent ops, microarchitectural traps, etc.).

If you make A always stored whenever B is, and the other way around, the
optimal place to do it will always store at least as often as either A
or B, _but can also store more often than either_.

> >> That may well be the problem. So if there are N predecessors, of which N-1
> >> need to restore the same set of callee saves, but one was shrinkwrapped,
> >> N-1 copies of the same restores might be emitted. N could be the number
> >> of blocks in a function - I really hope it doesn't work out like that...
> >
> > In the worst case it would.  OTOH, joining every combo into blocks costs
> > O(2**C) (where C is the # components) bb's worst case.
> >
> > It isn't a simple problem.  The current tuning works pretty well for us,
> > but no doubt it can be improved!
> 
> Well if there are C components, we could limit the total number of 
> saves/restores
> inserted to say 4C. Similarly common cases could easily share the restores
> without increasing the number of branches.

It is common to see many saves/restores generated for the exceptional cases.


Segher


Re: [PATCH v2.4 of 02/14] Support for adding and stripping location_t wrapper nodes

2018-01-09 Thread Jason Merrill

On 01/09/2018 06:37 AM, David Malcolm wrote:

+  /* We should only be adding wrappers for constants and for decls,
+ or for some exceptional tree nodes (e.g. BASELINK in the C++ FE).  */
+  gcc_assert (CONSTANT_CLASS_P (expr)
+ || DECL_P (expr)
+ || EXCEPTIONAL_CLASS_P (expr));
+
+  if (EXCEPTIONAL_CLASS_P (expr))
+return expr;


This seems like it could use a comment about why we don't actually wrap 
BASELINK as suggested by the previous comment.


OK with that change.

Jason



Re: Smart pointer pretty printers

2018-01-09 Thread Jonathan Wakely

On 04/01/18 11:22 +0100, Juraj Oršulić wrote:

Hi Jonathan (and the libstdc++ list). Can we revive this? I sent the
patches for improving the smart pointer pretty printers in March. They
haven't been reviewed.


Thanks for the reminder. I'm testing the attached patch, which has
been rebased on trunk, but I'm getting test failures for the
shared_ptr printers.

I can't see any difference between the expected output and what GDB
prints, so I think it's some DejaGnu or Tcl/Tk oddity. The actual
printer seems to work OK.



Regards, and happy 2018,
Juraj

On Mon, Mar 27, 2017 at 6:05 PM, Jonathan Wakely  wrote:

On 27/03/17 17:34 +0200, Juraj Oršulić wrote:

On Mon, Mar 27, 2017 at 5:15 PM, Jonathan Wakely  wrote:

I think we're probably too close to the gcc7 release to make this
change now, sorry.


No problem, I think it's most important that it entered the pipeline,
since it's been completely dead for 4 years :-)

Just a reminder, I added this only for those smart pointers that I
needed at the moment - the unique and shared pointer, and for the
vector iterator. When you (or someone else) will be doing a review of
the patch, let me know if you think this should be expanded for other
iterators as well.


Will do, thanks.

diff --git a/libstdc++-v3/python/libstdcxx/v6/printers.py b/libstdc++-v3/python/libstdcxx/v6/printers.py
index 6da8508d944..7b999ee6db4 100644
--- a/libstdc++-v3/python/libstdcxx/v6/printers.py
+++ b/libstdc++-v3/python/libstdcxx/v6/printers.py
@@ -120,6 +120,10 @@ class SharedPointerPrinter:
 def __init__ (self, typename, val):
 self.typename = strip_versioned_namespace(typename)
 self.val = val
+self.pointer = val['_M_ptr']
+
+def children (self):
+return [('get()', self.pointer)]
 
 def to_string (self):
 state = 'empty'
@@ -130,25 +134,29 @@ class SharedPointerPrinter:
 if usecount == 0:
 state = 'expired, weak %d' % weakcount
 else:
-state = 'count %d, weak %d' % (usecount, weakcount - 1)
-return '%s (%s) %s' % (self.typename, state, self.val['_M_ptr'])
+state = 'use count %d, weak count %d' % (usecount, weakcount - 1)
+return '%s<%s> (%s)' % (self.typename, str(self.pointer.type.target().strip_typedefs()), state)
 
 class UniquePointerPrinter:
 "Print a unique_ptr"
 
 def __init__ (self, typename, val):
 self.val = val
-
-def to_string (self):
-impl_type = self.val.type.fields()[0].type.tag
+impl_type = val.type.fields()[0].type.tag
 if is_specialization_of(impl_type, '__uniq_ptr_impl'): # New implementation
-v = self.val['_M_t']['_M_t']['_M_head_impl']
+self.pointer = val['_M_t']['_M_t']['_M_head_impl']
 elif is_specialization_of(impl_type, 'tuple'):
-v = self.val['_M_t']['_M_head_impl']
+self.pointer = val['_M_t']['_M_head_impl']
 else:
+self.pointer = None
+
+def children (self):
+return [('get()', self.pointer)]
+
+def to_string (self):
+if not self.pointer:
 raise ValueError("Unsupported implementation for unique_ptr: %s" % self.val.type.fields()[0].type.tag)
-return 'std::unique_ptr<%s> containing %s' % (str(v.type.target()),
-  str(v))
+return ('std::unique_ptr<%s>' % (str(self.pointer.type.target(
 
 def get_value_from_aligned_membuf(buf, valtype):
 """Returns the value held in a __gnu_cxx::__aligned_membuf."""


Re: Smart pointer pretty printers

2018-01-09 Thread Jonathan Wakely

On 09/01/18 14:59 +, Jonathan Wakely wrote:

On 04/01/18 11:22 +0100, Juraj Oršulić wrote:

Hi Jonathan (and the libstdc++ list). Can we revive this? I sent the
patches for improving the smart pointer pretty printers in March. They
haven't been reviewed.


Thanks for the reminder. I'm testing the attached patch, which has
been rebased on trunk, but I'm getting test failures for the
shared_ptr printers.

I can't see any difference between the expected output and what GDB
prints, so I think it's some DejaGnu or Tcl/Tk oddity. The actual
printer seems to work OK.


No, the problem is that I'm just misreading the results.

I'll finish testing and commit this soon.



Re: [PATCH 3/5] Adjust predictor values according to SPEC2006 and, SPEC2017.

2018-01-09 Thread Jan Hubicka
> Third part changes predictors values:
> 
> 1) PRED_LOOP_EXIT: no dominant branch; value simply taken from statistics
> 2) PRED_LOOP_EXIT_WITH_RECURSION: there are 4 dominant edges; value taken w/o 
> these edges
> 3) PRED_LOOP_EXTRA_EXIT: there's one really dominant edge; value taken w/o 
> the edge; note that
> coverage of the predictor is quite small
> 4) PRED_OPCODE_POSITIVE: no dominant branch; value simply taken from 
> statistics
> 5) PRED_CONST_RETURN: there are 4 dominant edges, value taken w/o these edges
> 6) PRED_NULL_RETURN: one dominant edge, value taken w/o these edges
> 7) PRED_LOOP_IV_COMPARE_GUESS: very fragile predictor (according how is 
> identified in predict.c);
> has a dominant edge; value taken w/o these edges; in future I plan to 
> investigate it
> 8) PRED_LOOP_GUARD: adjusted based on numbers without a one dominant edge
> 
> Martin

> >From 960f16a6e3e916524d8881f53913c15a3c2ec2ae Mon Sep 17 00:00:00 2001
> From: marxin 
> Date: Thu, 28 Dec 2017 10:13:50 +0100
> Subject: [PATCH 3/5] Adjust predictor values according to SPEC2006 and
>  SPEC2017.
> 
> gcc/ChangeLog:
> 
> 2017-12-28  Martin Liska  
> 
>   * predict.def (PRED_LOOP_EXIT): Change from 85 to 89.
>   (PRED_LOOP_EXIT_WITH_RECURSION): Change from 72 to 78.
>   (PRED_LOOP_EXTRA_EXIT): Change from 83 to 67.
>   (PRED_OPCODE_POSITIVE): Change from 64 to 59.
>   (PRED_TREE_OPCODE_POSITIVE): Change from 64 to 59.
>   (PRED_CONST_RETURN): Change from 69 to 65.
>   (PRED_NULL_RETURN): Change from 91 to 71.
>   (PRED_LOOP_IV_COMPARE_GUESS): Change from 98 to 64.
>   (PRED_LOOP_GUARD): Change from 66 to 73.

OK,
please wait with commit for tomorrow as i have comitted today the fix to 
ipa-inline
I would like to hit nightly testers independently.

Thanks!
Honza
> ---
>  gcc/predict.def | 18 +-
>  1 file changed, 9 insertions(+), 9 deletions(-)
> 
> diff --git a/gcc/predict.def b/gcc/predict.def
> index 390b9a35fa7..fe72080d5bd 100644
> --- a/gcc/predict.def
> +++ b/gcc/predict.def
> @@ -89,16 +89,16 @@ DEF_PREDICTOR (PRED_COLD_FUNCTION, "cold function call", 
> PROB_VERY_LIKELY,
>  PRED_FLAG_FIRST_MATCH)
>  
>  /* Edge causing loop to terminate is probably not taken.  */
> -DEF_PREDICTOR (PRED_LOOP_EXIT, "loop exit", HITRATE (85),
> +DEF_PREDICTOR (PRED_LOOP_EXIT, "loop exit", HITRATE (89),
>  PRED_FLAG_FIRST_MATCH)
>  
>  /* Same as LOOP_EXIT but for loops containing recursive call.  */
>  DEF_PREDICTOR (PRED_LOOP_EXIT_WITH_RECURSION, "loop exit with recursion",
> -HITRATE (72), PRED_FLAG_FIRST_MATCH)
> +HITRATE (78), PRED_FLAG_FIRST_MATCH)
>  
>  /* Edge causing loop to terminate by computing value used by later
> conditional.  */
> -DEF_PREDICTOR (PRED_LOOP_EXTRA_EXIT, "extra loop exit", HITRATE (83),
> +DEF_PREDICTOR (PRED_LOOP_EXTRA_EXIT, "extra loop exit", HITRATE (67),
>  PRED_FLAG_FIRST_MATCH)
>  
>  /* Pointers are usually not NULL.  */
> @@ -106,11 +106,11 @@ DEF_PREDICTOR (PRED_POINTER, "pointer", HITRATE (70), 0)
>  DEF_PREDICTOR (PRED_TREE_POINTER, "pointer (on trees)", HITRATE (70), 0)
>  
>  /* NE is probable, EQ not etc...  */
> -DEF_PREDICTOR (PRED_OPCODE_POSITIVE, "opcode values positive", HITRATE (64), 
> 0)
> +DEF_PREDICTOR (PRED_OPCODE_POSITIVE, "opcode values positive", HITRATE (59), 
> 0)
>  DEF_PREDICTOR (PRED_OPCODE_NONEQUAL, "opcode values nonequal", HITRATE (66), 
> 0)
>  DEF_PREDICTOR (PRED_FPOPCODE, "fp_opcode", HITRATE (90), 0)
>  DEF_PREDICTOR (PRED_TREE_OPCODE_POSITIVE, "opcode values positive (on 
> trees)",
> -HITRATE (64), 0)
> +HITRATE (59), 0)
>  DEF_PREDICTOR (PRED_TREE_OPCODE_NONEQUAL, "opcode values nonequal (on 
> trees)",
>  HITRATE (66), 0)
>  DEF_PREDICTOR (PRED_TREE_FPOPCODE, "fp_opcode (on trees)", HITRATE (90), 0)
> @@ -136,18 +136,18 @@ DEF_PREDICTOR (PRED_TREE_EARLY_RETURN, "early return 
> (on trees)", HITRATE (66),
>  DEF_PREDICTOR (PRED_GOTO, "goto", HITRATE (66), 0)
>  
>  /* Branch ending with return constant is probably not taken.  */
> -DEF_PREDICTOR (PRED_CONST_RETURN, "const return", HITRATE (69), 0)
> +DEF_PREDICTOR (PRED_CONST_RETURN, "const return", HITRATE (65), 0)
>  
>  /* Branch ending with return negative constant is probably not taken.  */
>  DEF_PREDICTOR (PRED_NEGATIVE_RETURN, "negative return", HITRATE (98), 0)
>  
>  /* Branch ending with return; is probably not taken */
> -DEF_PREDICTOR (PRED_NULL_RETURN, "null return", HITRATE (91), 0)
> +DEF_PREDICTOR (PRED_NULL_RETURN, "null return", HITRATE (71), 0)
>  
>  /* Branches to compare induction variable to a loop bound is
> extremely likely.  */
>  DEF_PREDICTOR (PRED_LOOP_IV_COMPARE_GUESS, "guess loop iv compare",
> -HITRATE (98), 0)
> +HITRATE (64), 0)
>  
>  /* Use number of loop iterations determined by # of iterations analysis
> to set probability of branches that compares IV to loop bound variable.  
> */
> @@ -160,7 +160,7 @

Re: [PATCH] Clean up partitioning in try_optimize_cfg (PR bootstrap/82831).

2018-01-09 Thread Jan Hubicka
> Hello.
> 
> This is patch for i586 bootstrap that is triggered by bug in detail described
> in the PR.
> 
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
> 
> Ready to be installed?
> Martin
> 
> gcc/ChangeLog:
> 
> 2018-01-09  Martin Liska  
> 
>   PR bootstrap/82831
>   * basic-block.h (CLEANUP_NO_PARTITIONING): New define.
>   * bb-reorder.c (pass_reorder_blocks::execute): Do not clean up
>   partitioning.
>   * cfgcleanup.c (try_optimize_cfg): Fix up partitioning if
>   CLEANUP_NO_PARTITIONING is not set.

OK,
thanks!
Honza


Re: Make ivopts handle calls to internal functions

2018-01-09 Thread Richard Sandiford
Richard Biener  writes:
> On Mon, Nov 20, 2017 at 12:31 PM, Bin.Cheng  wrote:
>> On Fri, Nov 17, 2017 at 3:03 PM, Richard Sandiford
>>  wrote:
>>> ivopts previously treated pointer arguments to internal functions
>>> like IFN_MASK_LOAD and IFN_MASK_STORE as normal gimple values.
>>> This patch makes it treat them as addresses instead.  This makes
>>> a significant difference to the code quality for SVE loops,
>>> since we can then use loads and stores with scaled indices.
>> Thanks for working on this.  This can be extended to other internal
>> functions which eventually
>> are expanded into memory references.  I believe (at least) both x86
>> and AArch64 has such
>> requirement.
>
> In addition to Bins comments I only have a single one (the rest of the
> middle-end
> changes look OK).  The alias type of MEM_REFs and TARGET_MEM_REFs
> in ADDR_EXPR context is meaningless so you don't need to jump through hoops
> to get at it or preserve it in any way, likewise for CLIQUE/BASE if it
> were present.

Ah, OK.

> Maybe you can simplify code with this.

In the end it didn't really simplify the code, since internal-fn.c
uses the address to build a (TARGET_)MEM_REF, and the alias information
of that ref needs to be correct, since it gets carried across to the
MEM rtx.  But it does mean that the alias_ptr_type check in the previous:

  if (TREE_CODE (mem) == TARGET_MEM_REF
  && types_compatible_p (TREE_TYPE (mem), type)
  && alias_ptr_type == TREE_TYPE (TMR_OFFSET (mem))
  && integer_zerop (TMR_OFFSET (mem)))
return mem;

made no sense: we should simply replace the TMR_OFFSET if it has
the wrong type.

> As you're introducing &TARGET_MEM_REF as a valid construct (it weren't
> before) you'll run into missing / misguided foldings eventually.  So
> be prepared to fix up fallout.

OK :-) I haven't hit any new places yet, but like you say, I'll be on
the lookout.

Is the version below OK?  Tested on aarch64-linux-gnu, x86_64-linux-gnu
and powerpc64le-linux-gnu.

Richard


2018-01-09  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* expr.c (expand_expr_addr_expr_1): Handle ADDR_EXPRs of
TARGET_MEM_REFs.
* gimple-expr.h (is_gimple_addressable: Likewise.
* gimple-expr.c (is_gimple_address): Likewise.
* internal-fn.c (expand_call_mem_ref): New function.
(expand_mask_load_optab_fn): Use it.
(expand_mask_store_optab_fn): Likewise.

Index: gcc/expr.c
===
--- gcc/expr.c  2018-01-09 15:13:32.603106251 +
+++ gcc/expr.c  2018-01-09 15:13:32.784098242 +
@@ -7885,6 +7885,9 @@ expand_expr_addr_expr_1 (tree exp, rtx t
return expand_expr (tem, target, tmode, modifier);
   }
 
+case TARGET_MEM_REF:
+  return addr_for_mem_ref (exp, as, true);
+
 case CONST_DECL:
   /* Expand the initializer like constants above.  */
   result = XEXP (expand_expr_constant (DECL_INITIAL (exp),
Index: gcc/gimple-expr.h
===
--- gcc/gimple-expr.h   2018-01-09 15:13:32.603106251 +
+++ gcc/gimple-expr.h   2018-01-09 15:13:32.785098198 +
@@ -119,6 +119,7 @@ virtual_operand_p (tree op)
 is_gimple_addressable (tree t)
 {
   return (is_gimple_id (t) || handled_component_p (t)
+ || TREE_CODE (t) == TARGET_MEM_REF
  || TREE_CODE (t) == MEM_REF);
 }
 
Index: gcc/gimple-expr.c
===
--- gcc/gimple-expr.c   2018-01-09 15:13:32.603106251 +
+++ gcc/gimple-expr.c   2018-01-09 15:13:32.784098242 +
@@ -631,7 +631,9 @@ is_gimple_address (const_tree t)
   op = TREE_OPERAND (op, 0);
 }
 
-  if (CONSTANT_CLASS_P (op) || TREE_CODE (op) == MEM_REF)
+  if (CONSTANT_CLASS_P (op)
+  || TREE_CODE (op) == TARGET_MEM_REF
+  || TREE_CODE (op) == MEM_REF)
 return true;
 
   switch (TREE_CODE (op))
Index: gcc/internal-fn.c
===
--- gcc/internal-fn.c   2018-01-09 15:13:32.603106251 +
+++ gcc/internal-fn.c   2018-01-09 15:13:32.785098198 +
@@ -2412,15 +2412,53 @@ expand_LOOP_DIST_ALIAS (internal_fn, gca
   gcc_unreachable ();
 }
 
+/* Return a memory reference of type TYPE for argument INDEX of STMT.
+   Use argument INDEX + 1 to derive the second (TBAA) operand.  */
+
+static tree
+expand_call_mem_ref (tree type, gcall *stmt, int index)
+{
+  tree addr = gimple_call_arg (stmt, index);
+  tree alias_ptr_type = TREE_TYPE (gimple_call_arg (stmt, index + 1));
+  unsigned int align = tree_to_shwi (gimple_call_arg (stmt, index + 1));
+  if (TYPE_ALIGN (type) != align)
+type = build_aligned_type (type, align);
+
+  tree tmp = addr;
+  if (TREE_CODE (tmp) == SSA_NAME)
+{
+  gimple *def = SSA_NAME_DEF_STMT (tmp);
+  if (gimple_assign_single_p (def))
+   tmp = gimple_assign_rhs1 (def);
+}
+
+  if (TREE_COD

Re: [PATCH 0/3] Add __builtin_load_no_speculate

2018-01-09 Thread Richard Earnshaw (lists)
On 05/01/18 17:26, Jeff Law wrote:
> On 01/04/2018 11:20 AM, Joseph Myers wrote:
>> On Thu, 4 Jan 2018, Richard Earnshaw wrote:
>>
>>> 1 - generic modifications to GCC providing the builtin function for all
>>> architectures and expanding to an implementation that gives the
>>> logical behaviour of the builtin only.  A warning is generated if
>>> this expansion path is used that code will execute correctly but
>>> without providing protection against speculative use.
>>
>> Presumably it would make sense to have a standard way for architectures 
>> with no speculative execution to just use the generic version, but without 
>> the warning?  E.g., split up default_inhibit_load_speculation so that it 
>> generates the warning then calls another function with the same prototype, 
>> so such architectures can just define the hook to use that other function?
>>
> Seems wise.  There's still architectures that don't speculate or don't
> speculate enough to trigger these problems.
> 
> Jeff
> 

I'm in two minds about that.  There are certainly cases where you might
want to use the generic expansion as part of handling a case where you
have a more standard speculation barrier; in those cases you might want
to emit your barrier and then let the generic code expand and provide
the logical behaviour of the builtin.

However, if you don't have a barrier you need to ask yourself, will my
architecture ever have an implementation that does do speculation?
Unless you can be certain that it won't, you probably need to be warned
that some code the programmer thinks might be vulnerable to spectre has
not been compiled with that protection, otherwise if you run that code
on a later implementation that does speculate, it might be vulnerable.

Let me give an example, we use the generic code expansion if we
encounter the builtin when generating code for Thumb on pre-ARMv7
devices.  We don't have instructions in 'thumb1' to guard against
speculation and we really want to warn users that they've done this (it
might be a mistake in how they're invoking the compiler).

I could add an extra parameter to the helper (bool warn_unimplemented),
which would be true if called directly from the expanders, but would now
allow back-ends to implement a trivial variant that just suppressed the
warning.  They could also then use the generic expansion if all that was
needed was a subsequent fence instruction.

R.


Re: [C++ PATCH] Fix constexpr ICE with statement frontiers (PR c++/83734)

2018-01-09 Thread Jason Merrill
On Mon, Jan 8, 2018 at 6:55 PM, Jakub Jelinek  wrote:
> The assert there assumes we never evaluate a statement list to
> DEBUG_BEGIN_STMT, but it breaks appart when a BIND_EXPR with a typedef in it
> has some DEBUG_BEGIN_STMTs in it and nothing else (without -g it is just
> empty STATEMENT_LIST inside of the BIND_EXPR).  We want to return void_node
> in that case.
>
> THere is nothing interesting about DEBUG_BEGIN_STMT for constexpr
> evaluation, it doesn't change any values, so let's just ignore them
> and the assert is then unnecessary.

What if we return void_node for a DEBUG_BEGIN_STMT from
cxx_eval_constant_expression?

Jason


Re: [PATCH] Simplify floating point comparisons

2018-01-09 Thread Wilco Dijkstra
Richard Biener wrote:
>On Thu, Jan 4, 2018 at 10:27 PM, Marc Glisse  wrote:

>> I don't understand how the check you added helps.

It simply blocks the transformation for infinity:

+  (if (!REAL_VALUE_ISINF (TREE_REAL_CST (@0)))
+   (switch
+   (if (real_less (&dconst0, TREE_REAL_CST_PTR (@0)))
+(op @1 @2))
+   /* For C < 0, use the inverted operator.  */
+   (if (real_less (TREE_REAL_CST_PTR (@0), &dconst0))
+(neg_op @1 @2)))

I can't see what is potentially problematic here, it excludes infinity. I can 
replace
the flag_unsafe_math_optimizations with HONOR variants and then the infinity 
check is no longer required.


>>> We could change C / x > 0 to x >= 0 so the underflow case is included.
>>> However that still means x == 0.0 would behave differently - so the
>>> question is
>>> what exactly does -funsafe-math-optimization allow?
>>
>>
>> That is indeed unclear. HONOR_SIGNED_ZEROS, HONOR_NANS, HONOR_INFINITIES are
>> better defined and could also see some use.
>>
>> 1/X>=0: if X can be zero (and the inverse can be infinite), the result
>> depends on the sign of that zero. If !HONOR_SIGNED_ZEROS ||
>> !HONOR_INFINITIES, it looks like we can simplify to either X>0 or X>=0,
>> doesn't matter which.

Agreed.

>> 1/X>=0 is true iff X is not NaN and the sign bit is not set, so with
>> !HONOR_NANS it could also be turned into a bit test.

That's unlikely more efficient than a compare with zero.

>> It works the same for C/X>=0 with 0> difference is that X=infinity now gives false (if HONOR_NANS).

Yes that's why C=infinity is excluded in the latest version. However if we 
already
check HONOR_INFINITIES it wouldn't be needed.

>> C/X>0 is more tricky because small/big may round to zero. For C large enough
>> (assuming denormals, 1 is large enough for instance), the only new issue is
>> X=infinity. For smaller C, we start getting wrong results for finite values
>> of X, and presumably that's where flag_unsafe_math_optimizations comes into
>> play.

Well we can change C/X > 0 to X >= 0. That deals with underflow cases when
X is a normal, and results in the same cases that are wrong as C/X >= 0
(X = -0 and X = +Inf assuming C>0), so  !HONOR_SIGNED_ZEROS && 
!HONOR_INFINITIES should be sufficient for all cases.

>> If we consider flag_unsafe_math_optimizations as a kitchen sink that implies
>> !HONOR_SIGNED_ZEROS && !HONOR_INFINITIES, then you don't even need your
>> REAL_VALUE_ISINF test.

Indeed. Changing it to use more descriptive tests is a good idea indeed as it 
makes it
more explicit which cases are not supported.


> Without looking at the latest patch -funsafe-math-optimizations is a
> kitchen-sink
> we should avoid at all cost.  In the past we somewhat decided to break it up
> properly (-fassociative-math and friends have been added for example),
> and we really
> don't want to add more stuff under this umbrella (there's too much
> under it already).
> I'm not sure if C / x op 0.0 to x op 0.0 is association though (I'm
> quite sure it isn't),
> so citing association examples doesn't work for me here ;)

Marc's proposal to use HONOR_* instead of flag_unsafe_math_optimizations
should work, I'll give it a go. Maybe in the future we should aim to replace all
existing uses.

> It might be we need some new flag like -fassume-infinite-precision-math to
> cover the underflow issue?  We do have -ffinite-math-only but that
> shouldn't allow
> GCC to possibly introduce +-Inf for overflow - GCC just can assume there are
> no overflows in the original user expression.

We can avoid the underflow for normal numbers, so I don't think there is need
for a new flag.

Wilco


Re: Add an "early rematerialisation" pass

2018-01-09 Thread Richard Sandiford
Possible ping: wasn't sure whether this one needed more work or whether
it was OK to go in.  I've attached the patch with the improved comment
above early_remat::emit_copy_before.

Thanks,
Richard

Richard Sandiford  writes:
> Jeff Law  writes:
>> On 11/17/2017 08:58 AM, Richard Sandiford wrote:
>>> This patch looks for pseudo registers that are live across a call
>>> and for which no call-preserved hard registers exist.  It then
>>> recomputes the pseudos as necessary to ensure that they are no
>>> longer live across a call.  The comment at the head of the file
>>> describes the approach.
>>> 
>>> A new target hook selects which modes should be treated in this way.
>>> By default none are, in which case the pass is skipped very early.
>>> 
>>> It might also be worth looking for cases like:
>>> 
>>>C1: R1 := f (...)
>>>...
>>>C2: R2 := f (...)
>>>C3: R1 := C2
>>> 
>>> and giving the same value number to C1 and C3, effectively treating
>>> it like:
>>> 
>>>C1: R1 := f (...)
>>>...
>>>C2: R2 := f (...)
>>>C3: R1 := f (...)
>>> 
>>> Another (much more expensive) enhancement would be to apply value
>>> numbering to all pseudo registers (not just rematerialisation
>>> candidates), so that we can handle things like:
>>> 
>>>   C1: R1 := f (...R2...)
>>>   ...
>>>   C2: R1 := f (...R3...)
>>> 
>>> where R2 and R3 hold the same value.  But the current pass seems
>>> to catch the vast majority of cases.
>>> 
>>> Tested on aarch64-linux-gnu (with and without SVE), x86_64-linux-gnu
>>> and powerpc64le-linux-gnu.  OK to install?
>>> 
>>> Richard
>>> 
>>> 
>>> 2017-11-17  Richard Sandiford  
>>> 
>>> gcc/
>>> * Makefile.in (OBJS): Add early-remat.o.
>>> * target.def (select_early_remat_modes): New hook.
>>> * doc/tm.texi.in (TARGET_SELECT_EARLY_REMAT_MODES): New hook.
>>> * doc/tm.texi: Regenerate.
>>> * targhooks.h (default_select_early_remat_modes): Declare.
>>> * targhooks.c (default_select_early_remat_modes): New function.
>>> * timevar.def (TV_EARLY_REMAT): New timevar.
>>> * passes.def (pass_early_remat): New pass.
>>> * tree-pass.h (make_pass_early_remat): Declare.
>>> * early-remat.c: New file.
>>> * config/aarch64/aarch64.c (aarch64_select_early_remat_modes): New
>>> function.
>>> (TARGET_SELECT_EARLY_REMAT_MODES): Define.
>>> 
>>> gcc/testsuite/
>>> * gcc.target/aarch64/sve_spill_1.c: Also test that no predicates
>>> are spilled.
>>> * gcc.target/aarch64/sve_spill_2.c: New test.
>>> * gcc.target/aarch64/sve_spill_3.c: Likewise.
>>> * gcc.target/aarch64/sve_spill_4.c: Likewise.
>>> * gcc.target/aarch64/sve_spill_5.c: Likewise.
>>> * gcc.target/aarch64/sve_spill_6.c: Likewise.
>>> * gcc.target/aarch64/sve_spill_7.c: Likewise.
>> So it's not the immediate goal here, but it'd be nice if we could extend
>> this to cover all the other targets that have no registers of certain
>> classes that can be allocated across a call.
>>
>> Mostly this is exactly what I'd expect -- not to diminish the work,
>> there's a ton of stuff here.  But the overall flow of work and
>> organization is basically what I'd expect.
>
> That's good.  I wasn't setting out to do anything particularly
> innovative here, so the fact that it's not surprising is a positive
> sign. :-)
>
>> I know Richi was a bit scared off by the RTL nature, but the vast
>> majority of the code here isn't related to RTL -- it's operating on
>> properties that got extracted from the RTL.  I could almost take this
>> change bits at the start of the pass and the end and run it on gimple
>> :-)  Which brings up an interesting question, is that a worthwhile thing
>> to ponder?  Is it gimple optimizers that are causing objects to be live
>> across calls or is it usually teh RTL bits?
>
> It tends to be a mixture of both.  E.g. if we have two loops with
> the same bounds and a call inbetween, the gimple optimisers will reuse
> the WHILE_ULT in the first loop header for the second loop.  But there
> are also cases introduced in rtl.  E.g. we don't (and IMO shouldn't)
> expose to gimple that some SVE operations always require a predicate.
> The expanders instead generate all-true predicates where necessary.
> These are then an obvious candidate for PRE.
>
>> As I was reading one of the thoughts I had was that this reminded me a
>> bit of LCM.  WHich makes sense -- it's code motion after all.  I was
>> pleasantly surprised to see the comment explaining why the LCM
>> infrastructure was not used.
>
> Yeah, the availablity problem is basically the same.  The main
> difference is choosing the final placement based on block frequencies.
>
>> Do you have to do anything about rounding modes?  Or are those not an
>> issue for aarch64?  It's something I'm pretty sure we'd need to handle
>> for x86 as we have to ensure the rounding mode is stable if we move/copy
>> FP computations from one context to another.
>
> Rounding modes don't affect loads, stores or moves, which

Re: Smart pointer pretty printers

2018-01-09 Thread Jonathan Wakely

On 02/03/17 19:10 +0100, Juraj Oršulić wrote:

On Fri, Feb 24, 2017 at 5:36 PM, Jonathan Wakely  wrote:

For a patch this size we do, so I'm not going to look at your patch.
It will probably be simpler to just do it myself, and not worry about
paperwork.

If you want to contribue in future then please do complete the
necessary paperwork anyway:
https://gcc.gnu.org/onlinedocs/libstdc++/manual/appendix_contributing.html#contrib.list


Hi Jonathan,

I have completed the assignment.

I am resubmitting the patch from the last time, this time as an
attachment (since the previous was broken by newlines). As before,
there are two versions of the patch: the one that I obtained by editing
printers.py on Ubuntu 16.04, which ships with gcc 5.4.0; and the
other, for the current printers.py from the gcc repo (I assume it's gcc
6?), which needs a slightly different patch because the unique_ptr
printer has significantly changed.

I have been using the gcc 5 patch without problems.  I have not tested
the gcc 6 patch, but since it is very similar, I don't expect any
problems.

Let me know if you want me to expand this for other iterators.


I don't like the change for std::vector iterators, because it makes it
appear as though the iterator stores a pointer. Although that's true
internally, conceptually it's not how we should think about iterators.

An iterator refers to a value, not a pointer. However, the current
code has a serious flaw that it always dereferences the pointer, even
for the end iterator (which is https://gcc.gnu.org/PR59170 and is
worse than displaying the value as a pointer!)

There's certainly scope for improvement, but I'm not sure this is the
right solution.



@@ -322,9 +328,17 @@ class StdVectorIteratorPrinter:

def __init__(self, typename, val):
self.val = val
+self.pointer = self.val['_M_current']
+
+def children(self):
+if not self.pointer:
+return []
+return [('operator->()', self.pointer)]

def to_string(self):
-return self.val['_M_current'].dereference()
+if not self.pointer:
+ return 'non-dereferenceable iterator for std::vector'
+return ('std::vector<%s>::iterator' % 
(str(self.pointer.type.target(

class StdTuplePrinter:
"Print a std::tuple"




Re: Add support for in-order addition reduction using SVE FADDA

2018-01-09 Thread Richard Sandiford
Ping

Richard Sandiford  writes:
> Richard Biener  writes:
>> On Mon, Nov 20, 2017 at 1:54 PM, Richard Sandiford
>>  wrote:
>>> Richard Biener  writes:
 On Fri, Nov 17, 2017 at 5:53 PM, Richard Sandiford
  wrote:
> This patch adds support for in-order floating-point addition reductions,
> which are suitable even in strict IEEE mode.
>
> Previously vect_is_simple_reduction would reject any cases that forbid
> reassociation.  The idea is instead to tentatively accept them as
> "FOLD_LEFT_REDUCTIONs" and only fail later if there is no target
> support for them.  Although this patch only handles the particular
> case of plus and minus on floating-point types, there's no reason in
> principle why targets couldn't handle other cases.
>
> The vect_force_simple_reduction change makes it simpler for parloops
> to read the type of reduction.
>
> Tested on aarch64-linux-gnu (with and without SVE), x86_64-linux-gnu
> and powerpc64le-linux-gnu.  OK to install?

 I don't like that you add a new tree code for this.  A new IFN looks more
 suitable to me.
>>>
>>> OK.
>>
>> Thanks.  I'd like to eventually get rid of other vectorizer tree codes as 
>> well,
>> like the REDUC_*_EXPR, DOT_PROD_EXPR and SAD_EXPR.  IFNs
>> are now really the way to go for "target instructions on GIMPLE".
>>
 Also I think if there's a way to handle this correctly with target support
 you can also implement a fallback if there is no such support increasing
 test coverage.  It would basically boil down to extracting all scalars from
 the non-reduction operand vector and performing a series of reduction
 ops, keeping the reduction PHI scalar.  This would also support any
 reduction operator.
>>>
>>> Yeah, but without target support, that's probably going to be expensive.
>>> It's a bit like how we can implement element-by-element loads and stores
>>> for cases that don't have target support, but had to explicitly disable
>>> that in many cases, since the cost model was too optimistic.
>>
>> I expect that for V2DF or even V4DF it might be profitable in quite a number
>> of cases.  V2DF definitely.
>>
>>> I can give it a go anyway if you think it's worth it.
>>
>> I think it is.
>
> OK, done in the patch below.  Tested as before.
>
> Thanks,
> Richard

2017-11-21  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* optabs.def (fold_left_plus_optab): New optab.
* doc/md.texi (fold_left_plus_@var{m}): Document.
* internal-fn.def (IFN_FOLD_LEFT_PLUS): New internal function.
* internal-fn.c (fold_left_direct): Define.
(expand_fold_left_optab_fn): Likewise.
(direct_fold_left_optab_supported_p): Likewise.
* fold-const-call.c (fold_const_fold_left): New function.
(fold_const_call): Use it to fold CFN_FOLD_LEFT_PLUS.
* tree-parloops.c (valid_reduction_p): New function.
(gather_scalar_reductions): Use it.
* tree-vectorizer.h (FOLD_LEFT_REDUCTION): New vect_reduction_type.
(vect_finish_replace_stmt): Declare.
* tree-vect-loop.c (fold_left_reduction_code): New function.
(needs_fold_left_reduction_p): New function, split out from...
(vect_is_simple_reduction): ...here.  Accept reductions that
forbid reassociation, but give them type FOLD_LEFT_REDUCTION.
(vect_force_simple_reduction): Also store the reduction type in
the assignment's STMT_VINFO_REDUC_TYPE.
(vect_model_reduction_cost): Handle FOLD_LEFT_REDUCTION.
(merge_with_identity): New function.
(vectorize_fold_left_reduction): Likewise.
(vectorizable_reduction): Handle FOLD_LEFT_REDUCTION.  Leave the
scalar phi in place for it.  Check for target support and reject
cases that would reassociate the operation.  Defer the transform
phase to vectorize_fold_left_reduction.
* config/aarch64/aarch64.md (UNSPEC_FADDA): New unspec.
* config/aarch64/aarch64-sve.md (fold_left_plus_): New expander.
(*fold_left_plus_, *pred_fold_left_plus_): New insns.

gcc/testsuite/
* gcc.dg/vect/no-fast-math-vect16.c: Expect the test to pass and
check for a message about using in-order reductions.
* gcc.dg/vect/pr79920.c: Expect both loops to be vectorized and
check for a message about using in-order reductions.
* gcc.dg/vect/trapv-vect-reduc-4.c: Expect all three loops to be
vectorized and check for a message about using in-order reductions.
* gcc.dg/vect/vect-reduc-6.c: Expect the loop to be vectorized and
check for a message about using in-order reductions.
* gcc.dg/vect/vect-reduc-in-order-1.c: New test.
* gcc.dg/vect/vect-reduc-in-order-2.c: Likewise.
* gcc.dg/vect/vect-reduc-in-order-3.c: Likewise.
* gcc.dg/vect/vect-reduc-in-order-4.c: Likewise.
* gcc.target/aarch64/sve_reduc

[PATCH PR81228][AARCH64][gcc-7] Backport r255625 : Fix ICE by adding LTGT

2018-01-09 Thread Sudakshina Das

Hi

This patch is only adding the missing LTGT to plug the ICE. This is a 
backport to r255625 of trunk.


Testing done: Checked for regressions on bootstrapped 
aarch64-none-linux-gnu and added a new compile time test case that gives 
out LTGT to make sure it doesn't ICE.


Is this ok for trunk?

Thanks
Sudi

ChangeLog Entries:

*** gcc/ChangeLog ***

2018-01-09  Sudakshina Das  
Bin Cheng  

Backport from mainline:
2017-12-14  Sudakshina Das  
Bin Cheng  

PR target/81228
	* config/aarch64/aarch64.c (aarch64_select_cc_mode): Move LTGT to 
CCFPEmode.

* config/aarch64/aarch64-simd.md (vec_cmp): Add
LTGT.

*** gcc/testsuite/ChangeLog ***

2017-01-09  Sudakshina Das  

Backport from mainline:
2017-12-14  Sudakshina Das  

PR target/81228
* gcc.dg/pr81228.c: New.
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index c462164..1e0a346 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -2490,6 +2490,7 @@
 case UNEQ:
 case ORDERED:
 case UNORDERED:
+case LTGT:
   break;
 default:
   gcc_unreachable ();
@@ -2544,6 +2545,15 @@
   emit_insn (gen_one_cmpl2 (operands[0], operands[0]));
   break;
 
+case LTGT:
+  /* LTGT is not guranteed to not generate a FP exception.  So let's
+	 go the faster way : ((a > b) || (b > a)).  */
+  emit_insn (gen_aarch64_cmgt (operands[0],
+	 operands[2], operands[3]));
+  emit_insn (gen_aarch64_cmgt (tmp, operands[3], operands[2]));
+  emit_insn (gen_ior3 (operands[0], operands[0], tmp));
+  break;
+
 case UNORDERED:
   /* Operands are ORDERED iff (a > b || b >= a), so we can compute
 	 UNORDERED as !ORDERED.  */
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 436091a..db517ca 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -4664,13 +4664,13 @@ aarch64_select_cc_mode (RTX_CODE code, rtx x, rtx y)
 	case UNGT:
 	case UNGE:
 	case UNEQ:
-	case LTGT:
 	  return CCFPmode;
 
 	case LT:
 	case LE:
 	case GT:
 	case GE:
+	case LTGT:
 	  return CCFPEmode;
 
 	default:
diff --git a/gcc/testsuite/gcc.dg/pr81228.c b/gcc/testsuite/gcc.dg/pr81228.c
new file mode 100644
index 000..f7eecc5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr81228.c
@@ -0,0 +1,21 @@
+/* PR target/81228.  */
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-ssa" } */
+
+void *a;
+
+void b ()
+{
+  char c;
+  long d;
+  char *e = a;
+  for (; d; d++)
+  {
+double f, g;
+c = g < f || g > f;
+e[d] = c;
+  }
+}
+
+/* Let's make sure we do have a LTGT.  */
+/* { dg-final { scan-tree-dump "<>" "ssa" } } */


Re: [C++ PATCH] Fix constexpr ICE with statement frontiers (PR c++/83734)

2018-01-09 Thread Jakub Jelinek
On Tue, Jan 09, 2018 at 10:30:31AM -0500, Jason Merrill wrote:
> On Mon, Jan 8, 2018 at 6:55 PM, Jakub Jelinek  wrote:
> > The assert there assumes we never evaluate a statement list to
> > DEBUG_BEGIN_STMT, but it breaks appart when a BIND_EXPR with a typedef in it
> > has some DEBUG_BEGIN_STMTs in it and nothing else (without -g it is just
> > empty STATEMENT_LIST inside of the BIND_EXPR).  We want to return void_node
> > in that case.
> >
> > THere is nothing interesting about DEBUG_BEGIN_STMT for constexpr
> > evaluation, it doesn't change any values, so let's just ignore them
> > and the assert is then unnecessary.
> 
> What if we return void_node for a DEBUG_BEGIN_STMT from
> cxx_eval_constant_expression?

That would handle differently the case when some non-DEBUG_BEGIN_STMT
is followed by one or more DEBUG_BEGIN_STMTs.  In that case, with -g0
it would return the last stmt in the list, while with -g it would return
void_node.  I don't have a testcase for that right now, but given e.g.
earlier reports from Alex that DEBUG_BEGIN_STMTs appear virtually
everywhere, I can't prove it can't happen.

Jakub


RFA: Expand vec_perm_indices::series_p comment

2018-01-09 Thread Richard Sandiford
James Greenhalgh  writes:
> On Thu, Jan 04, 2018 at 11:27:56AM +, Richard Sandiford wrote:
>> Ping**2
>
> This is OK.

Thanks.

> It took me a while to get the hang of the interface - a worked example
> in the comment in vec-perm-indices.c would probably have been helpful.
> It took until your code for REV for this to really make sense to me; so
> perhaps that make for a good example.

Yeah, good idea.

Is the following OK?  Tested on aarch64-linux-gnu.

Thanks,
Richard


2018-01-09  Richard Sandiford  

gcc/
* vec-perm-indices.c (vec_perm_indices::series_p): Give examples
of usage.

Index: gcc/vec-perm-indices.c
===
--- gcc/vec-perm-indices.c  2018-01-03 11:12:55.709763763 +
+++ gcc/vec-perm-indices.c  2018-01-09 15:46:40.004232873 +
@@ -114,7 +114,18 @@ vec_perm_indices::rotate_inputs (int del
 }
 
 /* Return true if index OUT_BASE + I * OUT_STEP selects input
-   element IN_BASE + I * IN_STEP.  */
+   element IN_BASE + I * IN_STEP.  For example, the call to test
+   whether a permute reverses a vector of N elements would be:
+
+ series_p (0, 1, N - 1, -1)
+
+   which would return true for { N - 1, N - 2, N - 3, ... }.
+   The calls to test for an interleaving of elements starting
+   at N1 and N2 would be:
+
+ series_p (0, 2, N1, 1) && series_p (1, 2, N2, 1).
+
+   which would return true for { N1, N2, N1 + 1, N2 + 1, ... }.  */
 
 bool
 vec_perm_indices::series_p (unsigned int out_base, unsigned int out_step,


Re: [PATCH, combine]: Use correct mode for ASHIFT in force_int_to_mode

2018-01-09 Thread Segher Boessenkool
Hi!

On Mon, Jan 08, 2018 at 08:41:55PM +0100, Uros Bizjak wrote:
> Attached patch corrects wrong mode argument in the call to
> force_to_mode call for ASHIFT operator. The patch uses "mode" mode,
> the same as for all binop and unop operators in the force_int_to_mode
> function.
> 
> Also, the unpatched function would force operand to op_mode and later
> truncate to op_mode again, so it all looks like a typo to me.

Indeed.

> 2018-01-08  Uros Bizjak  
> 
> PR target/83628
> * combine.c (force_int_to_mode) : Use mode instead of
> op_mode in the force_to_mode call.
> 
> Together with a follow-up target patch, the patch fixes
> gcc.target/alpha/pr83628-2.c scan-asm failures on alpha.
> 
> 2018-01-08  Uros Bizjak  
> 
> PR target/83628
> * combine.c (force_int_to_mode) : Use mode instead of
> op_mode in the force_to_mode call.
> 
> Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
> 
> OK for mainline and branches?

OK for trunk; OK for the branches after a suitable burn-in period.
Thanks,


Segher


> diff --git a/gcc/combine.c b/gcc/combine.c
> index 3a42de53455c..6adc0a7d6f85 100644
> --- a/gcc/combine.c
> +++ b/gcc/combine.c
> @@ -8908,7 +8908,7 @@ force_int_to_mode (rtx x, scalar_int_mode mode,
> scalar_int_mode xmode,
>  mask = fuller_mask;
> 
>op0 = gen_lowpart_or_truncate (op_mode,
> - force_to_mode (XEXP (x, 0), op_mode,
> + force_to_mode (XEXP (x, 0), mode,
>  mask, next_select));
> 
>if (op_mode != xmode || op0 != XEXP (x, 0))


Re: [PATCH 1/3] [builtins] Generic support for __builtin_load_no_speculate()

2018-01-09 Thread Bill Schmidt
On Jan 9, 2018, at 4:21 AM, Richard Earnshaw (lists)  
wrote:
> 
> On 08/01/18 16:01, Bill Schmidt wrote:
>> On Jan 8, 2018, at 8:06 AM, Richard Earnshaw (lists) 
>>  wrote:
>>> 
>>> On 08/01/18 02:20, Bill Schmidt wrote:
 Hi Richard,
 
 Unfortunately, I don't see any way that this will be useful for the ppc 
 targets.  We don't
 have a way to force resolution of a condition prior to continuing 
 speculation, so this
 will just introduce another comparison that we would speculate past.  For 
 our mitigation
 we will have to introduce an instruction that halts all speculation at 
 that point, and place
 it in front of all dangerous loads.  I wish it were otherwise.
>>> 
>>> So can't you make the builtin expand to (in pseudo code):
>>> 
>>> if (bounds_check)
>>>   {
>>> __asm ("barrier");
>>> result = *ptr;
>>> }
>>>   else
>>>   result = failval;
>> 
>> Could, but this just generates unnecessary code for Power.  We would instead 
>> generate
>> 
>>  __asm ("barrier");
>>  result = *ptr;
>> 
>> without any checks.  We would ignore everything but the first argument.
> 
> You can't do that with the builtin as it is currently specified as it
> also has a defined behaviour for the result it returns.  You can,
> however, expand the code as normal RTL and let the optimizers remove any
> redundant code if they can make that deduction and you don't need the
> additional behaviour.

But that's my original point.  "As currently specified" is overspecified for
our architecture, and expanding the extra code *hoping* it will go away
is not something I feel we should do.  If our hopes are dashed, we end up
with yet worse performance.  If we're going to use something generic
for everyone, then I argue that the semantics may not be the same for
all targets, and that this needs to be specified in the documentation.

I'm just looking for a solution that works for everyone.

Thanks,
Bill

> 
> R.
> 
>> 
>> Thanks,
>> Bill
>> 
>>> 
>>> R.
>>> 
 
 Thanks,
 Bill
 
> On Jan 4, 2018, at 7:58 AM, Richard Earnshaw  
> wrote:
> 
> 
> This patch adds generic support for the new builtin
> __builtin_load_no_speculate.  It provides the overloading of the
> different access sizes and a default fall-back expansion for targets
> that do not support a mechanism for inhibiting speculation.
> 
>   * builtin_types.def (BT_FN_I1_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR):
>   New builtin type signature.
>   (BT_FN_I2_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.
>   (BT_FN_I4_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.
>   (BT_FN_I8_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.
>   (BT_FN_I16_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.
>   * builtins.def (BUILT_IN_LOAD_NO_SPECULATE_N): New builtin.
>   (BUILT_IN_LOAD_NO_SPECULATE_1): Likewise.
>   (BUILT_IN_LOAD_NO_SPECULATE_2): Likewise.
>   (BUILT_IN_LOAD_NO_SPECULATE_4): Likewise.
>   (BUILT_IN_LOAD_NO_SPECULATE_8): Likewise.
>   (BUILT_IN_LOAD_NO_SPECULATE_16): Likewise.
>   * target.def (inhibit_load_speculation): New hook.
>   * doc/tm.texi.in (TARGET_INHIBIT_LOAD_SPECULATION): Add to
>   documentation.
>   * doc/tm.texi: Regenerated.
>   * doc/cpp.texi: Document __HAVE_LOAD_NO_SPECULATE.
>   * doc/extend.texi: Document __builtin_load_no_speculate.
>   * c-family/c-common.c (load_no_speculate_resolve_size): New function.
>   (load_no_speculate_resolve_params): New function.
>   (load_no_speculate_resolve_return): New function.
>   (resolve_overloaded_builtin): Handle overloading
>   __builtin_load_no_speculate.
>   * builtins.c (expand_load_no_speculate): New function.
>   (expand_builtin): Handle new no-speculation builtins.
>   * targhooks.h (default_inhibit_load_speculation): Declare.
>   * targhooks.c (default_inhibit_load_speculation): New function.
> ---
> gcc/builtin-types.def   |  16 +
> gcc/builtins.c  |  99 ++
> gcc/builtins.def|  22 ++
> gcc/c-family/c-common.c | 164 
> 
> gcc/c-family/c-cppbuiltin.c |   5 +-
> gcc/doc/cpp.texi|   4 ++
> gcc/doc/extend.texi |  53 ++
> gcc/doc/tm.texi |   6 ++
> gcc/doc/tm.texi.in  |   2 +
> gcc/target.def  |  20 ++
> gcc/targhooks.c |  69 +++
> gcc/targhooks.h |   3 +
> 12 files changed, 462 insertions(+), 1 deletion(-)
> 
> <0001-builtins-Generic-support-for-__builtin_load_no_specu.patch>



Re: [PATCH] Implementation of RANDOM_INIT from F2018

2018-01-09 Thread Steve Kargl
On Mon, Jan 08, 2018 at 06:51:06PM -0800, Jerry DeLisle wrote:
> On 01/08/2018 04:58 PM, Steve Kargl wrote:
> > 
> > Boostrapped and regression tested on x86_64-*-freebsd.
> > OK to commit?
> > 
> 
> Yes, Looks good Steve.  So all we need is a run test with actual =lib case.
> 

Just realized that I forgot to update the documentation.
I'll add a RANDOM_INIT section before committing.

-- 
Steve


Go patch committed: Use macro Unordered_map instead of std::unordered_map

2018-01-09 Thread Ian Lance Taylor
This patch by Cherry Zhang fixes a case in the Go frontend that was
using std::unodered_map where it should use the macro Unordered_map
for more portability.  Bootstrapped on x86_64-pc-linux-gnu.  Committed
to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 256366)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-dbc0c7e4329aada2ae3554c20cfb8cfa48041213
+0445dc01fd75325ff99f839cfaab29cb9f2a1f97
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/escape.cc
===
--- gcc/go/gofrontend/escape.cc (revision 256366)
+++ gcc/go/gofrontend/escape.cc (working copy)
@@ -858,7 +858,7 @@ Gogo::analyze_escape()
 
   // Propagate levels across each dst.  This is the flood phase.
   std::set dsts = context->dsts();
-  std::unordered_map escapes;
+  Unordered_map(Node*, int) escapes;
   for (std::set::iterator n = dsts.begin();
n != dsts.end();
++n)


Re: [PATCH 0/3] Add __builtin_load_no_speculate

2018-01-09 Thread Bernd Edlinger
Richard Earnshaw wrote:
 > Let me give an example, we use the generic code expansion if we
 > encounter the builtin when generating code for Thumb on pre-ARMv7
 > devices.  We don't have instructions in 'thumb1' to guard against
 > speculation and we really want to warn users that they've done this (it
 > might be a mistake in how they're invoking the compiler).

Instead of this warning in one of the unsupported cases, could you use 
the DSB SYS + ISB  barrier as the white paper suggests?


Bernd.


Re: [PATCH 1/5][AArch64] Crypto command line split

2018-01-09 Thread James Greenhalgh
On Wed, Jan 03, 2018 at 05:21:27PM +, Michael Collison wrote:
> Hi all,
> 
> This patch adds two new command line options for the legacy cryptographic
> extensions AES (+aes) and SHA-1/SHA-2 (+sha2). Backward compatibility is
> retained by modifying the +crypto feature modifier to enable +aes and +sha2.
> 
> Bootstrapped on aarch64-none-elf. Tested with new binutils and verified all
> instructions assembly correctly.

I'm a bit confused by this testing statement. aarch64-none-elf is not a
bootstrap target. Was this bootstrapped or only tested with a cross-compiler
(or both)? Was the testsuite also run? I don't see any new instructions in
this patch that would need a new binutils.

The patch is OK, but please clarify how it has been tested.

Thanks,
James



Re: [PATCH 0/3] Add __builtin_load_no_speculate

2018-01-09 Thread Richard Earnshaw (lists)
On 09/01/18 17:36, Bernd Edlinger wrote:
> Richard Earnshaw wrote:
>  > Let me give an example, we use the generic code expansion if we
>  > encounter the builtin when generating code for Thumb on pre-ARMv7
>  > devices.  We don't have instructions in 'thumb1' to guard against
>  > speculation and we really want to warn users that they've done this (it
>  > might be a mistake in how they're invoking the compiler).
> 
> Instead of this warning in one of the unsupported cases, could you use
> the DSB SYS + ISB  barrier as the white paper suggests?
> 
> 
> Bernd.

Thumb1 doesn't have those instructions.

R.


Re: [PATCH 2/5][AArch64] Add v8.4 architecture

2018-01-09 Thread James Greenhalgh
On Wed, Jan 03, 2018 at 05:25:17PM +, Michael Collison wrote:
> Hi all,
> 
> This patch adds support for the Arm architecture v8.4. A new command line
> option, -march=armv8.4-a, is added as well as documentation.
> 
> Bootstrapped on aarch64-none-elf. Tested with new binutils and verified all
> instructions assembly correctly.

OK.

Thanks,
James



Re: [PATCH 2/5] libstdc++ futex: Use FUTEX_CLOCK_REALTIME for wait

2018-01-09 Thread Mike Crowe
On Tuesday 09 January 2018 at 13:50:54 +, Jonathan Wakely wrote:
> On 07/01/18 20:55 +, Mike Crowe wrote:
> > The futex system call supports waiting for an absolute time if
> > FUTEX_WAIT_BITSET is used rather than FUTEX_WAIT. Doing so provides two
> > benefits:
> > 
> > 1. The call to gettimeofday is not required in order to calculate a
> >   relative timeout.
> > 
> > 2. If someone changes the system clock during the wait then the futex
> >   timeout will correctly expire earlier or later. Currently that only
> >   happens if the clock is changed prior to the call to gettimeofday.
> > 
> > According to futex(2), support for FUTEX_CLOCK_REALTIME was added in the
> > v2.6.28 Linux kernel and FUTEX_WAIT_BITSET was added in v2.6.25. There is
> > no attempt to detect the kernel version and fall back to the previous
> > method.
> 
> I don't think we can require a specific kernel version just for this.

What is the minimum kernel version that libstdc++ requires? Since it's
already relying on NPTL it can't go back earlier than v2.6, but I suppose
that's a while before v2.6.28.

> What happens if you use those bits on an older kernel, will there be
> an ENOSYS error? Because that would allow us to try the new form, and
> fallback to the old.

Glibc's nptl-init.c calls

 futex(.., FUTEX_WAIT_BITSET | FUTEX_CLOCK_REALTIME | FUTEX_PRIVATE_FLAG, ..)

and sets __have_futex_clock_realtime based on whether it gets ENOSYS back
or not so it looks like it is possible to determine whether support is
available.

The only such check I can find in libstdc++ is in filesystem/std-ops.cc
fs::do_copy_file which can try sendfile(2) on each invocation but then
automatically falls back to copying by steam. In that case, the overhead of
the potentially-failing sendfile system call is small compared to copying
the file.

Doing such a check in _M_futex_wait_until means one system call if
FUTEX_CLOCK_REALTIME is supported, but three system calls if it is not
supported. If we found a way to cache the answer in a thread-safe and cheap
manner then this could be avoided.

Do you think it's worth trying to cache whether FUTEX_CLOCK_REALTIME is
available?

Thanks.

Mike.


Re: [PATCH 3/5][AArch64] Crypto SM4 Support

2018-01-09 Thread James Greenhalgh
On Wed, Jan 03, 2018 at 05:25:57PM +, Michael Collison wrote:
> Hi All,
> 
> This patch adds support for the SM3/SM4 cryptographic instructions added in
> Armv8.4-a. Support for the new instructions is in the form of new ACLE
> intrinsics. A new command line feature modifier, +sm4, is added to enable the
> support.
> 
> Test cases were added to verify that the ACLE Intrinsics generate the
> appropriate SM3/SM4 assembly instructions.
> 
> Bootstrapped on aarch64-none-elf. Tested with new binutils and verified all
> instructions assembly correctly.
> 
> Okay for trunk?

OK.

Thanks,
James

> 2017-11-10  Michael Collison  
> 
>   * config/aarch64/aarch64-builtins.c:
>   (aarch64_types_quadopu_imm_qualifiers, TYPES_QUADOPUI): New.
>   * config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins):
>   (__ARM_FEATURE_SM3): Define if TARGET_SM4 is true.
>   (__ARM_FEATURE_SM4): Define if TARGET_SM4 is true.
>   * config/aarch64/aarch64.h (AARCH64_FL_SM4): New flags.
>   (AARCH64_ISA_SM4): New ISA flag.
>   (TARGET_SM4): New feature flag for sm4.
>   * config/aarch64/aarch64-simd-builtins.def
>   (aarch64_sm3ss1qv4si): Ditto.
>   (aarch64_sm3tt1aq4si): Ditto.
>   (aarch64_sm3tt1bq4si): Ditto.
>   (aarch64_sm3tt2aq4si): Ditto.
>   (aarch64_sm3tt2bq4si): Ditto.
>   (aarch64_sm3partw1qv4si): Ditto.
>   (aarch64_sm3partw2qv4si): Ditto.
>   (aarch64_sm4eqv4si): Ditto.
>   (aarch64_sm4ekeyqv4si): Ditto.
>   * config/aarch64/aarch64-simd.md:
>   (aarch64_sm3ss1qv4si): Ditto.
>   (aarch64_sm3ttqv4si): Ditto.
>   (aarch64_sm3partwqv4si): Ditto.
>   (aarch64_sm4eqv4si): Ditto.
>   (aarch64_sm4ekeyqv4si): Ditto.
>   * config/aarch64/iterators.md (sm3tt_op): New int iterator.
>   (sm3part_op): Ditto.
>   (CRYPTO_SM3TT): Ditto.
>   (CRYPTO_SM3PART): Ditto.
>   (UNSPEC_SM3SS1): New unspec.
>   (UNSPEC_SM3TT1A): Ditto.
>   (UNSPEC_SM3TT1B): Ditto.
>   (UNSPEC_SM3TT2A): Ditto.
>   (UNSPEC_SM3TT2B): Ditto.
>   (UNSPEC_SM3PARTW1): Ditto.
>   (UNSPEC_SM3PARTW2): Ditto.
>   (UNSPEC_SM4E): Ditto.
>   (UNSPEC_SM4EKEY): Ditto.
>   * config/aarch64/constraints.md (Ui2): New constraint.
>   * config/aarch64/predicates.md (aarch64_imm2): New predicate.
>   * config/arm/types.md (crypto_sm3): New type attribute.
>   (crypto_sm4): Ditto.
>   * config/aarch64/arm_neon.h (vsm3ss1q_u32): New intrinsic.
>   (vsm3tt1aq_u32): Ditto.
>   (vsm3tt1bq_u32): Ditto.
>   (vsm3tt2aq_u32): Ditto.
>   (vsm3tt2bq_u32): Ditto.
>   (vsm3partw1q_u32): Ditto.
>   (vsm3partw2q_u32): Ditto.
>   (vsm4eq_u32): Ditto.
>   (vsm4ekeyq_u32): Ditto.
>   (doc/invoke.texi): Document new sm4 option.
>   gcc.target/aarch64/sm3_sm4.c: New testcase.




Re: [PATCH 0/3] Add __builtin_load_no_speculate

2018-01-09 Thread Bernd Edlinger
On 01/09/18 18:50, Richard Earnshaw (lists) wrote:
> On 09/01/18 17:36, Bernd Edlinger wrote:
>> Richard Earnshaw wrote:
>>   > Let me give an example, we use the generic code expansion if we
>>   > encounter the builtin when generating code for Thumb on pre-ARMv7
>>   > devices.  We don't have instructions in 'thumb1' to guard against
>>   > speculation and we really want to warn users that they've done this (it
>>   > might be a mistake in how they're invoking the compiler).
>>
>> Instead of this warning in one of the unsupported cases, could you use
>> the DSB SYS + ISB  barrier as the white paper suggests?
>>
>>
>> Bernd.
>
> Thumb1 doesn't have those instructions.
> 

Ah, well, and in the case mode == TImode ?

Bernd.


Re: [PATCH 0/3] Add __builtin_load_no_speculate

2018-01-09 Thread Richard Earnshaw (lists)
On 09/01/18 17:57, Bernd Edlinger wrote:
> On 01/09/18 18:50, Richard Earnshaw (lists) wrote:
>> On 09/01/18 17:36, Bernd Edlinger wrote:
>>> Richard Earnshaw wrote:
>>>   > Let me give an example, we use the generic code expansion if we
>>>   > encounter the builtin when generating code for Thumb on pre-ARMv7
>>>   > devices.  We don't have instructions in 'thumb1' to guard against
>>>   > speculation and we really want to warn users that they've done this (it
>>>   > might be a mistake in how they're invoking the compiler).
>>>
>>> Instead of this warning in one of the unsupported cases, could you use
>>> the DSB SYS + ISB  barrier as the white paper suggests?
>>>
>>>
>>> Bernd.
>>
>> Thumb1 doesn't have those instructions.
>> 
> 
> Ah, well, and in the case mode == TImode ?
> 
> Bernd.

I don't think we can ever get a TImode integral type on AArch32.
Perhaps we should just ICE in that case...

R.


Re: [PATCH 4/5][AArch64] Crypto sha512 and sha3

2018-01-09 Thread James Greenhalgh
On Wed, Jan 03, 2018 at 05:30:33PM +, Michael Collison wrote:
> Hi All,
> 
> This patch adds support for the SHA-512 and SHA-3 instructions added in
> Armv8.4-a. Support for the new instructions is in the form of new ACLE
> intrinsics. A new command line feature modifier, +sha3, is added to enable
> the support.
> 
> Test cases were added to verify that the ACLE Intrinsics generate the
> appropriate SHA-512/SHA-3 assembly instructions.
> 
> Bootstrapped on aarch64-none-elf. Tested with new binutils and verified all
> instructions assembly correctly.
> 
> Okay for trunk?

OK.

Thanks,
James

> 
> 2017-11-10  Michael Collison  
> 
>   * config/aarch64/aarch64-builtins.c:
>   (aarch64_types_ternopu_imm_qualifiers, TYPES_TERNOPUI): New.
>   * config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins):
>   (__ARM_FEATURE_SHA3): Define if TARGET_SHA3 is true.
>   * config/aarch64/aarch64.h (AARCH64_FL_SHA3): New flags.
>   (AARCH64_ISA_SHA3): New ISA flag.
>   (TARGET_SHA3): New feature flag for sha3.
>   * config/aarch64/iterators.md (sha512_op): New int attribute.
>   (CRYPTO_SHA512): New int iterator.
>   (UNSPEC_SHA512H): New unspec.
>   (UNSPEC_SHA512H2): Ditto.
>   (UNSPEC_SHA512SU0): Ditto.
>   (UNSPEC_SHA512SU1): Ditto.
>   * config/aarch64/aarch64-simd-builtins.def
>   (aarch64_crypto_sha512hqv2di): New builtin.
>   (aarch64_crypto_sha512h2qv2di): Ditto.
>   (aarch64_crypto_sha512su0qv2di): Ditto.
>   (aarch64_crypto_sha512su1qv2di): Ditto.
>   (aarch64_eor3qv8hi): Ditto.
>   (aarch64_rax1qv2di): Ditto.
>   (aarch64_xarqv2di): Ditto.
>   (aarch64_bcaxqv8hi): Ditto.
>   * config/aarch64/aarch64-simd.md:
>   (aarch64_crypto_sha512hqv2di): New pattern.
>   (aarch64_crypto_sha512su0qv2di): Ditto.
>   (aarch64_crypto_sha512su1qv2di): Ditto.
>   (aarch64_eor3qv8hi): Ditto.
>   (aarch64_rax1qv2di): Ditto.
>   (aarch64_xarqv2di): Ditto.
>   (aarch64_bcaxqv8hi): Ditto.
>   * config/aarch64/arm_neon.h (vsha512hq_u64): New intrinsic.
>   (vsha512h2q_u64): Ditto.
>   (vsha512su0q_u64): Ditto.
>   (vsha512su1q_u64): Ditto.
>   (veor3q_u16): Ditto.
>   (vrax1q_u64): Ditto.
>   (vxarq_u64): Ditto.
>   (vbcaxq_u16): Ditto.
>   * config/arm/types.md (crypto_sha512): New type attribute.
>   (crypto_sha3): Ditto.
>   (doc/invoke.texi): Document new sha3 option.
>   gcc.target/aarch64/sha2.h: New shared testcase.
>   gcc.target/aarch64/sha2_1.c: New testcase.
>   gcc.target/aarch64/sha2_2.c: New testcase.
>   gcc.target/aarch64/sha2_3.c: New testcase.
>   gcc.target/aarch64/sha3.h: New shared testcase.
>   gcc.target/aarch64/sha3_1.c: New testcase.
>   gcc.target/aarch64/sha3_2.c: New testcase.
>   gcc.target/aarch64/sha3_3.c: New testcase.




Re: [PATCH][GCC][AArch64] Enable dotproduct by default for Armv8.4-a

2018-01-09 Thread James Greenhalgh
On Tue, Jan 09, 2018 at 10:36:23AM +, Tamar Christina wrote:
> Hi All,
> 
> This patch makes the Dot Product extension mandatory on Armv8.4-A.
> 
> Regtested on aarch64-none-elf and no regressions.

OK.

Thanks,
James

> gcc/
> 2018-01-09  Tamar Christina  
> 
>   * config/aarch64/aarch64.h
>   (AARCH64_FL_FOR_ARCH8_4): Add  AARCH64_FL_DOTPROD.
> 
> gcc/testsuite/
> 2018-01-09  Tamar Christina  
> 
>   * gcc.target/aarch64/advsimd-intrinsics/vdot-compile-2.c: New.
> 
> Ok for trunk?
> 
> Thanks,
> Tamar
> 
> -- 

> diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
> index 
> 20cef53079c955733db105331b7f1bd3bdb03005..02546e278ed860f0e42fa1423ceb39f1d4cb5e9b
>  100644
> --- a/gcc/config/aarch64/aarch64.h
> +++ b/gcc/config/aarch64/aarch64.h
> @@ -171,7 +171,7 @@ extern unsigned aarch64_architecture_version;
>  #define AARCH64_FL_FOR_ARCH8_3   \
>(AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_V8_3)
>  #define AARCH64_FL_FOR_ARCH8_4   \
> -  (AARCH64_FL_FOR_ARCH8_3 | AARCH64_FL_V8_4)
> +  (AARCH64_FL_FOR_ARCH8_3 | AARCH64_FL_V8_4 | AARCH64_FL_DOTPROD)
>  
>  /* Macros to test ISA flags.  */
>  
> diff --git 
> a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vdot-compile-2.c 
> b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vdot-compile-2.c
> new file mode 100644
> index 
> ..7d8d641bcf0f4cc31b9df51b0c0c499bc8dedfce
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vdot-compile-2.c
> @@ -0,0 +1,73 @@
> +/* { dg-do compile { target { aarch64*-*-* } } } */
> +/* { dg-additional-options "-O3 -march=armv8.4-a" } */
> +
> +#include 
> +
> +/* Unsigned Dot Product instructions.  */
> +
> +uint32x2_t ufoo (uint32x2_t r, uint8x8_t x, uint8x8_t y)
> +{
> +  return vdot_u32 (r, x, y);
> +}
> +
> +uint32x4_t ufooq (uint32x4_t r, uint8x16_t x, uint8x16_t y)
> +{
> +  return vdotq_u32 (r, x, y);
> +}
> +
> +uint32x2_t ufoo_lane (uint32x2_t r, uint8x8_t x, uint8x8_t y)
> +{
> +  return vdot_lane_u32 (r, x, y, 0);
> +}
> +
> +uint32x2_t ufoo_laneq (uint32x2_t r, uint8x8_t x, uint8x16_t y)
> +{
> +  return vdot_laneq_u32 (r, x, y, 0);
> +}
> +
> +uint32x4_t ufooq_lane (uint32x4_t r, uint8x16_t x, uint8x8_t y)
> +{
> +  return vdotq_lane_u32 (r, x, y, 0);
> +}
> +
> +uint32x4_t ufooq_laneq (uint32x4_t r, uint8x16_t x, uint8x16_t y)
> +{
> +  return vdotq_laneq_u32 (r, x, y, 0);
> +}
> +
> +/* Signed Dot Product instructions.  */
> +
> +int32x2_t sfoo (int32x2_t r, int8x8_t x, int8x8_t y)
> +{
> +  return vdot_s32 (r, x, y);
> +}
> +
> +int32x4_t sfooq (int32x4_t r, int8x16_t x, int8x16_t y)
> +{
> +  return vdotq_s32 (r, x, y);
> +}
> +
> +int32x2_t sfoo_lane (int32x2_t r, int8x8_t x, int8x8_t y)
> +{
> +  return vdot_lane_s32 (r, x, y, 0);
> +}
> +
> +int32x2_t sfoo_laneq (int32x2_t r, int8x8_t x, int8x16_t y)
> +{
> +  return vdot_laneq_s32 (r, x, y, 0);
> +}
> +
> +int32x4_t sfooq_lane (int32x4_t r, int8x16_t x, int8x8_t y)
> +{
> +  return vdotq_lane_s32 (r, x, y, 0);
> +}
> +
> +int32x4_t sfooq_laneq (int32x4_t r, int8x16_t x, int8x16_t y)
> +{
> +  return vdotq_laneq_s32 (r, x, y, 0);
> +}
> +
> +/* { dg-final { scan-assembler-times {[us]dot\tv[0-9]+\.2s, v[0-9]+\.8b, 
> v[0-9]+\.8b} 2 } } */
> +/* { dg-final { scan-assembler-times {[us]dot\tv[0-9]+\.2s, v[0-9]+\.8b, 
> v[0-9]+\.4b\[[0-9]+\]}  4 } } */
> +/* { dg-final { scan-assembler-times {[us]dot\tv[0-9]+\.4s, v[0-9]+\.16b, 
> v[0-9]+\.16b}  2 } } */
> +/* { dg-final { scan-assembler-times {[us]dot\tv[0-9]+\.4s, v[0-9]+\.16b, 
> v[0-9]+\.4b\[[0-9]+\]}  4 } } */
> 



Re: [PR/middle-end 81897] make tree-ssa-uninit.c handle longer sequences

2018-01-09 Thread Aldy Hernandez



On 01/07/2018 12:32 AM, Jeff Law wrote:

On 01/06/2018 01:01 AM, Jeff Law wrote:

On 01/05/2018 01:59 PM, Aldy Hernandez wrote:



I went ahead and fixed up the empty block test, and the invalidation
code above.  Bootstrapped and regression tested on x86_64.  Installing
on the trunk.


Thanks.

Aldy


Re: [PATCH 0/3] Add __builtin_load_no_speculate

2018-01-09 Thread Richard Earnshaw (lists)
On 09/01/18 18:00, Richard Earnshaw (lists) wrote:
> On 09/01/18 17:57, Bernd Edlinger wrote:
>> On 01/09/18 18:50, Richard Earnshaw (lists) wrote:
>>> On 09/01/18 17:36, Bernd Edlinger wrote:
 Richard Earnshaw wrote:
   > Let me give an example, we use the generic code expansion if we
   > encounter the builtin when generating code for Thumb on pre-ARMv7
   > devices.  We don't have instructions in 'thumb1' to guard against
   > speculation and we really want to warn users that they've done this (it
   > might be a mistake in how they're invoking the compiler).

 Instead of this warning in one of the unsupported cases, could you use
 the DSB SYS + ISB  barrier as the white paper suggests?


 Bernd.
>>>
>>> Thumb1 doesn't have those instructions.
>>> 
>> 
>> Ah, well, and in the case mode == TImode ?
>> 
>> Bernd.
> 
> I don't think we can ever get a TImode integral type on AArch32.
> Perhaps we should just ICE in that case...
> 
> R.

But we probably don't need to.

 int x __attribute__((mode(TI))) = 0;

$ arm-none-elf-gcc timode.c

/tmp/ti.c:1:1: error: unable to emulate ‘TI’
 int x __attribute__((mode(TI))) = 0;
 ^~~


Re: [PATCH 0/3] Add __builtin_load_no_speculate

2018-01-09 Thread Bernd Edlinger
On 01/09/18 19:08, Richard Earnshaw (lists) wrote:
> 
> But we probably don't need to.
> 
>   int x __attribute__((mode(TI))) = 0;
> 
> $ arm-none-elf-gcc timode.c
> 
> /tmp/ti.c:1:1: error: unable to emulate ‘TI’
>   int x __attribute__((mode(TI))) = 0;
>   ^~~
> 

Yes, good, then you should probably do this:

+  if (TARGET_THUMB1)
+return default_inhibit_load_speculation (mode, result, mem, 
lower_bound,
+upper_bound, fail_result, cmpptr);


Bernd.


[PATCH, rs6000] Fix PR83399, ICE During LRA with 2-op rtl pattern for lvx instruction

2018-01-09 Thread Peter Bergner
PR83399 shows a problem where we emit an altivec load using a builtin
that forces us to use a specific altivec load pattern.  The generated
rtl pattern has a use of sfp (frame pointer) and during LRA, we eliminate
it's use to the sp (lra-eliminations.c:process_insn_for_elimination).
During this process, we re-recog the insn and end up matching a different
vsx pattern, because it exists earlier in the machine description file.
That vsx pattern uses a "Z" constraint for its address operand, which will
not accept the "special" altivec address we have, but the memory_operand
predicate the pattern uses allows it.  The recog'ing to a different pattern
than we want, causes us to ICE later on.

The solution here is to tighten the predicate used for the address in the
vsx pattern to use the indexed_or_indirect_operand instead, which will
reject the altivec address our rtl pattern has.

Once this is fixed, we end up hitting another issue in print_operand when
outputing altivec addresses when using -mvsx.  This was fixed by allowing
both ALTIVEC or VSX VECTOR MEMs.

This passed bootstrap and regtesting on powerpc64le-linux with no regressions.
Ok for trunk?

Is this ok for the open release branches too, once testing has completed there?

Peter


gcc/
PR target/83399
* config/rs6000/rs6000.c (print_operand): Use
VECTOR_MEM_ALTIVEC_OR_VSX_P.
* config/rs6000/vsx.md (*vsx_le_perm_load_): Use
indexed_or_indirect_operand predicate.
(*vsx_le_perm_load_): Likewise.
(*vsx_le_perm_load_v8hi): Likewise.
(*vsx_le_perm_load_v8hi): Likewise.
(*vsx_le_perm_load_v16qi): Likewise.
(*vsx_le_perm_store_): Likewise in pattern and splitters.
(*vsx_le_perm_store_): Likewise.
(*vsx_le_perm_store_v8hi): Likewise.
(*vsx_le_perm_store_v16qi): Likewise.

gcc/testsuite/
PR target/83399
* gcc.target/powerpc/pr83399.c: New test.

Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 256351)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -21671,7 +21671,7 @@ print_operand (FILE *file, rtx x, int co
 
tmp = XEXP (x, 0);
 
-   if (VECTOR_MEM_ALTIVEC_P (GET_MODE (x))
+   if (VECTOR_MEM_ALTIVEC_OR_VSX_P (GET_MODE (x))
&& GET_CODE (tmp) == AND
&& GET_CODE (XEXP (tmp, 1)) == CONST_INT
&& INTVAL (XEXP (tmp, 1)) == -16)
Index: gcc/config/rs6000/vsx.md
===
--- gcc/config/rs6000/vsx.md(revision 256351)
+++ gcc/config/rs6000/vsx.md(working copy)
@@ -430,7 +430,7 @@ (define_c_enum "unspec"
 ;; VSX moves so they match first.
 (define_insn_and_split "*vsx_le_perm_load_"
   [(set (match_operand:VSX_D 0 "vsx_register_operand" "=")
-(match_operand:VSX_D 1 "memory_operand" "Z"))]
+(match_operand:VSX_D 1 "indexed_or_indirect_operand" "Z"))]
   "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR"
   "#"
   "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR"
@@ -453,7 +453,7 @@ (define_insn_and_split "*vsx_le_perm_loa
 
 (define_insn_and_split "*vsx_le_perm_load_"
   [(set (match_operand:VSX_W 0 "vsx_register_operand" "=")
-(match_operand:VSX_W 1 "memory_operand" "Z"))]
+(match_operand:VSX_W 1 "indexed_or_indirect_operand" "Z"))]
   "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR"
   "#"
   "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR"
@@ -478,7 +478,7 @@ (define_insn_and_split "*vsx_le_perm_loa
 
 (define_insn_and_split "*vsx_le_perm_load_v8hi"
   [(set (match_operand:V8HI 0 "vsx_register_operand" "=wa")
-(match_operand:V8HI 1 "memory_operand" "Z"))]
+(match_operand:V8HI 1 "indexed_or_indirect_operand" "Z"))]
   "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR"
   "#"
   "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR"
@@ -507,7 +507,7 @@ (define_insn_and_split "*vsx_le_perm_loa
 
 (define_insn_and_split "*vsx_le_perm_load_v16qi"
   [(set (match_operand:V16QI 0 "vsx_register_operand" "=wa")
-(match_operand:V16QI 1 "memory_operand" "Z"))]
+(match_operand:V16QI 1 "indexed_or_indirect_operand" "Z"))]
   "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR"
   "#"
   "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR"
@@ -543,7 +543,7 @@ (define_insn_and_split "*vsx_le_perm_loa
(set_attr "length" "8")])
 
 (define_insn "*vsx_le_perm_store_"
-  [(set (match_operand:VSX_D 0 "memory_operand" "=Z")
+  [(set (match_operand:VSX_D 0 "indexed_or_indirect_operand" "=Z")
 (match_operand:VSX_D 1 "vsx_register_operand" "+"))]
   "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR"
   "#"
@@ -551,7 +551,7 @@ (define_insn "*vsx_le_perm_store_"
(set_attr "length" "12")])
 
 (define_split
-  [(set (match_operand:VSX_D 0 "memory_operand" "")
+  [(set (match_operand:VSX_D 0 "indexed_or_indirect_operand" "")
 (match_operand:VSX_D 1 "vsx

RE: [PATCH 1/5][AArch64] Crypto command line split

2018-01-09 Thread Michael Collison
I used a generic statement that applied to all five patches. The patch was 
bootstrapped and the test suite executed along with the other patches together. 
As you correctly point out there are no new instruction, but backward 
compatibility was tested as existing patterns had their pattern conditional 
statement changed from TARGET_CRYPTO to TARGET_AES.

-Original Message-
From: James Greenhalgh [mailto:james.greenha...@arm.com] 
Sent: Tuesday, January 9, 2018 10:44 AM
To: Michael Collison 
Cc: GCC Patches ; nd 
Subject: Re: [PATCH 1/5][AArch64] Crypto command line split

On Wed, Jan 03, 2018 at 05:21:27PM +, Michael Collison wrote:
> Hi all,
> 
> This patch adds two new command line options for the legacy 
> cryptographic extensions AES (+aes) and SHA-1/SHA-2 (+sha2). Backward 
> compatibility is retained by modifying the +crypto feature modifier to enable 
> +aes and +sha2.
> 
> Bootstrapped on aarch64-none-elf. Tested with new binutils and 
> verified all instructions assembly correctly.

I'm a bit confused by this testing statement. aarch64-none-elf is not a 
bootstrap target. Was this bootstrapped or only tested with a cross-compiler 
(or both)? Was the testsuite also run? I don't see any new instructions in this 
patch that would need a new binutils.

The patch is OK, but please clarify how it has been tested.

Thanks,
James



RE: [PATCH 5/5][AArch64] fp16fml support

2018-01-09 Thread Michael Collison
Patch updated per Richard's comments. Ok for trunk?

-Original Message-
From: Richard Sandiford [mailto:richard.sandif...@linaro.org] 
Sent: Thursday, January 4, 2018 8:02 AM
To: Michael Collison 
Cc: GCC Patches ; nd 
Subject: Re: [PATCH 5/5][AArch64] fp16fml support

Hi Michael,

Not a review of the full patch, just a comment about the patterns:

Michael Collison  writes:
> +(define_expand "aarch64_fmll_lane_lowv2sf"
> +  [(set (match_operand:V2SF 0 "register_operand" "")
> + (unspec:V2SF [(match_operand:V2SF 1 "register_operand" "")
> +(match_operand:V4HF 2 "register_operand" "")
> +(match_operand:V4HF 3 "register_operand" "")
> +(match_operand:SI 4 "aarch64_imm2" "")]
> +  VFMLA16_LOW))]
> +  "TARGET_F16FML"
> +{
> +rtx p1 = aarch64_simd_vect_par_cnst_half (V4HFmode,
> +   GET_MODE_NUNITS (V4HFmode),
> +   false);
> +rtx lane = GEN_INT (ENDIAN_LANE_N (GET_MODE_NUNITS (SImode), INTVAL 
> (operands[4])));

Please use the newly-introduced aarch64_endian_lane_rtx for this.

GET_MODE_NUNITS (SImode) doesn't seem right though, since that's always 1.
Should it be using V4HFmode instead?

Same for the other patterns.

Thanks,
Richard


fp16fml_up_v2.patch
Description: fp16fml_up_v2.patch


Re: [PATCH][RFC] Radically simplify emission of balanced tree for switch statements.

2018-01-09 Thread Jeff Law
On 01/09/2018 07:43 AM, Martin Liška wrote:
> On 09/20/2017 05:00 PM, Jeff Law wrote:
>> On 09/20/2017 01:24 AM, Martin Liška wrote:
>>
>>>
>>> Hello.
>>>
>>> Thank you Jeff for very verbose explanation what's happening. I'm planning 
>>> to do
>>> follow-up of this patch that will include clustering for bit-tests and jump 
>>> tables.
>>> Maybe that will make aforementioned issues even more difficult, but we'll 
>>> see.
>> FWIW, the DOM changes to simplify the conditionals seem to help both
>> cases, trigger reasonably consistently in a bootstrap and for some
>> subset of the triggers actually result in transformations that allow
>> other passes to do a better job in the common (-O2) case.  So my
>> inclination is to polish them a bit further get them on the trunk.
>>
>> My recommendation is to ignore the two regressions for now and focus on
>> the cleanups you're trying to do.
>>
>> jeff
>>
> 
> Hello.
> 
> Some time ago I've decided that I'll make patch submission of switch 
> clustering
> in next stage1. However, this patch can be applied as is in this stage3. Would
> it be possible or is it too late?
I'll let Richi make the call here.  FWIW, the DOM changes to avoid the
two missed-optimization regressions you ran into are on the trunk, so
that's no longer a blocking issue.

jeff


[PATCH v4 of 03/14] C++: add location_t wrapper nodes during parsing (minimal impl)

2018-01-09 Thread David Malcolm
On Tue, 2018-01-09 at 15:39 +0100, Jakub Jelinek wrote:
> On Tue, Jan 09, 2018 at 09:36:58AM -0500, Jason Merrill wrote:
> > On 01/09/2018 06:53 AM, David Malcolm wrote:
> > > +case NON_LVALUE_EXPR:
> > > +case VIEW_CONVERT_EXPR:
> > > + {
> > > +   /* Handle location wrappers by substituting the
> > > wrapped node
> > > +  first,*then*  reusing the resulting type.  Doing
> > > the type
> > > +  first ensures that we handle template parameters
> > > and
> > > +  parameter pack expansions.  */
> > > +   gcc_assert (location_wrapper_p (t));
> > > +   tree op0 = tsubst_copy (TREE_OPERAND (t, 0), args,
> > > complain, in_decl);
> > > +   return build1 (code, TREE_TYPE (op0), op0);
> > > + }
> > 
> > Doesn't this lose the location information?
> 
> And the public_flag...
> 
>   Jakub

Ooops, yes.  Thanks.  I'm not convinced we always retain location
information in the tsubst_* calls: although we override input_location
within them, I see lots of pre-existing "build1" calls (as opposed to
"build1_loc"), which as I understand it set any EXPR_LOCATION to be
UNKNOWN_LOCATION.  On the other hand, even if I'm correct, that feels
like a pre-existing issue and orthogonal to this patch kit.

Here's an updated version of the patch which uses maybe_wrap_with_location
in tsubst_copy and tsubst_copy_and_build when copying the wrappers
(thus setting the flag, but hiding it as an implementation detail
within maybe_wrap_with_location).

I also updated the assertion as per Jason's other comment
(re NON_LVALUE_EXPR when args is NULL_TREE).

Successfully bootstrapped®rtested on x86_64-pc-linux-gnu, as
part of the kit.
Also, manually tested with "make s-selftest-c++" (since we don't run
the selftests for cc1plus by default).

OK for trunk in conjunction with the rest of the kit?

gcc/cp/ChangeLog:
* cp-gimplify.c (cp_fold): Remove the early bailout when
processing_template_decl.
* cp-lang.c (selftest::run_cp_tests): Call
selftest::cp_pt_c_tests.
* cp-tree.h (selftest::cp_pt_c_tests): New decl.
* mangle.c (write_expression): Skip location wrapper nodes.
* parser.c (cp_parser_postfix_expression): Call
maybe_add_location_wrapper on the result for RID_TYPEID. Pass true
for new "wrap_locations_p" param of
cp_parser_parenthesized_expression_list.
(cp_parser_parenthesized_expression_list): Add "wrap_locations_p"
param, defaulting to false.  Convert "expr" to a cp_expr, and call
maybe_add_location_wrapper on it when wrap_locations_p is true.
(cp_parser_unary_expression): Call maybe_add_location_wrapper on
the result for RID_ALIGNOF and RID_SIZEOF.
(cp_parser_builtin_offsetof): Likewise.
* pt.c: Include "selftest.h".
(tsubst_copy): Handle location wrappers.
(tsubst_copy_and_build): Likewise.
(build_non_dependent_expr): Likewise.
(selftest::test_build_non_dependent_expr): New function.
(selftest::cp_pt_c_tests): New function.
---
 gcc/cp/cp-gimplify.c |  5 ++--
 gcc/cp/cp-lang.c |  1 +
 gcc/cp/cp-tree.h | 10 +++
 gcc/cp/mangle.c  |  1 +
 gcc/cp/parser.c  | 25 
 gcc/cp/pt.c  | 83 +++-
 6 files changed, 109 insertions(+), 16 deletions(-)

diff --git a/gcc/cp/cp-gimplify.c b/gcc/cp/cp-gimplify.c
index eda493a..e97247c 100644
--- a/gcc/cp/cp-gimplify.c
+++ b/gcc/cp/cp-gimplify.c
@@ -2064,7 +2064,7 @@ clear_fold_cache (void)
 
 /*  This function tries to fold an expression X.
 To avoid combinatorial explosion, folding results are kept in fold_cache.
-If we are processing a template or X is invalid, we don't fold at all.
+If X is invalid, we don't fold at all.
 For performance reasons we don't cache expressions representing a
 declaration or constant.
 Function returns X or its folded variant.  */
@@ -2081,8 +2081,7 @@ cp_fold (tree x)
   if (!x || x == error_mark_node)
 return x;
 
-  if (processing_template_decl
-  || (EXPR_P (x) && (!TREE_TYPE (x) || TREE_TYPE (x) == error_mark_node)))
+  if (EXPR_P (x) && (!TREE_TYPE (x) || TREE_TYPE (x) == error_mark_node))
 return x;
 
   /* Don't bother to cache DECLs or constants.  */
diff --git a/gcc/cp/cp-lang.c b/gcc/cp/cp-lang.c
index 9992bc2..1ce986e 100644
--- a/gcc/cp/cp-lang.c
+++ b/gcc/cp/cp-lang.c
@@ -247,6 +247,7 @@ run_cp_tests (void)
   c_family_tests ();
 
   /* Additional C++-specific tests.  */
+  cp_pt_c_tests ();
 }
 
 } // namespace selftest
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index e0ce14e..901493f 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -7464,6 +7464,16 @@ named_decl_hash::equal (const value_type existing, 
compare_type candidate)
   return candidate == name;
 }
 
+#if CHECKING_P
+namespace selftest {
+  extern void run_cp_tests (void);
+
+  /* Declarations for specific families of tests within cp,
+ by source file, in alphabetica

Use poly_int rtx accessors instead of hwi accessors

2018-01-09 Thread Richard Sandiford
This patch generalises various places that used hwi rtx accessors
so that they can handle poly_ints instead.  Earlier patches did
this while updating interfaces; this patch just mops up some
left-over pieces that weren't necessary to make things compile,
but that still make sense.

In many cases these changes are by inspection rather than because
something had shown them to be necessary.

Sorry for not posting this earlier.  I kept holding it back in case
more examples showed up.

Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64el-linux-gnu.
Also tested by comparing the before-and-after assembly output for at
least one target per CPU directory.  OK to install?

Richard


2018-01-09  Richard Sandiford  

gcc/
* poly-int.h (can_div_trunc_p): Add new overload in which all values
are poly_ints.
* alias.c (get_addr): Extend CONST_INT handling to poly_int_rtx_p.
(memrefs_conflict_p): Likewise.
(init_alias_analysis): Likewise.
* cfgexpand.c (expand_debug_expr): Likewise.
* combine.c (combine_simplify_rtx, force_int_to_mode): Likewise.
* cse.c (fold_rtx): Likewise.
* explow.c (adjust_stack, anti_adjust_stack): Likewise.
* expr.c (emit_block_move_hints): Likewise.
(clear_storage_hints, push_block, emit_push_insn): Likewise.
(store_expr_with_bounds, reduce_to_bit_field_precision): Likewise.
(emit_group_load_1): Use rtx_to_poly_int64 for group offsets.
(emit_group_store): Likewise.
(find_args_size_adjust): Use strip_offset.  Use rtx_to_poly_int64
to read the PRE/POST_MODIFY increment.
* calls.c (store_one_arg): Use strip_offset.
* rtlanal.c (rtx_addr_can_trap_p_1): Extend CONST_INT handling to
poly_int_rtx_p.
(set_noop_p): Use rtx_to_poly_int64 for the elements selected
by a VEC_SELECT.
* simplify-rtx.c (avoid_constant_pool_reference): Use strip_offset.
(simplify_binary_operation_1): Extend CONST_INT handling to
poly_int_rtx_p.
(simplify_plus_minus): Likewise.  Remove use of neg_const_int and
instead use HWI_COMPUTABLE_MODE_P and coeffs_in_range_p to test
whether the negation gives a valid poly_int64.
* var-tracking.c (compute_cfa_pointer): Take a poly_int64 rather
than a HOST_WIDE_INT.
(hard_frame_pointer_adjustment): Change from HOST_WIDE_INT to
poly_int64.
(adjust_mems, add_stores): Update accodingly.
(vt_canonicalize_addr): Track polynomial offsets.
(emit_note_insn_var_location): Likewise.
(vt_add_function_parameter): Likewise.
(vt_initialize): Likewise.

Index: gcc/poly-int.h
===
--- gcc/poly-int.h  2018-01-09 18:26:49.634702370 +
+++ gcc/poly-int.h  2018-01-09 18:26:49.870692974 +
@@ -2346,6 +2346,27 @@ can_div_trunc_p (const poly_int_pod
+inline bool
+can_div_trunc_p (const poly_int_pod &a,
+const poly_int_pod &b,
+poly_int_pod *quotient)
+{
+  if (b.is_constant ())
+return can_div_trunc_p (a, b.coeffs[0], quotient);
+  if (!can_div_trunc_p (a, b, "ient->coeffs[0]))
+return false;
+  for (unsigned int i = 1; i < N; ++i)
+quotient->coeffs[i] = 0;
+  return true;
+}
+
 /* Return true if there is some constant Q and polynomial r such that:
 
  (1) a = b * Q + r
Index: gcc/alias.c
===
--- gcc/alias.c 2018-01-09 18:26:49.634702370 +
+++ gcc/alias.c 2018-01-09 18:26:49.865693173 +
@@ -2247,9 +2247,10 @@ get_addr (rtx x)
  rtx op0 = get_addr (XEXP (x, 0));
  if (op0 != XEXP (x, 0))
{
+ poly_int64 c;
  if (GET_CODE (x) == PLUS
- && GET_CODE (XEXP (x, 1)) == CONST_INT)
-   return plus_constant (GET_MODE (x), op0, INTVAL (XEXP (x, 1)));
+ && poly_int_rtx_p (XEXP (x, 1), &c))
+   return plus_constant (GET_MODE (x), op0, c);
  return simplify_gen_binary (GET_CODE (x), GET_MODE (x),
  op0, XEXP (x, 1));
}
@@ -2536,10 +2537,11 @@ memrefs_conflict_p (poly_int64 xsize, rt
return offset_overlap_p (c, xsize, ysize);
 
  /* Can't properly adjust our sizes.  */
- if (!CONST_INT_P (x1)
- || !can_div_trunc_p (xsize, INTVAL (x1), &xsize)
- || !can_div_trunc_p (ysize, INTVAL (x1), &ysize)
- || !can_div_trunc_p (c, INTVAL (x1), &c))
+ poly_int64 c1;
+ if (!poly_int_rtx_p (x1, &c1)
+ || !can_div_trunc_p (xsize, c1, &xsize)
+ || !can_div_trunc_p (ysize, c1, &ysize)
+ || !can_div_trunc_p (c, c1, &c))
return -1;
  return memrefs_conflict_p (xsize, x0, ysize, y0, c);
}
@@ -3375,6 +3377,7 @@ init_alias_analysis (void)
   

Use poly_int tree accessors

2018-01-09 Thread Richard Sandiford
This patch generalises various places that used hwi tree accessors
so that they can handle poly_ints instead.  Earlier patches did
this while updating interfaces; this patch just mops up some
left-over pieces that weren't necessary to make things compile,
but that still make sense.

In many cases these changes are by inspection rather than because
something had shown them to be necessary.

I think the alias.c part is a minor bug fix: previously we used
fits_uhwi_p for a signed HOST_WIDE_INT (which the caller does
treat as signed rather than unsigned).  We also checked whether
each individual offset overflowed but didn't check whether the
sum did.

Sorry for not posting this earlier.  I kept holding it back in case
more examples showed up.

Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
Also tested by comparing the before-and-after assembly output for at
least one target per CPU directory.  OK to install?

Richard


2018-01-09  Richard Sandiford  

gcc/
* alias.c (adjust_offset_for_component_ref): Use poly_int_tree_p
and wi::to_poly_offset.  Add the current offset and then check
whether the sum fits, rather than using an unchecked addition of
a checked term.  Check for a shwi rather than a uhwi.
* expr.c (get_bit_range): Use tree_to_poly_uint64.
(store_constructor): Use poly_int_tree_p.
(expand_expr_real_1): Likewise.
* function.c (assign_temp): Likewise.
* fold-const.c (const_binop): Use poly_int_tree_p and
wi::to_poly_offset.
(fold_indirect_ref_1): Likewise.  Use known_in_range_p to test
for an in-range vector access and multiple_p to attempt an exact
division.
* gimplify.c (gimple_add_tmp_var_fn): Use tree_fits_poly_uint64_p.
(gimple_add_tmp_var): Likewise.
* ipa-icf-gimple.c (func_checker::compare_operand): Use
to_poly_offset for MEM offsets.
* ipa-icf.c (sem_variable::equals): Likewise.
* stor-layout.c (compute_record_mode): Use poly_int_tree_p.
* tree-vectorizer.c (get_vec_alignment_for_array_type): Likewise.
* tree-predcom.c (aff_combination_dr_offset): Use wi::to_poly_widest
rather than wi::to_widest for DR_INITs.
* tree-ssa-sccvn.c (ao_ref_init_from_vn_reference): Use
wi::to_poly_offset for BIT_FIELD_REF offsets.
(vn_reference_maybe_forwprop_address): Use poly_int_tree_p and
wi::to_poly_offset.
* tree-vect-data-refs.c (vect_find_same_alignment_drs): Use
wi::to_poly_offset for DR_INIT.
(vect_analyze_data_ref_accesses): Require both DR_INITs to be
INTEGER_CSTs.
(vect_analyze_group_access_1): Note that here.
* var-tracking.c (emit_note_insn_var_location): Use
tree_to_poly_uint64.

Index: gcc/alias.c
===
--- gcc/alias.c 2018-01-09 18:26:49.865693173 +
+++ gcc/alias.c 2018-01-09 18:37:03.806269394 +
@@ -2685,22 +2685,22 @@ adjust_offset_for_component_ref (tree x,
 {
   tree xoffset = component_ref_field_offset (x);
   tree field = TREE_OPERAND (x, 1);
-  if (TREE_CODE (xoffset) != INTEGER_CST)
+  if (!poly_int_tree_p (xoffset))
{
  *known_p = false;
  return;
}
 
-  offset_int woffset
-   = (wi::to_offset (xoffset)
+  poly_offset_int woffset
+   = (wi::to_poly_offset (xoffset)
   + (wi::to_offset (DECL_FIELD_BIT_OFFSET (field))
- >> LOG2_BITS_PER_UNIT));
-  if (!wi::fits_uhwi_p (woffset))
+ >> LOG2_BITS_PER_UNIT)
+  + *offset);
+  if (!woffset.to_shwi (offset))
{
  *known_p = false;
  return;
}
-  *offset += woffset.to_uhwi ();
 
   x = TREE_OPERAND (x, 0);
 }
Index: gcc/expr.c
===
--- gcc/expr.c  2018-01-09 18:26:49.869693013 +
+++ gcc/expr.c  2018-01-09 18:37:03.807269355 +
@@ -4911,7 +4911,7 @@ get_bit_range (poly_uint64_pod *bitstart
   else
 *bitstart = *bitpos - bitoffset;
 
-  *bitend = *bitstart + tree_to_uhwi (DECL_SIZE (repr)) - 1;
+  *bitend = *bitstart + tree_to_poly_uint64 (DECL_SIZE (repr)) - 1;
 }
 
 /* Returns true if ADDR is an ADDR_EXPR of a DECL that does not reside
@@ -6518,12 +6518,10 @@ store_constructor (tree exp, rtx target,
  continue;
 
mode = TYPE_MODE (elttype);
-   if (mode == BLKmode)
- bitsize = (tree_fits_uhwi_p (TYPE_SIZE (elttype))
-? tree_to_uhwi (TYPE_SIZE (elttype))
-: -1);
-   else
+   if (mode != BLKmode)
  bitsize = GET_MODE_BITSIZE (mode);
+   else if (!poly_int_tree_p (TYPE_SIZE (elttype), &bitsize))
+ bitsize = -1;
 
if (index != NULL_TREE && TREE_CODE (index) == RANGE_EXPR)
  {
@@ -10289,11 +10287,11 @@ expand_ex

Re: Add an "early rematerialisation" pass

2018-01-09 Thread Jeff Law
On 01/09/2018 08:34 AM, Richard Sandiford wrote:
> Possible ping: wasn't sure whether this one needed more work or whether
> it was OK to go in.  I've attached the patch with the improved comment
> above early_remat::emit_copy_before.
> 
You answered all the questions I had.  I should have explicitly ack'd at
that point.  Sorry.

OK for the trunk,
Jeff



Re: Smart pointer pretty printers

2018-01-09 Thread Jonathan Wakely

On 09/01/18 15:02 +, Jonathan Wakely wrote:

On 09/01/18 14:59 +, Jonathan Wakely wrote:

On 04/01/18 11:22 +0100, Juraj Oršulić wrote:

Hi Jonathan (and the libstdc++ list). Can we revive this? I sent the
patches for improving the smart pointer pretty printers in March. They
haven't been reviewed.


Thanks for the reminder. I'm testing the attached patch, which has
been rebased on trunk, but I'm getting test failures for the
shared_ptr printers.

I can't see any difference between the expected output and what GDB
prints, so I think it's some DejaGnu or Tcl/Tk oddity. The actual
printer seems to work OK.


No, the problem is that I'm just misreading the results.

I'll finish testing and commit this soon.


I changed the code slightly, so it still works for GDB versions older
than 7.5, which don't have the fix to allow children() to return a
list. There's no easy way to tell if the GDB interpreting the code has
the fix or not.

Thanks very much for the patches, and sorry for the delay reviewing
them.

Tested x86_64-linux, committed to trunk.


commit 4c65ef4dae877443c5372ec504e3881a3922dc18
Author: Jonathan Wakely 
Date:   Tue Jan 9 15:36:13 2018 +

PR libstdc++/59253 Improve pretty printers for smart pointers

PR libstdc++/59253 (partial)
* python/libstdcxx/v6/printers.py (SmartPtrIterator): Common iterator
type for pointer stored by shared_ptr, weak_ptr and unique_ptr.
(SharedPointerPrinter, UniquePointerPrinter): Treat stored values as
children.
* testsuite/libstdc++-prettyprinters/cxx11.cc: Update expected output
of unique_ptr printer.
* testsuite/libstdc++-prettyprinters/shared_ptr.cc: Update expected
output of shared_ptr printer.

diff --git a/libstdc++-v3/python/libstdcxx/v6/printers.py b/libstdc++-v3/python/libstdcxx/v6/printers.py
index 6da8508d944..e9f7359d63f 100644
--- a/libstdc++-v3/python/libstdcxx/v6/printers.py
+++ b/libstdc++-v3/python/libstdcxx/v6/printers.py
@@ -114,12 +114,31 @@ def strip_versioned_namespace(typename):
 return typename.replace(_versioned_namespace, '')
 return typename
 
+class SmartPtrIterator(Iterator):
+"An iterator for smart pointer types with a single 'child' value"
+
+def __init__(self, val):
+self.val = val
+
+def __iter__(self):
+return self
+
+def __next__(self):
+if self.val is None:
+raise StopIteration
+self.val, val = None, self.val
+return ('get()', val)
+
 class SharedPointerPrinter:
 "Print a shared_ptr or weak_ptr"
 
 def __init__ (self, typename, val):
 self.typename = strip_versioned_namespace(typename)
 self.val = val
+self.pointer = val['_M_ptr']
+
+def children (self):
+return SmartPtrIterator(self.pointer)
 
 def to_string (self):
 state = 'empty'
@@ -128,27 +147,29 @@ class SharedPointerPrinter:
 usecount = refcounts['_M_use_count']
 weakcount = refcounts['_M_weak_count']
 if usecount == 0:
-state = 'expired, weak %d' % weakcount
+state = 'expired, weak count %d' % weakcount
 else:
-state = 'count %d, weak %d' % (usecount, weakcount - 1)
-return '%s (%s) %s' % (self.typename, state, self.val['_M_ptr'])
+state = 'use count %d, weak count %d' % (usecount, weakcount - 1)
+return '%s<%s> (%s)' % (self.typename, str(self.pointer.type.target().strip_typedefs()), state)
 
 class UniquePointerPrinter:
 "Print a unique_ptr"
 
 def __init__ (self, typename, val):
 self.val = val
+impl_type = val.type.fields()[0].type.tag
+if is_specialization_of(impl_type, '__uniq_ptr_impl'): # New implementation
+self.pointer = val['_M_t']['_M_t']['_M_head_impl']
+elif is_specialization_of(impl_type, 'tuple'):
+self.pointer = val['_M_t']['_M_head_impl']
+else:
+raise ValueError("Unsupported implementation for unique_ptr: %s" % impl_type)
+
+def children (self):
+return SmartPtrIterator(self.pointer)
 
 def to_string (self):
-impl_type = self.val.type.fields()[0].type.tag
-if is_specialization_of(impl_type, '__uniq_ptr_impl'): # New implementation
-v = self.val['_M_t']['_M_t']['_M_head_impl']
-elif is_specialization_of(impl_type, 'tuple'):
-v = self.val['_M_t']['_M_head_impl']
-else:
-raise ValueError("Unsupported implementation for unique_ptr: %s" % self.val.type.fields()[0].type.tag)
-return 'std::unique_ptr<%s> containing %s' % (str(v.type.target()),
-  str(v))
+return ('std::unique_ptr<%s>' % (str(self.pointer.type.target(
 
 def get_value_from_aligned_membuf(buf, valtype):
 """Returns the value held in a __gnu_cxx::__aligned_membuf."""
diff --g

Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre

2018-01-09 Thread Jeff Law
On 01/07/2018 05:29 PM, H.J. Lu wrote:
> On Sun, Jan 7, 2018 at 3:36 PM, Jeff Law  wrote:
>> On 01/07/2018 03:58 PM, H.J. Lu wrote:
>>> This set of patches for GCC 8 mitigates variant #2 of the speculative 
>>> execution
>>> vulnerabilities on x86 processors identified by CVE-2017-5715, aka Spectre. 
>>>  They
>>> convert indirect branches to call and return thunks to avoid speculative 
>>> execution
>>> via indirect call and jmp.
>>>
>>>
>>> H.J. Lu (5):
>>>   x86: Add -mindirect-branch=
>>>   x86: Add -mindirect-branch-loop=
>>>   x86: Add -mfunction-return=
>>>   x86: Add -mindirect-branch-register
>>>   x86: Add 'V' register operand modifier
>>>
>>>  gcc/config/i386/constraints.md |  12 +-
>>>  gcc/config/i386/i386-opts.h|  14 +
>>>  gcc/config/i386/i386-protos.h  |   2 +
>>>  gcc/config/i386/i386.c | 655 
>>> -
>>>  gcc/config/i386/i386.h |  10 +
>>>  gcc/config/i386/i386.md|  51 +-
>>>  gcc/config/i386/i386.opt   |  45 ++
>>>  gcc/config/i386/predicates.md  |  21 +-
>>>  gcc/doc/extend.texi|  22 +
>>>  gcc/doc/invoke.texi|  37 +-
>> My fundamental problem with this patchkit is that it is 100% x86/x86_64
>> specific.
>>
>> ISTM we want a target independent mechanism (ie, new standard patterns,
>> options, etc) then an x86/x86_64 implementation using that target
>> independent framework (ie, the actual implementation of those new
>> standard patterns).
>>
> 
> My patch set is implemented with some constraints:
> 
> 1. They need to be backportable to GCC 7/6/5/4.x.
> 2. They should work with all compiler optimizations.
> 3. They need to generate code sequences are x86 specific, which can't be
> changed in any shape or form.  And the generated codes are quite opposite
> to what a good optimizing compiler should generate.
> 
> Given that these conditions, I kept existing indirect call, jump and
> return patterns.
> I generated different code sequences for these patterns during the final pass
> when generating assembly codes.
> 
> I guess that I could add a late target independent RTL pass to convert
> indirect call, jump and return patterns to something else.  But I am not sure
> if that is what you are looking for.
I don't see how those constraints are incompatible with doing most of
this work at a higher level in the compiler.  You just surround the
resulting RTL bits with appropriate barriers to prevent them from
getting mucked up by the optimizers.  Actually I think you're going to
need one barrier in the middle of the sequence to keep bbro at bay
within the sequence as a whole.

It's really just a couple of new primitives to emit a jump as a call and
one to slam in a new return address.  Given those I think you can do the
entire implementation as RTL at expansion time and you've got a damn
good shot at protecting most architectures from these kinds of attacks.

jeff


Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre

2018-01-09 Thread Jeff Law
On 01/08/2018 03:01 AM, Martin Liška wrote:
> On 01/08/2018 01:29 AM, H.J. Lu wrote:
>> 1. They need to be backportable to GCC 7/6/5/4.x.
> 
> I must admit this is very important constrain. To be honest, we're planning
> to backport the patchset to GCC 4.3.
Red Hat would likely be backporting a ways as well.  But I don't think
doing this at expand rather than in the targets would affect
backportability all that much, nor do I think that backporting to 10
year old compilers like SuSE and Red Hat do should drive the design
decisions :-)

jeff


Re: [PATCH 1/5] x86: Add -mindirect-branch=

2018-01-09 Thread Jeff Law
On 01/08/2018 03:10 AM, Martin Liška wrote:
> On 01/07/2018 11:59 PM, H.J. Lu wrote:
>> +static void
>> +output_indirect_thunk_function (bool need_bnd_p, int regno)
>> +{
>> +  char name[32];
>> +  tree decl;
>> +
>> +  /* Create __x86_indirect_thunk/__x86_indirect_thunk_bnd.  */
>> +  indirect_thunk_name (name, regno, need_bnd_p);
>> +  decl = build_decl (BUILTINS_LOCATION, FUNCTION_DECL,
>> + get_identifier (name),
>> + build_function_type_list (void_type_node, NULL_TREE));
>> +  DECL_RESULT (decl) = build_decl (BUILTINS_LOCATION, RESULT_DECL,
>> +   NULL_TREE, void_type_node);
>> +  TREE_PUBLIC (decl) = 1;
>> +  TREE_STATIC (decl) = 1;
>> +  DECL_IGNORED_P (decl) = 1;
>> +
>> +#if TARGET_MACHO
>> +  if (TARGET_MACHO)
>> +{
>> +  switch_to_section (darwin_sections[picbase_thunk_section]);
>> +  fputs ("\t.weak_definition\t", asm_out_file);
>> +  assemble_name (asm_out_file, name);
>> +  fputs ("\n\t.private_extern\t", asm_out_file);
>> +  assemble_name (asm_out_file, name);
>> +  putc ('\n', asm_out_file);
>> +  ASM_OUTPUT_LABEL (asm_out_file, name);
>> +  DECL_WEAK (decl) = 1;
>> +}
>> +  else
>> +#endif
>> +if (USE_HIDDEN_LINKONCE)
>> +  {
>> +cgraph_node::create (decl)->set_comdat_group (DECL_ASSEMBLER_NAME 
>> (decl));
>> +
>> +targetm.asm_out.unique_section (decl, 0);
>> +switch_to_section (get_named_section (decl, NULL, 0));
>> +
>> +targetm.asm_out.globalize_label (asm_out_file, name);
>> +fputs ("\t.hidden\t", asm_out_file);
>> +assemble_name (asm_out_file, name);
>> +putc ('\n', asm_out_file);
>> +ASM_DECLARE_FUNCTION_NAME (asm_out_file, name, decl);
>> +  }
>> +else
>> +  {
>> +switch_to_section (text_section);
>> +ASM_OUTPUT_LABEL (asm_out_file, name);
>> +  }
>> +
>> +  DECL_INITIAL (decl) = make_node (BLOCK);
>> +  current_function_decl = decl;
>> +  allocate_struct_function (decl, false);
>> +  init_function_start (decl);
>> +  /* We're about to hide the function body from callees of final_* by
>> + emitting it directly; tell them we're a thunk, if they care.  */
>> +  cfun->is_thunk = true;
>> +  first_function_block_is_cold = false;
>> +  /* Make sure unwind info is emitted for the thunk if needed.  */
>> +  final_start_function (emit_barrier (), asm_out_file, 1);
>> +
>> +  output_indirect_thunk (need_bnd_p, regno);
>> +
>> +  final_end_function ();
>> +  init_insn_lengths ();
>> +  free_after_compilation (cfun);
>> +  set_cfun (NULL);
>> +  current_function_decl = NULL;
>> +}
>> +
> 
> I'm wondering whether thunk creation can be a good target-independent 
> generalization? I guess
> we can emit the function declaration without direct writes to asm_out_file? 
> And the emission
> of function body can be potentially a target hook?
> 
> What about emitting body of the function with RTL instructions instead of 
> direct assembly write?
> My knowledge of RTL is quite small, but maybe it can bring some 
> generalization and reusability
> for other targets?
That's the key point I'm trying to make.  We should be looking at
generalizing this stuff where it makes sense.

jeff


Re: [PATCH 1/5] x86: Add -mindirect-branch=

2018-01-09 Thread H.J. Lu
On Tue, Jan 9, 2018 at 10:55 AM, Jeff Law  wrote:
> On 01/08/2018 03:10 AM, Martin Liška wrote:
>> On 01/07/2018 11:59 PM, H.J. Lu wrote:
>>> +static void
>>> +output_indirect_thunk_function (bool need_bnd_p, int regno)
>>> +{
>>> +  char name[32];
>>> +  tree decl;
>>> +
>>> +  /* Create __x86_indirect_thunk/__x86_indirect_thunk_bnd.  */
>>> +  indirect_thunk_name (name, regno, need_bnd_p);
>>> +  decl = build_decl (BUILTINS_LOCATION, FUNCTION_DECL,
>>> + get_identifier (name),
>>> + build_function_type_list (void_type_node, NULL_TREE));
>>> +  DECL_RESULT (decl) = build_decl (BUILTINS_LOCATION, RESULT_DECL,
>>> +   NULL_TREE, void_type_node);
>>> +  TREE_PUBLIC (decl) = 1;
>>> +  TREE_STATIC (decl) = 1;
>>> +  DECL_IGNORED_P (decl) = 1;
>>> +
>>> +#if TARGET_MACHO
>>> +  if (TARGET_MACHO)
>>> +{
>>> +  switch_to_section (darwin_sections[picbase_thunk_section]);
>>> +  fputs ("\t.weak_definition\t", asm_out_file);
>>> +  assemble_name (asm_out_file, name);
>>> +  fputs ("\n\t.private_extern\t", asm_out_file);
>>> +  assemble_name (asm_out_file, name);
>>> +  putc ('\n', asm_out_file);
>>> +  ASM_OUTPUT_LABEL (asm_out_file, name);
>>> +  DECL_WEAK (decl) = 1;
>>> +}
>>> +  else
>>> +#endif
>>> +if (USE_HIDDEN_LINKONCE)
>>> +  {
>>> +cgraph_node::create (decl)->set_comdat_group (DECL_ASSEMBLER_NAME 
>>> (decl));
>>> +
>>> +targetm.asm_out.unique_section (decl, 0);
>>> +switch_to_section (get_named_section (decl, NULL, 0));
>>> +
>>> +targetm.asm_out.globalize_label (asm_out_file, name);
>>> +fputs ("\t.hidden\t", asm_out_file);
>>> +assemble_name (asm_out_file, name);
>>> +putc ('\n', asm_out_file);
>>> +ASM_DECLARE_FUNCTION_NAME (asm_out_file, name, decl);
>>> +  }
>>> +else
>>> +  {
>>> +switch_to_section (text_section);
>>> +ASM_OUTPUT_LABEL (asm_out_file, name);
>>> +  }
>>> +
>>> +  DECL_INITIAL (decl) = make_node (BLOCK);
>>> +  current_function_decl = decl;
>>> +  allocate_struct_function (decl, false);
>>> +  init_function_start (decl);
>>> +  /* We're about to hide the function body from callees of final_* by
>>> + emitting it directly; tell them we're a thunk, if they care.  */
>>> +  cfun->is_thunk = true;
>>> +  first_function_block_is_cold = false;
>>> +  /* Make sure unwind info is emitted for the thunk if needed.  */
>>> +  final_start_function (emit_barrier (), asm_out_file, 1);
>>> +
>>> +  output_indirect_thunk (need_bnd_p, regno);
>>> +
>>> +  final_end_function ();
>>> +  init_insn_lengths ();
>>> +  free_after_compilation (cfun);
>>> +  set_cfun (NULL);
>>> +  current_function_decl = NULL;
>>> +}
>>> +
>>
>> I'm wondering whether thunk creation can be a good target-independent 
>> generalization? I guess
>> we can emit the function declaration without direct writes to asm_out_file? 
>> And the emission
>> of function body can be potentially a target hook?
>>
>> What about emitting body of the function with RTL instructions instead of 
>> direct assembly write?
>> My knowledge of RTL is quite small, but maybe it can bring some 
>> generalization and reusability
>> for other targets?
> That's the key point I'm trying to make.  We should be looking at
> generalizing this stuff where it makes sense.
>

Thunks are x86 specific and they are created the same way as 32-bit PIC thunks.
I don't see how a target hook is used.

-- 
H.J.


Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre

2018-01-09 Thread H.J. Lu
On Tue, Jan 9, 2018 at 10:52 AM, Jeff Law  wrote:
> On 01/07/2018 05:29 PM, H.J. Lu wrote:
>> On Sun, Jan 7, 2018 at 3:36 PM, Jeff Law  wrote:
>>> On 01/07/2018 03:58 PM, H.J. Lu wrote:
 This set of patches for GCC 8 mitigates variant #2 of the speculative 
 execution
 vulnerabilities on x86 processors identified by CVE-2017-5715, aka 
 Spectre.  They
 convert indirect branches to call and return thunks to avoid speculative 
 execution
 via indirect call and jmp.


 H.J. Lu (5):
   x86: Add -mindirect-branch=
   x86: Add -mindirect-branch-loop=
   x86: Add -mfunction-return=
   x86: Add -mindirect-branch-register
   x86: Add 'V' register operand modifier

  gcc/config/i386/constraints.md |  12 +-
  gcc/config/i386/i386-opts.h|  14 +
  gcc/config/i386/i386-protos.h  |   2 +
  gcc/config/i386/i386.c | 655 
 -
  gcc/config/i386/i386.h |  10 +
  gcc/config/i386/i386.md|  51 +-
  gcc/config/i386/i386.opt   |  45 ++
  gcc/config/i386/predicates.md  |  21 +-
  gcc/doc/extend.texi|  22 +
  gcc/doc/invoke.texi|  37 +-
>>> My fundamental problem with this patchkit is that it is 100% x86/x86_64
>>> specific.
>>>
>>> ISTM we want a target independent mechanism (ie, new standard patterns,
>>> options, etc) then an x86/x86_64 implementation using that target
>>> independent framework (ie, the actual implementation of those new
>>> standard patterns).
>>>
>>
>> My patch set is implemented with some constraints:
>>
>> 1. They need to be backportable to GCC 7/6/5/4.x.
>> 2. They should work with all compiler optimizations.
>> 3. They need to generate code sequences are x86 specific, which can't be
>> changed in any shape or form.  And the generated codes are quite opposite
>> to what a good optimizing compiler should generate.
>>
>> Given that these conditions, I kept existing indirect call, jump and
>> return patterns.
>> I generated different code sequences for these patterns during the final pass
>> when generating assembly codes.
>>
>> I guess that I could add a late target independent RTL pass to convert
>> indirect call, jump and return patterns to something else.  But I am not sure
>> if that is what you are looking for.
> I don't see how those constraints are incompatible with doing most of
> this work at a higher level in the compiler.  You just surround the
> resulting RTL bits with appropriate barriers to prevent them from
> getting mucked up by the optimizers.  Actually I think you're going to
> need one barrier in the middle of the sequence to keep bbro at bay
> within the sequence as a whole.
>
> It's really just a couple of new primitives to emit a jump as a call and
> one to slam in a new return address.  Given those I think you can do the
> entire implementation as RTL at expansion time and you've got a damn
> good shot at protecting most architectures from these kinds of attacks.

Doing this at RTL expansion time may not work well with RTL optimizing
paases nor IRA/LRA.  We may wind up undoing all kinds of optimizations
as well code sequences.


-- 
H.J.


Re: [PATCH v4 of 03/14] C++: add location_t wrapper nodes during parsing (minimal impl)

2018-01-09 Thread Jason Merrill

On 01/09/2018 01:35 PM, David Malcolm wrote:

On Tue, 2018-01-09 at 15:39 +0100, Jakub Jelinek wrote:

On Tue, Jan 09, 2018 at 09:36:58AM -0500, Jason Merrill wrote:

On 01/09/2018 06:53 AM, David Malcolm wrote:

+case NON_LVALUE_EXPR:
+case VIEW_CONVERT_EXPR:
+   {
+ /* Handle location wrappers by substituting the
wrapped node
+first,*then*  reusing the resulting type.  Doing
the type
+first ensures that we handle template parameters
and
+parameter pack expansions.  */
+ gcc_assert (location_wrapper_p (t));
+ tree op0 = tsubst_copy (TREE_OPERAND (t, 0), args,
complain, in_decl);
+ return build1 (code, TREE_TYPE (op0), op0);
+   }


Doesn't this lose the location information?


And the public_flag...

Jakub


Ooops, yes.  Thanks.  I'm not convinced we always retain location
information in the tsubst_* calls: although we override input_location
within them, I see lots of pre-existing "build1" calls (as opposed to
"build1_loc"), which as I understand it set any EXPR_LOCATION to be
UNKNOWN_LOCATION.  On the other hand, even if I'm correct, that feels
like a pre-existing issue and orthogonal to this patch kit.

Here's an updated version of the patch which uses maybe_wrap_with_location
in tsubst_copy and tsubst_copy_and_build when copying the wrappers
(thus setting the flag, but hiding it as an implementation detail
within maybe_wrap_with_location).

I also updated the assertion as per Jason's other comment
(re NON_LVALUE_EXPR when args is NULL_TREE).

Successfully bootstrapped®rtested on x86_64-pc-linux-gnu, as
part of the kit.
Also, manually tested with "make s-selftest-c++" (since we don't run
the selftests for cc1plus by default).

OK for trunk in conjunction with the rest of the kit?


OK.

Jason



  1   2   >