date:20181112

On Sat, 10 Nov 2018, David Malcolm wrote:

> On Mon, 2018-10-22 at 16:08 +0200, Richard Biener wrote:
> > On Mon, 22 Oct 2018, David Malcolm wrote:
> > 
> > > On Mon, 2018-10-22 at 15:56 +0200, Richard Biener wrote:
> > > [...snip...]
> > > 
> > > > This is what I finally applied for the original patch after
> > > > fixing
> > > > the above issue.
> > > > 
> > > > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
> > > > 
> > > > Richard.
> > > > 
> > > > 2018-10-22  Richard Biener  
> > > > 
> > > >  * gimple-ssa-evrp-analyze.c
> > > >  (evrp_range_analyzer::record_ranges_from_incoming_edge): Be
> > > >  smarter about what ranges to use.
> > > >  * tree-vrp.c (add_assert_info): Dump here.
> > > >  (register_edge_assert_for_2): Instead of here at multiple
> > > > but
> > > >  not all places.
> > > 
> > > [...snip...]
> > >   
> > > > Index: gcc/tree-vrp.c
> > > > =
> > > > ==
> > > > --- gcc/tree-vrp.c  (revision 265381)
> > > > +++ gcc/tree-vrp.c  (working copy)
> > > > @@ -2299,6 +2299,9 @@ add_assert_info (vec &asser
> > > > info.val = val;
> > > > info.expr = expr;
> > > > asserts.safe_push (info);
> > > > +  dump_printf (MSG_NOTE | MSG_PRIORITY_INTERNALS,
> > > > +  "Adding assert for %T from %T %s %T\n",
> > > > +  name, expr, op_symbol_code (comp_code), val);
> > > >   }
> > > 
> > > I think this dump_printf call needs to be wrapped in:
> > >if (dump_enabled_p ())
> > > since otherwise it does non-trivial work, which is then discarded
> > > for
> > > the common case where dumping is disabled.
> > > 
> > > Alternatively, should dump_printf and dump_printf_loc test have an
> > > early-reject internally for that?
> > 
> > Oh, I thought it had one - at least the "old" implementation
> > did nothing expensive so if (dump_enabled_p ()) was just used
> > to guard multiple printf stmts, avoiding multiple no-op calls.
> > 
> > Did you check that all existing dump_* calls are wrapped inside
> > a dump_enabled_p () region?  If so I can properly guard the above.
> > Otherwise I think we should restore previous expectation?
> > 
> > Richard.
> 
> Here's a patch to address the above.
> 
> If called when !dump_enabled_p, the dump_* functions effectively do
> nothing, but as of r263178 this doing "nothing" involves non-trivial
> work internally.
> 
> I wasn't sure whether the dump_* functions should assert that
>   dump_enabled_p ()
> is true when they're called, or if they should bail out immediately
> for this case, so in this patch I implemented both, so that we get
> an assertion failure, and otherwise bail out for the case where
> !dump_enabled_p when assertions are disabled.
> 
> Alternatively, we could remove the assertion, and simply have the
> dump_* functions immediately bail out.
> 
> Richard, do you have a preference?

I like the VERIFY_DUMP_ENABLED_P way.  Given that dump_enabled_p ()
is testing a single global bool only inlining that check makes sense.

> The patch also fixes all of the places I found during testing
> (on x86_64-pc-linux-gnu) that call into dump_* but which
> weren't guarded by
>   if (dump_enabled_p ())
> The patch adds such conditionals.
> 
> Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> 
> OK for trunk?

Thus OK for trunk as-is.

Thanks,
Richard.

> gcc/ChangeLog:
>   * dumpfile.c (VERIFY_DUMP_ENABLED_P): New macro.
>   (dump_gimple_stmt): Use it.
>   (dump_gimple_stmt_loc): Likewise.
>   (dump_gimple_expr): Likewise.
>   (dump_gimple_expr_loc): Likewise.
>   (dump_generic_expr): Likewise.
>   (dump_generic_expr_loc): Likewise.
>   (dump_printf): Likewise.
>   (dump_printf_loc): Likewise.
>   (dump_dec): Likewise.
>   (dump_dec): Likewise.
>   (dump_hex): Likewise.
>   (dump_symtab_node): Likewise.
> 
> gcc/ChangeLog:
>   * gimple-loop-interchange.cc (tree_loop_interchange::interchange):
>   Guard dump call with dump_enabled_p.
>   * graphite-isl-ast-to-gimple.c (graphite_regenerate_ast_isl): Likewise.
>   * graphite-optimize-isl.c (optimize_isl): Likewise.
>   * graphite.c (graphite_transform_loops): Likewise.
>   * tree-loop-distribution.c (pass_loop_distribution::execute): Likewise.
>   * tree-parloops.c (parallelize_loops): Likewise.
>   * tree-ssa-loop-niter.c (number_of_iterations_exit): Likewise.
>   * tree-vect-data-refs.c (vect_analyze_group_access_1): Likewise.
>   (vect_prune_runtime_alias_test_list): Likewise.
>   * tree-vect-loop.c (vect_update_vf_for_slp): Likewise.
>   (vect_estimate_min_profitable_iters): Likewise.
>   * tree-vect-slp.c (vect_record_max_nunits): Likewise.
>   (vect_build_slp_tree_2): Likewise.
>   (vect_supported_load_permutation_p): Likewise.
>   (vect_slp_analyze_operations): Likewise.
>   (vect_slp_analyze_bb_1): Likewise.
>   (vect_slp_bb): Likewise.
>   * tree-vect-stmts

Re: [RFC] support --with-multilib-list=@/path/name

2018-11-12 Thread Alexandre Oliva

On Nov  9, 2018, "Richard Earnshaw (lists)"  wrote:

> - I'm not sure if we really want to allow combinations of an arbitrary
> multilib config with the builtin configurations.  Those are tricky
> enough to get right as it is, and requests to support additional libs as
> well as those with changes to these makefile fragments might be an
> issue.  As such, I think I'd want to say that you either use the builtin
> lists *or* you supply your own fragment.

diff --git a/gcc/config.gcc b/gcc/config.gcc
index f1363c41f989..20c2765d186f 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -3991,6 +3991,7 @@ case "${target}" in
 
# Add extra multilibs
if test "x$with_multilib_list" != x; then
+   ml=
arm_multilibs=`echo $with_multilib_list | sed -e 's/,/ 
/g'`
if test "x${arm_multilibs}" != xdefault ; then
for arm_multilib in ${arm_multilibs}; do
@@ -4031,6 +4032,9 @@ case "${target}" in
|| test "x$with_mode" != x ; then
echo "Error: You cannot use any of 
--with-arch/cpu/fpu/float/mode with --with-multilib-list=${with_multilib_list}" 
1>&2
exit 1
+   elif test "x$ml" != x ; then
+   echo "Error: You cannot use builtin 
multilib profiles along with custom ones" 1>&2
+   exit 1
fi
# But pass the default value for float-abi
# through to the multilib selector


> - I'd also be concerned about implying that this interface into the
> compiler build system is in any way stable, so I think we'd want to
> document explicitly that makefile fragments supplied this way may have
> to be tailored to a specific release of GCC.

diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index fd19fc590ec8..925a120ae7f4 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -1087,12 +1087,23 @@ the multilib profile for the architecture targetted.  
The special value
 @code{default} is also accepted and is equivalent to omitting the
 option, ie. only the default run-time library will be enabled.
 
-@var{list} may also contain @code{@@/path/name}, to use the multilib
+@var{list} may instead contain @code{@@/path/name}, to use the multilib
 configuration Makefile fragment @file{/path/name}.  Such files enable
 custom, user-chosen multilib lists to be configured.  Whether multiple
 such files can be used together depends on the contents of the supplied
-files.  See @file{gcc/config/arm/t-*profile} for examples of what such
-Makefile fragments ought to look like.
+files.  See @file{gcc/config/arm/t-multilib} and
+@file{gcc/config/arm/t-*profile} for examples of what such Makefile
+fragments might look like for this version of GCC.  The macros expected
+to be defined in these fragments are not stable across GCC releases, so
+make sure they define the @code{MULTILIB}-related macros expected by
+the version of GCC you are building.
+@ifnothtml
+@xref{Target Fragment,, Target Makefile Fragments, gccint, GNU Compiler
+Collection (GCC) Internals}.
+@end ifnothtml
+@ifhtml
+See ``Target Makefile Fragments'' in the internals manual.
+@end ifhtml
 
 The table below gives the combination of ISAs, architectures, FPUs and
 floating-point ABIs for which multilibs are built for each accepted value.


> Given the second point, there's nothing to stop a user copying the
> internal makefile fragments into their own fragment and then adjusting
> it to work with their additional features; so I don't think the first
> restriction is too onerous.

*nod*.  I'm having second thoughts on specifying external files with a
full pathname, though, as it might lead to accidental GPL violations.
Having the fragments in the source tree won't guarantee they are
included in corresponding sources, but at least for some workflows it
won't require significant procedural changes to get them included in
corresponding sources.


for  gcc/ChangeLog

* config.gcc (tmake_file): Add /path/name to tmake_file for
each @/path/name in --with-multilib-list on arm-*-* targets.
* configure.ac: Accept full pathnames in tmake_file.
* configure: Rebuilt.
* doc/install.texi (with-multilib-list): Document it.
---
 gcc/config.gcc   |   17 +
 gcc/configure|9 ++---
 gcc/configure.ac |5 -
 gcc/doc/install.texi |   33 ++---
 4 files changed, 53 insertions(+), 11 deletions(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 7578ff03825e..20c2765d186f 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -3991,6 +3991,7 @@ case "${target}" in
 
# Add extra multilibs
if test "x$with_multilib_list" != x; then
+

Re: [PATCH][RFC] Come up with -flive-patching master option.

On 11/12/18 3:28 AM, Qing Zhao wrote:
> Hi,
> 
> 
>> On Nov 10, 2018, at 2:51 AM, Martin Liška  wrote:
>>
>> On 11/9/18 6:43 PM, Qing Zhao wrote:
>>> Hi, Martin,
>>>
>>> thanks a lot for the previous two new options for live-patching. 
>>>
>>>
>>> I have two more questions below:
>>
>> Hello.
>>
>>>
>>> 1. do we still need new options to disable the following:
>>>   A. unreachable code/variable removal? 
>>
>> I hope it's guarded with newly added option -fipa-reference-addressable. 
>> Correct me
>> if I'm wrong.
> 
> The followings are some previous discussions on this:
> 
> “

 We perform discovery of functions/variables with no address taken and
 optimizations that are not valid otherwise such as duplicating them
 or doing skipping them for alias analysis (no flag to disable)
>>>
>>> Can you be please more verbose here? What optimizations do you mean?
> 
> See ipa_discover_readonly_nonaddressable_vars. If addressable bit is
> cleared we start analyzing uses of the variable via ipa_reference or so.
> If writeonly bit is set, we start removing writes to the variable and if
> readonly bit is set we skip any analysis about whether vairable changed.
> “
> 
> the new -fipa-reference-addressable is to control the above,  seems not the 
> unreachable code/variable removal?
> 
>>
>>>   B. Visibility changes with -flto and/or -fwhole-program?
>>
>> The options are not used in linux kernel, thus I didn't consider.
> 
> Okay, I see.
> 
>>
>>>
>>> 2. for this new patch, could you please explain a little bit more on the 
>>> problem?
>>
>> We want to enable a single option that will disable all possible (and 
>> future) optimizations
>> that influence live patching.
> 
> Okay, I see.
> 
> I am also working on a similar option as yours, but make the -flive-patching 
> as two level control:
> 
> +flive-patching
> +Common RejectNegative Alias(flive-patching=,inline-clone)
> +
> +flive-patching=
> +Common Report Joined RejectNegative Enum(live_patching_level) 
> Var(flag_live_patching) Init(LIVE_NONE)
> +-flive-patching=[inline-only-static|inline-clone]  Control optimizations 
> to provide a safe comp for live-patching purpose.
> 
> the implementation for -flive-patching=inline-clone (the default) is exactly 
> as yours,  the new level -flive-patching=inline-only-static
> is to only enable inlining of static function for live patching, which is 
> important for multiple-processes live patching to control memory
> consumption. 
> 
> (please see my 2nd version of the -flive-patching proposal).
> 
> I will send out my complete patch in another email.

Hi, sure, works for me. Let's make 2 level option.

Martin

> 
> thanks.
> 
> Qing
> 
>

Re: Stack alignment on Darwin (PR78444)

2018-11-12 Thread Iain Sandoe

Appending Uros’ comments from a second thread on Stack alignment.

Note that as per the new analysis in pr78444 this can affect other x86-64 
targets too.

> On 15 Aug 2018, at 16:57, H.J. Lu  wrote:
> 
> On Wed, Aug 15, 2018 at 8:41 AM, Iain Sandoe  wrote:
>> Hi HJ,
>> 
>> I am trying to track down a misalignment of the stack on Darwin (pr78444).
>> 
>> In r163971 you applied this:
>> 
>> --- gcc/config/i386/darwin.h(revision 163970)
>> +++ gcc/config/i386/darwin.h(revision 163971)
>> @@ -79,7 +79,9 @@
>>Failure to ensure this will lead to a crash in the system libraries
>>or dynamic loader.  */
>> #undef STACK_BOUNDARY
>> -#define STACK_BOUNDARY 128
>> +#define STACK_BOUNDARY \
>> + ((profile_flag || (TARGET_64BIT && ix86_abi == MS_ABI)) \
>> +  ? 128 : BITS_PER_WORD)
>> 
>> #undef MAIN_STACK_BOUNDARY
>> #define MAIN_STACK_BOUNDARY 128
>> @@ -91,7 +93,7 @@
>>it's below the minimum.  */
>> #undef PREFERRED_STACK_BOUNDARY
>> #define PREFERRED_STACK_BOUNDARY   \
>> -  MAX (STACK_BOUNDARY, ix86_preferred_stack_boundary)
>> +  MAX (128, ix86_preferred_stack_boundary
>> 
>> ===
>> 
>> I realise it’s a long time ago …
>> .. but, have you any pointer to the reasoning here or what problem was being 
>> solved?
>> (perhaps mail list discussion?)
> 
> Please see PR target/36502, PR target/42313 and PR target/44651.
> 
>> ===
>> 
>> Darwin’s 32b ABI mandates 128b alignment at functions calls:
>> 
>> "The function calling conventions used in the IA-32 environment are the same 
>> as those used in the System V IA-32 ABI, with the following exceptions:
>> ■ Different rules for returning structures
>> ■ The stack is 16-byte aligned at the point of function calls
>> “
>> 
>> Darwin’s 64b ABI refers directly to the SysV document, which also mandates 
>> [section 3.2.2] 128b (or 256b when __m256 is passed).
>> 
>> ===
>> 
>> The following patch resolves pr78444 - but it’s not clear if it’s a correct 
>> fix - or we should be looking for an alternate solution to whatever r193671 
>> was intending to achieve.
>> 
>> thanks,
>> Iain
>> 
>> [PATCH] Fix for PR78444.
>> 
>> maybe.
>> ---
>> gcc/config/i386/i386.c | 9 +
>> 1 file changed, 9 insertions(+)
>> 
>> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
>> index 163682bdff..405bfd082b 100644
>> --- a/gcc/config/i386/i386.c
>> +++ b/gcc/config/i386/i386.c
>> @@ -11530,6 +11530,15 @@ ix86_compute_frame_layout (void)
>>   crtl->preferred_stack_boundary = 128;
>>   crtl->stack_alignment_needed = 128;
>> }
>> +  else if (TARGET_MACHO && crtl->preferred_stack_boundary < 128
>> +  && !crtl->is_leaf)
>> +{
>> +  /* Darwin's ABI specifies 128b alignment for both 32 and
>> +64 bit variants at call sites.  So we apply this if the
>> +current function is not a leaf.  */
>> +  crtl->preferred_stack_boundary = 128;
>> +  crtl->stack_alignment_needed = 128;
>> +}
>> 
> 
> Can you change ix86_update_stack_boundary instead?

Uros writes:

"You can't use crtl->is_leaf in ix86_update_stack_boundary, since that
function gets called from cfgexpand, while crtl->is_leaf is set only
in IRA pass.

I *think* the fix should be along the lines of TARGET_64BIT_MS_ABI
fixup in ix86_compute_frame_layout (BTW: the existing fixup is strange
by itself, since TARGET_64BIT_MS_ABI declares STACK_BOUNDARY to 128,
and I can't see how leaf functions with crtl->preferred_stack_boundary
< 128 survive "gcc_assert (preferred_alignment >= STACK_BOUNDARY /
BITS_PER_UNIT);" a couple of lines below).

So, I think that fixup you proposed in the patch is in the right
direction. What happens if you add TARGET_MACHO to the existing fixup?
“
I will test that suggestion and re-post - although, if the problem is not 
specific to Darwin, maybe a more general fix is needed.

Iain

[wwwdocs] Add powerpcspe line to backends.html

2018-11-12 Thread Eric Botcazou

Now that the port has been spun off from the rs6000 port.  Applied.

-- 
Eric BotcazouIndex: htdocs/backends.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/backends.html,v
retrieving revision 1.81
diff -u -r1.81 backends.html
--- htdocs/backends.html	30 Sep 2018 14:38:45 -	1.81
+++ htdocs/backends.html	12 Nov 2018 09:13:55 -
@@ -102,6 +102,7 @@
 nvptx  |   S Q   Cq mg   e
 pa |   ? Q   CBD  qr  b   i  e
 pdp11  |L   ICqr  b  e
+powerpcspe | Q   Cqr pb   ia
 riscv  | Q   Cqr gia
 rl78   |L  F l   gs
 rs6000 | Q   Cqr pb   ia

Re: [PATCH] Remove unreachable nodes before IPA profile pass (PR ipa/87706).

On 11/8/18 12:46 PM, Jan Hubicka wrote:
>> On Thu, Nov 8, 2018 at 12:39 PM Martin Liška  wrote:
>>>
>>> On 11/8/18 12:19 PM, Jan Hubicka wrote:
> Hi.
>
> In order to fix the warnings mentioned in the PR, we need
> to run remove_unreachable_nodes after early tree passes. That's
> however possible only within a IPA pass. Thus I'm calling that
> before the profile PASS.
>
> Patch survives regression tests on ppc64le-linux-gnu and majority
> of warnings are gone in profiledbootstrap.
>
> Ready for trunk?

 I think we want to do that even with no -fprofile-generate because the
 unreachable code otherwise goes into all other IPA passes for no good
 reason.  So perhaps adding it as todo after the early optimization
 metapass?
>>>
>>> That fails due to gcc_assert.
>>> So one needs:
>>>
>>> diff --git a/gcc/passes.c b/gcc/passes.c
>>> index d838d909941..be92a2f3be3 100644
>>> --- a/gcc/passes.c
>>> +++ b/gcc/passes.c
>>> @@ -485,7 +485,7 @@ const pass_data pass_data_all_early_optimizations =
>>>0, /* properties_provided */
>>>0, /* properties_destroyed */
>>>0, /* todo_flags_start */
>>> -  0, /* todo_flags_finish */
>>> +  TODO_remove_functions | TODO_rebuild_cgraph_edges, /* todo_flags_finish 
>>> */
>>>  };
>>>
>>>  class pass_all_early_optimizations : public gimple_opt_pass
>>> @@ -1989,7 +1989,8 @@ execute_todo (unsigned int flags)
>>>   of IPA pass queue.  */
>>>if (flags & TODO_remove_functions)
>>>  {
>>> -  gcc_assert (!cfun);
>>> +  gcc_assert (!cfun
>>> + || strcmp (current_pass->name, "early_optimizations") == 
>>> 0);
>>>symtab->remove_unreachable_nodes (dump_file);
>>>  }
>>>
>>>
>>> Or do you prefer to a new pass_remove_functions pass that will be added 
>>> after
>>> pass_local_optimization_passes ?
>>
>> Can you make it todo_flags_start of pass_ipa_tree_profile instead?
> 
> It fails because all_early_optimizations are now gimple pass, so it
> should be TODO after pass_local_optimization_passes?

Unfortunately it does not work. Following file can't be compiled:

./xgcc -B. /home/marxin/Programming/gcc/gcc/testsuite/gcc.dg/torture/inline-2.c 
 -O0 
/usr/lib64/gcc/x86_64-suse-linux/8/../../../../x86_64-suse-linux/bin/ld: 
/tmp/ccCVFrPv.o: in function `bar1':
inline-2.c:(.text+0x5): undefined reference to `foo2'
/usr/lib64/gcc/x86_64-suse-linux/8/../../../../x86_64-suse-linux/bin/ld: 
/tmp/ccCVFrPv.o: in function `bar2':
inline-2.c:(.text+0x11): undefined reference to `foo1'
collect2: error: ld returned 1 exit status

So it's some interference with einline. Honza?

Thus I'm suggesting to add a new IPA pass.

That survives regression tests on x86_64-linux-gnu and bootstrap works.
Martin

> 
> Honza
>>
>>> Martin
>>>

 Honza
> Thanks,
> Martin
>
> gcc/ChangeLog:
>
> 2018-11-08  Martin Liska  
>
>  * tree-profile.c: Run TODO_remove_functions before "profile"
>  pass in order to remove dead functions that will trigger
>  -Wmissing-profile.
> ---
>  gcc/tree-profile.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
>

> diff --git a/gcc/tree-profile.c b/gcc/tree-profile.c
> index d8f2a3b1ba4..c14ebc556a6 100644
> --- a/gcc/tree-profile.c
> +++ b/gcc/tree-profile.c
> @@ -776,7 +776,7 @@ const pass_data pass_data_ipa_tree_profile =
>0, /* properties_required */
>0, /* properties_provided */
>0, /* properties_destroyed */
> -  0, /* todo_flags_start */
> +  TODO_remove_functions, /* todo_flags_start */
>TODO_dump_symtab, /* todo_flags_finish */
>  };
>
>

>>>

>From 00fd2e0870860d5e1b4e599e1f88292982a03efb Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 7 Nov 2018 13:10:57 +0100
Subject: [PATCH] Remove unreachable nodes before IPA profile pass (PR
 ipa/87706).

gcc/ChangeLog:

2018-11-12  Martin Liska  

	PR ipa/87706
	* cgraphbuild.c (class pass_ipa_remove_functions): New pass.
	(make_pass_remove_functions): Likewise.
	* passes.def: Declare the new pass.
	* tree-pass.h (make_pass_remove_functions): New.
---
 gcc/cgraphbuild.c | 44 
 gcc/passes.def|  1 +
 gcc/tree-pass.h   |  1 +
 3 files changed, 46 insertions(+)

diff --git a/gcc/cgraphbuild.c b/gcc/cgraphbuild.c
index c2ad5cf2ef7..f903df38c31 100644
--- a/gcc/cgraphbuild.c
+++ b/gcc/cgraphbuild.c
@@ -547,3 +547,47 @@ make_pass_remove_cgraph_callee_edges (gcc::context *ctxt)
 {
   return new pass_remove_cgraph_callee_edges (ctxt);
 }
+
+namespace {
+
+const pass_data pass_data_ipa_remove_functions =
+{
+  IPA_PASS, /* type */
+  "*remove_functions", /* name */
+  OPTGROUP_INLINE, /* optinfo_flags */
+  TV_IPA_FNSUMMARY, /* tv_id */
+  0, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  TODO_remove_functions /* todo_flags_finish */
+};
+
+class pas

Re: [PATCH][RFC] Come up with -flive-patching master option.

On 11/10/18 6:03 PM, Jan Hubicka wrote:
>> On 11/9/18 6:43 PM, Qing Zhao wrote:
>>> Hi, Martin,
>>>
>>> thanks a lot for the previous two new options for live-patching. 
>>>
>>>
>>> I have two more questions below:
>>
>> Hello.
>>
>>>
>>> 1. do we still need new options to disable the following:
>>>A. unreachable code/variable removal? 
>>
>> I hope it's guarded with newly added option -fipa-reference-addressable. 
>> Correct me
>> if I'm wrong.
> 
> No, unreachable code removal is still independent (we track all
> references and can remove variable with address taken)
> If you really want to keep all the symbols, you probably can mark
> everything as force_output like flag_keep_inline functions is
> implemented.  I am not sure how practical it would be though.

I see, that's probably not practical to start using a function/variable that
wasn't used before a live patch.

Martin

> 
> Honza
>>
>>>B. Visibility changes with -flto and/or -fwhole-program?
>>
>> The options are not used in linux kernel, thus I didn't consider.
>>
>>>
>>> 2. for this new patch, could you please explain a little bit more on the 
>>> problem?
>>
>> We want to enable a single option that will disable all possible (and 
>> future) optimizations
>> that influence live patching.
>>
>> Martin
>>
>>>
>>> Thanks.
>>>
>>> Qing
>>>
 On Nov 9, 2018, at 9:33 AM, Martin Liška  wrote:

 Hi.

 After I added 2 new options, I would like to include a new master option.
 It's minimal version which only disables optimizations that we are aware of
 and can potentially cause problems for live-patching.

 Martin
 <0001-Come-up-with-fvectorized-functions.patch>
>>>
>>

Re: [PATCH] Come up with htab_hash_string_vptr and use string-specific if possible.

On 11/9/18 2:28 PM, Michael Matz wrote:
> Hi,
> 
> On Thu, 8 Nov 2018, Martin Liška wrote:
> 
>>> That seems better.  But still, why declare this in system.h?  It seems 
>>> hash-table.h seems more appropriate.
>>
>> I need to declare it before I'll poison it. As system.h is included very 
>> early, one can guarantee that there will be no usage before the 
>> poisoning happens.
> 
> Yes but it's also included everywhere, so adding anything to it comes at a 
> cost, and conceptually it simply doesn't belong there.

Agree.

> 
> There's no fundamental reason why we can't poison identifiers in other 
> headers.  Indeed we do in vec.h.  So move the whole thing including 
> poisoning to hash-table.h?

That's not feasible as gcc/gcc/genhooks.c files use the function and
we don't want to include hash-table.h in the generator files.
So second candidate can be gcc/hash-traits.h, but it's also not working:
/home/marxin/Programming/gcc/gcc/hash-traits.h:270:17: error: 
‘gt_pointer_operator’ has not been declared
   pch_nx (T &p, gt_pointer_operator op, void *cookie)
 ^~~

so we should eventually come up with "hash.h" and include it in many places as 
there's following usage
in hash-traits.h:

   212  inline hashval_t
   213  string_hash::hash (const char *id)
   214  {
   215return hash_string (id);
   216  }

So it's question whether it worth doing that?

Martin

> 
> 
> Ciao,
> Michael.
>

Re: [PATCH] minor FDO profile related fixes

On 11/8/18 1:49 AM, Indu Bhagat wrote:
> I have been looking at -fdump-ipa-profile dump with an intention to sanitize
> bits of information so that one may use it to judge the "quality of a profile"
> in FDO.
> 
> The overall question I want to address is - are there ways to know which
> functions were not run in the training run, i.e. have ZERO profile ?
> (This patch corrects some dumped info; in a subsequent patch I would like to 
> add
> some more explicit information/diagnostics.)
> 
> Towards that end, I noticed that there are a couple of misleading bits of
> information (so I think) in the symbol table dump listing all functions in the
> compilation unit :
>    --- "globally 0" appears even when profile data has not been fed by 
> feedback
> profile (not the intent as the documentation of 
> profile_guessed_global0
>  in profile-count.h suggests).
>    --- "unlikely_executed" should appear only when there is profile feedback 
> or
>    a function attribute is specified (as per documentation of
>    node_frequency in coretypes.h). "unlikely_executed" in case of STALE or
>    NO profile is misleading in my opinion.
> 
> Summary of changes :
> 
> 1. This patch makes some adjustments around how x_profile_status of a function
> is set - x_profile_status should be set to PROFILE_READ only when there is a
> profile for a function read from the .gcda file. So, instead of relying on
> profile_info (set whenever the gcda feedback file is present, even if the
> function does not have a profile available in the file), use exec_counts
> (non null when function has a profile (the latter may or may not be zero)). In
> essence, x_profile_status and profile_count::m_quality
> are set consistent to the stated intent (in code comments.)
> 
> 2. A minor change in coverage.c is for more precise location of the message
> 
> Following -fdump-ipa-profile dump excerpts show the effect :
> 
> 
>  -O1, -O2, -O3
> 
> 
> 0. APPLICABLE PROFILE
> Trunk : Function flags: count:224114269 body hot
> After Patch : Function flags: count:224114269 (precise) body hot
> 
> 1. STALE PROFILE
> (i.e., those cases covered by Wcoverage-mismatch; when control flow changes
>  between profile-generate and profile-use)
> Trunk : Function flags: count:224114269 body hot
> After Patch : Function flags: count:224114269 (precise) body hot
> 
> 2. NO PROFILE
> (i.e., those cases covered by Wmissing-profile; when function has no profile
>  available in the .gcda file)
> Trunk (missing .gcda file) : Function flags: count:1073741824 (estimated 
> locally) body
> Trunk (missing function) : Function flags: count: 1073741824 (estimated 
> locally, globally 0) body unlikely_executed
> After Patch (missing .gcda file) : Function flags: count:1073741824 
> (estimated locally) body
> After Patch (missing function) : Function flags: count:1073741824 (estimated 
> locally) body
> 
> 3. ZERO PROFILE (functions not run in training run)
> Trunk : Function flags: count: 1073741824 (estimated locally, globally 0) 
> body unlikely_executed
> After Patch (remains the same) : count: 1073741824 (estimated locally, 
> globally 0) body unlikely_executed
> 
> --
> O0
> --
> In O0, flag_guess_branch_prob is not set. This makes the profile_quality set 
> to
> (precise) for most of the above cases.
> 
> 0. APPLICABLE PROFILE
> Trunk : Function flags: count:224114269 body hot
> After Patch : Function flags: count:224114269 (precise) body hot
> 
> 1. STALE PROFILE
> (i.e., those cases covered by Wcoverage-mismatch; when control flow changes
>  between profile-generate and profile-use)
> Trunk : Function flags: count:224114269 body hot
> After Patch : Function flags: count:224114269 (precise) body hot
> 
> 2. NO PROFILE
> (i.e., those cases covered by Wmissing-profile; when function has no profile
>  available in the .gcda file)
> Trunk (missing file) : Function flags: body
> Trunk (missing function) : Function flags: count:0 body unlikely_executed
> After Patch (missing file) :  Function flags: body
> *** After Patch (missing function) : Function flags: count:0 (precise) body
> (*** This remains misleading, and I do not have a solution for this; as use 
> of heuristics
>  to guess branch probability is not allowed in O0)
> 
> 3. ZERO PROFILE (functions not run in training run)
> Trunk : Function flags: count:0 body unlikely_executed
> After Patch : Function flags: count:0 (precise) body
> 
> --
> 
> make check-gcc on x86_64 shows no new failures.
> 
> (A related PR was https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86957 where we 
> added diagnostics for the NO PROFILE case.)

Hi.

Thanks for the patch. I'm not a maintainer, but the idea of the patch looks 
correct to me.
One question about adding "(precise)",

Re: cleanups and unification of value_range dumping code

2018-11-12 Thread Aldy Hernandez

I have rebased my value_range dumping patch after your value_range_base 
changes.


I know you are not a fan of the gimple-pretty-print.c chunk, but I still 
think having one set of dumping code is preferable to catering to 
possible GC corruption while debugging.  If you're strongly opposed (as, 
I'm putting my foot down), I can remove it as well as the relevant 
pretty_printer stuff.


The patch looks bigger than it is because I moved all the dump routines 
into one place.


OK?

p.s. After your changes, perhaps get_range_info(const_tree, value_range 
&) should take a value_range_base instead?
gcc/

	* gimple-pretty-print.c (dump_ssaname_info): Use value_range
	dumping infrastructure.
	* ipa-cp.c (ipcp_vr_lattice::print): Call overloaded
	dump_value_range.
	* tree-vrp.c (value_range_base::dump): Rewrite to use
	pretty_printer.  Dump type.  Do not display -INF/+INF if precision
	is 1.
	Move all dumping routines into one spot.
	* tree-vrp.h (value_range_base): Add pretty_printer variant.
	(value_range): Same.
	(dump_value_range_base): Rename to overloaded dump_value_range.

gcc/testsuite/

	* gcc.dg/tree-ssa/pr64130.c: Adjust for new value_range pretty
	printer.
	* gcc.dg/tree-ssa/vrp92.c: Same.

diff --git a/gcc/gimple-pretty-print.c b/gcc/gimple-pretty-print.c
index 276e5798bac..e69683f174e 100644
--- a/gcc/gimple-pretty-print.c
+++ b/gcc/gimple-pretty-print.c
@@ -2151,21 +2151,11 @@ dump_ssaname_info (pretty_printer *buffer, tree node, int spc)
   if (!POINTER_TYPE_P (TREE_TYPE (node))
   && SSA_NAME_RANGE_INFO (node))
 {
-  wide_int min, max, nonzero_bits;
-  value_range_kind range_type = get_range_info (node, &min, &max);
+  value_range vr;
 
-  if (range_type == VR_VARYING)
-	pp_printf (buffer, "# RANGE VR_VARYING");
-  else if (range_type == VR_RANGE || range_type == VR_ANTI_RANGE)
-	{
-	  pp_printf (buffer, "# RANGE ");
-	  pp_printf (buffer, "%s[", range_type == VR_RANGE ? "" : "~");
-	  pp_wide_int (buffer, min, TYPE_SIGN (TREE_TYPE (node)));
-	  pp_printf (buffer, ", ");
-	  pp_wide_int (buffer, max, TYPE_SIGN (TREE_TYPE (node)));
-	  pp_printf (buffer, "]");
-	}
-  nonzero_bits = get_nonzero_bits (node);
+  get_range_info (node, vr);
+  vr.dump (buffer);
+  wide_int nonzero_bits = get_nonzero_bits (node);
   if (nonzero_bits != -1)
 	{
 	  pp_string (buffer, " NONZERO ");
diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index 4f147eb37cc..882c8975ff4 100644
--- a/gcc/ipa-cp.c
+++ b/gcc/ipa-cp.c
@@ -522,7 +522,7 @@ ipcp_bits_lattice::print (FILE *f)
 void
 ipcp_vr_lattice::print (FILE * f)
 {
-  dump_value_range_base (f, &m_vr);
+  dump_value_range (f, &m_vr);
 }
 
 /* Print all ipcp_lattices of all functions to F.  */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr64130.c b/gcc/testsuite/gcc.dg/tree-ssa/pr64130.c
index e068765e2fc..28ffbb76da8 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr64130.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr64130.c
@@ -15,6 +15,6 @@ int funsigned2 (uint32_t a)
   return (-1 * 0x1L) / a == 0;
 }
 
-/* { dg-final { scan-tree-dump ": \\\[2, 8589934591\\\]" "evrp" } } */
-/* { dg-final { scan-tree-dump ": \\\[-8589934591, -2\\\]" "evrp" } } */
+/* { dg-final { scan-tree-dump "int \\\[2, 8589934591\\\]" "evrp" } } */
+/* { dg-final { scan-tree-dump "int \\\[-8589934591, -2\\\]" "evrp" } } */
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp92.c b/gcc/testsuite/gcc.dg/tree-ssa/vrp92.c
index 5a2dbf0108a..66d74e9b5e9 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/vrp92.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp92.c
@@ -18,5 +18,5 @@ int foo (int i, int j)
   return j;
 }
 
-/* { dg-final { scan-tree-dump "res_.: \\\[1, 1\\\]" "vrp1" } } */
+/* { dg-final { scan-tree-dump "res_.: int \\\[1, 1\\\]" "vrp1" } } */
 /* { dg-final { scan-tree-dump-not "Threaded" "vrp1" } } */
diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index 3ef676bb71b..0081821985c 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -358,70 +358,130 @@ value_range_base::type () const
   return TREE_TYPE (min ());
 }
 
-/* Dump value range to FILE.  */
+/* Value range dumping functions.  */
 
 void
-value_range_base::dump (FILE *file) const
+value_range_base::dump (pretty_printer *buffer) const
 {
   if (undefined_p ())
-fprintf (file, "UNDEFINED");
+pp_string (buffer, "UNDEFINED");
   else if (m_kind == VR_RANGE || m_kind == VR_ANTI_RANGE)
 {
-  tree type = TREE_TYPE (min ());
+  tree ttype = type ();
+
+  dump_generic_node (buffer, ttype, 0, TDF_SLIM, false);
+  pp_character (buffer, ' ');
 
-  fprintf (file, "%s[", (m_kind == VR_ANTI_RANGE) ? "~" : "");
+  pp_printf (buffer, "%s[", (m_kind == VR_ANTI_RANGE) ? "~" : "");
 
-  if (INTEGRAL_TYPE_P (type)
-	  && !TYPE_UNSIGNED (type)
-	  && vrp_val_is_min (min ()))
-	fprintf (file, "-INF");
+  if (INTEGRAL_TYPE_P (ttype)
+	  && !TYPE_UNSIGNED (ttype)
+	  && vrp_val_is_min (min ())
+	  && TYPE_PRECISION (ttype) != 1)
+	pp_printf (buffer, "-INF");
   else
-	print_generic_expr (file

Re: [PATCH] Add value_range_base (w/o equivs)

2018-11-12 Thread Aldy Hernandez





On 11/11/18 3:53 AM, Richard Biener wrote:

On Fri, 9 Nov 2018, Aldy Hernandez wrote:


On 11/9/18 9:19 AM, Richard Biener wrote:


This adds value_range_base, a base class of class value_range
with all members but the m_equiv one.


First of all, thanks so much for doing this!



I have looked into the sole GC user, IPA propagation, and replaced
the value_range use there with value_range_base since it also
asserted the equiv member is always NULL.

This in turn means I have written down that GC users only can
use value_range_base (and fixed the accessability issue with
adding a bunch of friends).



+
  /* Range of values that can be associated with an SSA_NAME after VRP
-   has executed.  */
-class GTY((for_user)) value_range
+   has executed.  Note you may only use value_range_base with GC memory.
*/
+class GTY((for_user)) value_range_base
+{


GC users cannot use the derived value_range?  Either way could you document
the "why" this is the case above?


I've changed the comment as it was said to be confusing.  The reason is
the marking isn't implemented.


Ah, I see.  In which case, shouldn't you then remove the GTY() markers 
from the derived class?


/* Note value_range cannot currently be used with GC memory, only
   value_range_base is fully set up for this.  */
class GTY((user)) value_range : public value_range_base

[PATCH] Add C++ runtime support for new 128-bit long double format

2018-11-12 Thread Jonathan Wakely


This adds support for the new 128-bit long double format on powerpc64,
see https://fedoraproject.org/wiki/Changes/PPC64LE_Float128_Transition
for more details.

Most of the required changes are to the locale facets that parse and
print long doubles, as used by iostreams for reading/writing numbers.

I followed the same design as is used for the existing
-mlong-double-64 compatibility, i.e. adding extra virtual functions to
the facet classes so that they are capable of handling the old and new
formats, using different virtual functions. For example, std::num_get
only handles 64-bit long double, and has only the virtual functions
described by the standard. For floating point types these are the
following:

iter_type do_get( iter_type in, iter_type end, std::ios_base& str,
 std::ios_base::iostate& err, float& v ) const;

iter_type do_get( iter_type in, iter_type end, std::ios_base& str,
 std::ios_base::iostate& err, double& v ) const;

iter_type do_get( iter_type in, iter_type end, std::ios_base& str,
 std::ios_base::iostate& err, long double& v ) const;


The __gnu_cxx_ldbl128::num_get class has an extra virtual function:

iter_type do_get( iter_type in, iter_type end, std::ios_base& str,
 std::ios_base::iostate& err, float& v ) const;

iter_type do_get( iter_type in, iter_type end, std::ios_base& str,
 std::ios_base::iostate& err, double& v ) const;

// Handles 64-bit long double, which has the same representation as
// double:
iter_type __do_get( iter_type in, iter_type end, std::ios_base& str,
   std::ios_base::iostate& err, double& v ) const;

// Handles 128-bit long double:
iter_type do_get( iter_type in, iter_type end, std::ios_base& str,
 std::ios_base::iostate& err, long double& v ) const;


The __gnu_cxx_ieee128::num_get class added by this patch has another
extra virtual function:

iter_type do_get( iter_type in, iter_type end, std::ios_base& str,
 std::ios_base::iostate& err, float& v ) const;

iter_type do_get( iter_type in, iter_type end, std::ios_base& str,
 std::ios_base::iostate& err, double& v ) const;

// Handles 64-bit long double, which has the same representation as
// double:
iter_type __do_get( iter_type in, iter_type end, std::ios_base& str,
   std::ios_base::iostate& err, double& v ) const;

// Handles old IBM 128-bit long double:
iter_type __do_get( iter_type in, iter_type end, std::ios_base& str,
   std::ios_base::iostate& err, __ibm128& v ) const;

// Handles new IEEE 128-bit long double:
iter_type do_get( iter_type in, iter_type end, std::ios_base& str,
 std::ios_base::iostate& err, long double& v ) const;

However, I'm not really sure if we want to do this. I don't understand
the purpose of having a new facet type that has virtual functions for
handling alternative long double formats. The facet can never be used
because std::use_facet will resolve to std::num_get when
-mlong-double-64 is used, and that facet isn't installed in any
locales by default (only __gnu_cxx_ldbl128::num_get is).

Maybe I should not have bothered to copy this technique this for the
new long double formats. I'm not sure.

Currently this patch causes a few regressions, due to the facets for
the IBM 128-bit long double format no longer being installed in the
locales. That means that 100% conforming behaviour is only possible
for the new IEEE 128-bit long double. I'm not sure what the migration
plan is for that type, and whether both formats need to work
equivalently. That would require installing both the
__gnu_cxx_ldbl128::num_get and __gnu_cxx_ieee128::num_get facets (and
similarly for num_put, money_get and money_put).

So this patch probably isn't in its final form, but I'm posting it now
to make the stage 1 deadline, and will complete it ASAP.

* config.h.in: Regenerate.
* config/abi/pre/gnu.ver: Make patterns less greedy. Add CXXABI_1.3.12
symbol version.
* config/os/gnu-linux/ldbl-ieee128-extra.ver: New file with patterns
for IEEE128 long double symbols.
* configure: Regenerate.
* configure.ac: Enable alternative 128-bit long double format on
powerpc64*-*-linux*.
* doc/Makefile.in: Regenerate.
* fragment.am: Regenerate.
* include/Makefile.am: Set _GLIBCXX_LONG_DOUBLE_ALT128_COMPAT.
* include/Makefile.in: Regenerate.
* include/bits/c++config: Define inline namespace for new long double
symbols. Don't define _GLIBCXX_USE_FLOAT128 when it's the same type
as long double.
* include/bits/locale_classes.h [_GLIBCXX_LONG_DOUBLE_ALT128_COMPAT]
(locale::_Impl::_M_init_extra_ldbl128): Declare new member function.
* include/bits/locale_facets.h (_GLIBCXX_NUM_FACETS): Simplify by
only counting narrow character facets.
(_GLIBCXX_NUM_CXX11_FACETS): Likewise.
(_

Re: [PATCH] Add value_range_base (w/o equivs)

On Mon, 12 Nov 2018, Aldy Hernandez wrote:

> 
> 
> On 11/11/18 3:53 AM, Richard Biener wrote:
> > On Fri, 9 Nov 2018, Aldy Hernandez wrote:
> > 
> > > On 11/9/18 9:19 AM, Richard Biener wrote:
> > > > 
> > > > This adds value_range_base, a base class of class value_range
> > > > with all members but the m_equiv one.
> > > 
> > > First of all, thanks so much for doing this!
> > > 
> > > > 
> > > > I have looked into the sole GC user, IPA propagation, and replaced
> > > > the value_range use there with value_range_base since it also
> > > > asserted the equiv member is always NULL.
> > > > 
> > > > This in turn means I have written down that GC users only can
> > > > use value_range_base (and fixed the accessability issue with
> > > > adding a bunch of friends).
> > > 
> > > > +
> > > >   /* Range of values that can be associated with an SSA_NAME after VRP
> > > > -   has executed.  */
> > > > -class GTY((for_user)) value_range
> > > > +   has executed.  Note you may only use value_range_base with GC
> > > > memory.
> > > > */
> > > > +class GTY((for_user)) value_range_base
> > > > +{
> > > 
> > > GC users cannot use the derived value_range?  Either way could you
> > > document
> > > the "why" this is the case above?
> > 
> > I've changed the comment as it was said to be confusing.  The reason is
> > the marking isn't implemented.
> 
> Ah, I see.  In which case, shouldn't you then remove the GTY() markers from
> the derived class?
> 
> /* Note value_range cannot currently be used with GC memory, only
>value_range_base is fully set up for this.  */
> class GTY((user)) value_range : public value_range_base

It's required to make gengtype happy...

Richard.

[PATCH] Change set_value_range_to_[non]null to not preserve equivs



This is a semantic change but AFAICS it shouldn't result in any 
pessimization.  The behavior of the API is non-obvious.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2018-11-12  Richard Biener  

* tree-vrp.c (set_value_range_to_nonnull): Clear equiv.
(set_value_range_to_null): Likewise.
* vr-values.c (vr_values::extract_range_from_comparison):
Clear equiv for constant singleton ranges.

Index: gcc/tree-vrp.c
===
--- gcc/tree-vrp.c  (revision 266026)
+++ gcc/tree-vrp.c  (working copy)
@@ -767,7 +767,7 @@ void
 set_value_range_to_nonnull (value_range *vr, tree type)
 {
   tree zero = build_int_cst (type, 0);
-  vr->update (VR_ANTI_RANGE, zero, zero);
+  set_value_range (vr, VR_ANTI_RANGE, zero, zero, NULL);
 }
 
 
@@ -776,7 +776,7 @@ set_value_range_to_nonnull (value_range
 void
 set_value_range_to_null (value_range *vr, tree type)
 {
-  set_value_range_to_value (vr, build_int_cst (type, 0), vr->equiv ());
+  set_value_range_to_value (vr, build_int_cst (type, 0), NULL);
 }
 
 /* Return true, if VAL1 and VAL2 are equal values for VRP purposes.  */
Index: gcc/vr-values.c
===
--- gcc/vr-values.c (revision 266026)
+++ gcc/vr-values.c (working copy)
@@ -896,7 +896,7 @@ vr_values::extract_range_from_comparison
 type.  */
   val = fold_convert (type, val);
   if (is_gimple_min_invariant (val))
-   set_value_range_to_value (vr, val, vr->equiv ());
+   set_value_range_to_value (vr, val, NULL);
   else
vr->update (VR_RANGE, val, val);
 }
@@ -1672,7 +1672,7 @@ vr_values::adjust_range_with_scev (value
   /* Like in PR19590, scev can return a constant function.  */
   if (is_gimple_min_invariant (chrec))
 {
-  set_value_range_to_value (vr, chrec, vr->equiv ());
+  set_value_range_to_value (vr, chrec, NULL);
   return;
 }

Re: [PATCH] Add value_range_base (w/o equivs)

2018-11-12 Thread Aldy Hernandez

Ug ok.

On Mon, Nov 12, 2018, 12:10 Richard Biener  On Mon, 12 Nov 2018, Aldy Hernandez wrote:
>
> >
> >
> > On 11/11/18 3:53 AM, Richard Biener wrote:
> > > On Fri, 9 Nov 2018, Aldy Hernandez wrote:
> > >
> > > > On 11/9/18 9:19 AM, Richard Biener wrote:
> > > > >
> > > > > This adds value_range_base, a base class of class value_range
> > > > > with all members but the m_equiv one.
> > > >
> > > > First of all, thanks so much for doing this!
> > > >
> > > > >
> > > > > I have looked into the sole GC user, IPA propagation, and replaced
> > > > > the value_range use there with value_range_base since it also
> > > > > asserted the equiv member is always NULL.
> > > > >
> > > > > This in turn means I have written down that GC users only can
> > > > > use value_range_base (and fixed the accessability issue with
> > > > > adding a bunch of friends).
> > > >
> > > > > +
> > > > >   /* Range of values that can be associated with an SSA_NAME after
> VRP
> > > > > -   has executed.  */
> > > > > -class GTY((for_user)) value_range
> > > > > +   has executed.  Note you may only use value_range_base with GC
> > > > > memory.
> > > > > */
> > > > > +class GTY((for_user)) value_range_base
> > > > > +{
> > > >
> > > > GC users cannot use the derived value_range?  Either way could you
> > > > document
> > > > the "why" this is the case above?
> > >
> > > I've changed the comment as it was said to be confusing.  The reason is
> > > the marking isn't implemented.
> >
> > Ah, I see.  In which case, shouldn't you then remove the GTY() markers
> from
> > the derived class?
> >
> > /* Note value_range cannot currently be used with GC memory, only
> >value_range_base is fully set up for this.  */
> > class GTY((user)) value_range : public value_range_base
>
> It's required to make gengtype happy...
>
> Richard.
>

[PATCH] Move more stuff to value-range-base



The simple stuff.  I have some more stuff queued.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2018-11-12  Richard Biener  

* tree-vrp.h (value_range_base::symbolic_p,
value_range_base::constant_p, value_range_base::zero_p,
value_range_base::singleton_p): Move from value_range.
(value_range::dump): Add.
* gimple-ssa-evrp-analyze.c
(evrp_range_analyzer::record_ranges_from_phis): Use set_varying.
* ipa-cp.c (ipcp_vr_lattice::print): Use dump_value_range.
* tree-ssa-threadedge.c (record_temporary_equivalences_from_phis):
Use set_varying.
* tree-vrp.c (value_range::symbolic_p): Move to value_range_base.
(value_range::constant_p): Likewise.
(value_range::singleton_p): Likewise.
(value_range_base::dump): Add.
(set_value_range_to_undefined): Remove.
(set_value_range_to_varying): Likewise.
(range_int_cst_p): Take value_range_base argument.
(range_int_cst_singleton_p): Likewise.
(value_range_constant_singleton): Likewise.
(vrp_set_zero_nonzero_bits): Likewise.
(extract_range_from_multiplicative_op): Use set_varying.
(extract_range_from_binary_expr_1): Likewise. Use set_undefined.
(extract_range_from_unary_expr): Likewise.
(dump_value_range_base): Change to overload of dump_value_range.
(vrp_prop::vrp_initialize): Use set_varying and set_undefined.
(vrp_prop::visit_stmt): Likewise.
(value_range::intersect_helper): Likewise.
(value_range::union_helper): Likewise.
(determine_value_range_1): Likewise.

diff --git a/gcc/gimple-ssa-evrp-analyze.c b/gcc/gimple-ssa-evrp-analyze.c
index 3e5287b1b0b..1cd13dda7b6 100644
--- a/gcc/gimple-ssa-evrp-analyze.c
+++ b/gcc/gimple-ssa-evrp-analyze.c
@@ -252,7 +252,7 @@ evrp_range_analyzer::record_ranges_from_phis (basic_block 
bb)
vr_values->extract_range_from_phi_node (phi, &vr_result);
   else
{
- set_value_range_to_varying (&vr_result);
+ vr_result.set_varying ();
  /* When we have an unvisited executable predecessor we can't
 use PHI arg ranges which may be still UNDEFINED but have
 to use VARYING for them.  But we can still resort to
diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index 4f147eb37cc..882c8975ff4 100644
--- a/gcc/ipa-cp.c
+++ b/gcc/ipa-cp.c
@@ -522,7 +522,7 @@ ipcp_bits_lattice::print (FILE *f)
 void
 ipcp_vr_lattice::print (FILE * f)
 {
-  dump_value_range_base (f, &m_vr);
+  dump_value_range (f, &m_vr);
 }
 
 /* Print all ipcp_lattices of all functions to F.  */
diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c
index 330ba153e37..3494ee90b58 100644
--- a/gcc/tree-ssa-threadedge.c
+++ b/gcc/tree-ssa-threadedge.c
@@ -183,7 +183,7 @@ record_temporary_equivalences_from_phis (edge e,
  else if (TREE_CODE (src) == INTEGER_CST)
set_value_range_to_value (new_vr, src,  NULL);
  else
-   set_value_range_to_varying (new_vr);
+   new_vr->set_varying ();
 
  /* This is a temporary range for DST, so push it.  */
  evrp_range_analyzer->push_value_range (dst, new_vr);
diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index 3ef676bb71b..25eea61ca80 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -237,7 +237,7 @@ value_range::operator!= (const value_range &other) const
 /* Return TRUE if this is a symbolic range.  */
 
 bool
-value_range::symbolic_p () const
+value_range_base::symbolic_p () const
 {
   return (!varying_p ()
  && !undefined_p ()
@@ -251,7 +251,7 @@ value_range::symbolic_p () const
constants would be represented as [-MIN, +MAX].  */
 
 bool
-value_range::constant_p () const
+value_range_base::constant_p () const
 {
   return (!varying_p ()
  && !undefined_p ()
@@ -336,7 +336,7 @@ value_range::equiv_add (const_tree var,
So, [&x, &x] counts as a singleton.  */
 
 bool
-value_range::singleton_p (tree *result) const
+value_range_base::singleton_p (tree *result) const
 {
   if (m_kind == VR_RANGE
   && vrp_operand_equal_p (min (), max ())
@@ -418,6 +418,13 @@ value_range::dump (FILE *file) const
 }
 
 void
+value_range_base::dump () const
+{
+  dump_value_range (stderr, this);
+  fprintf (stderr, "\n");
+}
+
+void
 value_range::dump () const
 {
   dump_value_range (stderr, this);
@@ -591,22 +598,6 @@ intersect_range_with_nonzero_bits (enum value_range_kind 
vr_type,
   return vr_type;
 }
 
-/* Set value range VR to VR_UNDEFINED.  */
-
-static inline void
-set_value_range_to_undefined (value_range *vr)
-{
-  vr->set_undefined ();
-}
-
-/* Set value range VR to VR_VARYING.  */
-
-void
-set_value_range_to_varying (value_range *vr)
-{
-  vr->set_varying ();
-}
-
 /* Set value range VR to {T, MIN, MAX, EQUIV}.  */
 
 void
@@ -823,7 +814,7 @@ range_is_nonnull (const value_range *vr)
a singleton.  */
 
 bool
-range_int_cst_p (const value_range *vr)
+range_int_

[PATCH 0/3] [ARC] Glibc required patches

Hi Andrew,

The attached three patches are required to reduce/enable glibc
builds. Although not all of them are glibc related they are found when
porting this library to ARC.

OK to apply?
Claudiu

Claudiu Zissulescu (3):
  [ARC] Update EH code.
  [ARC] Do not emit ZOL in the presence of text jump tables.
  [ARC] Add support for profiling in glibc.

 gcc/config/arc/arc-protos.h   |  2 +-
 gcc/config/arc/arc.c  | 25 +--
 gcc/config/arc/arc.h  | 14 +++--
 gcc/config/arc/arc.md | 15 ++
 gcc/config/arc/elf.h  |  9 
 gcc/config/arc/linux.h| 10 +
 gcc/testsuite/gcc.target/arc/builtin_eh.c | 22 
 7 files changed, 79 insertions(+), 18 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arc/builtin_eh.c

-- 
2.19.1

[PATCH 2/3] [ARC] Do not emit ZOL in the presence of text jump tables.

Avoid emitting lp instruction when in its ZOL body we find a jump table data
in text section.

gcc/
-xx-xx  Claudiu Zissulescu  

* config/arc/arc.c (hwloop_optimize): Bailout when detecting a
jump table data in the text section.
---
 gcc/config/arc/arc.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
index a92456b457d..9eab4c27284 100644
--- a/gcc/config/arc/arc.c
+++ b/gcc/config/arc/arc.c
@@ -7791,7 +7791,17 @@ hwloop_optimize (hwloop_info loop)
   for (insn = loop->start_label;
insn && insn != loop->loop_end;
insn = NEXT_INSN (insn))
-length += NONDEBUG_INSN_P (insn) ? get_attr_length (insn) : 0;
+{
+  length += NONDEBUG_INSN_P (insn) ? get_attr_length (insn) : 0;
+  if (JUMP_TABLES_IN_TEXT_SECTION
+ && JUMP_TABLE_DATA_P (insn))
+   {
+ if (dump_file)
+   fprintf (dump_file, ";; loop %d has a jump table\n",
+loop->loop_no);
+ return false;
+   }
+}
 
   if (!insn)
 {
-- 
2.19.1

[PATCH 1/3] [ARC] Update EH code.

Our ABI says the blink is pushed first on stack followed by an unknown
number of register saves, and finally by fp.  Hence we cannot use the
EH_RETURN_ADDRESS macro as the stack is not finalized at that moment.
The alternative is to use the eh_return pattern and to initialize all
the bits after register allocation when the stack layout is finalized.

gcc/
-xx-xx  Claudiu Zissulescu  

* config/arc/arc.c (arc_eh_return_address_location): Repurpose it
to fit the eh_return pattern.
* config/arc/arc.md (eh_return): Define.
(VUNSPEC_ARC_EH_RETURN): Likewise.
* config/arc/arc-protos.h (arc_eh_return_address_location): Match
new implementation.
* config/arc/arc.h (EH_RETURN_HANDLER_RTX): Remove it.

testsuite/
-xx-xx  Claudiu Zissulescu  

* gcc.target/arc/builtin_eh.c: New test.
---
 gcc/config/arc/arc-protos.h   |  2 +-
 gcc/config/arc/arc.c  | 13 -
 gcc/config/arc/arc.h  |  2 --
 gcc/config/arc/arc.md | 15 +++
 gcc/testsuite/gcc.target/arc/builtin_eh.c | 22 ++
 5 files changed, 46 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arc/builtin_eh.c

diff --git a/gcc/config/arc/arc-protos.h b/gcc/config/arc/arc-protos.h
index 6450b6a014e..4f72a06e3dc 100644
--- a/gcc/config/arc/arc-protos.h
+++ b/gcc/config/arc/arc-protos.h
@@ -110,7 +110,7 @@ extern bool arc_legitimize_reload_address (rtx *, 
machine_mode, int, int);
 extern void arc_secondary_reload_conv (rtx, rtx, rtx, bool);
 extern void arc_cpu_cpp_builtins (cpp_reader *);
 extern bool arc_store_addr_hazard_p (rtx_insn *, rtx_insn *);
-extern rtx arc_eh_return_address_location (void);
+extern void arc_eh_return_address_location (rtx);
 extern bool arc_is_jli_call_p (rtx);
 extern void arc_file_end (void);
 extern bool arc_is_secure_call_p (rtx);
diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
index 6802ca66554..a92456b457d 100644
--- a/gcc/config/arc/arc.c
+++ b/gcc/config/arc/arc.c
@@ -3858,10 +3858,13 @@ arc_check_multi (rtx op, bool push_p)
 /* Return rtx for the location of the return address on the stack,
suitable for use in __builtin_eh_return.  The new return address
will be written to this location in order to redirect the return to
-   the exception handler.  */
+   the exception handler.  Our ABI says the blink is pushed first on
+   stack followed by an unknown number of register saves, and finally
+   by fp.  Hence we cannot use the EH_RETURN_ADDRESS macro as the
+   stack is not finalized.  */
 
-rtx
-arc_eh_return_address_location (void)
+void
+arc_eh_return_address_location (rtx source)
 {
   rtx mem;
   int offset;
@@ -3889,8 +3892,8 @@ arc_eh_return_address_location (void)
  remove this store seems perfectly sensible.  Marking the memory
  address as volatile obviously has the effect of preventing DSE
  from removing the store.  */
-  MEM_VOLATILE_P (mem) = 1;
-  return mem;
+  MEM_VOLATILE_P (mem) = true;
+  emit_move_insn (mem, source);
 }
 
 /* PIC */
diff --git a/gcc/config/arc/arc.h b/gcc/config/arc/arc.h
index afd6d7681cf..a0a84900917 100644
--- a/gcc/config/arc/arc.h
+++ b/gcc/config/arc/arc.h
@@ -1355,8 +1355,6 @@ do { \
 
 #define EH_RETURN_STACKADJ_RTX   gen_rtx_REG (Pmode, 2)
 
-#define EH_RETURN_HANDLER_RTXarc_eh_return_address_location ()
-
 /* Turn off splitting of long stabs.  */
 #define DBX_CONTIN_LENGTH 0
 
diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
index a28c67ac184..a6bac0e8bee 100644
--- a/gcc/config/arc/arc.md
+++ b/gcc/config/arc/arc.md
@@ -163,6 +163,7 @@
   VUNSPEC_ARC_SC
   VUNSPEC_ARC_LL
   VUNSPEC_ARC_BLOCKAGE
+  VUNSPEC_ARC_EH_RETURN
   ])
 
 (define_constants
@@ -6627,6 +6628,20 @@ core_3, archs4x, archs4xd, archs4xd_slow"
   [(set_attr "type" "call_no_delay_slot")
(set_attr "length" "2")])
 
+;; Patterns for exception handling
+(define_insn_and_split "eh_return"
+  [(unspec_volatile [(match_operand:SI 0 "register_operand" "r")]
+   VUNSPEC_ARC_EH_RETURN)]
+  ""
+  "#"
+  "reload_completed"
+  [(const_int 0)]
+  "
+  {
+arc_eh_return_address_location (operands[0]);
+DONE;
+  }"
+)
 ;; include the arc-FPX instructions
 (include "fpx.md")
 
diff --git a/gcc/testsuite/gcc.target/arc/builtin_eh.c 
b/gcc/testsuite/gcc.target/arc/builtin_eh.c
new file mode 100644
index 000..717a54bb084
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arc/builtin_eh.c
@@ -0,0 +1,22 @@
+/* Check if we have the right offset for @bar function.  */
+/* { dg-options "-O1" } */
+
+void bar (void);
+
+void
+foo (int x)
+{
+  __builtin_unwind_init ();
+  __builtin_eh_return (x, bar);
+}
+
+/* { dg-final { scan-assembler "r24" } } */
+/* { dg-final { scan-assembler "r22" } } */
+/* { dg-final { scan-assembler "r20" } } */
+/* { dg-final { scan-assembler "r18" } } */
+/* { dg-final { scan-assembler "r16" } } */
+/* { dg-final { scan-assembler "r14" } } */
+

[PATCH 3/3] [ARC] Add support for profiling in glibc.

Use PROFILE_HOOK to add mcount library calls in each toolchain.

gcc/
-xx-xx  Claudiu Zissulescu  

* config/arc/arc.h (FUNCTION_PROFILER): Redefine to empty.
* config/arc/elf.h (PROFILE_HOOK): Define.
* config/arc/linux.h (PROFILE_HOOK): Likewise.
---
 gcc/config/arc/arc.h   | 12 +++-
 gcc/config/arc/elf.h   |  9 +
 gcc/config/arc/linux.h | 10 ++
 3 files changed, 22 insertions(+), 9 deletions(-)

diff --git a/gcc/config/arc/arc.h b/gcc/config/arc/arc.h
index a0a84900917..f75c273691c 100644
--- a/gcc/config/arc/arc.h
+++ b/gcc/config/arc/arc.h
@@ -775,15 +775,9 @@ extern int arc_initial_elimination_offset(int from, int 
to);
 #define INITIAL_ELIMINATION_OFFSET(FROM, TO, OFFSET)\
   (OFFSET) = arc_initial_elimination_offset ((FROM), (TO))
 
-/* Output assembler code to FILE to increment profiler label # LABELNO
-   for profiling a function entry.  */
-#define FUNCTION_PROFILER(FILE, LABELNO)   \
-  do { \
-  if (flag_pic)\
-fprintf (FILE, "\tbl\t__mcount@plt\n");\
-  else \
-fprintf (FILE, "\tbl\t__mcount\n");\
-  } while (0)
+/* All the work done in PROFILE_HOOK, but still required.  */
+#undef FUNCTION_PROFILER
+#define FUNCTION_PROFILER(STREAM, LABELNO) do { } while (0)
 
 #define NO_PROFILE_COUNTERS  1
 
diff --git a/gcc/config/arc/elf.h b/gcc/config/arc/elf.h
index 3472fd2e418..3aabcf8c9e6 100644
--- a/gcc/config/arc/elf.h
+++ b/gcc/config/arc/elf.h
@@ -78,3 +78,12 @@ along with GCC; see the file COPYING3.  If not see
 #undef LINK_GCC_C_SEQUENCE_SPEC
 #define LINK_GCC_C_SEQUENCE_SPEC   \
   "--start-group %G %{!specs=*:%{!nolibc:-lc -lnosys}} --end-group"
+
+/* Emit rtl for profiling.  Output assembler code to FILE
+   to call "_mcount" for profiling a function entry.  */
+#define PROFILE_HOOK(LABEL)\
+  {\
+rtx fun;   \
+fun = gen_rtx_SYMBOL_REF (Pmode, "__mcount");  \
+emit_library_call (fun, LCT_NORMAL, VOIDmode); \
+  }
diff --git a/gcc/config/arc/linux.h b/gcc/config/arc/linux.h
index 62ebe4de0fc..993f445d2a0 100644
--- a/gcc/config/arc/linux.h
+++ b/gcc/config/arc/linux.h
@@ -123,3 +123,13 @@ along with GCC; see the file COPYING3.  If not see
: "=r" (_beg)   \
: "0" (_beg), "r" (_end), "r" (_xtr), "r" (_scno)); \
 }
+
+/* Emit rtl for profiling.  Output assembler code to FILE
+   to call "_mcount" for profiling a function entry.  */
+#define PROFILE_HOOK(LABEL)\
+  {\
+   rtx fun, rt;\
+   rt = get_hard_reg_initial_val (Pmode, RETURN_ADDR_REGNUM);  \
+   fun = gen_rtx_SYMBOL_REF (Pmode, "_mcount");\
+   emit_library_call (fun, LCT_NORMAL, VOIDmode, rt, Pmode);   \
+  }
-- 
2.19.1

[RS6000] Use config/linux.h for powerpc--linux*

Using the macros in config/linux.h rather than duplicating them helps
stop future bitrot, and repairs existing bitrot (4 choices for libc in
linux.h, fewer in the rs6000 files not that it matters much).  Also
fixes the fact that __gnu_linux__ was always defined rather than just
when glibc was the libc of choice.

Bootstrapped etc. powerpc-linux and powerpc64le-linux.

* config.gcc (powerpc*-*-linux*): Add linux.h to tm_file.
* config/rs6000/linux.h (TARGET_OS_CPP_BUILTINS): Use
GNU_USER_TARGET_OS_CPP_BUILTINS.
(RS6000_ABI_NAME): Define.
* config/rs6000/linux64.h (TARGET_OS_CPP_BUILTINS): Use
GNU_USER_TARGET_OS_CPP_BUILTINS.
(MUSL_DYNAMIC_LINKER32): Undef before defining.
(UCLIBC_DYNAMIC_LINKER32, UCLIBC_DYNAMIC_LINKER64): Don't define.
(CHOOSE_DYNAMIC_LINKER): Don't define.
(GNU_USER_DYNAMIC_LINKER32, GNU_USER_DYNAMIC_LINKER64): Don't define.
* config/rs6000/sysv4.h (MUSL_DYNAMIC_LINKER): Undef before defining.
(CHOOSE_DYNAMIC_LINKER, GNU_USER_DYNAMIC_LINKER): Only define when
not already defined.
(CPP_OS_LINUX_SPEC): Remove defines and asserts handled by
TARGET_OS_CPP_BUILTINS.

diff --git a/gcc/config.gcc b/gcc/config.gcc
index b960de34e54..75ff2f5658e 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -2643,7 +2643,7 @@ powerpc*-*-linux*spe*)
tm_file="${tm_file} powerpcspe/linuxspe.h powerpcspe/e500.h"
;;
 powerpc*-*-linux*)
-   tm_file="${tm_file} dbxelf.h elfos.h gnu-user.h freebsd-spec.h 
rs6000/sysv4.h"
+   tm_file="${tm_file} dbxelf.h elfos.h gnu-user.h linux.h freebsd-spec.h 
rs6000/sysv4.h"
extra_options="${extra_options} rs6000/sysv4.opt"
tmake_file="${tmake_file} rs6000/t-fprules rs6000/t-ppccomm"
extra_objs="$extra_objs rs6000-linux.o"
diff --git a/gcc/config/rs6000/linux.h b/gcc/config/rs6000/linux.h
index fd06b14d837..29653c13455 100644
--- a/gcc/config/rs6000/linux.h
+++ b/gcc/config/rs6000/linux.h
@@ -46,15 +46,17 @@
 #define TARGET_LIBC_HAS_FUNCTION linux_libc_has_function
 
 #undef  TARGET_OS_CPP_BUILTINS
-#define TARGET_OS_CPP_BUILTINS()   \
-  do   \
-{  \
-  builtin_define_std ("PPC");  \
-  builtin_define_std ("powerpc");  \
-  builtin_assert ("cpu=powerpc");  \
-  builtin_assert ("machine=powerpc");  \
-  TARGET_OS_SYSV_CPP_BUILTINS ();  \
-}  \
+#define TARGET_OS_CPP_BUILTINS()   \
+  do   \
+{  \
+  if (strcmp (rs6000_abi_name, "linux") == 0)  \
+   GNU_USER_TARGET_OS_CPP_BUILTINS();  \
+  builtin_define_std ("PPC");  \
+  builtin_define_std ("powerpc");  \
+  builtin_assert ("cpu=powerpc");  \
+  builtin_assert ("machine=powerpc");  \
+  TARGET_OS_SYSV_CPP_BUILTINS ();  \
+}  \
   while (0)
 
 #define GNU_USER_TARGET_D_OS_VERSIONS()\
@@ -126,6 +128,9 @@
 #define RELOCATABLE_NEEDS_FIXUP \
   (rs6000_isa_flags & rs6000_isa_flags_explicit & OPTION_MASK_RELOCATABLE)
 
+#undef RS6000_ABI_NAME
+#defineRS6000_ABI_NAME "linux"
+
 #ifdef TARGET_LIBC_PROVIDES_SSP
 /* ppc32 glibc provides __stack_chk_guard in -0x7008(2).  */
 #define TARGET_THREAD_SSP_OFFSET   -0x7008
diff --git a/gcc/config/rs6000/linux64.h b/gcc/config/rs6000/linux64.h
index 0d8e164a598..b1818b43cf4 100644
--- a/gcc/config/rs6000/linux64.h
+++ b/gcc/config/rs6000/linux64.h
@@ -369,6 +369,8 @@ extern int dot_symbols;
 #define TARGET_OS_CPP_BUILTINS()   \
   do   \
 {  \
+  if (strcmp (rs6000_abi_name, "linux") == 0)  \
+   GNU_USER_TARGET_OS_CPP_BUILTINS();  \
   if (TARGET_64BIT)\
{   \
  builtin_define ("__PPC__");   \
@@ -438,32 +440,13 @@ extern int dot_symbols;
 ":%(dynamic_linker_prefix)/lib64/ld64.so.1}"
 #endif
 
+#undef MUSL_DYNAMIC_LINKER32
 #define MUSL_DYNAMIC_LINKER32 \
   "/lib/ld-musl-powerpc" MUSL_DYNAMIC_LINKER_E "%{msoft-float:-sf}.so.1"
+#undef MUSL_DYNAMIC_LINKER64
 #define MUSL_DYNAMIC_LINKER64 \
   "/lib/ld-musl-powerpc64" MUSL_DYNAMIC_LINKER_E "%{msoft-float:-sf}.so.1"
 
-#define UCLIBC_DYNAMIC_LINKER32 "/lib/ld-uClibc.so.0"
-#define UCLIBC_DYNAMIC_LINKER64 "/lib/ld64-uClibc.so.0"
-#if DEFAULT_LIBC == LIBC_UCLIBC
-#define CHOOSE_DYNAMIC_LINKER(G, U, M) \
-  "%{mglibc:" G ";:%{mmusl:" M ";:" U "}}"
-#elif DEFAULT_LIBC == LIBC_GLIBC
-#define CHOOSE_DYNAMIC_LINKER(

Delete !HAVE_LD_PIE variants of startfile/endfile specs

This patch is a small cleanup.

The HAVE_LD_PIE variant doesn't contain anything that will break
linking when !HAVE_LD_PIE that isn't already broken if you choose to
build PIEs with a linker that doesn't support PIE.  All this
HAVE_LD_PIE protects is the choice of different crt files, which is
more about libc capability than linker capability.

Bootstrapped etc. powerpc64le-linux and x86_64-linux.  OK?

* config/gnu-user.h (GNU_USER_TARGET_STARTFILE_SPEC): Delete
!HAVE_LD_PIE variant.
(GNU_USER_TARGET_ENDFILE_SPEC): Likewise.

diff --git a/gcc/config/gnu-user.h b/gcc/config/gnu-user.h
index 5b48fb21514..75f6a6f0a7d 100644
--- a/gcc/config/gnu-user.h
+++ b/gcc/config/gnu-user.h
@@ -45,7 +45,6 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If 
not, see
provides part of the support for getting C++ file-scope static
object constructed before entering `main'.  */
 
-#if defined HAVE_LD_PIE
 #define GNU_USER_TARGET_STARTFILE_SPEC \
   "%{shared:; \
  pg|p|profile:%{static-pie:grcrt1.o%s;:gcrt1.o%s}; \
@@ -59,22 +58,8 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If 
not, see
  :crtbegin.o%s} \
%{fvtable-verify=none:%s; \
  fvtable-verify=preinit:vtv_start_preinit.o%s; \
- fvtable-verify=std:vtv_start.o%s} \
-   " CRTOFFLOADBEGIN
-#else
-#define GNU_USER_TARGET_STARTFILE_SPEC \
-  "%{shared:; \
- pg|p|profile:gcrt1.o%s; \
- :crt1.o%s} \
-   crti.o%s \
-   %{static:crtbeginT.o%s; \
- shared|pie|static-pie:crtbeginS.o%s; \
- :crtbegin.o%s} \
-   %{fvtable-verify=none:%s; \
- fvtable-verify=preinit:vtv_start_preinit.o%s; \
- fvtable-verify=std:vtv_start.o%s} \
-   " CRTOFFLOADBEGIN
-#endif
+ fvtable-verify=std:vtv_start.o%s} " \
+   CRTOFFLOADBEGIN
 #undef  STARTFILE_SPEC
 #define STARTFILE_SPEC GNU_USER_TARGET_STARTFILE_SPEC
 
@@ -84,7 +69,6 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If 
not, see
object constructed before entering `main', followed by a normal
GNU userspace "finalizer" file, `crtn.o'.  */
 
-#if defined HAVE_LD_PIE
 #define GNU_USER_TARGET_ENDFILE_SPEC \
   "%{fvtable-verify=none:%s; \
  fvtable-verify=preinit:vtv_end_preinit.o%s; \
@@ -92,19 +76,8 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If 
not, see
%{static:crtend.o%s; \
  shared|static-pie|" PIE_SPEC ":crtendS.o%s; \
  :crtend.o%s} \
-   crtn.o%s \
-   " CRTOFFLOADEND
-#else
-#define GNU_USER_TARGET_ENDFILE_SPEC \
-  "%{fvtable-verify=none:%s; \
- fvtable-verify=preinit:vtv_end_preinit.o%s; \
- fvtable-verify=std:vtv_end.o%s} \
-   %{static:crtend.o%s; \
- shared|pie|static-pie:crtendS.o%s; \
- :crtend.o%s} \
-   crtn.o%s \
-   " CRTOFFLOADEND
-#endif
+   crtn.o%s " \
+   CRTOFFLOADEND
 #undef  ENDFILE_SPEC
 #define ENDFILE_SPEC GNU_USER_TARGET_ENDFILE_SPEC
 

-- 
Alan Modra
Australia Development Lab, IBM

[PATCH] [ARC] Cleanup, fix and set LRA default.

From: claziss 

Hi Andrew,

This is a patch which fixes and sets LRA by default.

OK to apply?
Claudiu

  Commit message 

LP_COUNT register cannot be freely allocated by the compiler as it
size, and/or content may change depending on the ARC hardware
configuration. Thus, make this register fixed.

Remove register classes and unused constraint letters.

Cleanup the implementation of conditional_register_usage hook by using
macros instead of magic constants and removing all references to
reg_class_contents which are bringing so much grief when lra is enabled.

gcc/
-xx-xx  Claudiu Zissulescu  

* config/arc/arc.h (reg_class): Reorder registers classes, remove
unused register classes.
(REG_CLASS_NAMES): Likewise.
(REG_CLASS_CONTENTS): Likewise.
(FIXED_REGISTERS): Make lp_count fixed.
(BASE_REG_CLASS): Remove ACC16_BASE_REGS reference.
(PROGRAM_COUNTER_REGNO): Remove.
* config/arc/arc.c (arc_conditional_register_usage): Remove unused
register classes, use constants for register numbers, remove
reg_class_contents references.
(arc_process_double_reg_moves): Add asserts.
(arc_secondary_reload): Remove LPCOUNT_REG reference, use
lra_in_progress predicate.
(arc_init_reg_tables): Remove unused register classes.
(arc_register_move_cost): Likewise.
(arc_preferred_reload_class): Likewise.
(hwloop_optimize): Update rtx patterns involving lp_count
register.
(arc_return_address_register): Rename ILINK1, INLINK2 regnums
macros.
* config/arc/constraints.md ("c"): Choose between GENERAL_REGS and
CHEAP_CORE_REGS.  Former one will be used for LRA.
("Rac"): Choose between GENERAL_REGS and ALL_CORE_REGS.  Former
one will be used for LRA.
("w"): Choose between GENERAL_REGS and WRITABLE_CORE_REGS.  Former
one will be used for LRA.
("W"): Choose between GENERAL_REGS and MPY_WRITABLE_CORE_REGS.
Former one will be used for LRA.
("f"): Delete constraint.
("k"): Likewise.
("e"): Likewise.
(movqi_insn): Remove unsed lp_count constraints.
(movhi_insn): Likewise.
(movsi_insn): Update pattern.
(arc_lp): Likewise.
(dbnz): Likewise.
("l"): Change it from register constraint to constraint.
(stack_tie): Remove 'b' constraint letter.
(R4_REG): Define.
(R9_REG, R15_REG, R16_REG, R25_REG): Likewise.
(R32_REG, R40_REG, R41_REG, R42_REG, R43_REG, R44_REG): Likewise.
(R57_REG, R59_REG, PCL_REG): Likewise.
(ILINK1_REGNUM): Renamed to ILINK1_REG.
(ILINK2_REGNUM): Renamed to ILINK2_REG.
(Rgp): Remove.
(SP_REGS): Likewise.
(Rcw): Remove unused reg classes.
* config/arc/predicates.md (dest_reg_operand): Just default on
register_operand predicate.
(mpy_dest_reg_operand): Likewise.
(move_dest_operand): Use macros instead of constants.
---
 gcc/config/arc/arc.c  | 331 +-
 gcc/config/arc/arc.h  | 106 ---
 gcc/config/arc/arc.md |  57 --
 gcc/config/arc/arc.opt|   7 +-
 gcc/config/arc/constraints.md |  45 ++---
 gcc/config/arc/predicates.md  |  28 +--
 6 files changed, 222 insertions(+), 352 deletions(-)

diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
index 75c2384eede..6802ca66554 100644
--- a/gcc/config/arc/arc.c
+++ b/gcc/config/arc/arc.c
@@ -734,11 +734,6 @@ arc_secondary_reload (bool in_p,
   if (cl == DOUBLE_REGS)
 return GENERAL_REGS;
 
-  /* The loop counter register can be stored, but not loaded directly.  */
-  if ((cl == LPCOUNT_REG || cl == WRITABLE_CORE_REGS)
-  && in_p && MEM_P (x))
-return GENERAL_REGS;
-
  /* If we have a subreg (reg), where reg is a pseudo (that will end in
 a memory location), then we may need a scratch register to handle
 the fp/sp+largeoffset address.  */
@@ -756,8 +751,9 @@ arc_secondary_reload (bool in_p,
  if (regno != -1)
return NO_REGS;
 
- /* It is a pseudo that ends in a stack location.  */
- if (reg_equiv_mem (REGNO (x)))
+ /* It is a pseudo that ends in a stack location.  This
+procedure only works with the old reload step.  */
+ if (reg_equiv_mem (REGNO (x)) && !lra_in_progress)
{
  /* Get the equivalent address and check the range of the
 offset.  */
@@ -1659,8 +1655,6 @@ enum reg_class arc_regno_reg_class[FIRST_PSEUDO_REGISTER];
 enum reg_class
 arc_preferred_reload_class (rtx, enum reg_class cl)
 {
-  if ((cl) == CHEAP_CORE_REGS  || (cl) == WRITABLE_CORE_REGS)
-return GENERAL_REGS;
   return cl;
 }
 
@@ -1758,25 +1752,21 @@ arc_conditional_register_usage (void)
   strcpy (rname29, "ilink");
   strcpy (rname30, "r30");
 
-  if (!TEST_HARD_REG_BIT (overrideregs, 30))
+  if (

Allow target to override gnu-user.h crti and crtn

Also give target access to the gnu-user.h LINK_GCC_C_SEQUENCE_SPEC.
In preparation for using gnu-user.h in rs6000/.

Bootstrapped etc. powerpc64le-linux.  OK?

* config/gnu-user.h (GNU_USER_TARGET_CRTI): Define.
(GNU_USER_TARGET_STARTFILE_SPEC): Use it here.
(GNU_USER_TARGET_CRTN): Define.
(GNU_USER_TARGET_ENDFILE_SPEC): Use it here.
(GNU_USER_TARGET_LINK_GCC_C_SEQUENCE_SPEC): Define.

diff --git a/gcc/config/gnu-user.h b/gcc/config/gnu-user.h
index 75f6a6f0a7d..09170d416b8 100644
--- a/gcc/config/gnu-user.h
+++ b/gcc/config/gnu-user.h
@@ -40,6 +40,9 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If 
not, see
 #define CRTOFFLOADEND ""
 #endif
 
+#define GNU_USER_TARGET_CRTI "crti.o%s"
+#define GNU_USER_TARGET_CRTN "crtn.o%s"
+
 /* Provide a STARTFILE_SPEC appropriate for GNU userspace.  Here we add
the GNU userspace magical crtbegin.o file (see crtstuff.c) which
provides part of the support for getting C++ file-scope static
@@ -51,8 +54,8 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If 
not, see
  static:crt1.o%s; \
  static-pie:rcrt1.o%s; \
  " PIE_SPEC ":Scrt1.o%s; \
- :crt1.o%s} \
-   crti.o%s \
+ :crt1.o%s} " \
+   GNU_USER_TARGET_CRTI " \
%{static:crtbeginT.o%s; \
  shared|static-pie|" PIE_SPEC ":crtbeginS.o%s; \
  :crtbegin.o%s} \
@@ -75,8 +78,8 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If 
not, see
  fvtable-verify=std:vtv_end.o%s} \
%{static:crtend.o%s; \
  shared|static-pie|" PIE_SPEC ":crtendS.o%s; \
- :crtend.o%s} \
-   crtn.o%s " \
+ :crtend.o%s} " \
+   GNU_USER_TARGET_CRTN " " \
CRTOFFLOADEND
 #undef  ENDFILE_SPEC
 #define ENDFILE_SPEC GNU_USER_TARGET_ENDFILE_SPEC
@@ -106,11 +109,13 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
 #define LINK_EH_SPEC "%{!static|static-pie:--eh-frame-hdr} "
 #endif
 
-#undef LINK_GCC_C_SEQUENCE_SPEC
-#define LINK_GCC_C_SEQUENCE_SPEC \
+#define GNU_USER_TARGET_LINK_GCC_C_SEQUENCE_SPEC \
   "%{static|static-pie:--start-group} %G %{!nolibc:%L} \
%{static|static-pie:--end-group}%{!static:%{!static-pie:%G}}"
 
+#undef LINK_GCC_C_SEQUENCE_SPEC
+#define LINK_GCC_C_SEQUENCE_SPEC GNU_USER_TARGET_LINK_GCC_C_SEQUENCE_SPEC
+
 /* Use --as-needed -lgcc_s for eh support.  */
 #ifdef HAVE_LD_AS_NEEDED
 #define USE_LD_AS_NEEDED 1

-- 
Alan Modra
Australia Development Lab, IBM

Re: [PATCH 4/6] [ARC] Add peephole rules to combine store/loads into double store/loads

2018-11-12 Thread claziss

PING.

On Wed, 2018-10-31 at 10:33 +0200, claz...@gmail.com wrote:
> Thank you for your review. Please find attached a new respin patch
> with
> your feedback in.
> 
> Please let me know if it is ok,
> Claudiu

rs6000/sysv4.h using gnu-user.h

This patch removes some duplication in rs6000/sysv4.h of macros found
in gnu-user.h that we want for linux.  Including gnu-user.h will mean
powerpc doesn't miss updates to that file.

Requires https://gcc.gnu.org/ml/gcc-patches/2018-11/msg00917.html and
https://gcc.gnu.org/ml/gcc-patches/2018-11/msg00919.html

Bootstrapped etc. powerpc-linux and powerpc64le-linux.

* config.gcc (powerpc*-*-freebsd*, powerpc-*-netbsd*),
(powerpc-*-eabisimaltivec*, powerpc-*-eabisim*, powerpc-*-elf*),
(powerpc-*-eabialtivec*, powerpc-*-eabi*, powerpc-*-rtems*),
(powerpc-wrs-vxworks*, powerpc-*-lynxos*, powerpcle-*-elf*),
(powerpcle-*-eabisim*, powerpcle-*-eabi*): Add gnu-user.h to tm_file.
* config/rs6000/freebsd.h (CPLUSPLUS_CPP_SPEC),
(LINK_GCC_C_SEQUENCE_SPEC): Undef.
(ASM_APP_ON, ASM_APP_OFF): Don't define.
* config/rs6000/freebsd64.h (ASM_APP_ON, ASM_APP_OFF): Don't define.
* config/rs6000/lynx.h (ASM_APP_ON, ASM_APP_OFF): Don't define.
* config/rs6000/linux64.h (LINK_GCC_C_SEQUENCE_SPEC): Define.
* config/rs6000/netbsd.h (CPLUSPLUS_CPP_SPEC),
(LINK_GCC_C_SEQUENCE_SPEC): Undef.
* config/rs6000/rtems.h (LINK_GCC_C_SEQUENCE_SPEC): Define.
* config/rs6000/sysv4.h (GNU_USER_TARGET_CRTI): Redefine.
(GNU_USER_TARGET_CRTN): Redefine.
(CC1_SPEC): Use GNU_USER_TARGET_CC1_SPEC.
(LIB_LINUX_SPEC): Use GNU_USER_TARGET_LIB_SPEC.
(CRTOFFLOADBEGIN, CRTOFFLOADEND): Don't define.
(STARTFILE_LINUX_SPEC): Define as GNU_USER_TARGET_STARTFILE_SPEC.
(ENDFILE_LINUX_SPEC): Define as GNU_USER_TARGET_ENDFILE_SPEC.
(UCLIBC_DYNAMIC_LINKER, CHOOSE_DYNAMIC_LINKER): Don't define.
(LINK_EH_SPEC): Don't define.

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 75ff2f5658e..6aea55207ca 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -2567,7 +2567,7 @@ powerpc64-*-darwin*)
extra_headers=altivec.h
;;
 powerpc*-*-freebsd*)
-   tm_file="${tm_file} dbxelf.h elfos.h ${fbsd_tm_file} rs6000/sysv4.h"
+   tm_file="${tm_file} dbxelf.h elfos.h gnu-user.h ${fbsd_tm_file} 
rs6000/sysv4.h"
extra_options="${extra_options} rs6000/sysv4.opt"
tmake_file="rs6000/t-fprules rs6000/t-ppcos ${tmake_file} 
rs6000/t-ppccomm"
case ${target} in
@@ -2582,7 +2582,7 @@ powerpc*-*-freebsd*)
esac
;;
 powerpc-*-netbsd*)
-   tm_file="${tm_file} dbxelf.h elfos.h ${nbsd_tm_file} freebsd-spec.h 
rs6000/sysv4.h rs6000/netbsd.h"
+   tm_file="${tm_file} dbxelf.h elfos.h gnu-user.h ${nbsd_tm_file} 
freebsd-spec.h rs6000/sysv4.h rs6000/netbsd.h"
extra_options="${extra_options} netbsd.opt netbsd-elf.opt"
tmake_file="${tmake_file} rs6000/t-netbsd"
extra_options="${extra_options} rs6000/sysv4.opt"
@@ -2594,30 +2594,30 @@ powerpc-*-eabispe*)
use_gcc_stdint=wrap
;;
 powerpc-*-eabisimaltivec*)
-   tm_file="${tm_file} dbxelf.h elfos.h freebsd-spec.h newlib-stdint.h 
rs6000/sysv4.h rs6000/eabi.h rs6000/eabisim.h rs6000/eabialtivec.h"
+   tm_file="${tm_file} dbxelf.h elfos.h gnu-user.h freebsd-spec.h 
newlib-stdint.h rs6000/sysv4.h rs6000/eabi.h rs6000/eabisim.h 
rs6000/eabialtivec.h"
extra_options="${extra_options} rs6000/sysv4.opt"
tmake_file="rs6000/t-fprules rs6000/t-ppcendian rs6000/t-ppccomm"
use_gcc_stdint=wrap
;;
 powerpc-*-eabisim*)
-   tm_file="${tm_file} dbxelf.h elfos.h usegas.h freebsd-spec.h 
newlib-stdint.h rs6000/sysv4.h rs6000/eabi.h rs6000/eabisim.h"
+   tm_file="${tm_file} dbxelf.h elfos.h gnu-user.h usegas.h freebsd-spec.h 
newlib-stdint.h rs6000/sysv4.h rs6000/eabi.h rs6000/eabisim.h"
extra_options="${extra_options} rs6000/sysv4.opt"
tmake_file="rs6000/t-fprules rs6000/t-ppcgas rs6000/t-ppccomm"
use_gcc_stdint=wrap
;;
 powerpc-*-elf*)
-   tm_file="${tm_file} dbxelf.h elfos.h usegas.h freebsd-spec.h 
newlib-stdint.h rs6000/sysv4.h"
+   tm_file="${tm_file} dbxelf.h elfos.h gnu-user.h usegas.h freebsd-spec.h 
newlib-stdint.h rs6000/sysv4.h"
extra_options="${extra_options} rs6000/sysv4.opt"
tmake_file="rs6000/t-fprules rs6000/t-ppcgas rs6000/t-ppccomm"
;;
 powerpc-*-eabialtivec*)
-   tm_file="${tm_file} dbxelf.h elfos.h freebsd-spec.h newlib-stdint.h 
rs6000/sysv4.h rs6000/eabi.h rs6000/eabialtivec.h"
+   tm_file="${tm_file} dbxelf.h elfos.h gnu-user.h freebsd-spec.h 
newlib-stdint.h rs6000/sysv4.h rs6000/eabi.h rs6000/eabialtivec.h"
extra_options="${extra_options} rs6000/sysv4.opt"
tmake_file="rs6000/t-fprules rs6000/t-ppcendian rs6000/t-ppccomm"
use_gcc_stdint=wrap
;;
 powerpc-*-eabi*)
-   tm_file="${tm_file} dbxelf.h elfos.h usegas.h freebsd-spec.h 
newlib-stdint.h rs6000/sysv4.h rs6000/eabi.h"
+   tm_file="${tm_file} dbxelf.h elfos.h gnu-user.h usegas.h freebsd-spec.h 
newlib-stdint.h rs6000/sysv4.h rs6000/eabi.h"
extra_opt

Re: [PATCH] [ARC] Cleanup, fix and set LRA default.

2018-11-12 Thread Eric Botcazou

> This is a patch which fixes and sets LRA by default.

You'll need to update htdocs/backends.html of wwwdocs once this is done:
  https://gcc.gnu.org/backends.html

-- 
Eric Botcazou

Re: [C++ Patch] Fix two grokdeclarator locations

2018-11-12 Thread Paolo Carlini


Hi again,

On 08/11/18 10:26, Paolo Carlini wrote:

Hi,

two additional grokdeclarator locations that we can easily fix by 
using declarator->id_loc. Slightly more interesting, testing revealed 
a latent issue in the make_id_declarator uses: 
cp_parser_member_declaration wasn't setting declarator->id_loc, thus I 
decided to add a location_t parameter to make_id_declarator itself and 
adjust all the callers. Tested x86_64-linux.


PS: In my local tree I have the cp_parser_objc_class_ivars change using 
token->location instead of UNKNOWN_LOCATION, thus all the 
make_id_declarator calls should be completely fine location-wise.


Paolo.

[RS6000] PowerPC -mcpu=native support

The -mcpu=native support has bit-rotted a little, in particular the
fallback when the native cpu couldn't be determined.  This patch fixes
the bit-rot and reorganizes ASM_CPU_SPEC so that it should be a little
easier to keep the -mcpu=native data up to date.

The patch also changes the fix for PR63177 (-mpower9-vector being
passed by the user when the default is -mpower8) to also apply when
-mcpu=powerpc64le and -mcpu=native is given.  I'll note that the hack
for PR63177 should probably be extended to lots of other options, if
we're going to continue supporting all those sub-architecture options
(-mpower9-vector, -mpower8-vector, -mcrypto, -mdirect-move, -mhtm,
-mvsx and others) in the positive sense.  I think those should have
only been supported in their -mno- variants..

Bootstrapped etc. powerpc64le-linux.

Note that there is a small change to the AIX default, with -maltivec
now selecting -m970 for both with and without -maix64 whereas before
you got -mppc64 for -maix64.  That seems correct to me, an oversight
likely due to handling -maix64 default cpu in the wrong place.

* config/rs6000/aix71.h (ASM_SPEC): Don't select default -maix64
cpu here.
(ASM_CPU_SPEC): Do so here.  Rewrite using if .. else if .. specs
form.  Error on missing -mcpu case.
* config/rs6000/driver-rs6000.c (asm_names <_AIX>): Update NULL case.
(asm_names ): Add missing cpus.  Update NULL case.  Apply
PR63177 fix for -mcpu=power8 and -mcpu=powerpc64le.
* config/rs6000/rs6000.h (ASM_CPU_SPEC): Rewrite using if ..
else if .. specs form.  Error on missing -mcpu case.  Don't output
duplicate -maltivec.  Apply PR63177 fix for -mcpu=powerpc64le.

diff --git a/gcc/config/rs6000/aix71.h b/gcc/config/rs6000/aix71.h
index 8150552ebf3..2398ed64baa 100644
--- a/gcc/config/rs6000/aix71.h
+++ b/gcc/config/rs6000/aix71.h
@@ -59,7 +59,7 @@ do {  
\
 } while (0)
 
 #undef ASM_SPEC
-#define ASM_SPEC "-u %{maix64:-a64 %{!mcpu*:-mppc64}} %(asm_cpu)"
+#define ASM_SPEC "-u %{maix64:-a64} %(asm_cpu)"
 
 /* Common ASM definitions used by ASM_SPEC amongst the various targets for
handling -mcpu=xxx switches.  There is a parallel list in driver-rs6000.c to
@@ -67,31 +67,29 @@ do {
\
you make changes here, make them there also.  */
 #undef ASM_CPU_SPEC
 #define ASM_CPU_SPEC \
-"%{!mcpu*: %{!maix64: \
-  %{mpowerpc64: -mppc64} \
-  %{maltivec: -m970} \
-  %{!maltivec: %{!mpowerpc64: %(asm_default) \
-%{mcpu=native: %(asm_cpu_native)} \
-%{mcpu=power3: -m620} \
-%{mcpu=power4: -mpwr4} \
-%{mcpu=power5: -mpwr5} \
-%{mcpu=power5+: -mpwr5x} \
-%{mcpu=power6: -mpwr6} \
-%{mcpu=power6x: -mpwr6} \
-%{mcpu=power7: -mpwr7} \
-%{mcpu=power8: -mpwr8} \
-%{mcpu=power9: -mpwr9} \
-%{mcpu=powerpc: -mppc} \
-%{mcpu=rs64a: -mppc} \
-%{mcpu=603: -m603} \
-%{mcpu=603e: -m603} \
-%{mcpu=604: -m604} \
-%{mcpu=604e: -m604} \
-%{mcpu=620: -m620} \
-%{mcpu=630: -m620} \
-%{mcpu=970: -m970} \
-%{mcpu=G5: -m970} \
-%{mvsx: %{!mcpu*: -mpwr6}} \
+"%{mcpu=native: %(asm_cpu_native); \
+  mcpu=power9: -mpwr9; \
+  mcpu=power8: -mpwr8; \
+  mcpu=power7: -mpwr7; \
+  mcpu=power6x|mcpu=power6: -mpwr6; \
+  mcpu=power5+: -mpwr5x; \
+  mcpu=power5: -mpwr5; \
+  mcpu=power4: -mpwr4; \
+  mcpu=power3: -m620; \
+  mcpu=powerpc: -mppc; \
+  mcpu=rs64a: -mppc; \
+  mcpu=603: -m603; \
+  mcpu=603e: -m603; \
+  mcpu=604: -m604; \
+  mcpu=604e: -m604; \
+  mcpu=620: -m620; \
+  mcpu=630: -m620; \
+  mcpu=970|mcpu=G5: -m970; \
+  !mcpu*: %{mvsx: -mpwr6; \
+   maltivec: -m970; \
+   maix64|mpowerpc64: -mppc64; \
+   : %(asm_default)}; \
+  :%eMissing -mcpu option in ASM_SPEC_CPU?\n} \
 -many"
 
 #undef ASM_DEFAULT_SPEC
diff --git a/gcc/config/rs6000/driver-rs6000.c 
b/gcc/config/rs6000/driver-rs6000.c
index 94b4c79..0a48d46d658 100644
--- a/gcc/config/rs6000/driver-rs6000.c
+++ b/gcc/config/rs6000/driver-rs6000.c
@@ -459,10 +459,10 @@ static const struct asm_name asm_names[] = {
   { "970", "-m970" },
   { "G5",  "-m970" },
   { NULL,  "\
-%{!maix64: \
-%{mpowerpc64: -mppc64} \
-%{maltivec: -m970} \
-%{!maltivec: %{!mpowerpc64: %(asm_default)}}}" },
+  %{mvsx: -mpwr6; \
+maltivec: -m970; \
+maix64|mpowerpc64: -mppc64; \
+: %(asm_default)}" },
 
 #else
   { "cell","-mcell" },
@@ -470,12 +470,14 @@ static const struct asm_name asm_names[] = {
   { "power4",  "-mpower4" },
   { "power5",  "-mpower5" },
   { "power5+", "-mpower5" },
-  { "power6",  "-mpower6 -maltivec" },
-  { "power6x", "-mpower6 -maltivec" },
+  { "power6",  "-mpower6 %{!mvsx:%{!maltivec:-maltivec}}" },
+  { "power6x", "-mpower6 %{!mvsx:%{!maltivec:-maltivec}}" },
   { "power7",  "-mpower7" },
-  { "power8",  "-mpower8" },
+  { "power8",  "%{mpower9-vector:-mpower9;:-mpower8}" },
   { "power9",  "-mpower9" },
+  { "a2",  "-ma2" },
   { "powerpc", "-mppc" },

[PowerPC] libgcc cfi

There are a few places in libgcc assembly where we don't emit call
frame information for functions, potentially breaking unwinding from
asynchronous signal handlers.  This patch fixes most.  Although I
patch tramp.S there is no attempt made to provide CFI for the actual
trampoline on the stack.  Doing that would require generating CFI at
run time and both registering and deregistering it, which is probably
not worth doing since it would significantly slow down the call.

Note that the out-of-line register save/restore functions do not
need CFI in the assembly.  CFI is added for them by the rs6000.c
prologue and epilogue code.

Bootstrapped etc. powerpc64le-linux.

* config/rs6000/morestack.S (__stack_split_initialize),
(__morestack_get_guard, __morestack_set_guard),
(__morestack_make_guard): Provide CFI covering these functions.
* config/rs6000/tramp.S (__trampoline_setup): Likewise.

diff --git a/libgcc/config/rs6000/morestack.S b/libgcc/config/rs6000/morestack.S
index a0fee4037e4..936051eab33 100644
--- a/libgcc/config/rs6000/morestack.S
+++ b/libgcc/config/rs6000/morestack.S
@@ -304,12 +304,15 @@ DW.ref.__gcc_personality_v0:
 # new thread starts.  This is called from a constructor.
 # void __stack_split_initialize (void)
 ENTRY(__stack_split_initialize)
+   .cfi_startproc
addi %r3,%r1,-0x4000# We should have at least 16K.
std %r3,-0x7000-64(%r13)# tcbhead_t.__private_ss
# void __generic_morestack_set_initial_sp (void *sp, size_t len)
mr %r3,%r1
li %r4, 0x4000
b __generic_morestack_set_initial_sp
+# The lack of .cfi_endproc here is deliberate.  This function and the
+# following ones can all use the default FDE.
SIZE (__stack_split_initialize)
 
 
@@ -335,6 +338,7 @@ ENTRY0(__morestack_make_guard)
sub %r3,%r3,%r4
addi %r3,%r3,BACKOFF
blr
+   .cfi_endproc
SIZE (__morestack_make_guard)
 
 
diff --git a/libgcc/config/rs6000/tramp.S b/libgcc/config/rs6000/tramp.S
index 19ea57838fc..637f4510146 100644
--- a/libgcc/config/rs6000/tramp.S
+++ b/libgcc/config/rs6000/tramp.S
@@ -56,8 +56,10 @@ trampoline_size = .-trampoline_initial
 /* R6 = static chain */
 
 FUNC_START(__trampoline_setup)
+   .cfi_startproc
mflrr0  /* save return address */
 bcl20,31,.LCF0 /* load up __trampoline_initial into r7 */
+   .cfi_register lr,r0
 .LCF0:
 mflr   r11
 addi   r7,r11,trampoline_initial-4-.LCF0 /* trampoline address -4 */
@@ -112,6 +114,7 @@ FUNC_START(__trampoline_setup)
addir30,r30,_GLOBAL_OFFSET_TABLE_-1b@l
 #endif
bl  JUMP_TARGET(abort)
+   .cfi_endproc
 FUNC_END(__trampoline_setup)
 
 #endif
@@ -144,6 +147,7 @@ trampoline_size = .-trampoline_initial
.popsection
 
 FUNC_START(__trampoline_setup)
+   .cfi_startproc
addis 7,2,.LC0@toc@ha
ld 7,.LC0@toc@l(7)  /* trampoline address -8 */
 
@@ -180,6 +184,7 @@ FUNC_START(__trampoline_setup)
 .Labort:
bl  JUMP_TARGET(abort)
nop
+   .cfi_endproc
 FUNC_END(__trampoline_setup)
 
 #endif

-- 
Alan Modra
Australia Development Lab, IBM

[PATCH][DOCS] Fix documentation of __builtin_cpu_is and __builtin_cpu_supports for x86.

Hi.

The patch is adding missing values for aforementioned built-ins.

Ready for trunk?
Thanks,
Martin

gcc/ChangeLog:

2018-11-12  Martin Liska  

* doc/extend.texi: Add missing values for __builtin_cpu_is and
__builtin_cpu_supports for x86 target.
---
 gcc/doc/extend.texi | 100 +++-
 1 file changed, 98 insertions(+), 2 deletions(-)


diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index ebdc0cec789..04a069fc366 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -20422,12 +20422,18 @@ is of type @var{cpuname}
 and returns @code{0} otherwise. The following CPU names can be detected:
 
 @table @samp
+@item amd
+AMD CPU.
+
 @item intel
 Intel CPU.
 
 @item atom
 Intel Atom CPU.
 
+@item slm
+Intel Silvermont CPU.
+
 @item core2
 Intel Core 2 CPU.
 
@@ -20443,8 +20449,50 @@ Intel Core i7 Westmere CPU.
 @item sandybridge
 Intel Core i7 Sandy Bridge CPU.
 
-@item amd
-AMD CPU.
+@item ivybridge
+Intel Core i7 Ivy Bridge CPU.
+
+@item haswell
+Intel Core i7 Haswell CPU.
+
+@item broadwell
+Intel Core i7 Broadwell CPU.
+
+@item skylake
+Intel Core i7 Skylake CPU.
+
+@item skylake-avx512
+Intel Core i7 Skylake AVX512 CPU.
+
+@item cannonlake
+Intel Core i7 Cannon Lake CPU.
+
+@item icelake-client
+Intel Core i7 Ice Lake Client CPU.
+
+@item icelake-server
+Intel Core i7 Ice Lake Server CPU.
+
+@item bonnell
+Intel Atom Bonnell CPU.
+
+@item silvermont
+Intel Atom Silvermont CPU.
+
+@item goldmont
+Intel Atom Goldmont CPU.
+
+@item goldmont-plus
+Intel Atom Goldmont Plus CPU.
+
+@item tremont
+Intel Atom Tremont CPU.
+
+@item knl
+Intel Knights Landing CPU.
+
+@item knm
+Intel Knights Mill CPU.
 
 @item amdfam10h
 AMD Family 10h CPU.
@@ -20530,8 +20578,56 @@ SSE4.2 instructions.
 AVX instructions.
 @item avx2
 AVX2 instructions.
+@item sse4a
+SSE4A instructions.
+@item fma4
+FMA4 instructions.
+@item xop
+XOP instructions.
+@item fma
+FMA instructions.
 @item avx512f
 AVX512F instructions.
+@item bmi
+BMI instructions.
+@item bmi2
+BMI2 instructions.
+@item aes
+AES instructions.
+@item pclmul
+PCLMUL instructions.
+@item avx512vl
+AVX512VL instructions.
+@item avx512bw
+AVX512BW instructions.
+@item avx512dq
+AVX512DQ instructions.
+@item avx512cd
+AVX512CD instructions.
+@item avx512er
+AVX512ER instructions.
+@item avx512pf
+AVX512PF instructions.
+@item avx512vbmi
+AVX512VBMI instructions.
+@item avx512ifma
+AVX512IFMA instructions.
+@item avx5124vnniw
+AVX5124VNNIW instructions.
+@item avx5124fmaps
+AVX5124FMAPS instructions.
+@item avx512vpopcntdq
+AVX512VPOPCNTDQ instructions.
+@item avx512vbmi2
+AVX512VBMI2 instructions.
+@item gfni
+GFNI instructions.
+@item vpclmulqdq
+VPCLMULQDQ instructions.
+@item avx512vnni
+AVX512VNNI instructions.
+@item avx512bitalg
+AVX512BITALG instructions.
 @end table
 
 Here is an example:

[RS6000] Don't pass -many to the assembler

I'd like to remove -many from the options passed by default to the
assembler, on the grounds that a gcc bug in instruction selection (eg.
emitting a power9 insn for -mcpu=power8) is better found at assembly
time than run time.

This might annoy people for a while fixing user asm that we didn't
diagnose previously, but I believe this is the right direction to go.
Of course, -Wa,-many is available for anyone who just wants their
dodgy old code to work.

Bootstrapped etc. powerpc64le-linux.  OK?

* config/rs6000/rs6000.h (ASM_CPU_SPEC): Remove -many.
* config/rs6000/aix61.h (ASM_CPU_SPEC): Likewise.
* config/rs6000/aix71.h (ASM_CPU_SPEC): Likewise.
* testsuite/gcc.target/powerpc/ppc32-abi-dfp-1.c: Don't use
power mnemonics.

diff --git a/gcc/config/rs6000/aix61.h b/gcc/config/rs6000/aix61.h
index 353e5d6cfeb..a7a8246bfe3 100644
--- a/gcc/config/rs6000/aix61.h
+++ b/gcc/config/rs6000/aix61.h
@@ -91,8 +91,7 @@ do {  
\
 %{mcpu=630: -m620} \
 %{mcpu=970: -m970} \
 %{mcpu=G5: -m970} \
-%{mvsx: %{!mcpu*: -mpwr6}} \
--many"
+%{mvsx: %{!mcpu*: -mpwr6}}"
 
 #undef ASM_DEFAULT_SPEC
 #define ASM_DEFAULT_SPEC "-mpwr4"
diff --git a/gcc/config/rs6000/aix71.h b/gcc/config/rs6000/aix71.h
index 2398ed64baa..d2ca8dc275d 100644
--- a/gcc/config/rs6000/aix71.h
+++ b/gcc/config/rs6000/aix71.h
@@ -89,8 +89,7 @@ do {  
\
maltivec: -m970; \
maix64|mpowerpc64: -mppc64; \
: %(asm_default)}; \
-  :%eMissing -mcpu option in ASM_SPEC_CPU?\n} \
--many"
+  :%eMissing -mcpu option in ASM_SPEC_CPU?\n}"
 
 #undef ASM_DEFAULT_SPEC
 #define ASM_DEFAULT_SPEC "-mpwr4"
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index d75137cf8f5..9d78173a680 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -137,8 +137,7 @@
mvsx: -mpower7; \
mpowerpc64: -mppc64;: %(asm_default)}; \
   :%eMissing -mcpu option in ASM_SPEC_CPU?\n} \
-%{mvsx: -mvsx -maltivec; maltivec: -maltivec} \
--many"
+%{mvsx: -mvsx -maltivec; maltivec: -maltivec}"
 
 #define CPP_DEFAULT_SPEC ""
 
diff --git a/gcc/testsuite/gcc.target/powerpc/ppc32-abi-dfp-1.c 
b/gcc/testsuite/gcc.target/powerpc/ppc32-abi-dfp-1.c
index 14908dba690..eea7f6ffc2e 100644
--- a/gcc/testsuite/gcc.target/powerpc/ppc32-abi-dfp-1.c
+++ b/gcc/testsuite/gcc.target/powerpc/ppc32-abi-dfp-1.c
@@ -45,14 +45,14 @@ __asm__ ("\t.globl\t" #NAME "_asm\n\t"  
\
 #NAME "_asm:\n\t"  \
 "lis 11,gparms@ha\n\t" \
 "la 11,gparms@l(11)\n\t"   \
-"st 3,0(11)\n\t"   \
-"st 4,4(11)\n\t"   \
-"st 5,8(11)\n\t"   \
-"st 6,12(11)\n\t"  \
-"st 7,16(11)\n\t"  \
-"st 8,20(11)\n\t"  \
-"st 9,24(11)\n\t"  \
-"st 10,28(11)\n\t" \
+"stw 3,0(11)\n\t"  \
+"stw 4,4(11)\n\t"  \
+"stw 5,8(11)\n\t"  \
+"stw 6,12(11)\n\t" \
+"stw 7,16(11)\n\t" \
+"stw 8,20(11)\n\t" \
+"stw 9,24(11)\n\t" \
+"stw 10,28(11)\n\t"\
 "stfd 1,32(11)\n\t"\
 "stfd 2,40(11)\n\t"\
 "stfd 3,48(11)\n\t"\

-- 
Alan Modra
Australia Development Lab, IBM

Re: [PATCH] combine: Do not combine moves from hard registers

2018-11-12 Thread Sam Tebbs



On 11/08/2018 08:34 PM, Segher Boessenkool wrote:
> On Thu, Nov 08, 2018 at 03:44:44PM +, Sam Tebbs wrote:
>> Does your patch fix the incorrect generation of "scvtf s1, s1"? I was
>> looking at the issue as well and don't want to do any overlapping work.
> I don't know.  Well, there are no incorrect code issues I know of at all
> now; but you mean that it is taking an instruction more than you would
> like to see, I suppose?
>
>
> Segher

Yes I am referring to the extra instruction generated, which is 
incorrect generation in my opinion since it shouldn't be generated.

[mcore] Remove duplicate preprocessor definition

2018-11-12 Thread Eric Botcazou

The same definition, with a comment, is present a few lines above.

Applied on the mainline as obvious.


2018-11-12  Eric Botcazou  

* config/mcore/mcore.h (WORD_REGISTER_OPERATIONS): Remove duplicate.

-- 
Eric BotcazouIndex: config/mcore/mcore.h
===
--- config/mcore/mcore.h	(revision 265948)
+++ config/mcore/mcore.h	(working copy)
@@ -552,8 +552,6 @@ extern const enum reg_class regno_reg_cl
and another.  All register moves are cheap.  */
 #define REGISTER_MOVE_COST(MODE, SRCCLASS, DSTCLASS) 2
 
-#define WORD_REGISTER_OPERATIONS 1
-
 /* Assembler output control.  */
 #define ASM_COMMENT_START "\t//"

[PR81878]: fix --disable-bootstrap --enable-languages=ada, and cross-back gnattools build

2018-11-12 Thread Alexandre Oliva

gnattools build machinery uses just-build xgcc and xg++ as $(CC) and
$(CXX) in native builds.  However, if C and C++ languages are not
enabled, it won't find them.  So, enable C and C++ if Ada is enabled.
Most of the time, this is probably no big deal: C is always enabled
anyway, and C++ is already enabled for bootstraps.

We need not enable those for cross builds, however.  At first I just
took the logic from gnattools/configure, but found it to be lacking:
it would use the just-built tools even in cross-back settings, whose
tools just built for the host would not run on the build machine.  So
I've narrowed down the test to rely on autoconf-detected cross-ness
(build->host only), but also to ensure that host matches build, and
that target matches host.

I've considered sourcing ada/config-lang.in from within
gnattools/configure, and testing lang_requires as set by it, so as to
avoid a duplication of tests that ought to remain in sync, but decided
it would be too fragile, as ada/config-lang.in does not expect srcdir
to refer to gnattools.


Please let me know if there are objections to this change in the next
few days, e.g., if enabling C and C++ for an Ada-only build is too
onerous.  It is certainly possible to rework gnattools build machinery
so that it uses CC and CXX as detected by the top-level configure if we
can't find xgcc and xg++ in ../gcc.  At least in cross builds, we
already require build-time Ada tools to have the same version as that
we're cross-building, so we might as well use preexisting gcc and g++
under the same requirements.


for  gcc/ada/gcc-interface/ChangeLog

PR ada/81878
* config-lang.in (lang_requires): Set to "c c++" when
gnattools wants it.

for  gnattools/ChangeLog

PR ada/81878
* configure.ac (default_gnattools_target): Do not mistake
just-built host tools as native in cross-back toolchains.
* configure: Rebuilt.
---
 gcc/ada/gcc-interface/config-lang.in |9 +
 gnattools/configure  |   32 ++--
 gnattools/configure.ac   |   30 +-
 3 files changed, 52 insertions(+), 19 deletions(-)

diff --git a/gcc/ada/gcc-interface/config-lang.in 
b/gcc/ada/gcc-interface/config-lang.in
index 5dc77df282ce..8eacf7bb870e 100644
--- a/gcc/ada/gcc-interface/config-lang.in
+++ b/gcc/ada/gcc-interface/config-lang.in
@@ -34,6 +34,15 @@ gtfiles="\$(srcdir)/ada/gcc-interface/ada-tree.h 
\$(srcdir)/ada/gcc-interface/gi
 
 outputs="ada/gcc-interface/Makefile ada/Makefile"
 
+# gnattools native builds use both $(CC) and $(CXX), see PR81878.
+# This is not too onerous: C is always enabled anyway, and C++ is
+# always enabled for bootstrapping.  Use here the same logic used in
+# gnattools/configure to decide whether to use -native or -cross tools
+# for the build.
+if test "x$cross_compiling/$build/$host" = "xno/$host/$target" ; then
+  lang_requires="c c++"
+fi
+
 target_libs="target-libada"
 lang_dirs="gnattools"
 
diff --git a/gnattools/configure b/gnattools/configure
index ccb512e39b6b..c2d755b723a9 100755
--- a/gnattools/configure
+++ b/gnattools/configure
@@ -584,6 +584,7 @@ PACKAGE_URL=
 ac_unique_file="Makefile.in"
 ac_subst_vars='LTLIBOBJS
 LIBOBJS
+default_gnattools_target
 warn_cflags
 OBJEXT
 EXEEXT
@@ -595,7 +596,6 @@ CC
 ADA_CFLAGS
 EXTRA_GNATTOOLS
 TOOLS_TARGET_PAIRS
-default_gnattools_target
 LN_S
 target_noncanonical
 host_noncanonical
@@ -2050,15 +2050,6 @@ $as_echo "no, using $LN_S" >&6; }
 fi
 
 
-# Determine what to build for 'gnattools'
-if test $build = $target ; then
-  # Note that build=target is almost certainly the wrong test; FIXME
-  default_gnattools_target="gnattools-native"
-else
-  default_gnattools_target="gnattools-cross"
-fi
-
-
 # Target-specific stuff (defaults)
 TOOLS_TARGET_PAIRS=
 
@@ -2134,6 +2125,8 @@ esac
 # From user or toplevel makefile.
 
 
+# This is testing the CC passed from the toplevel Makefile, not the
+# one we will select below.
 ac_ext=c
 ac_cpp='$CPP $CPPFLAGS'
 ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5'
@@ -2929,6 +2922,25 @@ if test "x$GCC" = "xyes"; then
 fi
 
 
+# Determine what to build for 'gnattools'.  Test after the above,
+# because testing for CC sets the final value of cross_compiling, even
+# if we end up using a different CC.  We want to build
+# gnattools-native when: (a) this is a native build, i.e.,
+# cross_compiling=no, otherwise we know we cannot run binaries
+# produced by the toolchain used for the build, not even the binaries
+# created within ../gcc/; (b) build and host are the same, otherwise
+# this is to be regarded as a cross build environment even if it seems
+# that we can run host binaries; (c) host and target are the same,
+# otherwise the tools in ../gcc/ generate code for a different
+# platform.  If you change this test, be sure to adjust
+# ../gcc/ada/gcc-interface/config-lang.in as well.
+if test "x$cross_compiling/$build/$host" = "xno/$host/$target"

Re: [PATCH 20/25] GCN libgcc.


On 09/11/2018 18:48, Jeff Law wrote:

diff --git a/libgcc/Makefile.in b/libgcc/Makefile.in
index 0c5b264..6f68257 100644
--- a/libgcc/Makefile.in
+++ b/libgcc/Makefile.in
@@ -429,9 +429,11 @@ LIB2ADD += enable-execute-stack.c
  # While emutls.c has nothing to do with EH, it is in LIB2ADDEH*
  # instead of LIB2ADD because that's the way to be sure on some targets
  # (e.g. *-*-darwin*) only one copy of it is linked.
+ifneq ($(enable_emutls),no)
  LIB2ADDEH += $(srcdir)/emutls.c
  LIB2ADDEHSTATIC += $(srcdir)/emutls.c
  LIB2ADDEHSHARED += $(srcdir)/emutls.c
+endif

Why is this needed? Are you just trying to cut out stuff you don't need
in the quest for smaller code or does this cause a more direct problem?


This dates back to when that code wouldn't compile. It also surprised me 
that --disable-emutls didn't do it (but this stuff is long ago now, so I 
don't recall the details of that).


Anyway, the code compiles now, so I can remove this hunk.


+/* Provide an entry point symbol to silence a linker warning.  */
+void _start() {}

This seems wrong.   I realize you're trying to quiet a linker warning
here, but for the case where you're creating GCN executables (testing?)
this should probably be handled by the C-runtime or linker script.


We're using an LLVM linker, so I'd rather fix things here than there.

Anyway, I plan to make this a proper kernel and use it to run static 
constructors, one day. Possibly it should be in Newlib, but then the 
"ctors" code is found in crtstuff and libgcc2, so I don't know?



diff --git a/libgcc/config/gcn/gomp_print.c b/libgcc/config/gcn/gomp_print.c
new file mode 100644
index 000..41f50c3
--- /dev/null
+++ b/libgcc/config/gcn/gomp_print.c

[ ... ]
Would this be better in libgomp?  Oh, you addressed that in the
prologue.  Feels like libgomp would be better to me, but I can
understand the rationale behind wanting it in libgcc.


Now that printf works, possibly it should be moved back. There's no 
debugger for this target, so these routines are my usual means for 
debugging stuff, and libgomp isn't built in the config used to run the 
testsuite.



I won't comment on the static sizes since this apparently has to match
something already in existence.


Yeah, this is basically a shared memory interface. I plan to implement a 
proper circular buffer, etc., etc., etc., but there's a lot to do.



+
+void
+gomp_print_string (const char *msg, const char *value)
+{
+  struct printf_data *output = reserve_print_slot ();
+  output->type = 2; /* String.  */
+
+  strncpy (output->msg, msg, 127);
+  output->msg[127] = '\0';
+  strncpy (output->text, value, 127);
+  output->text[127] = '\0';
+
+  asm ("" ::: "memory");
+  output->written = 1;
+}

I'm not familiar with the GCN memory model, but your asm is really just
a barrier for the compiler.  Do you need any kind of hardware fencing
here?  Similarly for other instances.


As long as the compiler doesn't reorder the write instructions then this 
is fine, as is. The architecture does not reorder writes in hardware.


That said, actually the updated version I'm preparing has additional L1 
cache flushes to make absolutely sure the data are written to the L2 
cache memory in order. I did this when investigating a problem to make 
sure I wasn't losing debug output, and even though I found that I was 
not, I kept the patch anyway.



All these functions probably need a little comment on their purpose and
arguments.


Understood.


Note some of the divmod stuff recently changed.  You may need minor
updates as a result of those patches.  See:


OK, thanks for the heads up.

And thanks for the review. I'm planning to post a somewhat-updated V2 
patch set any week now.


Andrew

[PATCH] Do not allow -mabi=ms and -fsanitize={,kernel-}address (PR sanitizer/87930).

Hi.

The patch reject usage of the mentioned options.

Ready for trunk?
Thanks,
Martin

gcc/ChangeLog:

2018-11-12  Martin Liska  

PR sanitizer/87930
* config/i386/i386.c (ix86_option_override_internal): Error
about usage -mabi=ms and -fsanitize={,kernel-}address.
---
 gcc/config/i386/i386.c | 5 +
 1 file changed, 5 insertions(+)


diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 711bec0cc9d..b3e0807b894 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -3546,6 +3546,11 @@ ix86_option_override_internal (bool main_args_p,
 error ("-mabi=ms not supported with X32 ABI");
   gcc_assert (opts->x_ix86_abi == SYSV_ABI || opts->x_ix86_abi == MS_ABI);
 
+  if ((opts->x_flag_sanitize & SANITIZE_USER_ADDRESS) && opts->x_ix86_abi == MS_ABI)
+error ("%<-mabi=ms%> not supported with %<-fsanitize=address%>");
+  if ((opts->x_flag_sanitize & SANITIZE_KERNEL_ADDRESS) && opts->x_ix86_abi == MS_ABI)
+error ("%<-mabi=ms%> not supported with %<-fsanitize=kernel-address%>");
+
   /* For targets using ms ABI enable ms-extensions, if not
  explicit turned off.  For non-ms ABI we turn off this
  option.  */

Re: [PR81878]: fix --disable-bootstrap --enable-languages=ada, and cross-back gnattools build

2018-11-12 Thread Arnaud Charlet

> I've considered sourcing ada/config-lang.in from within
> gnattools/configure, and testing lang_requires as set by it, so as to
> avoid a duplication of tests that ought to remain in sync, but decided
> it would be too fragile, as ada/config-lang.in does not expect srcdir
> to refer to gnattools.
> 
> Please let me know if there are objections to this change in the next
> few days, e.g., if enabling C and C++ for an Ada-only build is too
> onerous.  It is certainly possible to rework gnattools build machinery
> so that it uses CC and CXX as detected by the top-level configure if we
> can't find xgcc and xg++ in ../gcc.  At least in cross builds, we
> already require build-time Ada tools to have the same version as that
> we're cross-building, so we might as well use preexisting gcc and g++
> under the same requirements.

No objection from me assuming it doesn't again break some builds in subtle
ways :-) (In which case we'll have to revert again).

Arno

Re: [PATCH] Do not allow -mabi=ms and -fsanitize={,kernel-}address (PR sanitizer/87930).

2018-11-12 Thread Jakub Jelinek

On Mon, Nov 12, 2018 at 01:03:41PM +0100, Martin Liška wrote:
> The patch reject usage of the mentioned options.
> 
> Ready for trunk?
> Thanks,
> Martin
> 
> gcc/ChangeLog:
> 
> 2018-11-12  Martin Liska  
> 
>   PR sanitizer/87930
>   * config/i386/i386.c (ix86_option_override_internal): Error
>   about usage -mabi=ms and -fsanitize={,kernel-}address.

Please add testcases for this.  Can this be changed through attribute too?
If so, a test for that should be there too.

> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -3546,6 +3546,11 @@ ix86_option_override_internal (bool main_args_p,
>  error ("-mabi=ms not supported with X32 ABI");
>gcc_assert (opts->x_ix86_abi == SYSV_ABI || opts->x_ix86_abi == MS_ABI);
>  
> +  if ((opts->x_flag_sanitize & SANITIZE_USER_ADDRESS) && opts->x_ix86_abi == 
> MS_ABI)
> +error ("%<-mabi=ms%> not supported with %<-fsanitize=address%>");
> +  if ((opts->x_flag_sanitize & SANITIZE_KERNEL_ADDRESS) && opts->x_ix86_abi 
> == MS_ABI)
> +error ("%<-mabi=ms%> not supported with %<-fsanitize=kernel-address%>");
> +
>/* For targets using ms ABI enable ms-extensions, if not
>   explicit turned off.  For non-ms ABI we turn off this
>   option.  */
> 


Jakub

Re: [GCC][AArch64] [middle-end][docs] Document the xorsign optab

2018-11-12 Thread Tamar Christina

Hi Sandra,

> > Ok for trunk?
> > 
> > +@cindex @code{xorsign@var{m}3} instruction pattern
> > +@item @samp{xorsign@var{m}3}
> > +Target suppports an efficient expansion of x * copysign (1.0, y)
> > +as xorsign (x, y).  Store a value with the magnitude of operand 1
> > +and the sign of operand 2 into operand 0.  All operands have mode
> > +@var{m}, which is a scalar or vector floating-point mode.
> > +
> > +This pattern is not allowed to @code{FAIL}.
> > +
> 
> Hmmm, needs markup, plus it's a little confusing.  How about describing 
> it as
> 
> Equivalent to @samp{op0 = op1 * copysign (1.0, op2)}: store a value with
> the magnitude of operand 1 and the sign of operand 2 into operand 0.
> All operands have mode @var{m}, which is a scalar or vector
> floating-point mode.
> 
> This pattern is not allowed to @code{FAIL}.

That works for me, updated patch attached.

OK for trunk?

Thanks,
Tamar

> 
> -Sandra

-- 
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 360b36b862f7eb13964e60ff53b04e1274f89fe4..cf779b5eb16812c244ec7eb032a33680149bba85 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5997,6 +5997,15 @@ vector floating-point mode.
 
 This pattern is not allowed to @code{FAIL}.
 
+@cindex @code{xorsign@var{m}3} instruction pattern
+@item @samp{xorsign@var{m}3}
+Equivalent to @samp{op0 = op1 * copysign (1.0, op2)}: store a value with
+the magnitude of operand 1 and the sign of operand 2 into operand 0.
+All operands have mode @var{m}, which is a scalar or vector
+floating-point mode.
+
+This pattern is not allowed to @code{FAIL}.
+
 @cindex @code{ffs@var{m}2} instruction pattern
 @item @samp{ffs@var{m}2}
 Store into operand 0 one plus the index of the least significant 1-bit

[PATCH] More value_range API cleanup



This mainly tries to rectify the workaround I put in place for ipa-cp.c
needing to build value_range instead of value_range_base for calling
extract_range_from_unary_expr.

To make this easier I moved more set_* functions to methods.

Then for some reason I chose to fix the rathole of equiv bitmap sharing
after finding at least one real bug 
(vr_values::extract_range_from_phi_node modifying a shared bitmap).
So the goal was to make copy construction and assignment deleted
(they are now not implemented) and force users to use explicit
deep_copy or move or set functions.  In most places fixing existing
uses was easy but some places needed more surgery and now use
indirection and a temporary scratch object.

Bootstrap and regtest in progress on x86_64-unknown-linux-gnu.

This is my last change in this area unless somebody points out
sth obvious.

Richard.

>From 9f141edae4e4eeb91448f9ea5638765b56d5780b Mon Sep 17 00:00:00 2001
From: Richard Guenther 
Date: Mon, 12 Nov 2018 12:43:34 +0100
Subject: [PATCH] vrp-changes-2

* tree-vrp.h (value_range[_base]::set): Make public.  Provide
overload for single value.
(value_range[_base]::set_nonnull): New.
(value_range[_base]::set_null): Likewise.
(value_range): Document bitmap copying behavior, mark
copy constructor and assignment operator deleted.
(value_range::move): New.
(value_range::set_and_canonicalize): Default bitmap to zero.
(set_value_range_to_nonnull): Remove.
(set_value_range_to_null): Likewise.
(set_value_range): Likewise.
(set_value_range_to_value): Likewise.
(extract_range_from_unary_expr): Work on value_range_base.
(extract_range_from_binary_expr_1): Likewise.  Rename to...
(extract_range_from_binary_expr): ... this.
* tree-vrp.c (value_range::update): Clear equiv bitmap
if required.
(value_range::move): New, move equiv bitmap.
(value_range_base::set_undefined): Avoid assignment.
(value_range::set_undefined): Likewise.
(value_range_base::set_varying): Likewise.
(value_range::set_varying): Likewise.
(set_value_range): Remove.
(value_range_base::set): New overload for value.
(value_range::set): Likewise.
(set_value_range_to_nonnull): Remove.
(value_range_base::set_nonnull): New.
(value_range::set_nonnull): Likewise.
(set_value_range_to_null): Remove.
(value_range_base::set_null): New.
(value_range::set_null): Likewise.
(range_is_null): Work on value_range_base.
(range_is_nonnull): Likewise.
(ranges_from_anti_range): Likewise.
(extract_range_into_wide_ints): Likewise.
(extract_range_from_multiplicative_op): Likewise.
(extract_range_from_binary_expr): Likewise.  Update for API changes.
(extract_range_from_unary_expr): Likewise.  Remove OBJ_TYPE_REF
handling.
(value_range::intersect_helper): Avoid copy and assignment.
(value_range::union_helper): Likewise.
(determine_value_range_1): Adjust.
* gimple-ssa-evrp-analyze.c (evrp_range_analyzer::try_find_new_range):
Avoid assignment by using move.
(evrp_range_analyzer::record_ranges_from_stmt): Avoid assignment.
* tree-ssa-threadedge.c (record_temporary_equivalences_from_phis):
Likewise.
* tree-ssanames.c (get_range_info): Likewise.
* vr-values.h (vr_values::get_vr_for_comparison): Adjust API.
* vr-values.c (vr_values::get_value_range): Adjust.
(vr_values::update_value_range): Likewise.
(symbolic_range_based_on_p): Work on value_range_base.
(vr_values::extract_range_from_binary_expr): Use value_range_base.
(vr_values::extract_range_from_unary_expr): Likewise.
(vr_values::extract_range_from_cond_expr): Avoid assignment.
(vr_values::extract_range_from_comparison): Adjust.
(vr_values::check_for_binary_op_overflow): Use value_range_base.
(vr_values::extract_range_basic): Adjust.
(vr_values::adjust_range_with_scev): Likewise.
(vr_values::vrp_visit_assignment_or_call): Likewise.
(vr_values::get_vr_for_comparison): Change API to avoid
assignment and copy construction.
(vr_values::compare_name_with_value): Adjust accordingly.
(vr_values::compare_names): Likewise.
(vr_values::extract_range_from_phi_node): Avoid assignment and
bogus in-place modify of equiv bitmap.
(vr_values::simplify_bit_ops_using_ranges): Use value_range_base.
* ipa-prop.c (ipa_compute_jump_functions_for_edge): Adjust
for extract_range_from_unary_expr API change.
* ipa-cp.c (ipa_vr_operation_and_type_effects): Likewise.

diff --git a/gcc/gimple-ssa-evrp-analyze.c b/gcc/gimple-ssa-evrp-analyze.c
index 1cd13dda7b6..bd11eea12b4 100644
--- a/gcc/gimple-ssa-evrp-analyze.c
+++ b/gcc/gimple-ssa-evrp-analyze.c

Re: [PATCH 21/25] GCN Back-end (part 1/2).


On 09/11/2018 19:11, Jeff Law wrote:

There's a ton of work related to reduction setup, updates and teardown.
  I don't guess there's any generic code we can/should be re-using.  Sigh.


I'm not sure what can be shared, or not, here. For OpenMP we don't have 
any special code, but OpenACC is much closer to the metal, and AMD GCN 
does things somewhat differently to NVPTX.



WRT your move patterns.  I'm a bit concerned about using distinct
matters for so many different variants.  But they mostly seem confined
to vector variants.  Be aware you may need to squash them into a single
pattern over time to keep LRA happy.


As you might guess, the move patterns have been really difficult to get 
right. The added dependency on the EXEC register tends to put LRA into 
an infinite loop, and the fact that GCN vector moves are always 
scatter/gather (rather than a contiguous load/store from a base address) 
makes spills rather painful.


Thanks for your review, I'll have a V2 patch-set soonish.

Andrew

Re: [PATCH][DOCS] Fix documentation of __builtin_cpu_is and __builtin_cpu_supports for x86.

2018-11-12 Thread Uros Bizjak

> The patch is adding missing values for aforementioned built-ins.
>
> Ready for trunk?
> Thanks,
> Martin
>
> gcc/ChangeLog:
>
> 2018-11-12  Martin Liska  
>
> * doc/extend.texi: Add missing values for __builtin_cpu_is and
> __builtin_cpu_supports for x86 target.

OK.

Thanks,
Uros.

Re: [PATCH 3/9][GCC][AArch64] Add autovectorization support for Complex instructions

2018-11-12 Thread Kyrill Tkachov


Hi Tamar,

On 11/11/18 10:26, Tamar Christina wrote:

Hi All,

This patch adds the expander support for supporting autovectorization of 
complex number operations
such as Complex addition with a rotation along the Argand plane.  This also 
adds support for complex
FMA.

The instructions are described in the ArmARM [1] and are available from 
Armv8.3-a onwards.

Concretely, this generates

f90:
mov x3, 0
.p2align 3,,7
.L2:
ldr q0, [x0, x3]
ldr q1, [x1, x3]
fcadd   v0.2d, v0.2d, v1.2d, #90
str q0, [x2, x3]
add x3, x3, 16
cmp x3, 3200
bne .L2
ret

now instead of

f90:
mov x4, x1
mov x1, x2
add x3, x4, 31
add x2, x0, 31
sub x3, x3, x1
sub x2, x2, x1
cmp x3, 62
mov x3, 62
ccmpx2, x3, 0, hi
bls .L5
mov x2, x4
add x3, x0, 3200
.p2align 3,,7
.L3:
ld2 {v2.2d - v3.2d}, [x0], 32
ld2 {v4.2d - v5.2d}, [x2], 32
cmp x0, x3
fsubv0.2d, v2.2d, v5.2d
faddv1.2d, v4.2d, v3.2d
st2 {v0.2d - v1.2d}, [x1], 32
bne .L3
ret
.L5:
add x6, x0, 8
add x5, x4, 8
add x2, x1, 8
mov x3, 0
.p2align 3,,7
.L2:
ldr d1, [x0, x3]
ldr d3, [x5, x3]
ldr d0, [x6, x3]
ldr d2, [x4, x3]
fsubd1, d1, d3
faddd0, d0, d2
str d1, [x1, x3]
str d0, [x2, x3]
add x3, x3, 16
cmp x3, 3200
bne .L2
ret

For complex additions with a 90* rotation along the Argand plane.

[1] 
https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile

Bootstrap and Regtest on aarch64-none-linux-gnu, arm-none-gnueabihf and 
x86_64-pc-linux-gnu
are still on going but previous patch showed no regressions.

The instructions have also been tested on aarch64-none-elf and arm-none-eabi on 
a Armv8.3-a model
and -march=Armv8.3-a+fp16 and all tests pass.

Ok for trunk?

Thanks,
Tamar

gcc/ChangeLog:

2018-11-11  Tamar Christina  

* config/aarch64/aarch64-simd.md (aarch64_fcadd,
fcadd3, aarch64_fcmla,
fcmla4): New.
* config/aarch64/aarch64.h (TARGET_COMPLEX): New.
* config/aarch64/iterators.md (UNSPEC_FCADD90, UNSPEC_FCADD270,
UNSPEC_FCMLA, UNSPEC_FCMLA90, UNSPEC_FCMLA180, UNSPEC_FCMLA270): New.
(FCADD, FCMLA): New.
(rot, rotsplit1, rotsplit2): New.
* config/arm/types.md (neon_fcadd, neon_fcmla): New.

--



diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
c4be3101fdec930707918106cd7c53cf7584553e..12a91183a98ea23015860c77a97955cb1b30bfbb
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -419,6 +419,63 @@
 }
 )
 
+;; The fcadd and fcmla patterns are made UNSPEC for the explicitly due to the

+;; fact that their usage need to guarantee that the source vectors are
+;; contiguous.  It would be wrong to describe the operation without being able
+;; to describe the permute that is also required, but even if that is done
+;; the permute would have been created as a LOAD_LANES which means the values
+;; in the registers are in the wrong order.
+(define_insn "aarch64_fcadd"
+  [(set (match_operand:VHSDF 0 "register_operand" "=w")
+   (unspec:VHSDF [(match_operand:VHSDF 1 "register_operand" "w")
+  (match_operand:VHSDF 2 "register_operand" "w")]
+  FCADD))]
+  "TARGET_COMPLEX"
+  "fcadd\t%0., %1., %2., #"
+  [(set_attr "type" "neon_fcadd")]
+)
+
+(define_expand "fcadd3"
+  [(set (match_operand:VHSDF 0 "register_operand")
+   (unspec:VHSDF [(match_operand:VHSDF 1 "register_operand")
+  (match_operand:VHSDF 2 "register_operand")]
+  FCADD))]
+  "TARGET_COMPLEX"
+{
+  emit_insn (gen_aarch64_fcadd (operands[0], operands[1],
+  operands[2]));
+  DONE;
+})
+
+(define_insn "aarch64_fcmla"
+  [(set (match_operand:VHSDF 0 "register_operand" "=w")
+   (plus:VHSDF (match_operand:VHSDF 1 "register_operand" "0")
+   (unspec:VHSDF [(match_operand:VHSDF 2 "register_operand" 
"w")
+  (match_operand:VHSDF 3 "register_operand" 
"w")]
+  FCMLA)))]
+  "TARGET_COMPLEX"
+  "fcmla\t%0., %2., %3., #"
+  [(set_attr "type" "neon_fcmla")]
+)
+
+;; The complex mla operations always need to expand to two instructions.
+;; The first operation does half the computation and the second does the
+;; remainder.  Because of this, expand early.
+(define_expand "fcmla4"
+  [(set (match_operand:VHSDF 0 "register_operand")
+   (plus:VHSDF (match_operand:VHSDF 1 "register_operand")

Re: [PATCH][LRA] Fix PR87899: r264897 cause mis-compiled native arm-linux-gnueabihf toolchain

2018-11-12 Thread Renlin Li

Hi Peter,

Thanks for the patch! It makes much more sense to me to split those functions,
and use them separately.

I tried to build a native arm-linuxeabihf toolchain with the patch. But I got
the following ICE:

/home/renlin/try-new/./gcc/xgcc -B/home/renlin/try-new/./gcc/ -B/usr/local/arm-none-linux-gnueabihf/bin/ -B/usr/local/arm-none-linux-gnueabihf/lib/
-isystem /usr/local/arm-none-linux-gnueabihf/include -isystem /usr/local/arm-none-linux-gnueabihf/sys-include -fno-checking -O2 -g -O0 -O2 -O2 -g
-O0 -DIN_GCC-W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wno-format -Wstrict-prototypes -Wmissing-prototypes -Wold-style-definition
-isystem ./include -fPIC -fno-inline -g -DIN_LIBGCC2 -fbuilding-libgcc -fno-stack-protector -fPIC -fno-inline -I. -I. -I../.././gcc
-I../../../gcc/libgcc -I../../../gcc/libgcc/. -I../../../gcc/libgcc/../gcc -I../../../gcc/libgcc/../include -DHAVE_CC_TLS -o _negvdi2_s.o -MT
_negvdi2_s.o -MD -MP -MF _negvdi2_s.dep -DSHARED -DL_negvdi2 -c ../../../gcc/libgcc/libgcc2.c

0x807eb3 lra(_IO_FILE*)
../../gcc/gcc/lra.c:2497
0x7c2755 do_reload
../../gcc/gcc/ira.c:5469
0x7c2c11 execute
../../gcc/gcc/ira.c:5653
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See for instructions.
make[3]: *** [Makefile:916: _gcov_merge_icall_topn.o] Error 1
make[3]: *** Waiting for unfinished jobs
make[3]: *** [Makefile:916: _gcov_merge_single.o] Error 1
during RTL pass: reload
../../../gcc/libgcc/libgcov-driver.c: In function
‘gcov_sort_icall_topn_counter’:
../../../gcc/libgcc/libgcov-driver.c:436:1: internal compiler error: in
remove_some_program_points_and_update_live_ranges, at lra-lives.c:1172
436 | }
| ^
0x829189 remove_some_program_points_and_update_live_ranges
../../gcc/gcc/lra-lives.c:1172
0x829683 compress_live_ranges
../../gcc/gcc/lra-lives.c:1301
0x829d45 lra_create_live_ranges_1
../../gcc/gcc/lra-lives.c:1454
0x829d7d lra_create_live_ranges(bool, bool)
../../gcc/gcc/lra-lives.c:1466
0x807eb3 lra(_IO_FILE*)
../../gcc/gcc/lra.c:2497
0x7c2755 do_reload
../../gcc/gcc/ira.c:5469
0x7c2c11 execute
../../gcc/gcc/ira.c:5653
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See for instructions.

Regards,
Renlin

On 11/12/2018 04:34 AM, Peter Bergner wrote:

Renlin, Jeff and Vlad: requests and questions for you below...

PR87899 shows another latent LRA bug exposed by my r264897 commit.
In the bugzilla report, we have the following rtl in LRA:

(insn 1 (set (reg:SI 1 r1) (reg/f:SI 2040)))
...
(insn 2 (set (mem/f/c:SI (pre_modify:SI (reg:SI 1 r1)
(plus:SI (reg:SI 1 r1)
(const_int 12
(reg:SI 1048))
(expr_list:REG_INC (reg:SI 1 r1)))
...

My earlier patch now sees the reg copy in insn "1" and correctly skips
adding a conflict between r1 and r2040 due to the copy. However, insn "2"
updates r1 and r2040 is live across that update and so we should create
a conflict between them, but we currently do not and that leads to us
assigning r1 to one of r2040's reload pseudos which gets clobbered by
the r1 update in insn "2".

The reason a conflict was never added between r1 and r2040 is that LRA
skips INOUT operands when computing conflicts and so misses the definition
of r1 in insn "2" and so never adds conflicts for it. The reason the code
skips the INOUT operands is that LRA doesn't want to create new program
points for INOUT operands, since unnecessary program points can slow down
remove_some_program_points_and_update_live_ranges. This was all fine
before when we had conservative conflict info, but now we cannot ignore
INOUT operands.

The heart of the problem is that the {make,mark}_*_{live,dead} routines
update the liveness, conflict and program point information for operands.
My solution to the problem was to pull out the updating of the program point
info from {make,mark}_*_{live,dead} and have them only update liveness and
conflict information. I then created a separate function that is used for
updating an operand's program points. This allowed me to modify the insn
operand scanning to handle all operand types (IN, OUT and INOUT) and always
call the {make,mark}_*_{live,dead} functions for all operand types, while
only calling the new program point update function for IN and OUT operands.

This change then allowed me to remove the hacky handling of conflicts for
reg copies and instead use the more common method of removing the src reg
of a copy from the live set before handling the copy's definition, thereby
skipping the unwanted conflict. Bonus! :-)

This passes bootstrap and regtesting on powerpc64le-linux with no regressions.

Re: [PATCH 3/9][GCC][AArch64] Add autovectorization support for Complex instructions

2018-11-12 Thread Tamar Christina

Hi Kyrill,

> Hi Tamar,
> 
> On 11/11/18 10:26, Tamar Christina wrote:
> > Hi All,
> >
> > This patch adds the expander support for supporting autovectorization of 
> > complex number operations
> > such as Complex addition with a rotation along the Argand plane.  This also 
> > adds support for complex
> > FMA.
> >
> > The instructions are described in the ArmARM [1] and are available from 
> > Armv8.3-a onwards.
> >
> > Concretely, this generates
> >
> > f90:
> > mov x3, 0
> > .p2align 3,,7
> > .L2:
> > ldr q0, [x0, x3]
> > ldr q1, [x1, x3]
> > fcadd   v0.2d, v0.2d, v1.2d, #90
> > str q0, [x2, x3]
> > add x3, x3, 16
> > cmp x3, 3200
> > bne .L2
> > ret
> >
> > now instead of
> >
> > f90:
> > mov x4, x1
> > mov x1, x2
> > add x3, x4, 31
> > add x2, x0, 31
> > sub x3, x3, x1
> > sub x2, x2, x1
> > cmp x3, 62
> > mov x3, 62
> > ccmpx2, x3, 0, hi
> > bls .L5
> > mov x2, x4
> > add x3, x0, 3200
> > .p2align 3,,7
> > .L3:
> > ld2 {v2.2d - v3.2d}, [x0], 32
> > ld2 {v4.2d - v5.2d}, [x2], 32
> > cmp x0, x3
> > fsubv0.2d, v2.2d, v5.2d
> > faddv1.2d, v4.2d, v3.2d
> > st2 {v0.2d - v1.2d}, [x1], 32
> > bne .L3
> > ret
> > .L5:
> > add x6, x0, 8
> > add x5, x4, 8
> > add x2, x1, 8
> > mov x3, 0
> > .p2align 3,,7
> > .L2:
> > ldr d1, [x0, x3]
> > ldr d3, [x5, x3]
> > ldr d0, [x6, x3]
> > ldr d2, [x4, x3]
> > fsubd1, d1, d3
> > faddd0, d0, d2
> > str d1, [x1, x3]
> > str d0, [x2, x3]
> > add x3, x3, 16
> > cmp x3, 3200
> > bne .L2
> > ret
> >
> > For complex additions with a 90* rotation along the Argand plane.
> >
> > [1] 
> > https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile
> >
> > Bootstrap and Regtest on aarch64-none-linux-gnu, arm-none-gnueabihf and 
> > x86_64-pc-linux-gnu
> > are still on going but previous patch showed no regressions.
> >
> > The instructions have also been tested on aarch64-none-elf and 
> > arm-none-eabi on a Armv8.3-a model
> > and -march=Armv8.3-a+fp16 and all tests pass.
> >
> > Ok for trunk?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > 2018-11-11  Tamar Christina  
> >
> > * config/aarch64/aarch64-simd.md (aarch64_fcadd,
> > fcadd3, aarch64_fcmla,
> > fcmla4): New.
> > * config/aarch64/aarch64.h (TARGET_COMPLEX): New.
> > * config/aarch64/iterators.md (UNSPEC_FCADD90, UNSPEC_FCADD270,
> > UNSPEC_FCMLA, UNSPEC_FCMLA90, UNSPEC_FCMLA180, UNSPEC_FCMLA270): 
> > New.
> > (FCADD, FCMLA): New.
> > (rot, rotsplit1, rotsplit2): New.
> > * config/arm/types.md (neon_fcadd, neon_fcmla): New.
> >
> > -- 
> 
> 
> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index 
> c4be3101fdec930707918106cd7c53cf7584553e..12a91183a98ea23015860c77a97955cb1b30bfbb
>  100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -419,6 +419,63 @@
>   }
>   )
>   
> +;; The fcadd and fcmla patterns are made UNSPEC for the explicitly due to the
> +;; fact that their usage need to guarantee that the source vectors are
> +;; contiguous.  It would be wrong to describe the operation without being 
> able
> +;; to describe the permute that is also required, but even if that is done
> +;; the permute would have been created as a LOAD_LANES which means the values
> +;; in the registers are in the wrong order.
> +(define_insn "aarch64_fcadd"
> +  [(set (match_operand:VHSDF 0 "register_operand" "=w")
> + (unspec:VHSDF [(match_operand:VHSDF 1 "register_operand" "w")
> +(match_operand:VHSDF 2 "register_operand" "w")]
> +FCADD))]
> +  "TARGET_COMPLEX"
> +  "fcadd\t%0., %1., %2., #"
> +  [(set_attr "type" "neon_fcadd")]
> +)
> +
> +(define_expand "fcadd3"
> +  [(set (match_operand:VHSDF 0 "register_operand")
> + (unspec:VHSDF [(match_operand:VHSDF 1 "register_operand")
> +(match_operand:VHSDF 2 "register_operand")]
> +FCADD))]
> +  "TARGET_COMPLEX"
> +{
> +  emit_insn (gen_aarch64_fcadd (operands[0], operands[1],
> +operands[2]));
> +  DONE;
> +})
> +
> +(define_insn "aarch64_fcmla"
> +  [(set (match_operand:VHSDF 0 "register_operand" "=w")
> + (plus:VHSDF (match_operand:VHSDF 1 "register_operand" "0")
> + (unspec:VHSDF [(match_operand:VHSDF 2 "register_operand" 
> "w")
> +(match_operand:VHS

Re: [PATCH 21/25] GCN Back-end (part 2/2).


On 09/11/2018 19:39, Jeff Law wrote:

+
+/* Generate epilogue.  Called from gen_epilogue during pro_and_epilogue pass.
+
+   See gcn_expand_prologue for stack details.  */
+
+void
+gcn_expand_epilogue (void)

You probably need a barrier in here to ensure that the scheduler doesn't
move an aliased memory reference into the local stack beyond the stack
adjustment.

You're less likely to run into it because you eliminate frame pointers
fairly aggressively, but it's still the right thing to do.


Sorry, I'm not sure I understand what the problem is? How can this 
happen? Surely the scheduler wouldn't change the logic of the code?



+
+/* Implement TARGET_LEGITIMATE_COMBINED_INSN.
+
+   Return false if the instruction is not appropriate as a combination of two
+   or more instructions.  */
+
+bool
+gcn_legitimate_combined_insn (rtx_insn *insn)
+{
+  rtx pat = PATTERN (insn);
+
+  /* The combine pass tends to strip (use (exec)) patterns from insns.  This
+ means it basically switches everything to use the *_scalar form of the
+ instructions, which is not helpful.  So, this function disallows such
+ combinations.  Unfortunately, this also disallows combinations of genuine
+ scalar-only patterns, but those only come from explicit expand code.
+
+ Possible solutions:
+ - Invent TARGET_LEGITIMIZE_COMBINED_INSN.
+ - Remove all (use (EXEC)) and rely on md_reorg with "exec" attribute.
+   */

This seems a bit hokey.  Why specifically is combine removing the USE?


I don't understand combine fully enough to explain it now, although at 
the time I wrote this, and in a GCC 7 code base, I had followed the code 
through and observed what it was doing.


Basically, if you have two patterns that do the same operation, but one 
has a "parallel" with an additional "use", then combine will tend to 
prefer the one without the "use". That doesn't stop the code working, 
but it makes a premature (accidental) decision about instruction 
selection that we'd prefer to leave to the register allocator.


I don't recall if it did this to lone instructions, but it would 
certainly do so when combining two (or more) instructions, and IIRC 
there are typically plenty of simple moves around that can be easily 
combined.



+  /* "Manually Inserted Wait States (NOPs)."
+
+ GCN hardware detects most kinds of register dependencies, but there
+ are some exceptions documented in the ISA manual.  This pass
+ detects the missed cases, and inserts the documented number of NOPs
+ required for correct execution.  */

How unpleasant :(  But if it's what you need to do, so be it.  I'll
assume the compiler is the right place to do this -- though some ports
handle this kind of stuff in the assembler or linker.


We're using an LLVM assembler and linker, so we have tried to use them 
as is, rather than making parallel changes that would prevent GCC 
working with the last numbered release of LLVM (see the work around for 
assembler bugs in the BImode mode instruction).


Expecting the assembler to fix this up would also throw off the 
compiler's offset calculations, and the near/far branch instructions 
have different register requirements it's better for the compiler to 
know about.


The MIPS backend also inserts NOPs in a similar way.

In future, I'd like to have the scheduler insert real instructions into 
these slots, but that's very much on the to-do list.



+/* Disable the "current_vector_size" feature intended for
+   AVX<->SSE switching.  */

Guessing you just copied the comment, you probably want to update it to
not refer to AVX/SSE.


Nope, that means exactly what it says. See the (unresolved) discussion 
around "[PATCH 13/25] Create TARGET_DISABLE_CURRENT_VECTOR_SIZE".


I'll probably move that into a separate patch to commit after the main 
port. It'll suffer poor vectorization in some examples in the mean-time, 
but that patch is not going to be straight-forward.



You probably need to define the safe-speculation stuff
(TARGET_SPECULATION_SAFE_VALUE).


Oh, OK. :-(

I have no idea whether the architecture has those issues or not.


+; "addptr" is the same as "add" except that it must not write to VCC or SCC
+; as a side-effect.  Unfortunately GCN3 does not have a suitable instruction
+; for this, so we use a split to save and restore the condition code.
+; This pattern must use "Sg" instead of "SD" to prevent the compiler
+; assigning VCC as the destination.
+; FIXME: Provide GCN5 implementation

I worry about the save/restore aspects of this.  Haven't we discussed
this somewhere?!?


I think this came up in the SPECIAL_REGNO_P patch discussion. We 
eventually found that the underlying problem was the way the 
save/restore reused pseudoregs.


The "addptr" pattern has been rewritten in my draft V2 patchset. It 
still uses a fixed scratch register, but no longer does save/restore.



Generally I don't see major concerns.   THere's some minor things to
fix.  As far as the correctness of the code yo

Re: [PATCH] combine: Do not combine moves from hard registers

On Mon, Nov 12, 2018 at 11:54:37AM +, Sam Tebbs wrote:
> On 11/08/2018 08:34 PM, Segher Boessenkool wrote:
> 
> > On Thu, Nov 08, 2018 at 03:44:44PM +, Sam Tebbs wrote:
> >> Does your patch fix the incorrect generation of "scvtf s1, s1"? I was
> >> looking at the issue as well and don't want to do any overlapping work.
> > I don't know.  Well, there are no incorrect code issues I know of at all
> > now; but you mean that it is taking an instruction more than you would
> > like to see, I suppose?
> 
> Yes, I am referring to the extra instruction being generated. In my
> opinion is incorrect code generation since the intention is that it
> shouldn't be generated, and shouldn't based on the relevant code and
> patterns implemented.

That is not what incorrect code means; it is a "missed optimisation",
instead.

Anyway, does it now do what you want?

(PR87763 btw)


Segher

Re: [RS6000] Don't pass -many to the assembler

On Mon, Nov 12, 2018 at 10:19:04PM +1030, Alan Modra wrote:
> I'd like to remove -many from the options passed by default to the
> assembler, on the grounds that a gcc bug in instruction selection (eg.
> emitting a power9 insn for -mcpu=power8) is better found at assembly
> time than run time.
> 
> This might annoy people for a while fixing user asm that we didn't
> diagnose previously, but I believe this is the right direction to go.
> Of course, -Wa,-many is available for anyone who just wants their
> dodgy old code to work.
> 
> Bootstrapped etc. powerpc64le-linux.  OK?

I forgot to mention something important.  This exposes a bug with our
target_clones support, in that we don't emit .machine directives when
changing cpu.  eg. gcc.target/powerpc/clone2.c fails with
"unrecognized opcode: `modsd'".

__attribute__((__target__("cpu=..."))) also doesn't emit a .machine
directive before the affected function code.

-- 
Alan Modra
Australia Development Lab, IBM

Re: [PATCH] Come up with htab_hash_string_vptr and use string-specific if possible.

2018-11-12 Thread Michael Matz

Hi,

On Mon, 12 Nov 2018, Martin Liška wrote:

> > There's no fundamental reason why we can't poison identifiers in other 
> > headers.  Indeed we do in vec.h.  So move the whole thing including 
> > poisoning to hash-table.h?
> 
> That's not feasible as gcc/gcc/genhooks.c files use the function and
> we don't want to include hash-table.h in the generator files.

gencfn-macros.c:#include "hash-table.h"
genmatch.c:#include "hash-table.h"
gentarget-def.c:#include "hash-table.h"

So there's precedent.  The other solution would be to ignore genhooks.c 
(i.e. let it continue using the non-typesafe variant), I'm not very 
worried about wrong uses creeping in there.  It had like one material 
change in the last seven years.

I think I prefer the latter (ignoring the problem).

> So it's question whether it worth doing that?

Jumping through hoops for generator files seems useless.  But the general 
idea for your type-checking hashers for the compiler proper does seem 
useful.

Ciao,
Michael.

[PATCH] Fix PR87985



The following fixes split_constant_offset unbound un-CSEing of
expressions when following SSA def stmts.  Simply limiting it to
single-uses isn't good for consumers so the following instead
limits analysis by implementing a cache.  Note this may still
end up un-CSEing stuff but I didn't want to try inserting
SAVE_EXPRs in split_constant_offset result...  (maybe I should
simply try though...).  Another option would be to give up
when we see several uses of an "interesting" expression, thus
make the hash-map a visited thing instead (but the result would
be somewhat odd I guess).

Anyway, the following preserves existing behavior while fixing
the compile-time issue for the testcase (which doesn't end up
generating anything interesting).

Bootstrap & regtest running on x86_64-unknown-linux-gnu.

Richard.

>From cfe2c14173b8d2fa6e998e9895dce0cdf9b3e00e Mon Sep 17 00:00:00 2001
From: Richard Guenther 
Date: Mon, 12 Nov 2018 14:45:27 +0100
Subject: [PATCH] fix-pr87985

PR middle-end/87985
* tree-data-ref.c (split_constant_offset): Add wrapper
allocating a cache hash-map.
(split_constant_offset_1): Cache results of expanding
expressions from SSA def stmts.

* gcc.dg/pr87985.c: New testcase.

diff --git a/gcc/testsuite/gcc.dg/pr87985.c b/gcc/testsuite/gcc.dg/pr87985.c
new file mode 100644
index 000..c0d07ff918f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr87985.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O -ftree-slp-vectorize" } */
+
+char *bar (void);
+__INTPTR_TYPE__ baz (void);
+
+void
+foo (__INTPTR_TYPE__ *q)
+{
+  char *p = bar ();
+  __INTPTR_TYPE__ a = baz ();
+  __INTPTR_TYPE__ b = baz ();
+  int i = 0;
+#define X q[i++] = a; q[i++] = b; a = a + b; b = b + a;
+#define Y X X X X X X X X X X
+#define Z Y Y Y Y Y Y Y Y Y Y
+  Z Z Z Z Z Z Z Z Z Z
+  p[a] = 1;
+  p[b] = 2;
+}
diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c
index 6019c6168bf..0617c97eec4 100644
--- a/gcc/tree-data-ref.c
+++ b/gcc/tree-data-ref.c
@@ -95,10 +95,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-affine.h"
 #include "params.h"
 #include "builtins.h"
-#include "stringpool.h"
-#include "tree-vrp.h"
-#include "tree-ssanames.h"
 #include "tree-eh.h"
+#include "ssa.h"
 
 static struct datadep_stats
 {
@@ -584,6 +582,10 @@ debug_ddrs (vec ddrs)
   dump_ddrs (stderr, ddrs);
 }
 
+static void
+split_constant_offset (tree exp, tree *var, tree *off,
+  hash_map > &cache);
+
 /* Helper function for split_constant_offset.  Expresses OP0 CODE OP1
(the type of the result is TYPE) as VAR + OFF, where OFF is a nonzero
constant of type ssizetype, and returns true.  If we cannot do this
@@ -592,7 +594,8 @@ debug_ddrs (vec ddrs)
 
 static bool
 split_constant_offset_1 (tree type, tree op0, enum tree_code code, tree op1,
-tree *var, tree *off)
+tree *var, tree *off,
+hash_map > &cache)
 {
   tree var0, var1;
   tree off0, off1;
@@ -613,8 +616,10 @@ split_constant_offset_1 (tree type, tree op0, enum 
tree_code code, tree op1,
   /* FALLTHROUGH */
 case PLUS_EXPR:
 case MINUS_EXPR:
-  split_constant_offset (op0, &var0, &off0);
-  split_constant_offset (op1, &var1, &off1);
+  split_constant_offset (op0, &var0, &off0, cache);
+  split_constant_offset (op1, &var1, &off1, cache);
+  if (integer_zerop (off0) && integer_zerop (off1))
+   return false;
   *var = fold_build2 (code, type, var0, var1);
   *off = size_binop (ocode, off0, off1);
   return true;
@@ -623,7 +628,9 @@ split_constant_offset_1 (tree type, tree op0, enum 
tree_code code, tree op1,
   if (TREE_CODE (op1) != INTEGER_CST)
return false;
 
-  split_constant_offset (op0, &var0, &off0);
+  split_constant_offset (op0, &var0, &off0, cache);
+  if (integer_zerop (off0))
+   return false;
   *var = fold_build2 (MULT_EXPR, type, var0, op1);
   *off = size_binop (MULT_EXPR, off0, fold_convert (ssizetype, op1));
   return true;
@@ -647,7 +654,7 @@ split_constant_offset_1 (tree type, tree op0, enum 
tree_code code, tree op1,
 
if (poffset)
  {
-   split_constant_offset (poffset, &poffset, &off1);
+   split_constant_offset (poffset, &poffset, &off1, cache);
off0 = size_binop (PLUS_EXPR, off0, off1);
if (POINTER_TYPE_P (TREE_TYPE (base)))
  base = fold_build_pointer_plus (base, poffset);
@@ -691,11 +698,40 @@ split_constant_offset_1 (tree type, tree op0, enum 
tree_code code, tree op1,
if (gimple_code (def_stmt) != GIMPLE_ASSIGN)
  return false;
 
-   var0 = gimple_assign_rhs1 (def_stmt);
subcode = gimple_assign_rhs_code (def_stmt);
+
+   /* We are using a cache to avoid un-CSEing large amounts of code.  */
+   bool use_cache = false;
+   if (!has_single_use (op0)
+   && (subcode == POINTER_PLUS_EXPR
+

Re: [PATCH] Change set_value_range_to_[non]null to not preserve equivs

2018-11-12 Thread Jeff Law

On 11/12/18 4:11 AM, Richard Biener wrote:
> 
> This is a semantic change but AFAICS it shouldn't result in any 
> pessimization.  The behavior of the API is non-obvious.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
> 
> Richard.
> 
> 2018-11-12  Richard Biener  
> 
>   * tree-vrp.c (set_value_range_to_nonnull): Clear equiv.
>   (set_value_range_to_null): Likewise.
>   * vr-values.c (vr_values::extract_range_from_comparison):
>   Clear equiv for constant singleton ranges.
No concerns from my side.  When I did my work last year I was trying to
preserve existing semantics, so I didn't really look at places to drop
uses of the equivalence bitmaps.

Jeff

Re: [PATCH][GCC] Make DR_TARGET_ALIGNMENT compile time variable

On Fri, Nov 9, 2018 at 5:08 PM Andre Vieira (lists)
 wrote:
>
> On 05/11/18 12:41, Richard Biener wrote:
> > On Mon, Nov 5, 2018 at 1:07 PM Andre Vieira (lists)
> >  wrote:
> >>
> >>
> >> Hi,
> >>
> Hi,
>
> Thank you for the quick response! See inline responses below.
>
> >> This patch enables targets to describe DR_TARGET_ALIGNMENT as a
> >> compile-time variable.  It does so by turning the variable into a
> >> 'poly_uint64'.  This should not affect the current code-generation for
> >> any target.
> >>
> >> We hope to use this in the near future for SVE using the
> >> current_vector_size as the preferred target alignment for vectors.  In
> >> fact I have a patch to do just this, but I am still trying to figure out
> >> whether and when it is beneficial to peel for alignment with a runtime
> >> misalignment.
> >
> > In fact in most cases I have seen the issue is that it's not visible whether
> > peeling will be able to align _all_ references and doing peeling only to
> > align some is hardly beneficial.  To improve things the vectorizer would
> > have to version the loop for the case where peeling can reach alignment
> > for a group of DRs and then vectorize one copy with peeling for alignment
> > and one copy with unaligned accesses.
>
>
> So I have seen code being peeled for alignment even when it only knows
> how to align one of a group (only checked 2 or 3) and I think this may
> still be beneficial in some cases.  I am more worried about cases where
> the number of iterations isn't enough to justify the initial peeling
> cost or when the loop isn't memory bound, i.e. very arithmetic heavy
> loops.  This is a bigger vectorization problem though, that would
> require some kind of cost-model.
>
> >
> >>  The patch I am working on will change the behavior of
> >> auto-vectorization for SVE when building vector-length agnostic code for
> >> targets that benefit from aligned vector loads/stores.  The patch will
> >> result in  the generation of a runtime computation of misalignment and
> >> the construction of a corresponding mask for the first iteration of the
> >> loop.
> >>
> >> I have decided to not offer support for prolog/epilog peeling when the
> >> target alignment is not compile-time constant, as this didn't seem
> >> useful, this is why 'vect_do_peeling' returns early if
> >> DR_TARGET_ALIGNMENT is not constant.
> >>
> >> I bootstrapped and tested this on aarch64 and x86 basically
> >> bootstrapping one target that uses this hook and one that doesn't.
> >>
> >> Is this OK for trunk?
> >
> > The patch looks good but I wonder wheter it is really necessary at this
> > point.
>
> The goal of this patch is really to enable future work, on it's own it
> does nothing.  I am working on a small target-specific patch to enable
> this for SVE, but I need to do a bit more analysis and benchmarking to
> be able to determine whether its beneficial which I will not be able to
> finish before end of stage 1. That is why I split them up and sent this
> one upstream to see if I could get the middle-end change in.

OK, fine with me then.

Thanks,
Richard.

> >
> > Thanks,
> > Richard.
> >
> >> Cheers,
> >> Andre
> >>
> >> 2018-11-05  Andre Vieira  
> >>
> >> * config/aarch64/aarch64.c 
> >> (aarch64_vectorize_preferred_vector_alignment):
> >> Change return type to poly_uint64.
> >> (aarch64_simd_vector_alignment_reachable): Adapt to preferred 
> >> vector
> >> alignment being a poly int.
> >> * doc/tm.texi (TARGET_VECTORIZE_PREFERRED_VECTOR_ALIGNMENT): 
> >> Change return
> >> type to poly_uint64.
> >> * target.def (default_preferred_vector_alignment): Likewise.
> >> * targhooks.c (default_preferred_vector_alignment): Likewise.
> >> * targhooks.h (default_preferred_vector_alignment): Likewise.
> >> * tree-vect-data-refs.c
> >> (vect_calculate_target_alignment): Likewise.
> >> (vect_compute_data_ref_alignment): Adapt to vector alignment
> >> being a poly int.
> >> (vect_update_misalignment_for_peel): Likewise.
> >> (vect_enhance_data_refs_alignment): Likewise.
> >> (vect_find_same_alignment_drs): Likewise.
> >> (vect_duplicate_ssa_name_ptr_info): Likewise.
> >> (vect_setup_realignment): Likewise.
> >> (vect_can_force_dr_alignment_p): Change alignment parameter type to
> >> poly_uint64.
> >> * tree-vect-loop-manip.c (get_misalign_in_elems): Learn to 
> >> construct a mask
> >> with a compile time variable vector alignment.
> >> (vect_gen_prolog_loop_niters): Adapt to vector alignment being a 
> >> poly int.
> >> (vect_do_peeling): Exit early if vector alignment is not constant.
> >> * tree-vect-stmts.c (ensure_base_align): Adapt to vector alignment 
> >> being a
> >> poly int.
> >> (vectorizable_store): Likewise.
> >> (vectorizable_load): Likweise.
> >> * tree-vectorizer.h (struct dr_vec_info): Make target

Re: [PATCH] Simplify floating point comparisons

On Fri, Nov 9, 2018 at 6:05 PM Wilco Dijkstra  wrote:
>
> Richard Biener wrote:
> >Marc Glisse wrote:
> >> Let's try with C = DBL_MIN and x = 婊BL_MAX. I don't believe it involves
> >> signed zeros or infinities, just an underflow. First, the result depends on
> >> the rounding mode. And in the default round-to-nearest, both divisions give
> >> 0, and thus compare the same with 0, but we replace that with a sign test 
> >> on
> >> x, where they clearly give opposite answers.
> >>
> >> What would be the proper flag to test to check if we care about underflow?
> >
> > We have none specific so this makes it flag_unsafe_math_optimizations.
>
> Right I have added the unsafe math check again like in the previous version:
>
>
> The patch implements some of the optimizations discussed in
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71026.
>
> Simplify (C / x >= 0.0) into x >= 0.0 with -funsafe-math-optimizations
> (since C / x can underflow to zero if x is huge, it's not safe otherwise).
> If C is negative the comparison is reversed.
>
>
> Simplify (x * C1) > C2 into x > (C2 / C1) with -funsafe-math-optimizations.
> If C1 is negative the comparison is reversed.
>
> OK for commit?

OK.

Thanks,
Richard.

> ChangeLog
> 2018-11-09  Wilco Dijkstra  
> Jackson Woodruff  
>
> gcc/
> PR 71026/tree-optimization
> * match.pd: Simplify floating point comparisons.
>
> gcc/testsuite/
> PR 71026/tree-optimization
> * gcc.dg/div-cmp-1.c: New test.
> * gcc.dg/div-cmp-2.c: New test.
>
> --
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 
> 94fbab841f5e36bd33fda849a686fd80886ee1ff..f6c76510f95be2485e5bacd07edab336705cbd25
>  100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -405,6 +405,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (rdiv @0 (negate @1))
>   (rdiv (negate @0) @1))
>
> +(if (flag_unsafe_math_optimizations)
> + /* Simplify (C / x op 0.0) to x op 0.0 for C != 0, C != Inf/Nan.
> +Since C / x may underflow to zero, do this only for unsafe math.  */
> + (for op (lt le gt ge)
> +  neg_op (gt ge lt le)
> +  (simplify
> +   (op (rdiv REAL_CST@0 @1) real_zerop@2)
> +   (if (!HONOR_SIGNED_ZEROS (@1) && !HONOR_INFINITIES (@1))
> +(switch
> + (if (real_less (&dconst0, TREE_REAL_CST_PTR (@0)))
> +  (op @1 @2))
> + /* For C < 0, use the inverted operator.  */
> + (if (real_less (TREE_REAL_CST_PTR (@0), &dconst0))
> +  (neg_op @1 @2)))
> +
>  /* Optimize (X & (-A)) / A where A is a power of 2, to X >> log2(A) */
>  (for div (trunc_div ceil_div floor_div round_div exact_div)
>   (simplify
> @@ -4049,6 +4064,22 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> (rdiv @2 @1))
> (rdiv (op @0 @2) @1)))
>
> + (for cmp (lt le gt ge)
> +  neg_cmp (gt ge lt le)
> +  /* Simplify (x * C1) cmp C2 -> x cmp (C2 / C1), where C1 != 0.  */
> +  (simplify
> +   (cmp (mult @0 REAL_CST@1) REAL_CST@2)
> +   (with
> +{ tree tem = const_binop (RDIV_EXPR, type, @2, @1); }
> +(if (tem
> +&& !(REAL_VALUE_ISINF (TREE_REAL_CST (tem))
> + || (real_zerop (tem) && !real_zerop (@1
> + (switch
> +  (if (real_less (&dconst0, TREE_REAL_CST_PTR (@1)))
> +   (cmp @0 { tem; }))
> +  (if (real_less (TREE_REAL_CST_PTR (@1), &dconst0))
> +   (neg_cmp @0 { tem; })))
> +
>   /* Simplify sqrt(x) * sqrt(y) -> sqrt(x*y).  */
>   (for root (SQRT CBRT)
>(simplify
> diff --git a/gcc/testsuite/gcc.dg/div-cmp-1.c 
> b/gcc/testsuite/gcc.dg/div-cmp-1.c
> new file mode 100644
> index 
> ..cd1a5cd3d6fee5a10e9859ca99b344fa3fdb7f5f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/div-cmp-1.c
> @@ -0,0 +1,29 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -funsafe-math-optimizations -fdump-tree-optimized-raw" 
> } */
> +
> +int
> +cmp_mul_1 (float x)
> +{
> +  return x * 3 <= 100;
> +}
> +
> +int
> +cmp_mul_2 (float x)
> +{
> +  return x * -5 > 100;
> +}
> +
> +int
> +div_cmp_1 (float x, float y)
> +{
> +  return x / 3 <= y;
> +}
> +
> +int
> +div_cmp_2 (float x, float y)
> +{
> +  return x / 3 <= 1;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "mult_expr" 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-not "rdiv_expr" "optimized" } } */
> diff --git a/gcc/testsuite/gcc.dg/div-cmp-2.c 
> b/gcc/testsuite/gcc.dg/div-cmp-2.c
> new file mode 100644
> index 
> ..f4ac42a196a804747d0b578e0aa2131671c8d3cf
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/div-cmp-2.c
> @@ -0,0 +1,29 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -funsafe-math-optimizations -ffinite-math-only 
> -fdump-tree-optimized-raw" } */
> +
> +int
> +cmp_1 (float x)
> +{
> +  return 5 / x >= 0;
> +}
> +
> +int
> +cmp_2 (float x)
> +{
> +  return 1 / x <= 0;
> +}
> +
> +int
> +cmp_3 (float x)
> +{
> +  return -2 / x >= 0;
> +}
> +
> +int
> +cmp_4 (float x)
> +{
> +  return -5 / x <= 0;
> +}
> +
> +/* { dg-final { scan-tree-dump-not "rdiv_expr" "optimized" } } */
>

[PATCH] Replace sync builtins with atomic builtins

2018-11-12 Thread Janne Blomqvist

The old __sync builtins have been deprecated for a long time now in
favor of the __atomic builtins following the C++11/C11 memory model.
This patch converts libgfortran to use the modern __atomic builtins.

At the same time I weakened the consistency to relaxed for
incrementing and decrementing the counter, and acquire-release when
decrementing to check whether the counter is 0 and the unit can be
freed.  This is similar to e.g. std::shared_ptr in C++.  Jakub, as the
original author of the algorithm, do you concur?

Regtested on x86_64-pc-linux-gnu, Ok for trunk?

libgfortran/ChangeLog:

2018-11-12  Janne Blomqvist  

* acinclude.m4 (LIBGFOR_CHECK_ATOMIC_FETCH_ADD): Rename and test
presence of atomic builtins instead of sync builtins.
* configure.ac (LIBGFOR_CHECK_ATOMIC_FETCH_ADD): Call new test.
* io/io.h (inc_waiting_locked): Use __atomic_fetch_add.
(predec_waiting_locked): Use __atomic_add_fetch.
(dec_waiting_unlocked): Use __atomic_fetch_add.
* config.h.in: Regenerated.
* configure: Regenerated.
---
 libgfortran/acinclude.m4 | 20 ++--
 libgfortran/configure.ac |  4 ++--
 libgfortran/io/io.h  | 24 ++--
 3 files changed, 30 insertions(+), 18 deletions(-)

diff --git a/libgfortran/acinclude.m4 b/libgfortran/acinclude.m4
index dd5429ac0d2..5b0c094e716 100644
--- a/libgfortran/acinclude.m4
+++ b/libgfortran/acinclude.m4
@@ -59,17 +59,17 @@ extern void bar(void) __attribute__((alias("foo")));]],
   [Define to 1 if the target supports __attribute__((alias(...))).])
   fi])
 
-dnl Check whether the target supports __sync_fetch_and_add.
-AC_DEFUN([LIBGFOR_CHECK_SYNC_FETCH_AND_ADD], [
-  AC_CACHE_CHECK([whether the target supports __sync_fetch_and_add],
-libgfor_cv_have_sync_fetch_and_add, [
+dnl Check whether the target supports __atomic_fetch_add.
+AC_DEFUN([LIBGFOR_CHECK_ATOMIC_FETCH_ADD], [
+  AC_CACHE_CHECK([whether the target supports __atomic_fetch_add],
+libgfor_cv_have_atomic_fetch_add, [
   AC_LINK_IFELSE([AC_LANG_PROGRAM([[int foovar = 0;]], [[
-if (foovar <= 0) return __sync_fetch_and_add (&foovar, 1);
-if (foovar > 10) return __sync_add_and_fetch (&foovar, -1);]])],
- libgfor_cv_have_sync_fetch_and_add=yes, 
libgfor_cv_have_sync_fetch_and_add=no)])
-  if test $libgfor_cv_have_sync_fetch_and_add = yes; then
-AC_DEFINE(HAVE_SYNC_FETCH_AND_ADD, 1,
- [Define to 1 if the target supports __sync_fetch_and_add])
+if (foovar <= 0) return __atomic_fetch_add (&foovar, 1, __ATOMIC_ACQ_REL);
+if (foovar > 10) return __atomic_add_fetch (&foovar, -1, 
__ATOMIC_ACQ_REL);]])],
+ libgfor_cv_have_atomic_fetch_add=yes, 
libgfor_cv_have_atomic_fetch_add=no)])
+  if test $libgfor_cv_have_atomic_fetch_add = yes; then
+AC_DEFINE(HAVE_ATOMIC_FETCH_ADD, 1,
+ [Define to 1 if the target supports __atomic_fetch_add])
   fi])
 
 dnl Check for pragma weak.
diff --git a/libgfortran/configure.ac b/libgfortran/configure.ac
index 76007d38f6f..30ff8734760 100644
--- a/libgfortran/configure.ac
+++ b/libgfortran/configure.ac
@@ -608,8 +608,8 @@ fi
 LIBGFOR_CHECK_ATTRIBUTE_VISIBILITY
 LIBGFOR_CHECK_ATTRIBUTE_ALIAS
 
-# Check out sync builtins support.
-LIBGFOR_CHECK_SYNC_FETCH_AND_ADD
+# Check out atomic builtins support.
+LIBGFOR_CHECK_ATOMIC_FETCH_ADD
 
 # Check out #pragma weak.
 LIBGFOR_GTHREAD_WEAK
diff --git a/libgfortran/io/io.h b/libgfortran/io/io.h
index 902eb412848..282c1455763 100644
--- a/libgfortran/io/io.h
+++ b/libgfortran/io/io.h
@@ -961,8 +961,8 @@ internal_proto(free_ionml);
 static inline void
 inc_waiting_locked (gfc_unit *u)
 {
-#ifdef HAVE_SYNC_FETCH_AND_ADD
-  (void) __sync_fetch_and_add (&u->waiting, 1);
+#ifdef HAVE_ATOMIC_FETCH_ADD
+  (void) __atomic_fetch_add (&u->waiting, 1, __ATOMIC_RELAXED);
 #else
   u->waiting++;
 #endif
@@ -971,8 +971,20 @@ inc_waiting_locked (gfc_unit *u)
 static inline int
 predec_waiting_locked (gfc_unit *u)
 {
-#ifdef HAVE_SYNC_FETCH_AND_ADD
-  return __sync_add_and_fetch (&u->waiting, -1);
+#ifdef HAVE_ATOMIC_FETCH_ADD
+  /* Note that the pattern
+
+ if (predec_waiting_locked (u) == 0)
+ // destroy u
+
+ could be further optimized by making this be an __ATOMIC_RELEASE,
+ and then inserting a
+
+ __atomic_thread_fence (__ATOMIC_ACQUIRE);
+
+ inside the branch before destroying.  But for now, lets keep it
+ simple.  */
+  return __atomic_add_fetch (&u->waiting, -1, __ATOMIC_ACQ_REL);
 #else
   return --u->waiting;
 #endif
@@ -981,8 +993,8 @@ predec_waiting_locked (gfc_unit *u)
 static inline void
 dec_waiting_unlocked (gfc_unit *u)
 {
-#ifdef HAVE_SYNC_FETCH_AND_ADD
-  (void) __sync_fetch_and_add (&u->waiting, -1);
+#ifdef HAVE_ATOMIC_FETCH_ADD
+  (void) __atomic_fetch_add (&u->waiting, -1, __ATOMIC_RELAXED);
 #else
   __gthread_mutex_lock (&unit_lock);
   u->waiting--;
-- 
2.17.1

Re: [PATCH][cunroll] Add unroll-known-loop-iterations-only param and use it in aarch64

On Fri, Nov 9, 2018 at 6:57 PM Kyrill Tkachov
 wrote:
>
> On 09/11/18 12:18, Richard Biener wrote:
> > On Fri, Nov 9, 2018 at 11:47 AM Kyrill Tkachov
> >  wrote:
> >>
> >> Hi all,
> >>
> >> In this testcase the codegen for VLA SVE is worse than it could be due to 
> >> unrolling:
> >>
> >> fully_peel_me:
> >>  mov x1, 5
> >>  ptrue   p1.d, all
> >>  whilelo p0.d, xzr, x1
> >>  ld1dz0.d, p0/z, [x0]
> >>  faddz0.d, z0.d, z0.d
> >>  st1dz0.d, p0, [x0]
> >>  cntdx2
> >>  addvl   x3, x0, #1
> >>  whilelo p0.d, x2, x1
> >>  beq .L1
> >>  ld1dz0.d, p0/z, [x0, #1, mul vl]
> >>  faddz0.d, z0.d, z0.d
> >>  st1dz0.d, p0, [x3]
> >>  cntwx2
> >>  incbx0, all, mul #2
> >>  whilelo p0.d, x2, x1
> >>  beq .L1
> >>  ld1dz0.d, p0/z, [x0]
> >>  faddz0.d, z0.d, z0.d
> >>  st1dz0.d, p0, [x0]
> >> .L1:
> >>  ret
> >>
> >> In this case, due to the vector-length-agnostic nature of SVE the compiler 
> >> doesn't know the loop iteration count.
> >> For such loops we don't want to unroll if we don't end up eliminating 
> >> branches as this just bloats code size
> >> and hurts icache performance.
> >>
> >> This patch introduces a new unroll-known-loop-iterations-only param that 
> >> disables cunroll when the loop iteration
> >> count is unknown (SCEV_NOT_KNOWN). This case occurs much more often for 
> >> SVE VLA code, but it does help some
> >> Advanced SIMD cases as well where loops with an unknown iteration count 
> >> are not unrolled when it doesn't eliminate
> >> the branches.
> >>
> >> So for the above testcase we generate now:
> >> fully_peel_me:
> >>  mov x2, 5
> >>  mov x3, x2
> >>  mov x1, 0
> >>  whilelo p0.d, xzr, x2
> >>  ptrue   p1.d, all
> >> .L2:
> >>  ld1dz0.d, p0/z, [x0, x1, lsl 3]
> >>  faddz0.d, z0.d, z0.d
> >>  st1dz0.d, p0, [x0, x1, lsl 3]
> >>  incdx1
> >>  whilelo p0.d, x1, x3
> >>  bne .L2
> >>  ret
> >>
> >> Not perfect still, but it's preferable to the original code.
> >> The new param is enabled by default on aarch64 but disabled for other 
> >> targets, leaving their behaviour unchanged
> >> (until other target people experiment with it and set it, if appropriate).
> >>
> >> Bootstrapped and tested on aarch64-none-linux-gnu.
> >> Benchmarked on SPEC2017 on a Cortex-A57 and there are no differences in 
> >> performance.
> >>
> >> Ok for trunk?
> >
> > Hum.  Why introduce a new --param and not simply key on
> > flag_peel_loops instead?  That is
> > enabled by default at -O3 and with FDO but you of course can control
> > that in your targets
> > post-option-processing hook.
>
> You mean like this?
> It's certainly a simpler patch, but I was just a bit hesitant of making this 
> change for all targets :)
> But I suppose it's a reasonable change.

No, that change is backward.  What I said is that peeling is already
conditional on
flag_peel_loops and that is enabled by -O3.  So you want to disable
flag_peel_loops for
SVE instead in the target.

> >
> > It might also make sense to have more fine-grained control for this
> > and allow a target
> > to say whether it wants to peel a specific loop or not when the
> > middle-end thinks that
> > would be profitable.
>
> Can be worth looking at as a follow-up. Do you envisage the target analysing
> the gimple statements of the loop to figure out its cost?

Kind-of.  Sth like

  bool targetm.peel_loop (struct loop *);

I have no idea whether you can easily detect a SVE vectorized loop though.
Maybe there's always a special IV or so (the mask?)

Richard.

> Thanks,
> Kyrill
>
>
> 2018-11-09  Kyrylo Tkachov  
>
> * tree-ssa-loop-ivcanon.c (try_unroll_loop_completely): Do not unroll
> loop when number of iterations is not known and flag_peel_loops is in
> effect.
>
> 2018-11-09  Kyrylo Tkachov  
>
> * gcc.target/aarch64/sve/unroll-1.c: New test.
>

Re: [PATCH, GCC, AArch64] Branch Dilution Pass

On Fri, Nov 9, 2018 at 6:23 PM Sudakshina Das  wrote:
>
> Hi
>
> I am posting this patch on behalf of Carey (cc'ed). I also have some
> review comments that I will make as a reply to this later.
>
>
> This implements a new AArch64 specific back-end pass that helps optimize
> branch-dense code, which can be a bottleneck for performance on some Arm
> cores. This is achieved by padding out the branch-dense sections of the
> instruction stream with nops.

Wouldn't this be more suitable for implementing inside the assembler?

> This has proven to show up to a 2.61%~ improvement on the Cortex A-72
> (SPEC CPU 2006: sjeng).
>
> The implementation includes the addition of a new RTX instruction class
> FILLER_INSN, which has been white listed to allow placement of NOPs
> outside of a basic block. This is to allow padding after unconditional
> branches. This is favorable so that any performance gained from
> diluting branches is not paid straight back via excessive eating of nops.
>
> It was deemed that a new RTX class was less invasive than modifying
> behavior in regards to standard UNSPEC nops.
>
> ## Command Line Options
>
> Three new target-specific options are provided:
> - mbranch-dilution
> - mbranch-dilution-granularity={num}
> - mbranch-dilution-max-branches={num}
>
> A number of cores known to be able to benefit from this pass have been
> given default tuning values for their granularity and max-branches.
> Each affected core has a very specific granule size and associated
> max-branch limit. This is a microarchitecture specific optimization.
> Typical usage should be -mdilute-branches with a specificed -mcpu. Cores
> with a granularity tuned to 0 will be ignored. Options are provided for
> experimentation.
>
> ## Algorithm and Heuristic
>
> The pass takes a very simple 'sliding window' approach to the problem.
> We crawl through each instruction (starting at the first branch) and
> keep track of the number of branches within the current "granule" (or
> window). When this exceeds the max-branch value, the pass will dilute
> the current granule, inserting nops to push out some of the branches.
> The heuristic will favour unconditonal branches (for performance
> reasons), or branches that are between two other branches (in order to
> decrease the likelihood of another dilution call being needed).
>
> Each branch type required a different method for nop insertion due to
> RTL/basic_block restrictions:
>
> - Returning calls do not end a basic block so can be handled by emitting
> a generic nop.
> - Unconditional branches must be the end of a basic block, and nops
> cannot be outside of a basic block.
>Thus the need for FILLER_INSN, which allows placement outside of a
> basic block - and translates to a nop.
> - For most conditional branches we've taken a simple approach and only
> handle the fallthru edge for simplicity,
>which we do by inserting a "nop block" of nops on the fallthru edge,
> mapping that back to the original destination block.
> - asm gotos and pcsets are going to be tricky to analyse from a dilution
> perspective so are ignored at present.
>
>
> ## Changelog
>
> gcc/testsuite/ChangeLog:
>
> 2018-11-09  Carey Williams  
>
> * gcc.target/aarch64/branch-dilution-off.c: New test.
> * gcc.target/aarch64/branch-dilution-on.c: New test.
>
>
> gcc/ChangeLog:
>
> 2018-11-09  Carey Williams  
>
> * cfgbuild.c (inside_basic_block_p): Add FILLER_INSN case.
> * cfgrtl.c (rtl_verify_bb_layout): Whitelist FILLER_INSN outside
> basic blocks.
> * config.gcc (extra_objs): Add aarch64-branch-dilution.o.
> * config/aarch64/aarch64-branch-dilution.c: New file.
> * config/aarch64/aarch64-passes.def (branch-dilution): Register
> pass.
> * config/aarch64/aarch64-protos.h (struct tune_params): Declare
> tuning parameters bdilution_gsize and bdilution_maxb.
> (make_pass_branch_dilution): New declaration.
> * config/aarch64/aarch64.c (generic_tunings,cortexa35_tunings,
> cortexa53_tunings,cortexa57_tunings,cortexa72_tunings,
> cortexa73_tunings,exynosm1_tunings,thunderxt88_tunings,
> thunderx_tunings,tsv110_tunings,xgene1_tunings,
> qdf24xx_tunings,saphira_tunings,thunderx2t99_tunings):
> Provide default tunings for bdilution_gsize and bdilution_maxb.
> * config/aarch64/aarch64.md (filler_insn): Define new insn.
> * config/aarch64/aarch64.opt (mbranch-dilution,
> mbranch-dilution-granularity,
> mbranch-dilution-max-branches): Define new branch dilution
> options.
> * config/aarch64/t-aarch64 (aarch64-branch-dilution.c): New rule
> for aarch64-branch-dilution.c.
> * coretypes.h (rtx_filler_insn): New rtx class.
> * doc/invoke.texi (mbranch-dilution,
> mbranch-dilution-granularity,
> mbranch-dilution-max-branches): Document branch dilution
> options.
> * emit-rtl.c (em

Re: [PATCH] Add sinh(tanh(x)) and cosh(tanh(x)) rules

On Sat, Nov 10, 2018 at 6:36 AM Segher Boessenkool
 wrote:
>
> On Fri, Nov 09, 2018 at 01:03:55PM -0700, Jeff Law wrote:
> > >> And signed zeroes.  Yeah.  I think it would have to be
> > >> flag_unsafe_math_optimizations + some more.
> > >
> > > Indeed.
> > So we need to give Giuliano some clear guidance on guarding.  This is
> > out of my area of expertise, so looking to y'all to help here.
>
> IMO, it needs flag_unsafe_optimizations, as above; and it needs to be
> investigated which (if any) options like flag_signed_zeros it needs in
> addition to that.  It needs an option like that whenever the new expression
> can give a zero with a different sign than the original expression, etc.
> Although it could be said that flag_unsafe_optimizations supercedes all
> of that.  It isn't clear.

It indeed isn't clear whether at least some of the other flags make no
sense with -funsafe-math-optimizations.  Still at least for
documentation purposes
please use !flag_siged_zeros && flag_unsafe_math_optimizations && ...

flag_unsafe_math_optimizations is generally used when there's extra rounding
involved.  Some specific kind of transforms have individual flags and do not
require flag_unsafe_math_optimizations (re-association and contraction
for example).

I'm not sure I would require flag_unsafe_math_optimizations for a 2ulp
error though.

Richard.

>
> Segher

Re: [PATCH] Fix ICE with -fopt-info-inline (PR ipa/87955)

On Sun, Nov 11, 2018 at 2:33 AM David Malcolm  wrote:
>
> PR ipa/87955 reports a problem I introduced in r265920, where I converted
> the guard in report_inline_failed_reason from using:
>   if (dump_file)
> to using
>   if (dump_enabled_p ()).
> without updating the calls to cl_target_option_print_diff and
> cl_optimization_print_diff, which assume that dump_file is non-NULL.
>
> The functions are auto-generated.  Rather than porting them to the dump
> API, this patch applies the workaround of adding the missing checks on
> dump_file before calling them.
>
> Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
>
> OK for trunk?

OK.

Richard.

> gcc/ChangeLog:
> PR ipa/87955
> * ipa-inline.c (report_inline_failed_reason): Guard calls to
> cl_target_option_print_diff and cl_optimization_print_diff with
> if (dump_file).
>
> gcc/testsuite/ChangeLog:
> PR ipa/87955
> * gcc.target/i386/pr87955.c: New test.
> ---
>  gcc/ipa-inline.c| 14 --
>  gcc/testsuite/gcc.target/i386/pr87955.c | 10 ++
>  2 files changed, 18 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr87955.c
>
> diff --git a/gcc/ipa-inline.c b/gcc/ipa-inline.c
> index e04ede7..173808a 100644
> --- a/gcc/ipa-inline.c
> +++ b/gcc/ipa-inline.c
> @@ -244,13 +244,15 @@ report_inline_failed_reason (struct cgraph_edge *e)
>e->callee->ultimate_alias_target 
> ()->lto_file_data->file_name);
> }
>if (e->inline_failed == CIF_TARGET_OPTION_MISMATCH)
> -   cl_target_option_print_diff
> -(dump_file, 2, target_opts_for_fn (e->caller->decl),
> -  target_opts_for_fn (e->callee->ultimate_alias_target ()->decl));
> +   if (dump_file)
> + cl_target_option_print_diff
> +   (dump_file, 2, target_opts_for_fn (e->caller->decl),
> +target_opts_for_fn (e->callee->ultimate_alias_target ()->decl));
>if (e->inline_failed == CIF_OPTIMIZATION_MISMATCH)
> -   cl_optimization_print_diff
> - (dump_file, 2, opts_for_fn (e->caller->decl),
> -  opts_for_fn (e->callee->ultimate_alias_target ()->decl));
> +   if (dump_file)
> + cl_optimization_print_diff
> +   (dump_file, 2, opts_for_fn (e->caller->decl),
> +opts_for_fn (e->callee->ultimate_alias_target ()->decl));
>  }
>  }
>
> diff --git a/gcc/testsuite/gcc.target/i386/pr87955.c 
> b/gcc/testsuite/gcc.target/i386/pr87955.c
> new file mode 100644
> index 000..ed87da6
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr87955.c
> @@ -0,0 +1,10 @@
> +/* { dg-options "-O2 -fopt-info-inline-missed" } */
> +
> +float a;
> +
> +__attribute__((__target__("fpmath=387")))
> +int b() {
> +  return a;
> +}
> +
> +int c() { return b(); } /* { dg-missed "not inlinable: c/\[0-9\]* -> 
> b/\[0-9\]*, target specific option mismatch" } */
> --
> 1.8.5.3
>

Re: RFA: vectorizer patches 1/2 : WIDEN_MULT_PLUS support

On Sun, Nov 11, 2018 at 8:21 AM Joern Wolfgang Rennecke
 wrote:
>
> Our target (eSi-RISC) doesn't have DOT_PROD_EXPR or WIDEN_SUM_EXPR
> operations in
> the standard vector modes; however, it has a vectorized WIDEN_MULT_PLUS_EXPR
> implementation with a double-vector output, which works just as well,
> with a little
> help from the compiler - as implemented in these patches.

I guess I already asked this question when WIDEN_MULT_PLUS_EXPR was
introduced - but isn't that fully contained within a DOT_PROD_EXPR?

Some comments on the patch.

+  tree vecotype
+= build_vector_type (otype, GET_MODE_NUNITS (TYPE_MODE (vecitype)));

TYPE_VECTOR_SUBPARTS (vecitype)

You want to pass in the half/full types and use get_vectype_for_scalar_type
which also makes sure the target supports the vector type.

I think you want to extend and re-use supportable_widening_operation
here anyways.

Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   (revision 266008)
+++ gcc/tree-vect-stmts.c   (working copy)
@@ -10638,7 +10638,11 @@ vect_get_vector_types_for_stmt (stmt_vec
   scalar_type);

   if (maybe_ne (GET_MODE_SIZE (TYPE_MODE (vectype)),
-   GET_MODE_SIZE (TYPE_MODE (nunits_vectype
+   GET_MODE_SIZE (TYPE_MODE (nunits_vectype)))
+  /* Reductions that use a widening reduction would show
+a mismatch but that's already been checked to be OK.  */
+  && STMT_VINFO_DEF_TYPE (stmt_info) != vect_reduction_def)
+
 return opt_result::failure_at (stmt,
   "not vectorized: different sized vector "
   "types in statement, %T and %T\n",

that change doesn't look good.

> Bootstrapped and regtested on i686-pc-linux-gnu.

Re: RFA: vectorizer patches 2/2: reduction splitting

On Sun, Nov 11, 2018 at 9:16 AM Joern Wolfgang Rennecke
 wrote:
>
> It's nice to use the processors vector arithmetic to good effect, but
> it's all for naught when
> there are too many moves from/to general registers cluttering up the
> loop.  With a
> double-vector reduction variable, the standard final reduction code got
> so awkward that
> the register allocator decided that the reduction variable must live in
> general purpose
> registers, not only after the loop, but across the loop patch.
> Splitting the reduction to force the first step to be done as a vector
> operation
> seemed the obvious solution. The hook was called, but the vectorizer still
> generated the vanilla final reduction code.  It turns out that the
> reduction splitting
> was calculated, but the result not used, and the calculation started anew.
>
> The attached patch fixes this.

That looks quite fragile to me or warrants further cleanups.  Can you
push up the new_phis.length assert further and elide the loop over
the PHIs?  It looks like at the very beginning we are reducing the
PHIs to a single PHI and new_phi_result is the one to look at
(and the vector is updated, but given we replace the PHI with an
assign using new_phi_result instead of the vector would be better).

RIchard.

> bootstrapped and regression tested on x86_64-pc-linux-gnu .

Re: [RS6000] Don't pass -many to the assembler

2018-11-12 Thread Michael Matz

Hi,

On Mon, 12 Nov 2018, Alan Modra wrote:

> I'd like to remove -many from the options passed by default to the 
> assembler, on the grounds that a gcc bug in instruction selection (eg. 
> emitting a power9 insn for -mcpu=power8) is better found at assembly 
> time than run time.
> 
> This might annoy people for a while fixing user asm that we didn't 
> diagnose previously, but I believe this is the right direction to go. Of 
> course, -Wa,-many is available for anyone who just wants their dodgy old 
> code to work.

Wouldn't this also break compiling code that contains power9 instructions 
but guarded by runtime tests to only be executed on power9 machines?  That 
seems a valid usecase, and it'd be bad if the assembler fails to compile 
such.  (You can't use -mcpu=power9 as work around as the other 
unguarded code is not supposed to be using power9 insns).


Ciao,
Michael.

> 
> Bootstrapped etc. powerpc64le-linux.  OK?
> 
>   * config/rs6000/rs6000.h (ASM_CPU_SPEC): Remove -many.
>   * config/rs6000/aix61.h (ASM_CPU_SPEC): Likewise.
>   * config/rs6000/aix71.h (ASM_CPU_SPEC): Likewise.
>   * testsuite/gcc.target/powerpc/ppc32-abi-dfp-1.c: Don't use
>   power mnemonics.
> 
> diff --git a/gcc/config/rs6000/aix61.h b/gcc/config/rs6000/aix61.h
> index 353e5d6cfeb..a7a8246bfe3 100644
> --- a/gcc/config/rs6000/aix61.h
> +++ b/gcc/config/rs6000/aix61.h
> @@ -91,8 +91,7 @@ do {
> \
>  %{mcpu=630: -m620} \
>  %{mcpu=970: -m970} \
>  %{mcpu=G5: -m970} \
> -%{mvsx: %{!mcpu*: -mpwr6}} \
> --many"
> +%{mvsx: %{!mcpu*: -mpwr6}}"
>  
>  #undef   ASM_DEFAULT_SPEC
>  #define ASM_DEFAULT_SPEC "-mpwr4"
> diff --git a/gcc/config/rs6000/aix71.h b/gcc/config/rs6000/aix71.h
> index 2398ed64baa..d2ca8dc275d 100644
> --- a/gcc/config/rs6000/aix71.h
> +++ b/gcc/config/rs6000/aix71.h
> @@ -89,8 +89,7 @@ do {
> \
>   maltivec: -m970; \
>   maix64|mpowerpc64: -mppc64; \
>   : %(asm_default)}; \
> -  :%eMissing -mcpu option in ASM_SPEC_CPU?\n} \
> --many"
> +  :%eMissing -mcpu option in ASM_SPEC_CPU?\n}"
>  
>  #undef   ASM_DEFAULT_SPEC
>  #define ASM_DEFAULT_SPEC "-mpwr4"
> diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
> index d75137cf8f5..9d78173a680 100644
> --- a/gcc/config/rs6000/rs6000.h
> +++ b/gcc/config/rs6000/rs6000.h
> @@ -137,8 +137,7 @@
>   mvsx: -mpower7; \
>   mpowerpc64: -mppc64;: %(asm_default)}; \
>:%eMissing -mcpu option in ASM_SPEC_CPU?\n} \
> -%{mvsx: -mvsx -maltivec; maltivec: -maltivec} \
> --many"
> +%{mvsx: -mvsx -maltivec; maltivec: -maltivec}"
>  
>  #define CPP_DEFAULT_SPEC ""
>  
> diff --git a/gcc/testsuite/gcc.target/powerpc/ppc32-abi-dfp-1.c 
> b/gcc/testsuite/gcc.target/powerpc/ppc32-abi-dfp-1.c
> index 14908dba690..eea7f6ffc2e 100644
> --- a/gcc/testsuite/gcc.target/powerpc/ppc32-abi-dfp-1.c
> +++ b/gcc/testsuite/gcc.target/powerpc/ppc32-abi-dfp-1.c
> @@ -45,14 +45,14 @@ __asm__ ("\t.globl\t" #NAME "_asm\n\t"
> \
>#NAME "_asm:\n\t"  \
>"lis 11,gparms@ha\n\t" \
>"la 11,gparms@l(11)\n\t"   \
> -  "st 3,0(11)\n\t"   \
> -  "st 4,4(11)\n\t"   \
> -  "st 5,8(11)\n\t"   \
> -  "st 6,12(11)\n\t"  \
> -  "st 7,16(11)\n\t"  \
> -  "st 8,20(11)\n\t"  \
> -  "st 9,24(11)\n\t"  \
> -  "st 10,28(11)\n\t" \
> +  "stw 3,0(11)\n\t"  \
> +  "stw 4,4(11)\n\t"  \
> +  "stw 5,8(11)\n\t"  \
> +  "stw 6,12(11)\n\t" \
> +  "stw 7,16(11)\n\t" \
> +  "stw 8,20(11)\n\t" \
> +  "stw 9,24(11)\n\t" \
> +  "stw 10,28(11)\n\t"\
>"stfd 1,32(11)\n\t"\
>"stfd 2,40(11)\n\t"\
>"stfd 3,48(11)\n\t"\
> 
>

Re: [RFC][PR87528][PR86677] Disable builtin popcount detection when back-end does not define it

On Mon, Nov 12, 2018 at 6:21 AM Kugan Vivekanandarajah
 wrote:
>
> Hi Richard,
>
> Thanks for the review.
> On Thu, 8 Nov 2018 at 00:03, Richard Biener  
> wrote:
> >
> > On Fri, Nov 2, 2018 at 10:02 AM Kugan Vivekanandarajah
> >  wrote:
> > >
> > > Hi Richard,
> > > Thanks for the review.
> > > On Tue, 30 Oct 2018 at 01:25, Richard Biener  
> > > wrote:
> > > >
> > > > On Mon, Oct 29, 2018 at 2:06 AM Kugan Vivekanandarajah
> > > >  wrote:
> > > > >
> > > > > Hi Richard and Jeff,
> > > > >
> > > > > Thanks for your comments.
> > > > >
> > > > > On Fri, 26 Oct 2018 at 19:40, Richard Biener 
> > > > >  wrote:
> > > > > >
> > > > > > On Fri, Oct 26, 2018 at 4:55 AM Jeff Law  wrote:
> > > > > > >
> > > > > > > On 10/25/18 4:33 PM, Kugan Vivekanandarajah wrote:
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > PR87528 showed a case where libgcc generated popcount is causing
> > > > > > > > regression for Skylake.
> > > > > > > > We also have PR86677 where kernel build is failing because the 
> > > > > > > > kernel
> > > > > > > > does not use the libgcc (when backend is not defining popcount
> > > > > > > > pattern).  While I agree that the kernel should implement its 
> > > > > > > > own
> > > > > > > > functionality when it is not using the libgcc, I am afraid that 
> > > > > > > > the
> > > > > > > > implementation can have the same performance issues reported for
> > > > > > > > Skylake in PR87528.
> > > > > > > >
> > > > > > > > Therefore, I would like to propose that we disable popcount 
> > > > > > > > detection
> > > > > > > > when we don't have a pattern for that. The attached patch 
> > > > > > > > (based on
> > > > > > > > previous discussions) does this.
> > > > > > > >
> > > > > > > > Bootstrapped and regression tested on x86_64-linux-gnu with no 
> > > > > > > > new
> > > > > > > > regressions. We need to disable the popcount* testcases. I will 
> > > > > > > > have
> > > > > > > > to define a effective_target_with_popcount in
> > > > > > > > gcc/testsuite/lib/target-supports.exp if this patch is OK?
> > > > > > > > Thanks,
> > > > > > > > Kugan
> > > > > > > >
> > > > > > > >
> > > > > > > > gcc/ChangeLog:
> > > > > > > >
> > > > > > > > 2018-10-25  Kugan Vivekanandarajah  
> > > > > > > >
> > > > > > > > * tree-scalar-evolution.c (expression_expensive_p): Make 
> > > > > > > > BUILTIN POPCOUNT
> > > > > > > > as expensive when backend does not define it.
> > > > > > > >
> > > > > > > >
> > > > > > > > gcc/testsuite/ChangeLog:
> > > > > > > >
> > > > > > > > 2018-10-25  Kugan Vivekanandarajah  
> > > > > > > >
> > > > > > > > * gcc.target/aarch64/popcount4.c: New test.
> > > > > > > >
> > > > > > > FWIW, I've been disabling by checking direct_optab_handler 
> > > > > > > elsewhere
> > > > > > > (number_of_iterations_popcount) in my tester.  It may in fact be 
> > > > > > > an old
> > > > > > > patch from you.
> > > > > > >
> > > > > > > Richi argued that it's the kernel team's responsibility to 
> > > > > > > provide a
> > > > > > > popcount since they don't link with libgcc.  And I'm generally in
> > > > > > > agreement with that position, though it does tend to generate some
> > > > > > > friction with the kernel developers.  We also run the real risk 
> > > > > > > of GCC 9
> > > > > > > not being able to build the kernel which, IMHO, would be a 
> > > > > > > disaster from
> > > > > > > a PR standpoint.
> > > > > > >
> > > > > > > I'd like to hear from others here.  I fully realize we're beyond 
> > > > > > > the
> > > > > > > realm of what is strictly technically correct here from a review 
> > > > > > > standpoint.
> > > > > >
> > > > > > As said final value replacement to a library call is probably not 
> > > > > > wanted
> > > > > > for optimization purpose, so adjusting expression_expensive_p is OK 
> > > > > > with
> > > > > > me.  It might not fully solve the (non-)issue in case another 
> > > > > > optimization pass
> > > > > > chooses to materialize niter computation result.
> > > > > >
> > > > > > Few comments on the patch:
> > > > > >
> > > > > > +  tree fndecl = get_callee_fndecl (expr);
> > > > > > +
> > > > > > +  if (fndecl && DECL_BUILT_IN_CLASS (fndecl) == 
> > > > > > BUILT_IN_NORMAL)
> > > > > > +   {
> > > > > > + combined_fn cfn = as_combined_fn (DECL_FUNCTION_CODE 
> > > > > > (fndecl));
> > > > > >
> > > > > >   combined_fn cfn = gimple_call_combined_fn (expr);
> > > > > >   switch (cfn)
> > > > > > {
> > > > >
> > > > > Did you mean:
> > > > > combined_fn cfn = get_call_combined_fn (expr);
> > > >
> > > > Yes.
> > > >
> > > > > > ...
> > > > > >
> > > > > > cfn will be CFN_LAST for a non-builtin/internal call.  I know 
> > > > > > Richard is mostly
> > > > > > offline but eventually he knows whether there is a better way to 
> > > > > > query
> > > > > >
> > > > > > +   CASE_CFN_POPCOUNT:
> > > > > > + /* Check if opcode for popcount is available.  */
> > > > > > + if (optab_handler (popcount_optab,

[PATCH] Fortran include line fixes and -fdec-include support

2018-11-12 Thread Jakub Jelinek

Hi!

In fortran97.pdf I read:
"Except in a character context, blanks are insignificant and may be used freely 
throughout the program."
and while we handle that in most cases, we don't allow spaces in INCLUDE
lines in fixed form, while e.g. ifort does.

Another thing, which I haven't touched in the PR except covering it with a
testcase is that we allow INLINE line in fixed form to start even in columns
1 to 6, while ifort rejects that.  Is say
 include 'omp_lib.h'
valid in fixed form?  i in column 6 normally means a continuation line,
though not sure if anything can in a valid program contain nclude
followed by character literal.  Shall we reject that, or at least warn that
it won't be portable?

The last thing, biggest part of the patch, is that for legacy DEC
compatibility, the DEC manuals document INCLUDE as a statement, not a line,
the
"An INCLUDE line is not a Fortran statement."
and
"An INCLUDE line shall appear on a single source line where a statement may 
appear; it shall be
the only nonblank text on this line other than an optional trailing comment. 
Thus, a statement
label is not allowed."
bullets don't apply, but instead there is:
"The INCLUDE statement takes one of the following forms:"
"An INCLUDE statement can appear anywhere within a scoping unit. The statement
can span more than one source line, but no other statement can appear on the 
same
line. The source line cannot be labeled."

This means there can be (as can be seen in the following testcases)
continuations in both forms, and in fixed form there can be 0 in column 6.

In order not to duplicate all the handling of continuations, comment
skipping etc., the patch just adjusts the include_line routine so that it
signals if the current line is a possible start of a valid INCLUDE statement
when in -fdec-include mode, and if so, whenever it reads a further line it
retries to parse it using
gfc_next_char/gfc_next_char_literal/gfc_gobble_whitespace APIs as an INCLUDE
stmt.  If it is found not to be a valid INCLUDE statement line or set of
lines, it returns 0, if it is valid, it returns 1 together with load_file
like include_line does and clears all the lines containint the INCLUDE
statement.  If the reading stops because we don't have enough lines, -1 is
returned and the caller tries again with more lines.

Tested on x86_64-linux, ok for trunk if it passes full bootstrap/regtest?

In addition to the above mentioned question about include in columns 1-6 in
fixed form, another thing is that we support
  print *, 'abc''def'
  print *, "hij""klm"
which prints abc'def and hij"klm.  Shall we support that for INCLUDE lines
and INCLUDE statements too?

2018-11-12  Jakub Jelinek  
Mark Eggleston  

* lang.opt (fdec-include): New option.
* options.c (set_dec_flags): Set also flag_dec_include.
* scanner.c (include_line): Change return type from bool to int.
In fixed form allow spaces in between include keyword letters.
For -fdec-include, allow in fixed form 0 in column 6.  With
-fdec-include return -1 if the parsed line is not full include
statement and it could be successfully completed on continuation
lines.
(include_stmt): New function.
(load_file): Adjust include_line caller.  If it returns -1, keep
trying include_stmt until it stops returning -1 whenever adding
further line of input.

* gfortran.dg/include_10.f: New test.
* gfortran.dg/include_10.inc: New file.
* gfortran.dg/include_11.f: New test.
* gfortran.dg/include_12.f: New test.
* gfortran.dg/include_13.f90: New test.
* gfortran.dg/gomp/include_1.f: New test.
* gfortran.dg/gomp/include_1.inc: New file.
* gfortran.dg/gomp/include_2.f90: New test.

--- gcc/fortran/lang.opt.jj 2018-07-18 22:57:15.227785894 +0200
+++ gcc/fortran/lang.opt2018-11-12 09:35:03.185259773 +0100
@@ -440,6 +440,10 @@ fdec
 Fortran Var(flag_dec)
 Enable all DEC language extensions.
 
+fdec-include
+Fortran Var(flag_dec_include)
+Enable legacy parsing of INCLUDE as statement.
+
 fdec-intrinsic-ints
 Fortran Var(flag_dec_intrinsic_ints)
 Enable kind-specific variants of integer intrinsic functions.
--- gcc/fortran/options.c.jj2018-11-06 18:27:13.828831733 +0100
+++ gcc/fortran/options.c   2018-11-12 09:35:39.515655453 +0100
@@ -68,6 +68,7 @@ set_dec_flags (int value)
   flag_dec_intrinsic_ints |= value;
   flag_dec_static |= value;
   flag_dec_math |= value;
+  flag_dec_include |= value;
 }
 
 
--- gcc/fortran/scanner.c.jj2018-05-08 13:56:41.691932534 +0200
+++ gcc/fortran/scanner.c   2018-11-12 15:21:51.249391936 +0100
@@ -2135,14 +2135,18 @@ static bool load_file (const char *, con
 /* include_line()-- Checks a line buffer to see if it is an include
line.  If so, we call load_file() recursively to load the included
file.  We never return a syntax error because a statement like
-   "include = 5" is perfec

Re: [RS6000] Don't pass -many to the assembler

2018-11-12 Thread Andreas Schwab

On Nov 12 2018, Michael Matz  wrote:

> Wouldn't this also break compiling code that contains power9 instructions 
> but guarded by runtime tests to only be executed on power9 machines?  That 
> seems a valid usecase, and it'd be bad if the assembler fails to compile 
> such.  (You can't use -mcpu=power9 as work around as the other 
> unguarded code is not supposed to be using power9 insns).

You'll need to put .machine directives around them.

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

Re: [PATCH, GCC, ARM] Enable armv8.5-a and add +sb and +predres for previous ARMv8-a in ARM

2018-11-12 Thread Sudakshina Das

Hi Kyrill

On 09/11/18 18:21, Kyrill Tkachov wrote:
> Hi Sudi,
> 
> On 09/11/18 15:33, Sudakshina Das wrote:
>> Hi
>>
>> This patch adds -march=armv8.5-a to the Arm backend.
>> (https://developer.arm.com/products/architecture/cpu-architecture/a-profile/exploration-tools)
>>  
>>
>> Armv8.5-A also adds two new security features:
>> - Speculation Barrier instruction
>> - Execution and Data Prediction Restriction Instructions
>> These are made optional to all older Armv8-A versions. Thus we are
>> adding two new options "+sb" and "+predres" to all older Armv8-A. These
>> are passed on to the assembler and have no code generation effects and
>> have already gone in the trunk of binutils.
>>
>> Bootstrapped and regression tested with arm-none-linux-gnueabihf.
>>
>> Is this ok for trunk?
>> Sudi
>>
>> *** gcc/ChangeLog ***
>>
>> 2018-xx-xx  Sudakshina Das  
>>
>> * config/arm/arm-cpus.in (armv8_5, sb, predres): New features.
>> (ARMv8_5a): New fgroup.
>> (armv8.5-a): New arch.
>> (armv8-a, armv8.1-a, armv8.2-a, armv8.3-a, armv8.4-a): New
>> options sb and predres.
>> * config/arm/arm-tables.opt: Regenerate.
>> * config/arm/t-aprofile: Add matching rules for -march=armv8.5-a
>> * config/arm/t-arm-elf (all_v8_archs): Add armv8.5-a.
>> * config/arm/t-multilib (v8_5_a_simd_variants): New variable.
>> Add matching rules for -march=armv8.5-a and extensions.
>> * doc/invoke.texi (ARM options): Document -march=armv8.5-a.
>> Add sb and predres to all armv8-a except armv8.5-a.
>>
>> *** gcc/testsuite/ChangeLog ***
>>
>> 2018-xx-xx  Sudakshina Das  
>>
>> * gcc.target/arm/multilib.exp: Add some -march=armv8.5-a
>> combination tests.
> 
> Hi
> 
> This patch adds -march=armv8.5-a to the Arm backend.
> (https://developer.arm.com/products/architecture/cpu-architecture/a-profile/exploration-tools)
>  
> 
> Armv8.5-A also adds two new security features:
> - Speculation Barrier instruction
> - Execution and Data Prediction Restriction Instructions
> These are made optional to all older Armv8-A versions. Thus we are
> adding two new options "+sb" and "+predres" to all older Armv8-A. These
> are passed on to the assembler and have no code generation effects and
> have already gone in the trunk of binutils.
> 
> Bootstrapped and regression tested with arm-none-linux-gnueabihf.
> 
> Is this ok for trunk?
> Sudi
> 
> *** gcc/ChangeLog ***
> 
> 2018-xx-xx  Sudakshina Das
> 
>  * config/arm/arm-cpus.in (armv8_5, sb, predres): New features.
>  (ARMv8_5a): New fgroup.
>  (armv8.5-a): New arch.
>  (armv8-a, armv8.1-a, armv8.2-a, armv8.3-a, armv8.4-a): New
>  options sb and predres.
>  * config/arm/arm-tables.opt: Regenerate.
>  * config/arm/t-aprofile: Add matching rules for -march=armv8.5-a
>  * config/arm/t-arm-elf (all_v8_archs): Add armv8.5-a.
>  * config/arm/t-multilib (v8_5_a_simd_variants): New variable.
>  Add matching rules for -march=armv8.5-a and extensions.
>  * doc/invoke.texi (ARM options): Document -march=armv8.5-a.
>  Add sb and predres to all armv8-a except armv8.5-a.
> 
> *** gcc/testsuite/ChangeLog ***
> 
> 2018-xx-xx  Sudakshina Das
> 
>  * gcc.target/arm/multilib.exp: Add some -march=armv8.5-a
>  combination tests.
> 
> 
> 
> This is ok modulo a typo fix below.
> 
> Thanks,
> Kyrill
> 

Thanks. Fixed and committed as r266031.

Sudi

> 
> 
> index 
> 25788ad09851daf41038b1578307bf23b7f34a94..eba038f9d20bc54bef7bdb7fa1c0e7028d954ed7
>  
> 100644
> --- a/gcc/config/arm/t-multilib
> +++ b/gcc/config/arm/t-multilib
> @@ -70,7 +70,8 @@ v8_a_simd_variants    := $(call all_feat_combs, simd 
> crypto)
>   v8_1_a_simd_variants    := $(call all_feat_combs, simd crypto)
>   v8_2_a_simd_variants    := $(call all_feat_combs, simd fp16 fp16fml 
> crypto dotprod)
>   v8_4_a_simd_variants    := $(call all_feat_combs, simd fp16 crypto)
> -v8_r_nosimd_variants    := +crc
> +v8_5_a_simd_variants    := $(call all_feat_combs, simd fp16 crypto)
> +v8_r_nosimd_variants    := +cr5
> 
> 
> Typo, should be +crc
> 
> 
>

Re: [PATCH, GCC, AArch64] Branch Dilution Pass

2018-11-12 Thread Kyrill Tkachov


Hi Richard,

On 12/11/18 14:13, Richard Biener wrote:

On Fri, Nov 9, 2018 at 6:23 PM Sudakshina Das  wrote:
>
> Hi
>
> I am posting this patch on behalf of Carey (cc'ed). I also have some
> review comments that I will make as a reply to this later.
>
>
> This implements a new AArch64 specific back-end pass that helps optimize
> branch-dense code, which can be a bottleneck for performance on some Arm
> cores. This is achieved by padding out the branch-dense sections of the
> instruction stream with nops.

Wouldn't this be more suitable for implementing inside the assembler?



The number of NOPs to insert to get the performance benefits varies from core 
to core,
I don't think we want to add such CPU-specific optimisation logic to the 
assembler.

Thanks,
Kyrill


> This has proven to show up to a 2.61%~ improvement on the Cortex A-72
> (SPEC CPU 2006: sjeng).
>
> The implementation includes the addition of a new RTX instruction class
> FILLER_INSN, which has been white listed to allow placement of NOPs
> outside of a basic block. This is to allow padding after unconditional
> branches. This is favorable so that any performance gained from
> diluting branches is not paid straight back via excessive eating of nops.
>
> It was deemed that a new RTX class was less invasive than modifying
> behavior in regards to standard UNSPEC nops.
>
> ## Command Line Options
>
> Three new target-specific options are provided:
> - mbranch-dilution
> - mbranch-dilution-granularity={num}
> - mbranch-dilution-max-branches={num}
>
> A number of cores known to be able to benefit from this pass have been
> given default tuning values for their granularity and max-branches.
> Each affected core has a very specific granule size and associated
> max-branch limit. This is a microarchitecture specific optimization.
> Typical usage should be -mdilute-branches with a specificed -mcpu. Cores
> with a granularity tuned to 0 will be ignored. Options are provided for
> experimentation.
>
> ## Algorithm and Heuristic
>
> The pass takes a very simple 'sliding window' approach to the problem.
> We crawl through each instruction (starting at the first branch) and
> keep track of the number of branches within the current "granule" (or
> window). When this exceeds the max-branch value, the pass will dilute
> the current granule, inserting nops to push out some of the branches.
> The heuristic will favour unconditonal branches (for performance
> reasons), or branches that are between two other branches (in order to
> decrease the likelihood of another dilution call being needed).
>
> Each branch type required a different method for nop insertion due to
> RTL/basic_block restrictions:
>
> - Returning calls do not end a basic block so can be handled by emitting
> a generic nop.
> - Unconditional branches must be the end of a basic block, and nops
> cannot be outside of a basic block.
>Thus the need for FILLER_INSN, which allows placement outside of a
> basic block - and translates to a nop.
> - For most conditional branches we've taken a simple approach and only
> handle the fallthru edge for simplicity,
>which we do by inserting a "nop block" of nops on the fallthru edge,
> mapping that back to the original destination block.
> - asm gotos and pcsets are going to be tricky to analyse from a dilution
> perspective so are ignored at present.
>
>
> ## Changelog
>
> gcc/testsuite/ChangeLog:
>
> 2018-11-09  Carey Williams 
>
> * gcc.target/aarch64/branch-dilution-off.c: New test.
> * gcc.target/aarch64/branch-dilution-on.c: New test.
>
>
> gcc/ChangeLog:
>
> 2018-11-09  Carey Williams 
>
> * cfgbuild.c (inside_basic_block_p): Add FILLER_INSN case.
> * cfgrtl.c (rtl_verify_bb_layout): Whitelist FILLER_INSN outside
> basic blocks.
> * config.gcc (extra_objs): Add aarch64-branch-dilution.o.
> * config/aarch64/aarch64-branch-dilution.c: New file.
> * config/aarch64/aarch64-passes.def (branch-dilution): Register
> pass.
> * config/aarch64/aarch64-protos.h (struct tune_params): Declare
> tuning parameters bdilution_gsize and bdilution_maxb.
> (make_pass_branch_dilution): New declaration.
> * config/aarch64/aarch64.c (generic_tunings,cortexa35_tunings,
> cortexa53_tunings,cortexa57_tunings,cortexa72_tunings,
> cortexa73_tunings,exynosm1_tunings,thunderxt88_tunings,
> thunderx_tunings,tsv110_tunings,xgene1_tunings,
> qdf24xx_tunings,saphira_tunings,thunderx2t99_tunings):
> Provide default tunings for bdilution_gsize and bdilution_maxb.
> * config/aarch64/aarch64.md (filler_insn): Define new insn.
> * config/aarch64/aarch64.opt (mbranch-dilution,
> mbranch-dilution-granularity,
> mbranch-dilution-max-branches): Define new branch dilution
> options.
> * config/aarch64/t-aarch64 (aarch64-branch-dilution.c): New rule
> for aarch64-branch-dilution.c.
> * coretypes.h (rtx_fille

[PATCH] PR libstdc++/87963 fix build for 64-bit mingw

2018-11-12 Thread Jonathan Wakely


PR libstdc++/87963
* src/c++17/memory_resource.cc (chunk::_M_bytes): Change type from
unsigned to uint32_t.
(chunk): Fix static assertion for 64-bit targets that aren't LP64.
(bigblock::all_ones): Fix undefined shift.

Tested x86_64-linux, committed to trunk.

commit d4c238672c04397626391ae9a89ebfe76d70eb55
Author: Jonathan Wakely 
Date:   Mon Nov 12 15:16:31 2018 +

PR libstdc++/87963 fix build for 64-bit mingw

PR libstdc++/87963
* src/c++17/memory_resource.cc (chunk::_M_bytes): Change type from
unsigned to uint32_t.
(chunk): Fix static assertion for 64-bit targets that aren't LP64.
(bigblock::all_ones): Fix undefined shift.

diff --git a/libstdc++-v3/src/c++17/memory_resource.cc 
b/libstdc++-v3/src/c++17/memory_resource.cc
index 781bdada381..3595e255889 100644
--- a/libstdc++-v3/src/c++17/memory_resource.cc
+++ b/libstdc++-v3/src/c++17/memory_resource.cc
@@ -421,7 +421,7 @@ namespace pmr
 // The chunk has space for n blocks, followed by a bitset of size n
 // that begins at address words.
 // This object does not own p or words, the caller will free it.
-chunk(void* p, size_t bytes, void* words, size_t n)
+chunk(void* p, uint32_t bytes, void* words, size_t n)
 : bitset(words, n),
   _M_bytes(bytes),
   _M_p(static_cast(p))
@@ -442,7 +442,7 @@ namespace pmr
 }
 
 // Allocated size of chunk:
-unsigned _M_bytes = 0;
+uint32_t _M_bytes = 0;
 // Start of allocated chunk:
 std::byte* _M_p = nullptr;
 
@@ -508,12 +508,9 @@ namespace pmr
 { return std::less{}(p, c._M_p); }
   };
 
-#ifdef __LP64__
-  // TODO pad up to 4*sizeof(void*) to avoid splitting across cache lines?
-  static_assert(sizeof(chunk) == (3 * sizeof(void*)), "");
-#else
-  static_assert(sizeof(chunk) == (4 * sizeof(void*)), "");
-#endif
+  // For 64-bit this is 3*sizeof(void*) and for 32-bit it's 4*sizeof(void*).
+  // TODO pad 64-bit to 4*sizeof(void*) to avoid splitting across cache lines?
+  static_assert(sizeof(chunk) == 2 * sizeof(uint32_t) + 2 * sizeof(void*));
 
   // An oversized allocation that doesn't fit in a pool.
   struct big_block
@@ -523,7 +520,7 @@ namespace pmr
 static constexpr unsigned _S_sizebits
   = numeric_limits::digits - _S_alignbits;
 // The maximum value that can be stored in _S_size
-static constexpr size_t all_ones = (1ul << _S_sizebits) - 1u;
+static constexpr size_t all_ones = (1ull << _S_sizebits) - 1u;
 // The minimum size of a big block
 static constexpr size_t min = 1u << _S_alignbits;

C++ PATCH to implement C++20 P0634R3, Down with typename!

2018-11-12 Thread Marek Polacek

This patch implements C++20 P0634R3, Down with typename!

which makes 'typename' optional in several contexts specified in [temp.res].

The gist of the patch is in cp_parser_simple_type_specifier, where, if the
context makes typename optional and the id is qualified, we pretend we've
seen the typename keyword.

There's quite a lot of churn because we need to be careful where we want
to make typename optional, and e.g. a flag in cp_parser would be too global.

I'm not sure about some of the bits in typename5.C, not quite sure if the
code is valid, but I didn't have time to investigate deeply and it seems
pretty obscure anyway.  There are preexisting cases when g++ and clang++
disagree.

The resolve_typename_type hunk was to make typename9.C work with -fconcepts.

Bootstrapped/regtested on x86_64-linux.

2018-11-12  Marek Polacek  

Implement P0634R3, Down with typename!
* parser.c (CP_PARSER_FLAGS_TYPENAME_OPTIONAL): New enumerator.
(cp_parser_type_name): Remove declaration.
(cp_parser_postfix_expression): Pass TYPENAME_OPTIONAL_P to
cp_parser_type_id.
(cp_parser_new_type_id): Pass TYPENAME_OPTIONAL_P to
cp_parser_type_specifier_seq.
(cp_parser_lambda_declarator_opt): Pass TYPENAME_OPTIONAL_P
to cp_parser_parameter_declaration_clause.
(cp_parser_condition): Adjust call to cp_parser_declarator.
(cp_parser_simple_declaration): Adjust call to
cp_parser_init_declarator.
(cp_parser_conversion_type_id): Adjust call to
cp_parser_type_specifier_seq.
(cp_parser_default_type_template_argument): Pass TYPENAME_OPTIONAL_P
to cp_parser_type_id.
(cp_parser_template_parameter): Pass TYPENAME_OPTIONAL_P to
cp_parser_parameter_declaration.
(cp_parser_explicit_instantiation): Adjust call to cp_parser_declarator.
(cp_parser_simple_type_specifier): Adjust call to cp_parser_type_name.
(cp_parser_type_name): Remove unused function.
(cp_parser_enum_specifier): Adjust call to cp_parser_type_specifier_seq.
(cp_parser_alias_declaration): Pass TYPENAME_OPTIONAL_P to
cp_parser_type_id.
(cp_parser_init_declarator): New parameter.
(cp_parser_declarator): New parameter.  Use it.
(cp_parser_direct_declarator): Likewise.
(cp_parser_type_id_1): Likewise.
(cp_parser_type_id): Likewise.
(cp_parser_template_type_arg): Adjust call to cp_parser_type_id_1.
(cp_parser_trailing_type_id): Pass TYPENAME_OPTIONAL_P to
cp_parser_type_id_1.
(cp_parser_type_specifier_seq): New parameter.  Set flags to
CP_PARSER_FLAGS_TYPENAME_OPTIONAL.
(cp_parser_parameter_declaration_clause): New parameter.  Use it.
(cp_parser_parameter_declaration_list): Likewise.
(cp_parser_parameter_declaration): Likewise.
(cp_parser_member_declaration): Set flags to
CP_PARSER_FLAGS_TYPENAME_OPTIONAL.
(cp_parser_exception_declaration): Adjust calls to
cp_parser_type_specifier_seq and cp_parser_declarator.
(cp_parser_requirement_parameter_list): Adjust call to
cp_parser_parameter_declaration_clause.
(cp_parser_constructor_declarator_p): Resolve the TYPENAME_TYPE.
(cp_parser_single_declaration): Set flags to
CP_PARSER_FLAGS_TYPENAME_OPTIONAL.  Pass TYPENAME_OPTIONAL_P to
cp_parser_init_declarator.
(cp_parser_cache_defarg): Adjust call to cp_parser_declarator.
(cp_parser_objc_method_tail_params_opt): Adjust call to
cp_parser_parameter_declaration.
(cp_parser_objc_class_ivars): Adjust call to cp_parser_declarator.
(cp_parser_objc_try_catch_finally_statement): Adjust call to
cp_parser_parameter_declaration.
(cp_parser_objc_struct_declaration): Adjust call to
cp_parser_declarator.
(cp_parser_omp_for_loop_init): Adjust calls to
cp_parser_type_specifier_seq and cp_parser_declarator.

* g++.dg/cpp0x/alias-decl-43.C: Adjust dg-error.
* g++.dg/cpp0x/decltype67.C: Only expect error in c++17_down.
* g++.dg/cpp1z/typename1.C: New test.
* g++.dg/cpp2a/typename1.C: New test.
* g++.dg/cpp2a/typename10.C: New test.
* g++.dg/cpp2a/typename11.C: New test.
* g++.dg/cpp2a/typename2.C: New test.
* g++.dg/cpp2a/typename3.C: New test.
* g++.dg/cpp2a/typename4.C: New test.
* g++.dg/cpp2a/typename5.C: New test.
* g++.dg/cpp2a/typename6.C: New test.
* g++.dg/cpp2a/typename7.C: New test.
* g++.dg/cpp2a/typename8.C: New test.
* g++.dg/cpp2a/typename9.C: New test.
* g++.dg/diagnostic/missing-typename.C: Only run the test in
c++17_down.
* g++.dg/other/crash-9.C: Add template disambiguator.
* g++.dg/other/nontype-1.C: Only expect error in c++17_down.
*

Re: [PATCH][LRA] Fix PR87899: r264897 cause mis-compiled native arm-linux-gnueabihf toolchain

2018-11-12 Thread Peter Bergner

On 11/12/18 6:25 AM, Renlin Li wrote:
> I tried to build a native arm-linuxeabihf toolchain with the patch.
> But I got the following ICE:

Why can't things ever be easy? :-)  I think we're getting closer though.

Anyway, can you please recompile the failing file but using -save-temps
and send me the resulting preprocessed source file?  Also, can you give
me the gcc configure options you used to build your GCC?  That should
give me enough info to debug this one.  Thanks.

Peter

Re: [PATCH] Instrument only selected files (PR gcov-profile/87442).

2018-11-12 Thread Jeff Law

On 11/12/18 12:56 AM, Martin Liška wrote:
> On 11/9/18 11:00 PM, Jeff Law wrote:
>> On 11/8/18 6:42 AM, Martin Liška wrote:
>>> Hi.
>>>
>>> The patch is about possibility to filter which files are instrumented. The 
>>> usage
>>> is explained in the PR.
>>>
>>> Patch can bootstrap and survives regression tests on x86_64-linux-gnu.
>>>
>>> Ready for trunk?
>>> Thanks,
>>> Martin
>>>
>>> gcc/ChangeLog:
>>>
>>> 2018-11-08  Martin Liska  
>>>
>>> PR gcov-profile/87442
>>> * common.opt: Add -fprofile-filter-files and -fprofile-exclude-files
>>> options.
>>> * doc/invoke.texi: Document them.
>>> * tree-profile.c (parse_profile_filter): New.
>>> (parse_profile_file_filtering): Likewise.
>>> (release_profile_file_filtering): Likewise.
>>> (include_source_file_for_profile): Likewise.
>>> (tree_profiling): Filter source files based on the
>>> newly added options.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> 2018-11-08  Martin Liska  
>>>
>>> PR gcov-profile/87442
>>> * gcc.dg/profile-filtering-1.c: New test.
>>> * gcc.dg/profile-filtering-2.c: New test.
>> Extra credit if we could also do this on a function level.  I've
>> certainly talked to developers that want finer grained control over what
>> gets instrumented and what doesn't.  This is probably enough to help
>> them, but I'm sure they'll want more :-)
>>
>>
>> OK.
>> jeff
>>
> 
> Hi.
> 
> May I consider this Jeff as approval of the patch?
Yes.  SOrry I wasn't explicit about that.
jeff

Re: [RS6000] Don't pass -many to the assembler

On Mon, Nov 12, 2018 at 03:52:29PM +0100, Andreas Schwab wrote:
> On Nov 12 2018, Michael Matz  wrote:
> 
> > Wouldn't this also break compiling code that contains power9 instructions 
> > but guarded by runtime tests to only be executed on power9 machines?  That 
> > seems a valid usecase, and it'd be bad if the assembler fails to compile 
> > such.  (You can't use -mcpu=power9 as work around as the other 
> > unguarded code is not supposed to be using power9 insns).
> 
> You'll need to put .machine directives around them.

My worry with that is there may be too much legacy code that does not
do this :-(


Segher

Re: [PATCH, GCC, AArch64] Branch Dilution Pass

2018-11-12 Thread Richard Earnshaw (lists)

On 12/11/2018 15:13, Kyrill Tkachov wrote:
> Hi Richard,
> 
> On 12/11/18 14:13, Richard Biener wrote:
>> On Fri, Nov 9, 2018 at 6:23 PM Sudakshina Das  wrote:
>> >
>> > Hi
>> >
>> > I am posting this patch on behalf of Carey (cc'ed). I also have some
>> > review comments that I will make as a reply to this later.
>> >
>> >
>> > This implements a new AArch64 specific back-end pass that helps
>> optimize
>> > branch-dense code, which can be a bottleneck for performance on some
>> Arm
>> > cores. This is achieved by padding out the branch-dense sections of the
>> > instruction stream with nops.
>>
>> Wouldn't this be more suitable for implementing inside the assembler?
>>
> 
> The number of NOPs to insert to get the performance benefits varies from
> core to core,
> I don't think we want to add such CPU-specific optimisation logic to the
> assembler.

Additionally, the compiler has to keep track of branch ranges.  It can't
do this properly if the assembler is emitting more instructions than the
compiler thinks it is.

R.

> 
> Thanks,
> Kyrill
> 
>> > This has proven to show up to a 2.61%~ improvement on the Cortex A-72
>> > (SPEC CPU 2006: sjeng).
>> >
>> > The implementation includes the addition of a new RTX instruction class
>> > FILLER_INSN, which has been white listed to allow placement of NOPs
>> > outside of a basic block. This is to allow padding after unconditional
>> > branches. This is favorable so that any performance gained from
>> > diluting branches is not paid straight back via excessive eating of
>> nops.
>> >
>> > It was deemed that a new RTX class was less invasive than modifying
>> > behavior in regards to standard UNSPEC nops.
>> >
>> > ## Command Line Options
>> >
>> > Three new target-specific options are provided:
>> > - mbranch-dilution
>> > - mbranch-dilution-granularity={num}
>> > - mbranch-dilution-max-branches={num}
>> >
>> > A number of cores known to be able to benefit from this pass have been
>> > given default tuning values for their granularity and max-branches.
>> > Each affected core has a very specific granule size and associated
>> > max-branch limit. This is a microarchitecture specific optimization.
>> > Typical usage should be -mdilute-branches with a specificed -mcpu.
>> Cores
>> > with a granularity tuned to 0 will be ignored. Options are provided for
>> > experimentation.
>> >
>> > ## Algorithm and Heuristic
>> >
>> > The pass takes a very simple 'sliding window' approach to the problem.
>> > We crawl through each instruction (starting at the first branch) and
>> > keep track of the number of branches within the current "granule" (or
>> > window). When this exceeds the max-branch value, the pass will dilute
>> > the current granule, inserting nops to push out some of the branches.
>> > The heuristic will favour unconditonal branches (for performance
>> > reasons), or branches that are between two other branches (in order to
>> > decrease the likelihood of another dilution call being needed).
>> >
>> > Each branch type required a different method for nop insertion due to
>> > RTL/basic_block restrictions:
>> >
>> > - Returning calls do not end a basic block so can be handled by
>> emitting
>> > a generic nop.
>> > - Unconditional branches must be the end of a basic block, and nops
>> > cannot be outside of a basic block.
>> >    Thus the need for FILLER_INSN, which allows placement outside of a
>> > basic block - and translates to a nop.
>> > - For most conditional branches we've taken a simple approach and only
>> > handle the fallthru edge for simplicity,
>> >    which we do by inserting a "nop block" of nops on the fallthru edge,
>> > mapping that back to the original destination block.
>> > - asm gotos and pcsets are going to be tricky to analyse from a
>> dilution
>> > perspective so are ignored at present.
>> >
>> >
>> > ## Changelog
>> >
>> > gcc/testsuite/ChangeLog:
>> >
>> > 2018-11-09  Carey Williams 
>> >
>> > * gcc.target/aarch64/branch-dilution-off.c: New test.
>> > * gcc.target/aarch64/branch-dilution-on.c: New test.
>> >
>> >
>> > gcc/ChangeLog:
>> >
>> > 2018-11-09  Carey Williams 
>> >
>> > * cfgbuild.c (inside_basic_block_p): Add FILLER_INSN case.
>> > * cfgrtl.c (rtl_verify_bb_layout): Whitelist FILLER_INSN
>> outside
>> > basic blocks.
>> > * config.gcc (extra_objs): Add aarch64-branch-dilution.o.
>> > * config/aarch64/aarch64-branch-dilution.c: New file.
>> > * config/aarch64/aarch64-passes.def (branch-dilution): Register
>> > pass.
>> > * config/aarch64/aarch64-protos.h (struct tune_params): Declare
>> > tuning parameters bdilution_gsize and bdilution_maxb.
>> > (make_pass_branch_dilution): New declaration.
>> > * config/aarch64/aarch64.c (generic_tunings,cortexa35_tunings,
>> > cortexa53_tunings,cortexa57_tunings,cortexa72_tunings,
>> > cortexa73_tunings,exynosm1_tunings,thunderxt88_tunings,
>> > thunderx_tunings,tsv110_tunings,xgen

Re: [GCC][AArch64] [middle-end][docs] Document the xorsign optab

2018-11-12 Thread Sandra Loosemore


On 11/12/18 5:10 AM, Tamar Christina wrote:

Hi Sandra,


Ok for trunk?

+@cindex @code{xorsign@var{m}3} instruction pattern
+@item @samp{xorsign@var{m}3}
+Target suppports an efficient expansion of x * copysign (1.0, y)
+as xorsign (x, y).  Store a value with the magnitude of operand 1
+and the sign of operand 2 into operand 0.  All operands have mode
+@var{m}, which is a scalar or vector floating-point mode.
+
+This pattern is not allowed to @code{FAIL}.
+


Hmmm, needs markup, plus it's a little confusing.  How about describing
it as

Equivalent to @samp{op0 = op1 * copysign (1.0, op2)}: store a value with
the magnitude of operand 1 and the sign of operand 2 into operand 0.
All operands have mode @var{m}, which is a scalar or vector
floating-point mode.

This pattern is not allowed to @code{FAIL}.


That works for me, updated patch attached.

OK for trunk?


Yes, this is fine.  :-)

-Sandra

Re: [RS6000] Don't pass -many to the assembler

2018-11-12 Thread Michael Matz

Hi,

On Mon, 12 Nov 2018, Segher Boessenkool wrote:

> > > Wouldn't this also break compiling code that contains power9 
> > > instructions but guarded by runtime tests to only be executed on 
> > > power9 machines?  That seems a valid usecase, and it'd be bad if the 
> > > assembler fails to compile such.  (You can't use -mcpu=power9 as 
> > > work around as the other unguarded code is not supposed to be using 
> > > power9 insns).
> > 
> > You'll need to put .machine directives around them.
> 
> My worry with that is there may be too much legacy code that does not do 
> this :-(

We'll see once we put gcc9 through a distro build.  My worry really only 
was that the change would result in compile breakage without a sensible 
solution.  (I'll just give all packages whose build failures prevent gcc9 
from being the new system compiler to Alan for fixing ;-) ).

Ciao,
Michael.

Re: [PATCH][DOCS] Fix documentation of __builtin_cpu_is and __builtin_cpu_supports for x86.

2018-11-12 Thread Sandra Loosemore


On 11/12/18 4:46 AM, Martin Liška wrote:

Hi.

The patch is adding missing values for aforementioned built-ins.

Ready for trunk?
Thanks,
Martin

gcc/ChangeLog:

2018-11-12  Martin Liska  

* doc/extend.texi: Add missing values for __builtin_cpu_is and
__builtin_cpu_supports for x86 target.
---
  gcc/doc/extend.texi | 100 +++-
  1 file changed, 98 insertions(+), 2 deletions(-)



Looks fine to me, although I can't vouch for technical correctness.

-Sandra

Re: [PATCH 21/25] GCN Back-end (part 2/2).

On Mon, Nov 12, 2018 at 12:53:26PM +, Andrew Stubbs wrote:
> >>+/* Implement TARGET_LEGITIMATE_COMBINED_INSN.
> >>+
> >>+   Return false if the instruction is not appropriate as a combination 
> >>of two
> >>+   or more instructions.  */
> >>+
> >>+bool
> >>+gcn_legitimate_combined_insn (rtx_insn *insn)
> >>+{
> >>+  rtx pat = PATTERN (insn);
> >>+
> >>+  /* The combine pass tends to strip (use (exec)) patterns from insns.  
> >>This
> >>+ means it basically switches everything to use the *_scalar form of 
> >>the
> >>+ instructions, which is not helpful.  So, this function disallows 
> >>such
> >>+ combinations.  Unfortunately, this also disallows combinations of 
> >>genuine
> >>+ scalar-only patterns, but those only come from explicit expand code.
> >>+
> >>+ Possible solutions:
> >>+ - Invent TARGET_LEGITIMIZE_COMBINED_INSN.
> >>+ - Remove all (use (EXEC)) and rely on md_reorg with "exec" 
> >>attribute.
> >>+   */
> >This seems a bit hokey.  Why specifically is combine removing the USE?
> 
> I don't understand combine fully enough to explain it now, although at 
> the time I wrote this, and in a GCC 7 code base, I had followed the code 
> through and observed what it was doing.
> 
> Basically, if you have two patterns that do the same operation, but one 
> has a "parallel" with an additional "use", then combine will tend to 
> prefer the one without the "use". That doesn't stop the code working, 
> but it makes a premature (accidental) decision about instruction 
> selection that we'd prefer to leave to the register allocator.
> 
> I don't recall if it did this to lone instructions, but it would 
> certainly do so when combining two (or more) instructions, and IIRC 
> there are typically plenty of simple moves around that can be easily 
> combined.

If you don't want useless USEs deleted, use UNSPEC_VOLATILE instead?
Or actually use the register, i.e. as input to an actually needed
instruction.

If combine is changing an X and a USE to just that X if it can, combine
is doing a great job!

(combine cannot "combine" one instruction, fwiw; this sometime could be
useful (so just run simplification on every single instruction, see if
that makes a simpler valid instruction; and indeed a common case where it
can help is if the insn is a parallel and one of the arms of that isn't
needed).


Segher

Re: [doc PATCH] Fix weakref description.

2018-11-12 Thread Michael Ploujnikov

On 2018-11-02 1:59 p.m., Michael Ploujnikov wrote:
> I came across this typo and also added a similar ld invocation for
> illustration purposes as mentioned by Jakub on irc.
> 

After talking to Jakub about it, I went with different terminology.


- Michael
From f14d7315e0dc9c4b6aff6137fd90e4d2595ef9f5 Mon Sep 17 00:00:00 2001
From: Michael Ploujnikov 
Date: Mon, 12 Nov 2018 12:42:37 -0500
Subject: [PATCH] Fix weakref description.

gcc/ChangeLog:

2018-11-12  Michael Ploujnikov  

	* doc/extend.texi: Fix typo in the weakref description.
---
 gcc/doc/extend.texi | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git gcc/doc/extend.texi gcc/doc/extend.texi
index e2b9ee11a54..fc507afe600 100644
--- gcc/doc/extend.texi
+++ gcc/doc/extend.texi
@@ -3619,7 +3619,7 @@ symbol, not necessarily in the same translation unit.
 The effect is equivalent to moving all references to the alias to a
 separate translation unit, renaming the alias to the aliased symbol,
 declaring it as weak, compiling the two separate translation units and
-performing a reloadable link on them.
+performing a link with relocatable output (ie: @code{ld -r}) on them.
 
 At present, a declaration to which @code{weakref} is attached can
 only be @code{static}.
-- 
2.19.1



signature.asc
Description: OpenPGP digital signature

Re: [RS6000] Don't pass -many to the assembler

2018-11-12 Thread Peter Bergner

On 11/12/18 5:49 AM, Alan Modra wrote:
> I'd like to remove -many from the options passed by default to the
> assembler, on the grounds that a gcc bug in instruction selection (eg.
> emitting a power9 insn for -mcpu=power8) is better found at assembly
> time than run time.
> 
> This might annoy people for a while fixing user asm that we didn't
> diagnose previously, but I believe this is the right direction to go.
> Of course, -Wa,-many is available for anyone who just wants their
> dodgy old code to work.

+1

Peter

Re: [PATCH 21/25] GCN Back-end (part 2/2).


On 12/11/2018 17:20, Segher Boessenkool wrote:

If you don't want useless USEs deleted, use UNSPEC_VOLATILE instead?
Or actually use the register, i.e. as input to an actually needed
instruction.


They're not useless. If we want to do scalar operations in vector 
registers (and we often do, on this target), then we need to write a "1" 
into the EXEC (vector mask) register.


Unless we want to rewrite all scalar operations in terms of vec_merge 
then there's no way to "actually use the register".


There are additional patterns that do scalar operations in scalar 
registers, and therefore do not depend on EXEC, but there are not a 
complete set of instructions for these, so usually we don't use those 
until reload_completed (via splits). I did think of simply disabling 
them until reload_completed, but there are cases where we do want them, 
so that didn't work.


Of course, it's possible that we took a wrong turn early on and ended up 
with a sub-optimal arrangement, but it is where we are.



If combine is changing an X and a USE to just that X if it can, combine
is doing a great job!


Not if the "simpler" instruction is somehow more expensive. And, in our 
case, it isn't the instruction itself that is more expensive, but the 
extra instructions that may (or may not) need to be inserted around it 
later.


I might investigate putting the USE inside an UNSPEC_VOLATILE. That 
would have the advantage of letting combine run again. This feels like a 
future project I'd rather not have block the port submission though.


If there are two instructions that both have an UNSPEC_VOLATILE, will 
combine coalesce them into one in the combined pattern?


Thanks

Andrew

Re: [PATCH][cunroll] Add unroll-known-loop-iterations-only param and use it in aarch64

2018-11-12 Thread Kyrill Tkachov




On 12/11/18 14:10, Richard Biener wrote:

On Fri, Nov 9, 2018 at 6:57 PM Kyrill Tkachov
 wrote:

On 09/11/18 12:18, Richard Biener wrote:

On Fri, Nov 9, 2018 at 11:47 AM Kyrill Tkachov
 wrote:

Hi all,

In this testcase the codegen for VLA SVE is worse than it could be due to 
unrolling:

fully_peel_me:
  mov x1, 5
  ptrue   p1.d, all
  whilelo p0.d, xzr, x1
  ld1dz0.d, p0/z, [x0]
  faddz0.d, z0.d, z0.d
  st1dz0.d, p0, [x0]
  cntdx2
  addvl   x3, x0, #1
  whilelo p0.d, x2, x1
  beq .L1
  ld1dz0.d, p0/z, [x0, #1, mul vl]
  faddz0.d, z0.d, z0.d
  st1dz0.d, p0, [x3]
  cntwx2
  incbx0, all, mul #2
  whilelo p0.d, x2, x1
  beq .L1
  ld1dz0.d, p0/z, [x0]
  faddz0.d, z0.d, z0.d
  st1dz0.d, p0, [x0]
.L1:
  ret

In this case, due to the vector-length-agnostic nature of SVE the compiler 
doesn't know the loop iteration count.
For such loops we don't want to unroll if we don't end up eliminating branches 
as this just bloats code size
and hurts icache performance.

This patch introduces a new unroll-known-loop-iterations-only param that 
disables cunroll when the loop iteration
count is unknown (SCEV_NOT_KNOWN). This case occurs much more often for SVE VLA 
code, but it does help some
Advanced SIMD cases as well where loops with an unknown iteration count are not 
unrolled when it doesn't eliminate
the branches.

So for the above testcase we generate now:
fully_peel_me:
  mov x2, 5
  mov x3, x2
  mov x1, 0
  whilelo p0.d, xzr, x2
  ptrue   p1.d, all
.L2:
  ld1dz0.d, p0/z, [x0, x1, lsl 3]
  faddz0.d, z0.d, z0.d
  st1dz0.d, p0, [x0, x1, lsl 3]
  incdx1
  whilelo p0.d, x1, x3
  bne .L2
  ret

Not perfect still, but it's preferable to the original code.
The new param is enabled by default on aarch64 but disabled for other targets, 
leaving their behaviour unchanged
(until other target people experiment with it and set it, if appropriate).

Bootstrapped and tested on aarch64-none-linux-gnu.
Benchmarked on SPEC2017 on a Cortex-A57 and there are no differences in 
performance.

Ok for trunk?

Hum.  Why introduce a new --param and not simply key on
flag_peel_loops instead?  That is
enabled by default at -O3 and with FDO but you of course can control
that in your targets
post-option-processing hook.

You mean like this?
It's certainly a simpler patch, but I was just a bit hesitant of making this 
change for all targets :)
But I suppose it's a reasonable change.

No, that change is backward.  What I said is that peeling is already
conditional on
flag_peel_loops and that is enabled by -O3.  So you want to disable
flag_peel_loops for
SVE instead in the target.


Sorry, I got confused by the similarly named functions.
I'm talking about try_unroll_loop_completely when run as part of 
canonicalize_induction_variables i.e. the "ivcanon" pass
(sorry about blaming cunroll here). This doesn't get called through the 
try_unroll_loops_completely path.

try_unroll_loop_completely doesn't get disabled with -fno-peel-loops or 
-fno-unroll-loops.
Maybe disabling peeling inside try_unroll_loop_completely itself when 
!flag_peel_loops is viable?

Thanks,
Kyrill


It might also make sense to have more fine-grained control for this
and allow a target
to say whether it wants to peel a specific loop or not when the
middle-end thinks that
would be profitable.

Can be worth looking at as a follow-up. Do you envisage the target analysing
the gimple statements of the loop to figure out its cost?

Kind-of.  Sth like

   bool targetm.peel_loop (struct loop *);

I have no idea whether you can easily detect a SVE vectorized loop though.
Maybe there's always a special IV or so (the mask?)

Richard.


Thanks,
Kyrill


2018-11-09  Kyrylo Tkachov  

 * tree-ssa-loop-ivcanon.c (try_unroll_loop_completely): Do not unroll
 loop when number of iterations is not known and flag_peel_loops is in
 effect.

2018-11-09  Kyrylo Tkachov  

 * gcc.target/aarch64/sve/unroll-1.c: New test.

Re: [PATCH 2/3][GCC][AARCH64] Add new -mbranch-protection option to combine pointer signing and BTI

2018-11-12 Thread Sudakshina Das

Hi Sam

On 02/11/18 17:31, Sam Tebbs wrote:
> Hi all,
> 
> The -mbranch-protection option combines the functionality of
> -msign-return-address and the BTI features new in Armv8.5 to better reflect
> their relationship. This new option therefore supersedes and deprecates the
> existing -msign-return-address option.
> 
> -mbranch-protection=[none|standard|] - Turns on different types of 
> branch
> protection available where:
> 
>   * "none": Turn of all types of branch protection
>   * "standard" : Turns on all the types of protection to their respective
> standard levels.
>   *  can be "+" separated protection types:
> 
>   * "bti" : Branch Target Identification Mechanism.
>   * "pac-ret{+leaf+b-key}": Return Address Signing. The default return
> address signing is enabled by signing functions that save the return
> address to memory (non-leaf functions will practically always do this)
> using the a-key. The optional tuning arguments allow the user to
> extend the scope of return address signing to include leaf functions
> and to change the key to b-key. The tuning arguments must proceed the
> protection type "pac-ret".
> 
> Thus -mbranch-protection=standard -> -mbranch-protection=bti+pac-ret.
> 
> Its mapping to -msign-return-address is as follows:
> 
>   * -mbranch-protection=none -> -msign-return-address=none
>   * -mbranch-protection=standard -> -msign-return-address=leaf
>   * -mbranch-protection=pac-ret -> -msign-return-address=non-leaf
>   * -mbranch-protection=pac-ret+leaf -> -msign-return-address=all
> 
> This patch implements the option's skeleton and the "none", "standard" and
> "pac-ret" types (along with its "leaf" subtype).
> 
> The previous patch in this series is here:
> https://gcc.gnu.org/ml/gcc-patches/2018-11/msg00103.html
> 
> Bootstrapped successfully and tested on aarch64-none-elf with no regressions.
> 
> OK for trunk?
> 

Thank for doing this. I am not a maintainer so you will need a
maintainer's approval. Only nit, that I would add is that it would
be good to have more test coverage, specially for the new parsing
functions that have been added and the errors that are added.

Example checking a few valid and invalid combinations of the options
like:
-mbranch-protection=pac-ret -mbranch-protection=none //disables
everything
-mbranch-protection=leaf  //errors out
-mbranch-protection=none+pac-ret //errors out
... etc

Also instead of removing all the old deprecated options, you can keep
one (or a copy of one) to check for the deprecated warning.


diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 
e290128f535f3e6b515bff5a81fae0aa0d1c8baf..07cfe69dc3dd9161a2dd93089ccf52ef251208d2
 
100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -15221,13 +15222,18 @@ accessed using a single instruction and 
emitted after each function.  This
  limits the maximum size of functions to 1MB.  This is enabled by 
default for
  @option{-mcmodel=tiny}.

-@item -msign-return-address=@var{scope}
-@opindex msign-return-address
-Select the function scope on which return address signing will be applied.
-Permissible values are @samp{none}, which disables return address signing,
-@samp{non-leaf}, which enables pointer signing for functions which are 
not leaf
-functions, and @samp{all}, which enables pointer signing for all 
functions.  The
-default value is @samp{none}.
+@item 
-mbranch-protection=@var{none}|@var{standard}|@var{pac-ret}[+@var{leaf}]
+@opindex mbranch-protection
+Select the branch protection features to use.
+@samp{none} is the default and turns off all types of branch protection.
+@samp{standard} turns on all types of branch protection features.  If a 
feature
+has additional tuning options, then @samp{standard} sets it to its standard
+level.
+@samp{pac-ret[+@var{leaf}]} turns on return address signing to its standard
+level: signing functions that save the return address to memory (non-leaf
+functions will practically always do this) using the a-key.  The optional
+argument @samp{leaf} can be used to extend the signing to include leaf
+functions.

I am not sure if deleting the previous documentation of
-msign-retun-address is the way to go. Maybe add a "this has been
deprecated and refer to -mbranch-protection" to its description.

Thanks
Sudi

> gcc/ChangeLog:
> 
> 2018-11-02  Sam Tebbs
> 
>   * config/aarch64/aarch64.c (BRANCH_PROTEC_STR_MAX,
>   aarch64_parse_branch_protection,
>   struct aarch64_branch_protec_type,
>   aarch64_handle_no_branch_protection,
>   aarch64_handle_standard_branch_protection,
>   aarch64_validate_mbranch_protection,
>   aarch64_handle_pac_ret_protection,
>   aarch64_handle_attr_branch_protection,
>   accepted_branch_protection_string,
>   aarch64_pac_ret_subtypes,
>   aarch64_branch_protec_types,
>   aarch64_handle_pac_ret_leaf): Define.
>   (aarch64_override_options_after_change_1): Add ch

Re: [PATCH] detect attribute mismatches in alias declarations (PR 81824)

2018-11-12 Thread Matthew Malcomson

Hello Martin,

The new testcase Wattribute-alias.c fails on targets without ifunc 
support (e.g. aarch64-none-elf cross-build).

It seems that just adding a directive `{ dg-require-ifunc "" }` to the 
test file changes the test to unsupported instead of having a fail.

I don't know much about this patch so I don't know if the non-ifunc 
checks would still be useful on such targets.

Would the simple change be OK? or would it be best to split the test 
file into multiple parts to still run the other checks?

Regards,
Matthew


On 09/11/18 17:33, Martin Sebor wrote:
>>> +/* Handle the "copy" attribute by copying the set of attributes
>>> +   from the symbol referenced by ARGS to the declaration of *NODE.  */
>>> +
>>> +static tree
>>> +handle_copy_attribute (tree *node, tree name, tree args,
>>> +   int flags, bool *no_add_attrs)
>>> +{
>>> +  /* Break cycles in circular references.  */
>>> +  static hash_set attr_copy_visited;
>> Does this really need to be static?
>
> The variable was intended to break cycles in recursive calls to
> the function for self-referential applications of attribute copy
> but since the attribute itself is not applied (anymore) such cycles
> can no longer form.  I have removed the variable and simplified
> the handlers (there are tests to verify this works correctly).
>
>>> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
>>> index cfe6a8e..8ffb0cd 100644
>>> --- a/gcc/doc/extend.texi
>>> +++ b/gcc/doc/extend.texi
>>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
>>> index 5c95f67..c027acd 100644
>>> --- a/gcc/doc/invoke.texi
>>> +++ b/gcc/doc/invoke.texi
>> [ ... ]
>>
>>> +
>>> +In C++, the warning is issued when an explicitcspecialization of a 
>>> primary
>> "explicitcspecialization" ? :-)
>>
>
> Fixed.
>
>>
>> Looks pretty good.  There's the explicit specialization nit and the
>> static vs auto question for attr_copy_visited.  Otherwise it's OK.
>
> Thanks.  I've retested a revision with the changes discussed here
> and committed it as r265980.
>
> Martin

Re: [PATCH 21/25] GCN Back-end (part 2/2).

On Mon, Nov 12, 2018 at 05:52:25PM +, Andrew Stubbs wrote:
> On 12/11/2018 17:20, Segher Boessenkool wrote:
> >If you don't want useless USEs deleted, use UNSPEC_VOLATILE instead?
> >Or actually use the register, i.e. as input to an actually needed
> >instruction.
> 
> They're not useless.

> >If combine is changing an X and a USE to just that X if it can, combine
> >is doing a great job!

Actually, it is incorrect to delete a USE.

Please open a PR.  Thanks.


Segher

Re: [PATCH] detect attribute mismatches in alias declarations (PR 81824)

2018-11-12 Thread Martin Sebor


On 11/12/2018 11:29 AM, Matthew Malcomson wrote:

Hello Martin,

The new testcase Wattribute-alias.c fails on targets without ifunc
support (e.g. aarch64-none-elf cross-build).

It seems that just adding a directive `{ dg-require-ifunc "" }` to the
test file changes the test to unsupported instead of having a fail.

I don't know much about this patch so I don't know if the non-ifunc
checks would still be useful on such targets.

Would the simple change be OK? or would it be best to split the test
file into multiple parts to still run the other checks?


I just committed the former change earlier today but splitting
the test would have probably been a better way to go.  Thanks
for reporting it just the same!  If you would prefer to split
the test that would be fine with me.

Martin


Regards,
Matthew


On 09/11/18 17:33, Martin Sebor wrote:

+/* Handle the "copy" attribute by copying the set of attributes
+   from the symbol referenced by ARGS to the declaration of *NODE.  */
+
+static tree
+handle_copy_attribute (tree *node, tree name, tree args,
+   int flags, bool *no_add_attrs)
+{
+  /* Break cycles in circular references.  */
+  static hash_set attr_copy_visited;

Does this really need to be static?


The variable was intended to break cycles in recursive calls to
the function for self-referential applications of attribute copy
but since the attribute itself is not applied (anymore) such cycles
can no longer form.  I have removed the variable and simplified
the handlers (there are tests to verify this works correctly).


diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index cfe6a8e..8ffb0cd 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 5c95f67..c027acd 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi

[ ... ]


+
+In C++, the warning is issued when an explicitcspecialization of a
primary

"explicitcspecialization" ? :-)



Fixed.



Looks pretty good.  There's the explicit specialization nit and the
static vs auto question for attr_copy_visited.  Otherwise it's OK.


Thanks.  I've retested a revision with the changes discussed here
and committed it as r265980.

Martin

Re: [PATCH 21/25] GCN Back-end (part 2/2).

2018-11-12 Thread Jeff Law

On 11/12/18 10:52 AM, Andrew Stubbs wrote:
> On 12/11/2018 17:20, Segher Boessenkool wrote:
>> If you don't want useless USEs deleted, use UNSPEC_VOLATILE instead?
>> Or actually use the register, i.e. as input to an actually needed
>> instruction.
> 
> They're not useless. If we want to do scalar operations in vector
> registers (and we often do, on this target), then we need to write a "1"
> into the EXEC (vector mask) register.
Presumably you're setting up active lanes or some such.  This may
ultimately be better modeled by ignoring the problem until much later in
the pipeline.

Shortly before assembly output you run a little LCM-like pass to find
optimal points to insert the assignment to the vector register.  It's a
lot like the mode switching stuff or zeroing the upper halves of the AVX
regsiters to avoid partial register stalls.  THe local properties are
different, but these all feel like the same class of problem.

> 
> Unless we want to rewrite all scalar operations in terms of vec_merge
> then there's no way to "actually use the register".
I think you need to correctly model it.  If you lie to the compiler
about what's going on, you're going to run into problems.
> 
> I might investigate putting the USE inside an UNSPEC_VOLATILE. That
> would have the advantage of letting combine run again. This feels like a
> future project I'd rather not have block the port submission though.
The gcn_legitimate_combined_insn code isn't really acceptable though.
You need a cleaner solution here.

> 
> If there are two instructions that both have an UNSPEC_VOLATILE, will
> combine coalesce them into one in the combined pattern?
I think you can put a different constant on each.

jeff

RE: PING [PATCH] RX new builtin function

2018-11-12 Thread Sebastian Perta

PING

> -Original Message-
> From: Sebastian Perta 
> Sent: 24 October 2018 18:19
> To: 'gcc-patches@gcc.gnu.org' 
> Cc: 'Nick Clifton' 
> Subject: [PATCH] RX new builtin function
> 
> Hi,
> 
> The following patch adds a new builtin function for rx (
__builtin_rx_bset) to
> make it possible for the user to use BSET whenever necessary.
> Please note this builtin function is dedicated only for the variant 32 bit
variant
> of BSET (when destination is a register).
> For the 8 bit variant (when destination is a memory location) another
builtin
> function is necessary.
> 
> The patch contains also a test case which I added in
testsuite/gcc.target/rx.
> 
> The patch also modifies extend.texi as necessary.
> 
> Regression test is OK, tested with the following command:
> make -k check-gcc RUNTESTFLAGS=--target_board=rx-sim
> 
> Please find below the changelog entries and patch.
> 
> Best Regards,
> Sebastian
> 
> --- ChangeLog
> 2018-10-23  Sebastian Perta  
> 
>   * config/rx/rx.c (RX_BUILTIN_BSET): New enum.
>   * config/rx/rx.c (rx_init_builtins): Added new builtin for BSET.
>   * config/rx/rx.c (rx_expand_builtin_bit_manip): New function.
>   * config/rx/rx.c (rx_expand_builtin): Added new case for BSET.
>   * doc/extend.texi (RX Built-in Functions): Added declaration for
>   __builtin_rx_bset.
> 
> testsuite/ChangeLog
> 2018-10-23  Sebastian Perta  
> 
>   * gcc.target/rx/testbset.c: New test.
> 
> 
> 
> 
> 
> Index: config/rx/rx.c
> ==
> =
> --- config/rx/rx.c(revision 265425)
> +++ config/rx/rx.c(working copy)
> @@ -2374,6 +2374,7 @@
>RX_BUILTIN_ROUND,
>RX_BUILTIN_SETPSW,
>RX_BUILTIN_WAIT,
> +  RX_BUILTIN_BSET,
>RX_BUILTIN_max
>  };
> 
> @@ -2440,6 +2441,7 @@
>ADD_RX_BUILTIN1 (ROUND,   "round",   intSI, float);
>ADD_RX_BUILTIN1 (REVW,"revw",intSI, intSI);
>ADD_RX_BUILTIN0 (WAIT,"wait",void);
> +  ADD_RX_BUILTIN2 (BSET,"bset",intSI, intSI, intSI);
>  }
> 
>  /* Return the RX builtin for CODE.  */
> @@ -2576,6 +2578,26 @@
>return target;
>  }
> 
> +static rtx
> +rx_expand_builtin_bit_manip(tree exp, rtx target, rtx (* gen_func)(rtx,
rtx,
> rtx))
> +{
> +  rtx arg1 = expand_normal (CALL_EXPR_ARG (exp, 0));
> +  rtx arg2 = expand_normal (CALL_EXPR_ARG (exp, 1));
> +
> +  if (! REG_P (arg1))
> +arg1 = force_reg (SImode, arg1);
> +
> +  if (! REG_P (arg2))
> +arg2 = force_reg (SImode, arg2);
> +
> +  if (target == NULL_RTX || ! REG_P (target))
> + target = gen_reg_rtx (SImode);
> +
> +  emit_insn(gen_func(target, arg2, arg1));
> +
> +  return target;
> +}
> +
>  static int
>  valid_psw_flag (rtx op, const char *which)
>  {
> @@ -2653,6 +2675,7 @@
>  case RX_BUILTIN_REVW:return rx_expand_int_builtin_1_arg
>   (op, target, gen_revw, false);
>  case RX_BUILTIN_WAIT:emit_insn (gen_wait ()); return NULL_RTX;
> + case RX_BUILTIN_BSET:   return
> rx_expand_builtin_bit_manip(exp, target, gen_bitset);
> 
>  default:
>internal_error ("bad builtin code");
> Index: doc/extend.texi
> ==
> =
> --- doc/extend.texi   (revision 265425)
> +++ doc/extend.texi   (working copy)
> @@ -19635,6 +19635,10 @@
>  Generates the @code{wait} machine instruction.
>  @end deftypefn
> 
> +@deftypefn {Built-in Function}  int __builtin_rx_bset (int, int)
> +Generates the @code{bset} machine instruction.
> +@end deftypefn
> +
>  @node S/390 System z Built-in Functions
>  @subsection S/390 System z Built-in Functions
>  @deftypefn {Built-in Function} int __builtin_tbegin (void*)
> Index: testsuite/ChangeLog
> Index: testsuite/gcc.target/rx/testbset.c
> ==
> =
> --- testsuite/gcc.target/rx/testbset.c(nonexistent)
> +++ testsuite/gcc.target/rx/testbset.c(working copy)
> @@ -0,0 +1,53 @@
> +/* { dg-do run } */
> +
> +#include 
> +
> +int f1(int a, int b) __attribute((noinline));
> +int f1(int a, int b)
> +{
> + return __builtin_rx_bset (a, b);
> +}
> +
> +int f2(int a) __attribute((noinline));
> +int f2(int a)
> +{
> + return __builtin_rx_bset (a, 1);
> +}
> +
> +int x, y;
> +
> +int f3() __attribute((noinline));
> +int f3()
> +{
> + return __builtin_rx_bset (x, 4);
> +}
> +
> +int f4() __attribute((noinline));
> +int f4()
> +{
> + return __builtin_rx_bset (x, y);
> +}
> +
> +void f5() __attribute((noinline));
> +void f5()
> +{
> + x = __builtin_rx_bset (x, 6);
> +}
> +
> +int main()
> +{
> + if(f1(0xF, 8) != 0x10F)
> + abort();
> + if(f2(0xC) != 0xE)
> + abort();
> + x = 0xF;
> + if(f3() != 0x1F)
> + abort();
> + y = 5;
> + if(f4() != 0x2F)
> + abort();
> + f5();
> + if(x != 0x4F)
> + abort();
> + exit(0);
> +}

[doc, committed] clarify rtl docs about mode of high and lo_sum

2018-11-12 Thread Sandra Loosemore

I've checked in this patch for PR 21110.  As noted in the issue, RTL 
high and lo_sum expressions don't have to be Pmode and are not 
restricted to address operands.


-Sandra
2018-11-12  Sandra Loosemore  

	PR middle-end/21110

	gcc/
	* doc/rtl.texi (Constants): Clarify that mode of "high" doesn't
	have to be Pmode.
	(Arithmetic): Likewise for "lo_sum".
Index: gcc/doc/rtl.texi
===
--- gcc/doc/rtl.texi	(revision 266034)
+++ gcc/doc/rtl.texi	(working copy)
@@ -1883,14 +1883,14 @@ of relocation operator.  @var{m} should
 
 @findex high
 @item (high:@var{m} @var{exp})
-Represents the high-order bits of @var{exp}, usually a
-@code{symbol_ref}.  The number of bits is machine-dependent and is
+Represents the high-order bits of @var{exp}.  
+The number of bits is machine-dependent and is
 normally the number of bits specified in an instruction that initializes
 the high order bits of a register.  It is used with @code{lo_sum} to
 represent the typical two-instruction sequence used in RISC machines to
-reference a global memory location.
-
-@var{m} should be @code{Pmode}.
+reference large immediate values and/or link-time constants such
+as global memory addresses.  In the latter case, @var{m} is @code{Pmode}
+and @var{exp} is usually a constant expression involving @code{symbol_ref}.
 @end table
 
 @findex CONST0_RTX
@@ -2429,15 +2429,15 @@ saturates at the maximum signed value re
 
 This expression represents the sum of @var{x} and the low-order bits
 of @var{y}.  It is used with @code{high} (@pxref{Constants}) to
-represent the typical two-instruction sequence used in RISC machines
-to reference a global memory location.
+represent the typical two-instruction sequence used in RISC machines to
+reference large immediate values and/or link-time constants such
+as global memory addresses.  In the latter case, @var{m} is @code{Pmode}
+and @var{y} is usually a constant expression involving @code{symbol_ref}.
 
 The number of low order bits is machine-dependent but is
-normally the number of bits in a @code{Pmode} item minus the number of
+normally the number of bits in mode @var{m} minus the number of
 bits set by @code{high}.
 
-@var{m} should be @code{Pmode}.
-
 @findex minus
 @findex ss_minus
 @findex us_minus

[PATCH 3/4] [aarch64] Add xgene1 prefetch tunings.

*** gcc/ChangeLog ***

2018-xx-xx  Christoph Muellner  

* config/aarch64/aarch64.c (xgene1_tunings): Add Xgene1 specific
prefetch tunings.
---
 gcc/config/aarch64/aarch64.c | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index a6bc1fb..903f4e2 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -662,6 +662,17 @@ static const cpu_prefetch_tune tsv110_prefetch_tune =
   -1/* default_opt_level  */
 };
 
+static const cpu_prefetch_tune xgene1_prefetch_tune =
+{
+  8,   /* num_slots  */
+  32,  /* l1_cache_size  */
+  64,  /* l1_cache_line_size  */
+  256, /* l2_cache_size  */
+  true, /* prefetch_dynamic_strides */
+  -1,   /* minimum_stride */
+  -1   /* default_opt_level  */
+};
+
 static const struct tune_params generic_tunings =
 {
   &cortexa57_extra_costs,
@@ -943,7 +954,7 @@ static const struct tune_params xgene1_tunings =
   0,   /* max_case_values.  */
   tune_params::AUTOPREFETCHER_OFF, /* autoprefetcher_model.  */
   (AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS),   /* tune_flags.  */
-  &generic_prefetch_tune
+  &xgene1_prefetch_tune
 };
 
 static const struct tune_params qdf24xx_tunings =
-- 
2.9.5

[PATCH 4/4] [aarch64] Update xgene1 tuning struct.

*** gcc/ChangeLog ***

2018-xx-xx  Christoph Muellner  

* config/aarch64/aarch64.c (xgene1_tunings): Optimize Xgene1 tunings for
GCC 9.
---
 gcc/config/aarch64/aarch64.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 903f4e2..f7f88a9 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -944,14 +944,14 @@ static const struct tune_params xgene1_tunings =
   4, /* issue_rate  */
   AARCH64_FUSE_NOTHING, /* fusible_ops  */
   "16",/* function_align.  */
-  "8", /* jump_align.  */
+  "16",/* jump_align.  */
   "16",/* loop_align.  */
   2,   /* int_reassoc_width.  */
   4,   /* fp_reassoc_width.  */
   1,   /* vec_reassoc_width.  */
   2,   /* min_div_recip_mul_sf.  */
   2,   /* min_div_recip_mul_df.  */
-  0,   /* max_case_values.  */
+  17,  /* max_case_values.  */
   tune_params::AUTOPREFETCHER_OFF, /* autoprefetcher_model.  */
   (AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS),   /* tune_flags.  */
   &xgene1_prefetch_tune
-- 
2.9.5

[PATCH 2/4] [aarch64] Update xgene1_addrcost_table.

*** gcc/ChangeLog ***

2018-xx-xx  Christoph Muellner  

* config/aarch64/aarch64.c (xgene1_addrcost_table): Correct the post 
modify
costs.
---
 gcc/config/aarch64/aarch64.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 815f824..a6bc1fb 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -254,7 +254,7 @@ static const struct cpu_addrcost_table 
xgene1_addrcost_table =
   1, /* ti  */
 },
   1, /* pre_modify  */
-  0, /* post_modify  */
+  1, /* post_modify  */
   0, /* register_offset  */
   1, /* register_sextend  */
   1, /* register_zextend  */
-- 
2.9.5

[PATCH 1/4] [aarch64/arm] Updating the cost table for xgene1.