Re: [Patch] LTO: Force externally_visible for offload_vars/funcs (PR97179)

2020-09-24 Thread Richard Biener
On Wed, 23 Sep 2020, Tobias Burnus wrote:

> Actually working patch attached.
> 
> As mentioned, just using TREE_PUBLIC in input_offload_tables
> works for functions but for variables this gets overridden.
> 
> The externally_visible is set to avoid running into the
> promote_symbol code (? visibility hidden) later in the
> function.

Hmm, but offload_vars and offload_funcs do not need to be exported
since they get stored into tables with addresses pointing to them
(and that table is exported).  So I think we don't yet understand
what goes wrong.

Note that ultimatively the desired visibility is determined by
the linker and communicated via the resolution file to the WPA
stage.  I'm not sure whether both host and offload code participate
in the same link and thus if the offload tables are properly
seen as being referenced (for a non-DSO symbols are usually _not_
force-exported) - so, how is the offload table constructed?
I'm not sure we can properly tell the linker of the host object
that a certain symbol will be referenced from a dynamically loaded
DSO - visibility("default") doesn't work.

#include 
int global_sym __attribute__((visibility("default")));
int main()
{
  dlopen("test.so", RTLD_NOW);
  return 0;
}

with -flto we elide global_sym, if I compile with
-Wl,-export-dynamic it works fine (even w/o the visibility).
But the question is how to selectively mark a symbol at
the compile-stage so the linker will force-export it ...

The 'used' attribute seems to work but that feels like a band-aid ...

Richard.

> On 9/23/20 5:47 PM, Tobias Burnus wrote:
> > ...
> > On 9/23/20 4:23 PM, Tobias Burnus wrote:
> >> On 9/23/20 3:10 PM, Richard Biener wrote:
> >>
> >>> On Wed, 23 Sep 2020, Richard Biener wrote:
>  LTRANS usually makes the symbols hidden, not local.
> >> Could also be ? whatever the 'nm' output means.
>  So are you
>  sure this isn't a target bug (hidden symbols not implemented
>  but the host compiler obviously having checked that but assuming
>  the target behaves the same as the host) or a linker bug?
> >>
> >> Unlikely, I assume the Linux x86-64 linker is rather well tested.
> >> As written this is the host ? just the offloading symbol table is
> >> device specific.
> >>
> >>> See lto/lto-partition.c:promote_symbol btw.
> >>
> >> Thanks for the pointer; it pointed me to node->externally_visible,
> 
> ...
> 
> Tobias
> 
> -
> Mentor Graphics (Deutschland) GmbH, Arnulfstra?e 201, 80634 M?nchen / Germany
> Registergericht M?nchen HRB 106955, Gesch?ftsf?hrer: Thomas Heurung, Alexander
> Walter
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imend


Re: [RFC] Offloading and automatic linking of libraries

2020-09-24 Thread Tobias Burnus

On 9/24/20 8:40 AM, Tom de Vries wrote:


Maybe we could require building libatomic for nvptx.


If one does not explicitly disable it, it builds,* which
I think is a good default. Additionally, libatomic is
only very rarely needed on x86-64 + nvptx and only
somewhat regularly on PowerPC + nvptx. Hence, I am not
sure whether nvptx's libgomp build should unconditionally
require it.

Tobias

* configure(.ac): If no --(enable|disable)-libatomic was
used, libatomic/configure.tgt is checked.
For nvptx*-*-* it is supported, for *), which includes gcn,
it is UNSUPPORTED=1.

PS: Besides the indirect dependency of nvptx on
__atomic_compare_exchange_16, libatomic is unsurprisingly
also required for code explicitly using atomics. (I wonder
whether we should add some libgomp/testsuite/ testcase like
the following with an appropriate effective-target.)

__uint128_t v;
#pragma omp declare target (v)
int
main ()
{
  #pragma omp target
  {
__atomic_add_fetch (&v, 1, __ATOMIC_RELAXED);
__atomic_fetch_add (&v, 1, __ATOMIC_RELAXED);
__uint128_t exp = 2;
__atomic_compare_exchange_n (&v, &exp, 7, 0, __ATOMIC_RELEASE, 
__ATOMIC_ACQUIRE);
  }
}

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter


Re: [PATCH] PR libstdc++/71579 assert that type traits are not misused with an incomplete type

2020-09-24 Thread Antony Polukhin via Gcc-patches
Looks like the last patch was not applied. Do I have to change something in
it?


Re: [PATCH] Fix UBSAN errors in ipa-cp.

2020-09-24 Thread Martin Liška

On 9/23/20 5:02 PM, Martin Jambor wrote:

Hi,

On Wed, Sep 23 2020, Martin Liška wrote:

There's patch that does that.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin
 From ff5f78110684ed9aedde15d19e856b3acf649971 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Wed, 23 Sep 2020 15:10:43 +0200
Subject: [PATCH] Port IPA CP time and size to sreal

gcc/ChangeLog:

* ipa-cp.c (ipcp_lattice::print): Print sreal values
with to_double.
(incorporate_penalties): Change type to sreal.
(good_cloning_opportunity_p): Change args to sreal.
(get_max_overall_size): Work in sreal type.
(estimate_local_effects): Likewise.
(safe_add): Remove.
(value_topo_info::propagate_effects): Work in sreal type.
(ipcp_propagate_stage): Print sreal numbers.
(decide_about_value): Work in sreal type.
---
  gcc/ipa-cp.c | 128 ---
  1 file changed, 59 insertions(+), 69 deletions(-)

diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index b3e7d41ea10..9a79b5862f8 100644
@@ -3224,8 +3226,9 @@ incorporate_penalties (cgraph_node *node, ipa_node_params 
*info,
 potential new clone in FREQUENCIES.  */
  
  static bool

-good_cloning_opportunity_p (struct cgraph_node *node, int time_benefit,
-   int freq_sum, profile_count count_sum, int 
size_cost)
+good_cloning_opportunity_p (struct cgraph_node *node, sreal time_benefit,
+   sreal freq_sum, profile_count count_sum,


Is the change of type of freq_sum intentional?  Even with this patch,
all the callers will keep passing their frequency sums as ints and so
the implicit conversion seems a bit misleading.

I guess complete transition to sreals would include also switch from
using cgraph_edge::frequency to sreal_frequency in
get_info_about_necessary_edges and users of struct caller_statistics.


Hello.

All right, I'm leaving that to you Martin ;) You are much more familiar
with the code base.

Martin



I planned to work on conversions to sreals right after resolving various
exchange2 issues and I can still do this bit afterwards, but the change
in the parameter type looks like you wanted to but eventually did not?

Thanks,

Martin






Re: [PATCH] tree-optimization/97151 - improve PTA for C++ operator delete

2020-09-24 Thread Richard Biener
On Wed, 23 Sep 2020, Jason Merrill wrote:

> On 9/23/20 2:42 PM, Richard Biener wrote:
> > On September 23, 2020 7:53:18 PM GMT+02:00, Jason Merrill 
> > wrote:
> >> On 9/23/20 4:14 AM, Richard Biener wrote:
> >>> C++ operator delete, when DECL_IS_REPLACEABLE_OPERATOR_DELETE_P,
> >>> does not cause the deleted object to be escaped.  It also has no
> >>> other interesting side-effects for PTA so skip it like we do
> >>> for BUILT_IN_FREE.
> >>
> >> Hmm, this is true of the default implementation, but since the function
> >>
> >> is replaceable, we don't know what a user definition might do with the
> >> pointer.
> > 
> > But can the object still be 'used' after delete? Can delete fail / throw?
> > 
> > What guarantee does the predicate give us?
> 
> The deallocation function is called as part of a delete expression in order to
> release the storage for an object, ending its lifetime (if it was not ended by
> a destructor), so no, the object can't be used afterward.

OK, but the delete operator can access the object contents if there
wasn't a destructor ...

> A deallocation function that throws has undefined behavior.

OK, so it seems the 'replaceable' operators are the global ones
(for user-defined/class-specific placement variants I see arbitrary
extra arguments that we'd possibly need to handle).

I'm happy to revert but I'd like to have a testcase that FAILs
with the patch ;)

Now, the following aborts:

struct X {
  static struct X saved;
  int *p;
  X() { __builtin_memcpy (this, &saved, sizeof (X)); }
};
void operator delete (void *p)
{
  __builtin_memcpy (&X::saved, p, sizeof (X));
}
int main()
{
  int y = 1;
  X *p = new X;
  p->p = &y;
  delete p;
  X *q = new X;
  *(q->p) = 2;
  if (y != 2)
__builtin_abort ();
}

and I could fix this by not making *p but what *p points to escape.
The testcase is of course maximally awkward, but hey ... ;)

Now this would all be moot if operator delete may not access
the object (or if the object contents are undefined at that point).

Oh, and the testcase segfaults when compiled with GCC 10 because
there we elide the new X / delete p pair ... which is invalid then?
Hmm, we emit

  MEM[(struct X *)_8] ={v} {CLOBBER};
  operator delete (_8, 8);

so the object contents are undefined _before_ calling delete
even when I do not have a DTOR?  That is, the above,
w/o -fno-lifetime-dse, makes the PTA patch OK for the testcase.

Richard.

> >>> Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.
> >>>
> >>> Richard.
> >>>
> >>> 2020-09-23  Richard Biener  
> >>>
> >>>  PR tree-optimization/97151
> >>>  * tree-ssa-structalias.c (find_func_aliases_for_call):
> >>>  DECL_IS_REPLACEABLE_OPERATOR_DELETE_P has no effect on
> >>>  arguments.
> >>>
> >>>   * g++.dg/cpp1y/new1.C: Adjust for two more handled transforms.
> >>> ---
> >>>gcc/testsuite/g++.dg/cpp1y/new1.C | 4 ++--
> >>>gcc/tree-ssa-structalias.c| 2 ++
> >>>2 files changed, 4 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/gcc/testsuite/g++.dg/cpp1y/new1.C
> >> b/gcc/testsuite/g++.dg/cpp1y/new1.C
> >>> index aa5f647d535..fec0088cb40 100644
> >>> --- a/gcc/testsuite/g++.dg/cpp1y/new1.C
> >>> +++ b/gcc/testsuite/g++.dg/cpp1y/new1.C
> >>> @@ -69,5 +69,5 @@ test_unused() {
> >>>  delete p;
> >>>}
> >>>
> >>> -/* { dg-final { scan-tree-dump-times "Deleting : operator delete" 5
> >> "cddce1"} } */
> >>> -/* { dg-final { scan-tree-dump-times "Deleting : _\\d+ = operator
> >> new" 7 "cddce1"} } */
> >>> +/* { dg-final { scan-tree-dump-times "Deleting : operator delete" 6
> >> "cddce1"} } */
> >>> +/* { dg-final { scan-tree-dump-times "Deleting : _\\d+ = operator
> >> new" 8 "cddce1"} } */
> >>> diff --git a/gcc/tree-ssa-structalias.c b/gcc/tree-ssa-structalias.c
> >>> index 44fe52e0f65..f676bf91e95 100644
> >>> --- a/gcc/tree-ssa-structalias.c
> >>> +++ b/gcc/tree-ssa-structalias.c
> >>> @@ -4857,6 +4857,8 @@ find_func_aliases_for_call (struct function
> >> *fn, gcall *t)
> >>>  point for reachable memory of their arguments.  */
> >>>   else if (flags & (ECF_PURE|ECF_LOOPING_CONST_OR_PURE))
> >>>   handle_pure_call (t, &rhsc);
> >>> +  else if (fndecl && DECL_IS_REPLACEABLE_OPERATOR_DELETE_P
> >> (fndecl))
> >>> + ;
> >>>   else
> >>> handle_rhs_call (t, &rhsc);
> >>>  if (gimple_call_lhs (t))
> >>>
> > 
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imend


Re: [Patch] LTO: Force externally_visible for offload_vars/funcs (PR97179)

2020-09-24 Thread Tobias Burnus

On 9/24/20 9:03 AM, Richard Biener wrote:


Hmm, but offload_vars and offload_funcs do not need to be exported
since they get stored into tables with addresses pointing to them
(and that table is exported).


Granted but the x86-64 linker does not seem to be able to resolve
the symbol if the table is in a.ltrans0.ltrans.o and the variable
or function is in a.ltrans1.ltrans.o

That's both host/x86-64 code; the linker might not see that the
table is used by a dynamic library – but still it should resolve
the links, shouldn't it?

Possibly, the 'externally_visible = 1' in my code is also a
read herring; it also works by using:
   TREE_PUBLIC (decl) = 1;
   gcc_assert (!node->offloadable);
   node->offloadable = 1;
and below
  if (node->offloadable)
{
  node->offloadable = 0;
  validize_symbol_for_target (node);
  continue;
}
Namely: PUBLIC + avoid calling promote_symbol.


Note that ultimatively the desired visibility is determined by
the linker and communicated via the resolution file to the WPA
stage.  I'm not sure whether both host and offload code participate
in the same link and thus if the offload tables are properly
seen as being referenced


This could be the problem. The device part is linked by the
host/x86-64 linker – but the device's ".o" files are just linked
and not processed by 'ld. (In case of nvptx, they are host
compiled .o files which contain everything as strings with the
nvptx as text – to be passed to the JIT at startup.)

Note that *no* WPA/LTO is done on the device side – there only all
generated files are collected without any inter-file
optimizations. (Sufficient for the code generated by the program,
which is all in one file – but it still would be useful to
inline, e.g., libm functions.)


(for a non-DSO symbols are usually _not_
force-exported) - so, how is the offload table constructed?


First, the offload tables exist both on the host and on the
device(s). They have to be identical as otherwise the
association between variables and function is lost.

The symbols are added to offload_vars + offload_funcs.

In lto-cgraph.c's output_offload_tables there is the last chance
to remove now unused nodes — as once the tables are streamed
for device usage, they cannot be changed. Hence, there one
has
   node->force_output = 1;
[Unrelated: this prevents later optimizations, which still
could be done; cf. PR95622]


The table itself is written in omp-offload.c's omp_finish_file.

For the host, the constructor is constructed in
add_decls_addresses_to_decl_constructor, which does:
  CONSTRUCTOR_APPEND_ELT (v_ctor, NULL_TREE, addr);
  if (is_var)
CONSTRUCTOR_APPEND_ELT (v_ctor, NULL_TREE, size);
and then in omp_finish_file:
  tree funcs_decl = build_decl (UNKNOWN_LOCATION, VAR_DECL,
get_identifier (".offload_func_table"),
funcs_decl_type);
  DECL_USER_ALIGN (funcs_decl) = DECL_USER_ALIGN (vars_decl) = 1;
  SET_DECL_ALIGN (funcs_decl, TYPE_ALIGN (funcs_decl_type));
  DECL_INITIAL (funcs_decl) = ctor_f;
  set_decl_section_name (funcs_decl, OFFLOAD_FUNC_TABLE_SECTION_NAME);
  varpool_node::finalize_decl (vars_decl);

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter


Re: [Patch] LTO: Force externally_visible for offload_vars/funcs (PR97179)

2020-09-24 Thread Richard Biener
On Thu, 24 Sep 2020, Tobias Burnus wrote:

> On 9/24/20 9:03 AM, Richard Biener wrote:
> 
> > Hmm, but offload_vars and offload_funcs do not need to be exported
> > since they get stored into tables with addresses pointing to them
> > (and that table is exported).
> 
> Granted but the x86-64 linker does not seem to be able to resolve
> the symbol if the table is in a.ltrans0.ltrans.o and the variable
> or function is in a.ltrans1.ltrans.o
> 
> That's both host/x86-64 code; the linker might not see that the
> table is used by a dynamic library ? but still it should resolve
> the links, shouldn't it?
> 
> Possibly, the 'externally_visible = 1' in my code is also a
> read herring; it also works by using:
>TREE_PUBLIC (decl) = 1;
>gcc_assert (!node->offloadable);
>node->offloadable = 1;
> and below
>   if (node->offloadable)
> {
>   node->offloadable = 0;
>   validize_symbol_for_target (node);
>   continue;
> }
> Namely: PUBLIC + avoid calling promote_symbol.
> 
> > Note that ultimatively the desired visibility is determined by
> > the linker and communicated via the resolution file to the WPA
> > stage.  I'm not sure whether both host and offload code participate
> > in the same link and thus if the offload tables are properly
> > seen as being referenced
> 
> This could be the problem. The device part is linked by the
> host/x86-64 linker ? but the device's ".o" files are just linked
> and not processed by 'ld. (In case of nvptx, they are host
> compiled .o files which contain everything as strings with the
> nvptx as text ? to be passed to the JIT at startup.)
> 
> Note that *no* WPA/LTO is done on the device side ? there only all
> generated files are collected without any inter-file
> optimizations. (Sufficient for the code generated by the program,
> which is all in one file ? but it still would be useful to
> inline, e.g., libm functions.)
> 
> > (for a non-DSO symbols are usually _not_
> > force-exported) - so, how is the offload table constructed?
> 
> First, the offload tables exist both on the host and on the
> device(s). They have to be identical as otherwise the
> association between variables and function is lost.
> 
> The symbols are added to offload_vars + offload_funcs.
> 
> In lto-cgraph.c's output_offload_tables there is the last chance
> to remove now unused nodes ? as once the tables are streamed
> for device usage, they cannot be changed. Hence, there one
> has
>node->force_output = 1;
> [Unrelated: this prevents later optimizations, which still
> could be done; cf. PR95622]
> 
> 
> The table itself is written in omp-offload.c's omp_finish_file.

But this is called at LTRANS time only, in particular we seem
to stream the offload_funcs/vars array, marking streamed nodes
as force_output but we do not make the offload table visible
to the partitioner.  But force_output should make the
nodes not renamed.  But then output_offload_tables is called at
the very end and we likely do not stream the altered
force_output state.

So - can you try, in prune_offload_funcs, in addition to
setting DECL_PRESERVE_P, mark the cgraph node ->force_output
so this happens early?  I guess the same is needed for
variables (there's no prune_offloar_vars ...).

> For the host, the constructor is constructed in
> add_decls_addresses_to_decl_constructor, which does:
>   CONSTRUCTOR_APPEND_ELT (v_ctor, NULL_TREE, addr);
>   if (is_var)
> CONSTRUCTOR_APPEND_ELT (v_ctor, NULL_TREE, size);
> and then in omp_finish_file:
>   tree funcs_decl = build_decl (UNKNOWN_LOCATION, VAR_DECL,
> get_identifier (".offload_func_table"),
> funcs_decl_type);
>   DECL_USER_ALIGN (funcs_decl) = DECL_USER_ALIGN (vars_decl) = 1;
>   SET_DECL_ALIGN (funcs_decl, TYPE_ALIGN (funcs_decl_type));
>   DECL_INITIAL (funcs_decl) = ctor_f;
>   set_decl_section_name (funcs_decl, OFFLOAD_FUNC_TABLE_SECTION_NAME);
>   varpool_node::finalize_decl (vars_decl);
> 
> Tobias
> 
> -
> Mentor Graphics (Deutschland) GmbH, Arnulfstra?e 201, 80634 M?nchen / Germany
> Registergericht M?nchen HRB 106955, Gesch?ftsf?hrer: Thomas Heurung, Alexander
> Walter
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imend


*PING* Re: [PATCH] Fortran : ICE in build_field PR95614

2020-09-24 Thread Mark Eggleston

I haven't yet committed this.

I am unfamiliar with Andre, I've checked MAINTAINERS and I find Andre in 
the "Write after approval" section.


Is Andre's approval sufficient? If so MAINTAINERS needs to be updated.

If not: OK to commit and backport?

regards,

Mark

On 14/09/2020 09:03, Andre Vehreschild wrote:

Hi Marc,

the patch looks reasonable and ok to me. Ok for trunk.

Thanks for the patch.

Regards,
Andre

On Fri, 4 Sep 2020 08:35:59 +0100
Mark Eggleston  wrote:


Please find attached a fix for PR95614.  The original patch was by Steve
Kargl.

The original patch resulted in name clashes between global identifiers
naming common blocks and local identifiers.  According to the 2018
standard 19.3.1 Classes of local identifiers, item 2, a local identifier
shall not be the same as a global identifier, however, there is an
exception where the global identifier is a common block name.

The change to the patch is:

if (gsym && gsym->type != GSYM_UNKNOWN && gsym->type != GSYM_COMMON)

instead of:

if (gsym && gsym->type != GSYM_UNKNOWN)

Tested on x86_64 using make -j 8 check-fortran.

OK to commit?

[PATCH] Fortran  :  ICE in build_field PR95614

Local identifiers can not be the same as a module name. Original
patch by Steve Kargl resulted in name clashes between common block
names and local identifiers.  A local identifier can be the same as
a global identier if that identifier represents a common.  The patch
was modified to allow global identifiers that represent a common
block.

2020-09-04  Steven G. Kargl  
          Mark Eggleston  

gcc/fortran/

      PR fortran/95614
      * decl.c (gfc_get_common): Use gfc_match_common_name instead
      of match_common_name.
      * decl.c (gfc_bind_idents): Use gfc_match_common_name instead
      of match_common_name.
      * match.c : Rename match_common_name to gfc_match_common_name.
      * match.c (gfc_match_common): Use gfc_match_common_name instead
      of match_common_name.
      * match.h : Rename match_common_name to gfc_match_common_name.
      * resolve.c (resolve_common_vars): Check each symbol in a
      common block has a global symbol.  If there is a global symbol
      issue an error if the symbol type is known as is not a common
      block name.

2020-09-04  Mark Eggleston  

gcc/testsuite/

      PR fortran/95614
      * gfortran.dg/pr95614_1.f90: New test.
      * gfortran.dg/pr95614_2.f90: New test.





--
https://www.codethink.co.uk/privacy.html



[committed][testsuite] Require non_strict_align in pr94600-{1,3}.c

2020-09-24 Thread Tom de Vries
Hi,

With the nvptx target, we run into:
...
FAIL: gcc.dg/pr94600-1.c scan-rtl-dump-times final "\\(mem/v" 6
FAIL: gcc.dg/pr94600-1.c scan-rtl-dump-times final "\\(set \\(mem/v" 6
FAIL: gcc.dg/pr94600-3.c scan-rtl-dump-times final "\\(mem/v" 1
FAIL: gcc.dg/pr94600-3.c scan-rtl-dump-times final "\\(set \\(mem/v" 1
...
The scans attempt to check for volatile stores, but on nvptx we have memcpy
instead.

This is due to nvptx being a STRICT_ALIGNMENT target, which has the effect
that the TYPE_MODE for the store target is set to BKLmode in
compute_record_mode.

Fix the FAILs by requiring effective target non_strict_align.

Tested on nvptx.

Committed to trunk.

Thanks,
- Tom

[testsuite] Require non_strict_align in pr94600-{1,3}.c

gcc/testsuite/ChangeLog:

2020-09-24  Tom de Vries  

* gcc.dg/pr94600-1.c: Require effective target non_strict_align for
scan-rtl-dump-times.
* gcc.dg/pr94600-3.c: Same.

---
 gcc/testsuite/gcc.dg/pr94600-1.c | 4 ++--
 gcc/testsuite/gcc.dg/pr94600-3.c | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/pr94600-1.c b/gcc/testsuite/gcc.dg/pr94600-1.c
index b5913a0939c..38f939a98cb 100644
--- a/gcc/testsuite/gcc.dg/pr94600-1.c
+++ b/gcc/testsuite/gcc.dg/pr94600-1.c
@@ -32,5 +32,5 @@ foo(void)
 }
 
 /* The only volatile accesses should be the obvious writes.  */
-/* { dg-final { scan-rtl-dump-times {\(mem/v} 6 "final" } } */
-/* { dg-final { scan-rtl-dump-times {\(set \(mem/v} 6 "final" } } */
+/* { dg-final { scan-rtl-dump-times {\(mem/v} 6 "final" { target { 
non_strict_align } } } } */
+/* { dg-final { scan-rtl-dump-times {\(set \(mem/v} 6 "final" { target { 
non_strict_align } } } } */
diff --git a/gcc/testsuite/gcc.dg/pr94600-3.c b/gcc/testsuite/gcc.dg/pr94600-3.c
index 7537f6cb797..e8776fbdb28 100644
--- a/gcc/testsuite/gcc.dg/pr94600-3.c
+++ b/gcc/testsuite/gcc.dg/pr94600-3.c
@@ -31,5 +31,5 @@ foo(void)
 }
 
 /* The loop isn't unrolled. */
-/* { dg-final { scan-rtl-dump-times {\(mem/v} 1 "final" } } */
-/* { dg-final { scan-rtl-dump-times {\(set \(mem/v} 1 "final" } } */
+/* { dg-final { scan-rtl-dump-times {\(mem/v} 1 "final" { target { 
non_strict_align } } } } */
+/* { dg-final { scan-rtl-dump-times {\(set \(mem/v} 1 "final" { target { 
non_strict_align } } } } */


[PATCH] tree-optimization/97085 - fold some trivial bool vector ?:

2020-09-24 Thread Richard Biener
The following aovids the ICE in the testcase by doing some additional
simplification of VEC_COND_EXPRs for VECTOR_BOOLEAN_TYPE_P which
we don't really expect, esp. when they are not classical vectors,
thus AVX512 or SVE masks.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2020-09-24  Richard Biener  

PR tree-optimization/97085
* match.pd (mask ? { false,..} : { true, ..} -> ~mask): New.

* gcc.dg/vect/pr97085.c: New testcase.
---
 gcc/match.pd| 11 +++
 gcc/testsuite/gcc.dg/vect/pr97085.c | 13 +
 2 files changed, 24 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr97085.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 7d63bb973cb..e6dcdd0b855 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3521,6 +3521,17 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (if (optimize_vectors_before_lowering_p () && types_match (@0, @1))
   (vec_cond (bit_and (bit_not @0) @1) @2 @3)))
 
+/* Canonicalize mask ? { 0, ... } : { -1, ...} to ~mask if the mask
+   types are compatible.  */
+(simplify
+ (vec_cond @0 VECTOR_CST@1 VECTOR_CST@2)
+ (if (VECTOR_BOOLEAN_TYPE_P (type)
+  && types_match (type, TREE_TYPE (@0)))
+  (if (integer_zerop (@1) && integer_all_onesp (@2))
+   (bit_not @0)
+   (if (integer_all_onesp (@1) && integer_zerop (@2))
+@0
+
 /* Simplification moved from fold_cond_expr_with_comparison.  It may also
be extended.  */
 /* This pattern implements two kinds simplification:
diff --git a/gcc/testsuite/gcc.dg/vect/pr97085.c 
b/gcc/testsuite/gcc.dg/vect/pr97085.c
new file mode 100644
index 000..ffde9f10995
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr97085.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=armv8.2-a+sve" { target aarch64-*-* } } */
+
+int a, b, c, d;
+short e, g;
+unsigned short f;
+void h() {
+  for (; d; d++) {
+g = d;
+e = b == 0 ? 1 : a % b;
+c ^= (f = e) > (g == 5);
+  }
+}
-- 
2.26.2


Re: [PATCH v2 2/2] rs6000: Expand vec_insert in expander instead of gimple [PR79251]

2020-09-24 Thread xionghu luo via Gcc-patches

Hi Segher,

The attached two patches are updated and split from
"[PATCH v2 2/2] rs6000: Expand vec_insert in expander instead of gimple 
[PR79251]"
as your comments.


[PATCH v3 2/3] rs6000: Fix lvsl&lvsr mode and change rs6000_expand_vector_set 
param

This one is preparation work of fix lvsl&lvsr arg mode and 
rs6000_expand_vector_set
parameter support for both constant and variable index input.


[PATCH v3 2/3] rs6000: Support variable insert and Expand vec_insert in 
expander [PR79251]

This one is Building VIEW_CONVERT_EXPR and expand the IFN VEC_SET to fast.


Thanks,
Xionghu
From 9d74c488ad3c7cad8c276cc49749ec05158d1e96 Mon Sep 17 00:00:00 2001
From: Xiong Hu Luo 
Date: Thu, 24 Sep 2020 00:52:35 -0500
Subject: [PATCH v3 2/3] rs6000: Fix lvsl&lvsr mode and change
 rs6000_expand_vector_set param

lvsl and lvsr looks only at the low 4 bits, use SI for index param.
 rs6000_expand_vector_set could accept insert either to constant position
or variable position, so change the operand to reg_or_cint_operand.

gcc/ChangeLog:

2020-09-24  Xionghu Luo  

* config/rs6000/altivec.md (altivec_lvsl_reg): Change to
SImode.
(altivec_lvsr_reg): Likewise.
* config/rs6000/rs6000-call.c (altivec_expand_vec_set_builtin):
Change call param 2 from type int to rtx.
* config/rs6000/rs6000-protos.h (rs6000_expand_vector_set):
Likewise.
* config/rs6000/rs6000.c (rs6000_expand_vector_init):
Change call param 2 from type int to rtx.
(rs6000_expand_vector_set): Likewise.
* config/rs6000/vector.md (vec_set): Support both constant
and variable index vec_set.
* config/rs6000/vsx.md: Call gen_altivec_lvsl_reg with SImode.
---
 gcc/config/rs6000/altivec.md  |  4 ++--
 gcc/config/rs6000/rs6000-call.c   |  2 +-
 gcc/config/rs6000/rs6000-protos.h |  2 +-
 gcc/config/rs6000/rs6000.c| 16 +---
 gcc/config/rs6000/vector.md   |  4 ++--
 gcc/config/rs6000/vsx.md  |  3 ++-
 6 files changed, 17 insertions(+), 14 deletions(-)

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 0a2e634d6b0..a1c06c9ab8c 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -2775,7 +2775,7 @@ (define_expand "altivec_lvsl"
 (define_insn "altivec_lvsl_reg"
   [(set (match_operand:V16QI 0 "altivec_register_operand" "=v")
(unspec:V16QI
-   [(match_operand:DI 1 "gpc_reg_operand" "b")]
+   [(match_operand:SI 1 "gpc_reg_operand" "b")]
UNSPEC_LVSL_REG))]
   "TARGET_ALTIVEC"
   "lvsl %0,0,%1"
@@ -2813,7 +2813,7 @@ (define_expand "altivec_lvsr"
 (define_insn "altivec_lvsr_reg"
   [(set (match_operand:V16QI 0 "altivec_register_operand" "=v")
(unspec:V16QI
-   [(match_operand:DI 1 "gpc_reg_operand" "b")]
+   [(match_operand:SI 1 "gpc_reg_operand" "b")]
UNSPEC_LVSR_REG))]
   "TARGET_ALTIVEC"
   "lvsr %0,0,%1"
diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index e39cfcf672b..51f278933bd 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -10655,7 +10655,7 @@ altivec_expand_vec_set_builtin (tree exp)
   op0 = force_reg (tmode, op0);
   op1 = force_reg (mode1, op1);
 
-  rs6000_expand_vector_set (op0, op1, elt);
+  rs6000_expand_vector_set (op0, op1, GEN_INT (elt));
 
   return op0;
 }
diff --git a/gcc/config/rs6000/rs6000-protos.h 
b/gcc/config/rs6000/rs6000-protos.h
index 28e859f4381..6a0fbc3ba2e 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -57,7 +57,7 @@ extern bool rs6000_move_128bit_ok_p (rtx []);
 extern bool rs6000_split_128bit_ok_p (rtx []);
 extern void rs6000_expand_float128_convert (rtx, rtx, bool);
 extern void rs6000_expand_vector_init (rtx, rtx);
-extern void rs6000_expand_vector_set (rtx, rtx, int);
+extern void rs6000_expand_vector_set (rtx, rtx, rtx);
 extern void rs6000_expand_vector_extract (rtx, rtx, rtx);
 extern void rs6000_split_vec_extract_var (rtx, rtx, rtx, rtx, rtx);
 extern rtx rs6000_adjust_vec_address (rtx, rtx, rtx, rtx, machine_mode);
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index fe93cf6ff2b..c46ec14f060 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -6669,7 +6669,8 @@ rs6000_expand_vector_init (rtx target, rtx vals)
   rs6000_expand_vector_init (target, copy);
 
   /* Insert variable.  */
-  rs6000_expand_vector_set (target, XVECEXP (vals, 0, one_var), one_var);
+  rs6000_expand_vector_set (target, XVECEXP (vals, 0, one_var),
+   GEN_INT (one_var));
   return;
 }
 
@@ -6683,10 +6684,10 @@ rs6000_expand_vector_init (rtx target, rtx vals)
   emit_move_insn (target, mem);
 }
 
-/* Set field ELT of TARGET to VAL.  */
+/* Set field ELT_RTX of TARGET to VAL.  */
 
 void
-rs6000_expand_vector_set (rtx target, rtx val, int elt)
+rs6000_expand_vector_set (rtx target, rtx val, rtx elt_rtx)
 {
   machine_

Re: [PATCH 0/2] Rework adding Power10 IEEE 128-bit min, max, and conditional move

2020-09-24 Thread Florian Weimer via Gcc-patches
* Michael Meissner via Gcc-patches:

> These patches are my latest versions of the patches to add IEEE 128-bit min,
> max, and conditional move to GCC.  They correspond to the earlier patches #3
> and #4 (patches #1 and #2 have been installed).

Is this about IEEE min or IEEE minimum?  My understanding is that they
are not the same (or that the behavior depends on the standard version,
but I think min was replaced with minimum in the 2019 standard or
something like that).

Thanks,
Florian
-- 
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill



RE: [PATCH] arm: Add a couple of extra stack-protector tests

2020-09-24 Thread Kyrylo Tkachov
Hi Richard,

> -Original Message-
> From: Richard Sandiford 
> Sent: 23 September 2020 19:34
> To: gcc-patches@gcc.gnu.org
> Cc: ni...@redhat.com; Richard Earnshaw ;
> Ramana Radhakrishnan ; Kyrylo
> Tkachov 
> Subject: [PATCH] arm: Add a couple of extra stack-protector tests
> 
> These tests were inspired by the corresponding aarch64 ones that I just
> committed.  They already pass.
> 
> Tested on arm-linux-gnueabi, arm-linux-gnueabihf and armeb-eabi.
> OK for trunk?

Ok. Do they also need to go on the branches when the fix is backported?
Thanks,
Kyrill

> 
> Richard
> 
> 
> gcc/testsuite/
>   * gcc.target/arm/stack-protector-5.c: New test.
>   * gcc.target/arm/stack-protector-6.c: Likewise.
> ---
>  .../gcc.target/arm/stack-protector-5.c| 21 +++
>  .../gcc.target/arm/stack-protector-6.c|  8 +++
>  2 files changed, 29 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/arm/stack-protector-5.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/stack-protector-6.c
> 
> diff --git a/gcc/testsuite/gcc.target/arm/stack-protector-5.c
> b/gcc/testsuite/gcc.target/arm/stack-protector-5.c
> new file mode 100644
> index 000..b808b11aa3d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/stack-protector-5.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile } */
> +/* { dg-options "-fstack-protector-all -O2" } */
> +
> +void __attribute__ ((noipa))
> +f (void)
> +{
> +  volatile int x;
> +  asm volatile ("" :::
> + "r0", "r1", "r2", "r3", "r4", "r5", "r6", "r7",
> + "r8", "r9", "r10", "r11", "r12", "r14");
> +}
> +
> +/* The register clobbers above should not generate any single LDRs or STRs;
> +   all registers should be pushed and popped using register lists.  The only
> +   STRs should therefore be those associated with the stack protector tests
> +   themselves.
> +
> +   Make sure the address of the canary is not spilled and reloaded,
> +   since that would give the attacker an opportunity to change the
> +   canary value.  */
> +/* { dg-final { scan-assembler-times {\tstr\t} 1 } } */
> diff --git a/gcc/testsuite/gcc.target/arm/stack-protector-6.c
> b/gcc/testsuite/gcc.target/arm/stack-protector-6.c
> new file mode 100644
> index 000..f8eec878bd6
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/stack-protector-6.c
> @@ -0,0 +1,8 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target fpic } */
> +/* { dg-options "-fstack-protector-all -O2 -fpic" } */
> +
> +#include "stack-protector-5.c"
> +
> +/* See the comment in stack-protector-5.c.  */
> +/* { dg-final { scan-assembler-times {\tstr\t} 1 } } */


RE: [PATCH] arm: Fix canary address calculation for non-PIC

2020-09-24 Thread Kyrylo Tkachov
Hi Richard,

> -Original Message-
> From: Richard Sandiford 
> Sent: 23 September 2020 19:20
> To: gcc-patches@gcc.gnu.org
> Cc: ni...@redhat.com; Richard Earnshaw ;
> Ramana Radhakrishnan ; Kyrylo
> Tkachov ; Kees Cook 
> Subject: [PATCH] arm: Fix canary address calculation for non-PIC
> 
> For non-PIC, the stack protector patterns did:
> 
> rtx mem = XEXP (force_const_mem (SImode, operands[1]), 0);
> emit_move_insn (operands[2], mem);
> 
> Here, operands[1] is the address of the canary (&__stack_chk_guard)
> and operands[2] is the register that we want to move that address into.
> However, the code above instead sets operands[2] to the address of a
> constant pool entry that contains &__stack_chk_guard, rather than to
> &__stack_chk_guard itself.  The sequence therefore does one less
> pointer indirection than it should.
> 
> The net effect was to use &__stack_chk_guard for stack-smash detection,
> instead of using __stack_chk_guard itself.
> 
> Tested on arm-linux-gnueabi, arm-linux-gnueabihf and armeb-eabi.
> OK for trunk and branches?

Ok.
Thanks,
Kyrill

> 
> Richard
> 
> 
> gcc/
>   * config/arm/arm.md (*stack_protect_combined_set_insn): For non-
> PIC,
>   load the address of the canary rather than the address of the
>   constant pool entry that points to it.
>   (*stack_protect_combined_test_insn): Likewise.
> 
> gcc/testsuite/
>   * gcc.target/arm/stack-protector-3.c: New test.
>   * gcc.target/arm/stack-protector-4.c: Likewise.
> ---
>  gcc/config/arm/arm.md |  4 +-
>  .../gcc.target/arm/stack-protector-3.c| 38 +++
>  .../gcc.target/arm/stack-protector-4.c|  6 +++
>  3 files changed, 46 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/arm/stack-protector-3.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/stack-protector-4.c
> 
> diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
> index bffdb0b3987..c4fa116ab77 100644
> --- a/gcc/config/arm/arm.md
> +++ b/gcc/config/arm/arm.md
> @@ -9212,7 +9212,7 @@ (define_insn_and_split
> "*stack_protect_combined_set_insn"
>   operands[2] = operands[1];
>else
>   {
> -   rtx mem = XEXP (force_const_mem (SImode, operands[1]), 0);
> +   rtx mem = force_const_mem (SImode, operands[1]);
> emit_move_insn (operands[2], mem);
>   }
>  }
> @@ -9295,7 +9295,7 @@ (define_insn_and_split
> "*stack_protect_combined_test_insn"
>   operands[3] = operands[1];
>else
>   {
> -   rtx mem = XEXP (force_const_mem (SImode, operands[1]), 0);
> +   rtx mem = force_const_mem (SImode, operands[1]);
> emit_move_insn (operands[3], mem);
>   }
>  }
> diff --git a/gcc/testsuite/gcc.target/arm/stack-protector-3.c
> b/gcc/testsuite/gcc.target/arm/stack-protector-3.c
> new file mode 100644
> index 000..b8f77fa2309
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/stack-protector-3.c
> @@ -0,0 +1,38 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target fstack_protector } */
> +/* { dg-options "-fstack-protector-all -O2" } */
> +
> +extern volatile long *stack_chk_guard_ptr;
> +
> +void __attribute__ ((noipa))
> +f (void)
> +{
> +  volatile int x;
> +  /* Munging the contents of __stack_chk_guard should trigger a
> + stack-smashing failure for this function.  */
> +  *stack_chk_guard_ptr += 1;
> +}
> +
> +asm (
> +".data\n"
> +".align  3\n"
> +".globl  stack_chk_guard_ptr\n"
> +"stack_chk_guard_ptr:\n"
> +".word   __stack_chk_guard\n"
> +".weak   __stack_chk_guard\n"
> +"__stack_chk_guard:\n"
> +".word   0xdead4321\n"
> +".text\n"
> +".type   __stack_chk_fail, %function\n"
> +"__stack_chk_fail:\n"
> +"movsr0, #0\n"
> +"b   exit\n"
> +".size   __stack_chk_fail, .-__stack_chk_fail"
> +);
> +
> +int
> +main (void)
> +{
> +  f ();
> +  __builtin_abort ();
> +}
> diff --git a/gcc/testsuite/gcc.target/arm/stack-protector-4.c
> b/gcc/testsuite/gcc.target/arm/stack-protector-4.c
> new file mode 100644
> index 000..6334dd00908
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/stack-protector-4.c
> @@ -0,0 +1,6 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target fstack_protector } */
> +/* { dg-require-effective-target fpic } */
> +/* { dg-options "-fstack-protector-all -O2 -fpic" } */
> +
> +#include "stack-protector-3.c"


[PATCH][GCC 10] aarch64: Add support for Neoverse V1 CPU

2020-09-24 Thread Alex Coplan
This patch backports the AArch64 support for Arm's Neoverse V1 CPU to
GCC 10.

Testing:
 * Bootstrapped and regtested on aarch64-none-linux-gnu.

OK for GCC 10 branch?

Thanks,
Alex

---

gcc/ChangeLog:

* config/aarch64/aarch64-cores.def: Add Neoverse V1.
* config/aarch64/aarch64-tune.md: Regenerate.
* doc/invoke.texi: Document support for Neoverse V1.
diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index a7dde38d768..a3bd56f5b43 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -134,6 +134,7 @@ AARCH64_CORE("thunderx3t110",  thunderx3t110,  
thunderx3t110, 8_3A,  AARCH64_FL_
 
 /* Arm ('A') cores.  */
 AARCH64_CORE("zeus", zeus, cortexa57, 8_4A,  AARCH64_FL_FOR_ARCH8_4 | 
AARCH64_FL_SVE | AARCH64_FL_RCPC | AARCH64_FL_I8MM | AARCH64_FL_BF16 | 
AARCH64_FL_F16 | AARCH64_FL_PROFILE | AARCH64_FL_SSBS | AARCH64_FL_RNG, 
neoversen1, 0x41, 0xd40, -1)
+AARCH64_CORE("neoverse-v1", neoversev1, cortexa57, 8_4A,  
AARCH64_FL_FOR_ARCH8_4 | AARCH64_FL_SVE | AARCH64_FL_RCPC | AARCH64_FL_I8MM | 
AARCH64_FL_BF16 | AARCH64_FL_F16 | AARCH64_FL_PROFILE | AARCH64_FL_SSBS | 
AARCH64_FL_RNG, neoversen1, 0x41, 0xd40, -1)
 
 /* Qualcomm ('Q') cores. */
 AARCH64_CORE("saphira", saphira,saphira,8_4A,  
AARCH64_FL_FOR_ARCH8_4 | AARCH64_FL_CRYPTO | AARCH64_FL_RCPC, saphira,   0x51, 
0xC01, -1)
diff --git a/gcc/config/aarch64/aarch64-tune.md 
b/gcc/config/aarch64/aarch64-tune.md
index ebf97c38fbd..8e38052d6cf 100644
--- a/gcc/config/aarch64/aarch64-tune.md
+++ b/gcc/config/aarch64/aarch64-tune.md
@@ -1,5 +1,5 @@
 ;; -*- buffer-read-only: t -*-
 ;; Generated automatically by gentune.sh from aarch64-cores.def
 (define_attr "tune"
-   
"cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa65,cortexa65ae,ares,neoversen1,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,zeus,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55"
+   
"cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa65,cortexa65ae,ares,neoversen1,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,zeus,neoversev1,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55"
(const (symbol_ref "((enum attr_tune) aarch64_tune)")))
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 3d3a208dcaa..5b408150084 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -16977,8 +16977,8 @@ performance of the code.  Permissible values for this 
option are:
 @samp{cortex-a76}, @samp{cortex-a76ae}, @samp{cortex-a77},
 @samp{cortex-a65}, @samp{cortex-a65ae}, @samp{cortex-a34},
 @samp{ares}, @samp{exynos-m1}, @samp{emag}, @samp{falkor},
-@samp{neoverse-e1},@samp{neoverse-n1},@samp{qdf24xx}, @samp{saphira},
-@samp{phecda}, @samp{xgene1}, @samp{vulcan}, @samp{octeontx},
+@samp{neoverse-e1},@samp{neoverse-n1},@samp{neoverse-v1},@samp{qdf24xx},
+@samp{saphira}, @samp{phecda}, @samp{xgene1}, @samp{vulcan}, @samp{octeontx},
 @samp{octeontx81},  @samp{octeontx83},
 @samp{octeontx2}, @samp{octeontx2t98}, @samp{octeontx2t96}
 @samp{octeontx2t93}, @samp{octeontx2f95}, @samp{octeontx2f95n},


Re: [PATCH] arm: Add a couple of extra stack-protector tests

2020-09-24 Thread Richard Sandiford
Kyrylo Tkachov  writes:
> Hi Richard,
>
>> -Original Message-
>> From: Richard Sandiford 
>> Sent: 23 September 2020 19:34
>> To: gcc-patches@gcc.gnu.org
>> Cc: ni...@redhat.com; Richard Earnshaw ;
>> Ramana Radhakrishnan ; Kyrylo
>> Tkachov 
>> Subject: [PATCH] arm: Add a couple of extra stack-protector tests
>> 
>> These tests were inspired by the corresponding aarch64 ones that I just
>> committed.  They already pass.
>> 
>> Tested on arm-linux-gnueabi, arm-linux-gnueabihf and armeb-eabi.
>> OK for trunk?
>
> Ok. Do they also need to go on the branches when the fix is backported?

There's not really an associated fix for this.  It's more just a defensive
patch: it's trying to make sure that the equivalent of the aarch64 bug
doesn't creep (back) into arm.  It was the same idea in the other direction
for 0f0b00033a71ff728d6fab6f9d: I've no evidence that those tests ever
failed on aarch64, but it seemed like a good idea to add aarch64
equivalents of the failing arm tests.

Thanks,
Richard


[PATCH][GCC 9] aarch64: Add support for Neoverse V1 CPU

2020-09-24 Thread Alex Coplan
This patch backports the AArch64 support for Arm's Neoverse V1 CPU to
GCC 9.

Testing:
 * Bootstrapped and regtested on aarch64-none-linux-gnu.

OK for GCC 9 branch?

Thanks,
Alex

---

gcc/ChangeLog:

* config/aarch64/aarch64-cores.def: Add Neoverse V1.
* config/aarch64/aarch64-tune.md: Regenerate.
* doc/invoke.texi: Document support for Neoverse V1.
diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index 9214686d9d1..48f1ac3ecf1 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -114,6 +114,8 @@ AARCH64_CORE("tsv110",  tsv110, tsv110, 8_2A,  
AARCH64_FL_FOR_ARCH8_2 | AARCH64_
 
 /* ARM ('A') cores. */
 AARCH64_CORE("zeus", zeus, cortexa57, 8_4A,  AARCH64_FL_FOR_ARCH8_4 | 
AARCH64_FL_SVE | AARCH64_FL_RCPC | AARCH64_FL_F16 | AARCH64_FL_PROFILE | 
AARCH64_FL_SSBS, neoversen1, 0x41, 0xd40, -1)
+AARCH64_CORE("neoverse-v1", neoversev1, cortexa57, 8_4A,  
AARCH64_FL_FOR_ARCH8_4 | AARCH64_FL_SVE | AARCH64_FL_RCPC | AARCH64_FL_F16 | 
AARCH64_FL_PROFILE | AARCH64_FL_SSBS, neoversen1, 0x41, 0xd40, -1)
+
 
 /* Qualcomm ('Q') cores. */
 AARCH64_CORE("saphira", saphira,saphira,8_4A,  
AARCH64_FL_FOR_ARCH8_4 | AARCH64_FL_CRYPTO | AARCH64_FL_RCPC, saphira,   0x51, 
0xC01, -1)
diff --git a/gcc/config/aarch64/aarch64-tune.md 
b/gcc/config/aarch64/aarch64-tune.md
index a3bd30754ea..f5d62de5940 100644
--- a/gcc/config/aarch64/aarch64-tune.md
+++ b/gcc/config/aarch64/aarch64-tune.md
@@ -1,5 +1,5 @@
 ;; -*- buffer-read-only: t -*-
 ;; Generated automatically by gentune.sh from aarch64-cores.def
 (define_attr "tune"
-   
"cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,ares,neoversen1,neoversee1,a64fx,tsv110,zeus,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55"
+   
"cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,ares,neoversen1,neoversee1,a64fx,tsv110,zeus,neoversev1,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55"
(const (symbol_ref "((enum attr_tune) aarch64_tune)")))
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index cb2dde07343..67cebf59fb7 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -15851,8 +15851,8 @@ performance of the code.  Permissible values for this 
option are:
 @samp{generic}, @samp{cortex-a35}, @samp{cortex-a53}, @samp{cortex-a55},
 @samp{cortex-a57}, @samp{cortex-a72}, @samp{cortex-a73}, @samp{cortex-a75},
 @samp{cortex-a76}, @samp{ares}, @samp{exynos-m1}, @samp{emag}, @samp{falkor},
-@samp{neoverse-e1},@samp{neoverse-n1},@samp{qdf24xx}, @samp{saphira},
-@samp{phecda}, @samp{xgene1}, @samp{vulcan}, @samp{octeontx},
+@samp{neoverse-e1},@samp{neoverse-n1},@samp{neoverse-v1},@samp{qdf24xx},
+@samp{saphira}, @samp{phecda}, @samp{xgene1}, @samp{vulcan}, @samp{octeontx},
 @samp{octeontx81},  @samp{octeontx83},
 @samp{a64fx},
 @samp{thunderx}, @samp{thunderxt88},


[PATCH][GCC 8] aarch64: Add support for Neoverse V1 CPU

2020-09-24 Thread Alex Coplan
This patch backports the AArch64 support for Arm's Neoverse V1 CPU to
GCC 8.

Testing:
 * Bootstrapped and regtested on aarch64-none-linux-gnu.

OK for GCC 8 branch?

Thanks,
Alex

---

gcc/ChangeLog:

* config/aarch64/aarch64-cores.def: Add Neoverse V1.
* config/aarch64/aarch64-tune.md: Regenerate.
* doc/invoke.texi: Document support for Neoverse V1.
diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index eb01390c262..35ce68ad077 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -98,6 +98,7 @@ AARCH64_CORE("saphira", saphira,falkor,8_3A,  
AARCH64_FL_FOR_ARCH8_3
 
 /* ARM ('A') cores. */
 AARCH64_CORE("zeus", zeus, cortexa57, 8_4A,  AARCH64_FL_FOR_ARCH8_4 | 
AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_SVE, cortexa72, 0x41, 0xd40, -1)
+AARCH64_CORE("neoverse-v1", neoversev1, cortexa57, 8_4A,  
AARCH64_FL_FOR_ARCH8_4 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_SVE, 
cortexa72, 0x41, 0xd40, -1)
 
 /* ARMv8-A big.LITTLE implementations.  */
 
diff --git a/gcc/config/aarch64/aarch64-tune.md 
b/gcc/config/aarch64/aarch64-tune.md
index c2de5e873a7..e8894ee4a9d 100644
--- a/gcc/config/aarch64/aarch64-tune.md
+++ b/gcc/config/aarch64/aarch64-tune.md
@@ -1,5 +1,5 @@
 ;; -*- buffer-read-only: t -*-
 ;; Generated automatically by gentune.sh from aarch64-cores.def
 (define_attr "tune"
-   
"cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,thunderxt81,thunderxt83,xgene1,falkor,qdf24xx,exynosm1,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,ares,neoversen1,saphira,zeus,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55"
+   
"cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,thunderxt81,thunderxt83,xgene1,falkor,qdf24xx,exynosm1,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,ares,neoversen1,saphira,zeus,neoversev1,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55"
(const (symbol_ref "((enum attr_tune) aarch64_tune)")))
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 47126319e72..a46a9cb31f7 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -14771,8 +14771,8 @@ Specify the name of the target processor for which GCC 
should tune the
 performance of the code.  Permissible values for this option are:
 @samp{generic}, @samp{cortex-a35}, @samp{cortex-a53}, @samp{cortex-a55},
 @samp{cortex-a57}, @samp{cortex-a72}, @samp{cortex-a73}, @samp{cortex-a75},
-@samp{cortex-a76}, @samp{ares}, @samp{neoverse-n1}, @samp{zeus},
-@samp{exynos-m1}, @samp{falkor}, @samp{qdf24xx}, @samp{saphira},
+@samp{cortex-a76}, @samp{ares}, @samp{neoverse-n1}, @samp{neoverse-v1},
+@samp{zeus}, @samp{exynos-m1}, @samp{falkor}, @samp{qdf24xx}, @samp{saphira},
 @samp{xgene1}, @samp{vulcan}, @samp{thunderx},
 @samp{thunderxt88}, @samp{thunderxt88p1}, @samp{thunderxt81},
 @samp{thunderxt83}, @samp{thunderx2t99}, @samp{cortex-a57.cortex-a53},


Add access through parameter derference tracking to modref

2020-09-24 Thread Jan Hubicka
Hi,
this patch re-adds tracking of accesses which was unfinished in David's patch.
At the moment I only implemented tracking of the fact that access is based on
derefernece of the parameter (so we track THIS pointers).
Patch does not implement IPA propagation since it needs bit more work which
I will post shortly: ipa-fnsummary needs to track when parameter points to
local memory, summaries needs to be merged when function is inlined (because
jump functions are) and propagation needs to be turned into iterative dataflow
on SCC components.

Patch also adds documentation of -fipa-modref and params that was left 
uncommited
in my branch :(.

Even without this change it does lead to nice increase of disambiguations
for cc1plus build.

Alias oracle query stats:
  refs_may_alias_p: 62758323 disambiguations, 72935683 queries
  ref_maybe_used_by_call_p: 139511 disambiguations, 63654045 queries
  call_may_clobber_ref_p: 23502 disambiguations, 29242 queries
  nonoverlapping_component_refs_p: 0 disambiguations, 37654 queries
  nonoverlapping_refs_since_match_p: 19417 disambiguations, 5 must 
overlaps, 75721 queries
  aliasing_component_refs_p: 54665 disambiguations, 752449 queries
  TBAA oracle: 21917926 disambiguations 53054678 queries
   15763411 are in alias set 0
   10162238 queries asked about the same object
   124 queries asked about the same alias set
   0 access volatile
   3681593 are dependent in the DAG
   1529386 are aritificially in conflict with void *

Modref stats:
  modref use: 8311 disambiguations, 32527 queries
  modref clobber: 742126 disambiguations, 1036986 queries
  1987054 tbaa queries (1.916182 per modref query)
  125479 base compares (0.121004 per modref query)

PTA query stats:
  pt_solution_includes: 968314 disambiguations, 13609584 queries
  pt_solutions_intersect: 1019136 disambiguations, 13147139 queries

So compared to
https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554605.html
we get 41% more use disambiguations (with similar number of queries) and 8% more
clobber disambiguations.

For tramp3d:
Alias oracle query stats:
  refs_may_alias_p: 2052256 disambiguations, 2312703 queries
  ref_maybe_used_by_call_p: 7122 disambiguations, 2089118 queries
  call_may_clobber_ref_p: 234 disambiguations, 234 queries
  nonoverlapping_component_refs_p: 0 disambiguations, 4299 queries
  nonoverlapping_refs_since_match_p: 329 disambiguations, 10200 must overlaps, 
10616 queries
  aliasing_component_refs_p: 857 disambiguations, 34555 queries
  TBAA oracle: 885546 disambiguations 1677080 queries
   132105 are in alias set 0
   469030 queries asked about the same object
   0 queries asked about the same alias set
   0 access volatile
   190084 are dependent in the DAG
   315 are aritificially in conflict with void *

Modref stats:
  modref use: 426 disambiguations, 1881 queries
  modref clobber: 10042 disambiguations, 16202 queries
  19405 tbaa queries (1.197692 per modref query)
  2775 base compares (0.171275 per modref query)

PTA query stats:
  pt_solution_includes: 313908 disambiguations, 526183 queries
  pt_solutions_intersect: 130510 disambiguations, 416084 queries

Here uses decrease by 4 disambiguations and clobber improve by 3.5%.  I think
the difference is caused by fact that gcc has much more alias set 0 accesses
originating from gimple and tree unions as I mentioned in original mail.

After pushing out the IPA propagation I will re-add code to track offsets and
sizes that further improve disambiguation. On tramp3d it enables a lot of DSE
for structure fields not acessed by uninlined function.

Bootstrapped/regtested x86_64-linux also lto-bootstrapped without chekcing (to
get the stats). OK?

Richi, all aliasing related changes are in base_may_alias_with_dereference_p.

Honza

2020-09-24  Jan Hubicka  

* doc/invoke.texi: Document -fipa-modref, ipa-modref-max-bases,
ipa-modref-max-refs, ipa-modref-max-accesses, ipa-modref-max-tests.
* ipa-modref-tree.c (test_insert_search_collapse): Update.
(test_merge): Update.
(gt_ggc_mx): New function.
* ipa-modref-tree.h (struct modref_access_node): New structure.
(struct modref_ref_node): Add every_access and accesses array.
(modref_ref_node::modref_ref_node): Update ctor.
(modref_ref_node::search): New member function.
(modref_ref_node::collapse): New member function.
(modref_ref_node::insert_access): New member function.
(modref_base_node::insert_ref): Do not collapse base if ref is 0.
(modref_base_node::collapse): Copllapse also refs.
(modref_tree): Add accesses.
(modref_tree::modref_tree): Initialize max_accesses.
(modref_tree::insert): Add access parameter.
(modref_tree::cleanup): New member function.
(modref_tree::merge): Add parm_map; merge accesses.
  

Re: [gcc-7-arm] Backport -moutline-atomics flag

2020-09-24 Thread Richard Biener via Gcc-patches
On Fri, Sep 11, 2020 at 12:38 AM Pop, Sebastian via Gcc-patches
 wrote:
>
> Hi,
>
> the attached patches are back-porting the flag -moutline-atomics to the 
> gcc-7-arm vendor branch.
> The flag enables a very important performance optimization for N1-neoverse 
> processors.
> The patches pass bootstrap and make check on Graviton2 aarch64-linux.
>
> Ok to commit to the gcc-7-arm vendor branch?

Given the branch doesn't exist yet can you eventually push this series to
a user branch (or a amazon vendor branch)?

You failed to CC arm folks so your mail might have been lost in the noise.

Thanks,
Richard.

> Thanks,
> Sebastian
>


Re: [PATCH] add move CTOR to auto_vec, use auto_vec for get_loop_exit_edges

2020-09-24 Thread Richard Biener
On Wed, 26 Aug 2020, Richard Biener wrote:

> On Thu, 6 Aug 2020, Richard Biener wrote:
> 
> > On Thu, 6 Aug 2020, Richard Biener wrote:
> > 
> > > This adds a move CTOR to auto_vec and makes use of a
> > > auto_vec return value for get_loop_exit_edges denoting
> > > that lifetime management of the vector is handed to the caller.
> > > 
> > > The move CTOR prompted the hash_table change because it appearantly
> > > makes the copy CTOR implicitely deleted (good) and hash-table
> > > expansion of the odr_enum_map which is
> > > hash_map  where odr_enum has an
> > > auto_vec member triggers this.  Not sure if
> > > there's a latent bug there before this (I think we're not
> > > invoking DTORs, but we're invoking copy-CTORs).
> > > 
> > > Bootstrap / regtest running on x86_64-unknown-linux-gnu.
> > > 
> > > Does this all look sensible and is it a good change
> > > (the get_loop_exit_edges one)?
> > 
> > Regtest went OK, here's an update with a complete ChangeLog
> > (how useful..) plus the move assign operator deleted, copy
> > assign wouldn't work as auto-generated and at the moment
> > there's no use of assigning.  I guess if we'd have functions
> > that take an auto_vec<> argument meaning they will destroy
> > the vector that will become useful and we can implement it.
> > 
> > OK for trunk?
> 
> Ping.

Ping^2.

Thanks,
Richard.

> > Thanks,
> > Richard.
> > 
> > 
> > From d74c346e95ff967d930b7c83daabc26b0227aea3 Mon Sep 17 00:00:00 2001
> > From: Richard Biener 
> > Date: Thu, 6 Aug 2020 14:50:56 +0200
> > Subject: [PATCH] add move CTOR to auto_vec, use auto_vec for
> >  get_loop_exit_edges
> > 
> > This adds a move CTOR to auto_vec and makes use of a
> > auto_vec return value for get_loop_exit_edges denoting
> > that lifetime management of the vector is handed to the caller.
> > 
> > The move CTOR prompted the hash_table change because it appearantly
> > makes the copy CTOR implicitely deleted (good) and hash-table
> > expansion of the odr_enum_map which is
> > hash_map  where odr_enum has an
> > auto_vec member triggers this.  Not sure if
> > there's a latent bug there before this (I think we're not
> > invoking DTORs, but we're invoking copy-CTORs).
> > 
> > 2020-08-06  Richard Biener  
> > 
> > * vec.h (auto_vec::auto_vec (auto_vec &&)): New move CTOR.
> > (auto_vec::operator=(auto_vec &&)): Delete.
> > * hash-table.h (hash_table::expand): Use std::move when expanding.
> > * cfgloop.h (get_loop_exit_edges): Return auto_vec.
> > * cfgloop.c (get_loop_exit_edges): Adjust.
> > * cfgloopmanip.c (fix_loop_placement): Likewise.
> > * ipa-fnsummary.c (analyze_function_body): Likewise.
> > * ira-build.c (create_loop_tree_nodes): Likewise.
> > (create_loop_tree_node_allocnos): Likewise.
> > (loop_with_complex_edge_p): Likewise.
> > * ira-color.c (ira_loop_edge_freq): Likewise.
> > * loop-unroll.c (analyze_insns_in_loop): Likewise.
> > * predict.c (predict_loops): Likewise.
> > * tree-predcom.c (last_always_executed_block): Likewise.
> > * tree-ssa-loop-ch.c (ch_base::copy_headers): Likewise.
> > * tree-ssa-loop-im.c (store_motion_loop): Likewise.
> > * tree-ssa-loop-ivcanon.c (loop_edge_to_cancel): Likewise.
> > (canonicalize_loop_induction_variables): Likewise.
> > * tree-ssa-loop-manip.c (get_loops_exits): Likewise.
> > * tree-ssa-loop-niter.c (find_loop_niter): Likewise.
> > (finite_loop_p): Likewise.
> > (find_loop_niter_by_eval): Likewise.
> > (estimate_numbers_of_iterations): Likewise.
> > * tree-ssa-loop-prefetch.c (emit_mfence_after_loop): Likewise.
> > (may_use_storent_in_loop_p): Likewise.
> > ---
> >  gcc/cfgloop.c|  4 ++--
> >  gcc/cfgloop.h|  2 +-
> >  gcc/cfgloopmanip.c   |  3 +--
> >  gcc/hash-table.h |  2 +-
> >  gcc/ipa-fnsummary.c  |  4 +---
> >  gcc/ira-build.c  | 12 +++-
> >  gcc/ira-color.c  |  4 +---
> >  gcc/loop-unroll.c|  3 +--
> >  gcc/predict.c|  9 ++---
> >  gcc/tree-predcom.c   |  3 +--
> >  gcc/tree-ssa-loop-ch.c   |  3 +--
> >  gcc/tree-ssa-loop-im.c   |  3 +--
> >  gcc/tree-ssa-loop-ivcanon.c  |  9 ++---
> >  gcc/tree-ssa-loop-manip.c|  3 +--
> >  gcc/tree-ssa-loop-niter.c| 20 +---
> >  gcc/tree-ssa-loop-prefetch.c |  7 ++-
> >  gcc/vec.h|  7 +++
> >  17 files changed, 33 insertions(+), 65 deletions(-)
> > 
> > diff --git a/gcc/cfgloop.c b/gcc/cfgloop.c
> > index 7720e6e5d2c..33a26cca6a4 100644
> > --- a/gcc/cfgloop.c
> > +++ b/gcc/cfgloop.c
> > @@ -1202,10 +1202,10 @@ release_recorded_exits (function *fn)
> >  
> >  /* Returns the list of the exit edges of a LOOP.  */
> >  
> > -vec 
> > +auto_vec
> >  get_loop_exit_edges (const class loop *loop, basic_block *body)
> >  {
> > -  vec edges = vNULL;
> > +  auto_vec edges;
> >edge e;
> >unsigned i;
> >edge_iterator ei;
> > diff -

RE: [PATCH][GCC 8] aarch64: Add support for Neoverse V1 CPU

2020-09-24 Thread Kyrylo Tkachov
Hi Alex,

> -Original Message-
> From: Alex Coplan 
> Sent: 24 September 2020 10:01
> To: gcc-patches@gcc.gnu.org
> Cc: Richard Earnshaw ; Richard Sandiford
> ; Kyrylo Tkachov 
> Subject: [PATCH][GCC 8] aarch64: Add support for Neoverse V1 CPU
> 
> This patch backports the AArch64 support for Arm's Neoverse V1 CPU to
> GCC 8.
> 
> Testing:
>  * Bootstrapped and regtested on aarch64-none-linux-gnu.
> 
> OK for GCC 8 branch?

This is okay, as well as the GCC 9 and 10 backports.
Thanks,
Kyrill

> 
> Thanks,
> Alex
> 
> ---
> 
> gcc/ChangeLog:
> 
>   * config/aarch64/aarch64-cores.def: Add Neoverse V1.
>   * config/aarch64/aarch64-tune.md: Regenerate.
>   * doc/invoke.texi: Document support for Neoverse V1.


Re: [PATCH] PR libstdc++/71579 assert that type traits are not misused with an incomplete type

2020-09-24 Thread Jonathan Wakely via Gcc-patches

On 24/09/20 10:15 +0300, Antony Polukhin via Libstdc++ wrote:

Looks like the last patch was not applied. Do I have to change something in
it?


No, it just hasn't been reviewed yet.




Re: [Patch] LTO: Force externally_visible for offload_vars/funcs (PR97179)

2020-09-24 Thread Tobias Burnus

On 9/24/20 10:03 AM, Richard Biener wrote:


The symbols are added to offload_vars + offload_funcs.
In lto-cgraph.c's output_offload_tables there is the last chance
to remove now unused nodes ? as once the tables are streamed
for device usage, they cannot be changed. Hence, there one
has
node->force_output = 1;
[Unrelated: this prevents later optimizations, which still
could be done; cf. PR95622]


The table itself is written in omp-offload.c's omp_finish_file.

But this is called at LTRANS time only, in particular we seem
to stream the offload_funcs/vars array, marking streamed nodes
as force_output but we do not make the offload table visible
to the partitioner.  But force_output should make the
nodes not renamed.  But then output_offload_tables is called at
the very end and we likely do not stream the altered
force_output state.

So - can you try, in prune_offload_funcs, in addition to
setting DECL_PRESERVE_P, mark the cgraph node ->force_output
so this happens early?  I guess the same is needed for
variables (there's no prune_offloar_vars ...).


As it accesses global variables, I could do just the same
with the variables – but it did not seems to have an effect.

Following Jakub's suggestion, I also added
  __attribute__((used))
to the tree belonging to both tables in omp-offload.c's omp_finish
but that did not help, either.

I think both the 'used' and 'force_output' are red herrings:
after all, the tables and the referrenced funcs/vars are output;
the problem is 'just' that they end up in different ltrans
while not being public. – Thus, some property ia wrong
during building the cgraph or when it is partitioned into ltrans.

Any additional suggestion to try?

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter


[ping] move and adjust PROBE_STACK_*_REG on aarch64

2020-09-24 Thread Olivier Hainque
Hello,

After
  https://gcc.gnu.org/pipermail/gcc-patches/2020-January/537843.html
and
  https://gcc.gnu.org/legacy-ml/gcc-patches/2019-12/msg01398.html

Re-proposing this patch after re-testing with a recent
mainline on on aarch64-linux (bootstrap and regression test
with --enable-languages=all), and more than a year of in-house
use in production for a few aarch64 ports on a gcc-9 base.

The change moves the definitions of PROBE_STACK_FIRST_REG
and PROBE_STACK_SECOND_REG to a more appropriate place for such
items (here, in aarch64.md as suggested by Richard), and adjusts
their value from r9/r10 to r10/r11 to free r9 for a possibly
more general purpose (e.g. as a static chain at least on targets
which have a private use of r18, such as Windows or Vxworks).

OK to commit?

Thanks in advance,

With Kind Regards,

Olivier

2020-11-07  Olivier Hainque  

* config/aarch64/aarch64.md: Define PROBE_STACK_FIRST_REGNUM
and PROBE_STACK_SECOND_REGNUM constants, designating r10/r11.
Replacements for the PROBE_STACK_FIRST/SECOND_REG constants in
aarch64.c.
* config/aarch64/aarch64.c (PROBE_STACK_FIRST_REG): Remove.
(PROBE_STACK_SECOND_REG): Remove.
(aarch64_emit_probe_stack_range): Adjust to the _REG -> _REGNUM
suffix update for PROBE_STACK register numbers.



aarch64-regnum.diff
Description: Binary data


Re: [Patch] LTO: Force externally_visible for offload_vars/funcs (PR97179)

2020-09-24 Thread Jakub Jelinek via Gcc-patches
On Thu, Sep 24, 2020 at 11:41:00AM +0200, Tobias Burnus wrote:
> Following Jakub's suggestion, I also added
>   __attribute__((used))
> to the tree belonging to both tables in omp-offload.c's omp_finish
> but that did not help, either.

That is really DECL_PRESERVED_P, the attribute is turned into that, so no
need to have attribute around after setting it.
That is needed (but already done), but clearly not sufficient.
What we need to emulate is the effect of all those decls being referenced
from a single (preserved) initializer, which would need to refer to their
names too.  Except we don't really have such a var and initializer
constructed early enough probably.
Now, for vars with initializers I think there is
record_references_in_initializer to remember those references, so do we need
to emulate that behavior?
Or, see what effects it has on the partitioning, and if it means forcing all
the referenced decls that aren't TREE_PUBLIC into the same partition, do it
for the offloading funcs and vars too?

Jakub



Re: [PATCH] aarch64: Do not alter value on a force_reg returned rtx expanding __jcvt

2020-09-24 Thread Andrea Corallo
Andrea Corallo  writes:

> Kyrylo Tkachov  writes:
[...]
>>
>> Can you please also backport it to the appropriate branches as well after 
>> some time on trunk.
>> Thanks,
>> Kyrill
>
> Ciao Kyrill,
>
> Sure happy to do that.  For now into trunk as 2c62952f816.

Backported into gcc-10 as aa47c987340.

Thanks

  Andrea


Re: [Patch] LTO: Force externally_visible for offload_vars/funcs (PR97179)

2020-09-24 Thread Richard Biener via Gcc-patches
On Thu, Sep 24, 2020 at 11:41 AM Tobias Burnus  wrote:
>
> On 9/24/20 10:03 AM, Richard Biener wrote:
>
> >> The symbols are added to offload_vars + offload_funcs.
> >> In lto-cgraph.c's output_offload_tables there is the last chance
> >> to remove now unused nodes ? as once the tables are streamed
> >> for device usage, they cannot be changed. Hence, there one
> >> has
> >> node->force_output = 1;
> >> [Unrelated: this prevents later optimizations, which still
> >> could be done; cf. PR95622]
> >>
> >>
> >> The table itself is written in omp-offload.c's omp_finish_file.
> > But this is called at LTRANS time only, in particular we seem
> > to stream the offload_funcs/vars array, marking streamed nodes
> > as force_output but we do not make the offload table visible
> > to the partitioner.  But force_output should make the
> > nodes not renamed.  But then output_offload_tables is called at
> > the very end and we likely do not stream the altered
> > force_output state.
> >
> > So - can you try, in prune_offload_funcs, in addition to
> > setting DECL_PRESERVE_P, mark the cgraph node ->force_output
> > so this happens early?  I guess the same is needed for
> > variables (there's no prune_offloar_vars ...).
>
> As it accesses global variables, I could do just the same
> with the variables – but it did not seems to have an effect.
>
> Following Jakub's suggestion, I also added
>__attribute__((used))
> to the tree belonging to both tables in omp-offload.c's omp_finish
> but that did not help, either.
>
> I think both the 'used' and 'force_output' are red herrings:
> after all, the tables and the referrenced funcs/vars are output;
> the problem is 'just' that they end up in different ltrans
> while not being public. – Thus, some property ia wrong
> during building the cgraph or when it is partitioned into ltrans.
>
> Any additional suggestion to try?

As I said the table itself is only created _after_ partitioning
so LTO doesn't see they are referenced from outside of their
LTRANS unit (in the unit that has the offload table).

I think we need to create the offload table during WPA
instead.

Richard.

> Tobias
>
> -
> Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
> Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, 
> Alexander Walter


Re: [Patch] LTO: Force externally_visible for offload_vars/funcs (PR97179)

2020-09-24 Thread Richard Biener via Gcc-patches
On Thu, Sep 24, 2020 at 11:50 AM Jakub Jelinek via Gcc-patches
 wrote:
>
> On Thu, Sep 24, 2020 at 11:41:00AM +0200, Tobias Burnus wrote:
> > Following Jakub's suggestion, I also added
> >   __attribute__((used))
> > to the tree belonging to both tables in omp-offload.c's omp_finish
> > but that did not help, either.
>
> That is really DECL_PRESERVED_P, the attribute is turned into that, so no
> need to have attribute around after setting it.
> That is needed (but already done), but clearly not sufficient.
> What we need to emulate is the effect of all those decls being referenced
> from a single (preserved) initializer, which would need to refer to their
> names too.  Except we don't really have such a var and initializer
> constructed early enough probably.
> Now, for vars with initializers I think there is
> record_references_in_initializer to remember those references, so do we need
> to emulate that behavior?
> Or, see what effects it has on the partitioning, and if it means forcing all
> the referenced decls that aren't TREE_PUBLIC into the same partition, do it
> for the offloading funcs and vars too?

Create the offload table at WPA time so we get to see it during partitioning?

> Jakub
>


Tighten flag_pic processing in vxworks_override_options

2020-09-24 Thread Olivier Hainque

This fixes spurious complaints about PIC mode not supported
on "gcc --help=...", on VxWorks without -mrtp. The spurious message
is emitted by vxworks_override_options, called with flag_pic == -1
when we're running for --help.

The change simply adjusts the check testing for "we're generating pic code"
to "flag_pic > 0" instead of just "flag_pic". We're not generating code at
all when reaching here with -1.

Tested by verifying that the spurious message goes away in
production gcc-9 based toolchains for more than a year now, and
sanity checked that I can build a mainline compiler with the
patch applied.

Committing to mainline shortly.

Olivier

2020-09-24  Olivier Hainque  

* config/vxworks.c (vxworks_override_options): Guard pic checks with
flag_pic > 0 instead of just flag_pic.




0001-Tigthen-flag_pic-processing-in-vxworks_override_o.diff
Description: Binary data


[PATCH][testsuite] Add effective target ident_directive

2020-09-24 Thread Tom de Vries
Hi,

On nvptx we run into:
...
FAIL: c-c++-common/ident-1b.c  -Wc++-compat   scan-assembler GCC:
FAIL: c-c++-common/ident-2b.c  -Wc++-compat   scan-assembler GCC:
...

Using a scan-assembler directive adds -fno-indent to the compile options.
The test c-c++-common/ident-1b.c adds dg-options "-fident", and intends to
check that the -fident overrides the -fno-indent, by means of the
scan-assembler.  But for nvptx, there's no .ident directive, both with -fident
and -fno-ident.

Fix this by adding an effective target ident_directive, and requiring
it in both test-cases.

Tested on nvptx and x86_64.

OK for trunk?

Thanks,
- Tom

[testsuite] Add effective target ident_directive

gcc/testsuite/ChangeLog:

2020-09-24  Tom de Vries  

* lib/target-supports.exp (check_effective_target_ident_directive):
New proc.
* c-c++-common/ident-1b.c: Require effective target ident_directive.
* c-c++-common/ident-2b.c: Same.

---
 gcc/testsuite/c-c++-common/ident-1b.c | 1 +
 gcc/testsuite/c-c++-common/ident-2b.c | 1 +
 gcc/testsuite/lib/target-supports.exp | 9 +
 3 files changed, 11 insertions(+)

diff --git a/gcc/testsuite/c-c++-common/ident-1b.c 
b/gcc/testsuite/c-c++-common/ident-1b.c
index 69567442a03..b8b83e64ad2 100644
--- a/gcc/testsuite/c-c++-common/ident-1b.c
+++ b/gcc/testsuite/c-c++-common/ident-1b.c
@@ -2,6 +2,7 @@
  * Make sure scan-assembler turns off .ident unless -fident in testcase */
 /* { dg-do compile } */
 /* { dg-options "-fident" } */
+/* { dg-require-effective-target ident_directive }*/
 int i;
 
 /* { dg-final { scan-assembler "GCC: " { xfail { { hppa*-*-hpux* && { ! lp64 } 
} || { powerpc-ibm-aix* || powerpc*-*-darwin* } } } } } */
diff --git a/gcc/testsuite/c-c++-common/ident-2b.c 
b/gcc/testsuite/c-c++-common/ident-2b.c
index fae6a031571..52f0693e164 100644
--- a/gcc/testsuite/c-c++-common/ident-2b.c
+++ b/gcc/testsuite/c-c++-common/ident-2b.c
@@ -2,6 +2,7 @@
  * Make sure scan-assembler-times turns off .ident unless -fident in testcase 
*/
 /* { dg-do compile } */
 /* { dg-options "-fident" } */
+/* { dg-require-effective-target ident_directive }*/
 int ident;
 
 /* { dg-final { scan-assembler "GCC: " { xfail { { hppa*-*-hpux* && { ! lp64 } 
} || { powerpc-ibm-aix* || powerpc*-*-darwin* } } } } } */
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 5cbe32ffbd6..0a00972edb5 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -10510,3 +10510,12 @@ proc check_symver_available { } {
}
}]
 }
+
+# Return 1 if emitted assembly contains .ident directive.
+
+proc check_effective_target_ident_directive {} {
+return [check_no_messages_and_pattern ident_directive \
+   "(?n)^\[\t\]+\\.ident" assembly {
+   int i;
+}]
+}


[PATCH][GCC 8] AArch64: Update Armv8.4-a's FP16 FML intrinsics

2020-09-24 Thread Kyrylo Tkachov
Hi all,

I'd like to backport this fix from Tamar to the GCC 8 branch to avoid having 
incorrectly-named intrinsics.
Tested on aarch64-none-elf.

Committing to the branch.

This patch updates the Armv8.4-a FP16 FML intrinsics's suffixes from u32 to f16
to be more consistent with the naming convention for intrinsics.

The specifications for these intrinsics have not been published yet so we do
not need to maintain the old names.

The patch was created with the following script:

grep -lIE "(vfml[as].+)_u32" -r gcc/ | grep -iEv ".+Changelog.*" \
  | xargs sed -i -E -e "s/(vfml[as].+)_u32/\1_f16/g"

gcc/
PR target/71233
* config/aarch64/arm_neon.h (vfmlal_low_u32, vfmlsl_low_u32,
vfmlalq_low_u32, vfmlslq_low_u32, vfmlal_high_u32, vfmlsl_high_u32,
vfmlalq_high_u32, vfmlslq_high_u32, vfmlal_lane_low_u32,
vfmlsl_lane_low_u32, vfmlal_laneq_low_u32, vfmlsl_laneq_low_u32,
vfmlalq_lane_low_u32, vfmlslq_lane_low_u32, vfmlalq_laneq_low_u32,
vfmlslq_laneq_low_u32, vfmlal_lane_high_u32, vfmlsl_lane_high_u32,
vfmlal_laneq_high_u32, vfmlsl_laneq_high_u32, vfmlalq_lane_high_u32,
vfmlslq_lane_high_u32, vfmlalq_laneq_high_u32, vfmlslq_laneq_high_u32):
Rename ...
(vfmlal_low_f16, vfmlsl_low_f16, vfmlalq_low_f16, vfmlslq_low_f16,
vfmlal_high_f16, vfmlsl_high_f16, vfmlalq_high_f16, vfmlslq_high_f16,
vfmlal_lane_low_f16, vfmlsl_lane_low_f16, vfmlal_laneq_low_f16,
vfmlsl_laneq_low_f16, vfmlalq_lane_low_f16, vfmlslq_lane_low_f16,
vfmlalq_laneq_low_f16, vfmlslq_laneq_low_f16, vfmlal_lane_high_f16,
vfmlsl_lane_high_f16, vfmlal_laneq_high_f16, vfmlsl_laneq_high_f16,
vfmlalq_lane_high_f16, vfmlslq_lane_high_f16, vfmlalq_laneq_high_f16,
vfmlslq_laneq_high_f16): ... To this.

gcc/testsuite/
PR target/71233
* gcc.target/aarch64/fp16_fmul_high.h (test_vfmlal_high_u32,
test_vfmlalq_high_u32, test_vfmlsl_high_u32, test_vfmlslq_high_u32):
Rename ...
(test_vfmlal_high_f16, test_vfmlalq_high_f16, test_vfmlsl_high_f16,
test_vfmlslq_high_f16): ... To this.
* gcc.target/aarch64/fp16_fmul_lane_high.h (test_vfmlal_lane_high_u32,
tets_vfmlsl_lane_high_u32, test_vfmlal_laneq_high_u32,
test_vfmlsl_laneq_high_u32, test_vfmlalq_lane_high_u32,
test_vfmlslq_lane_high_u32, test_vfmlalq_laneq_high_u32,
test_vfmlslq_laneq_high_u32): Rename ...
(test_vfmlal_lane_high_f16, tets_vfmlsl_lane_high_f16,
test_vfmlal_laneq_high_f16, test_vfmlsl_laneq_high_f16,
test_vfmlalq_lane_high_f16, test_vfmlslq_lane_high_f16,
test_vfmlalq_laneq_high_f16, test_vfmlslq_laneq_high_f16): ... To this.
* gcc.target/aarch64/fp16_fmul_lane_low.h (test_vfmlal_lane_low_u32,
test_vfmlsl_lane_low_u32, test_vfmlal_laneq_low_u32,
test_vfmlsl_laneq_low_u32, test_vfmlalq_lane_low_u32,
test_vfmlslq_lane_low_u32, test_vfmlalq_laneq_low_u32,
test_vfmlslq_laneq_low_u32): Rename ...
(test_vfmlal_lane_low_f16, test_vfmlsl_lane_low_f16,
test_vfmlal_laneq_low_f16, test_vfmlsl_laneq_low_f16,
test_vfmlalq_lane_low_f16, test_vfmlslq_lane_low_f16,
test_vfmlalq_laneq_low_f16, test_vfmlslq_laneq_low_f16): ... To this.
* gcc.target/aarch64/fp16_fmul_low.h (test_vfmlal_low_u32,
test_vfmlalq_low_u32, test_vfmlsl_low_u32, test_vfmlslq_low_u32):
Rename ...
(test_vfmlal_low_f16, test_vfmlalq_low_f16, test_vfmlsl_low_f16,
test_vfmlslq_low_f16): ... To This.
* lib/target-supports.exp
(check_effective_target_arm_fp16fml_neon_ok_nocache): Update test.

(cherry picked from commit 9d04c986b6faed878dbcc86d2f9392a721a3936e)


fmla-rename.patch
Description: fmla-rename.patch


[PATCH][GCC 8] Add missing AArch64 NEON instrinctics for Armv8.2-a to Armv8.4-a

2020-09-24 Thread Kyrylo Tkachov
Hi all,

I'd like to backport this patch to the GCC 8 branch that implements intrinsics 
that were (erroneously) missed out
of the initial implementation in GCC 8.

Bootstrapped and tested on aarch64-none-linux-gnu on the branch.

This patch adds the missing neon intrinsics for all 128 bit vector Integer 
modes for the
three-way XOR and negate and xor instructions for Arm8.2-a to Armv8.4-a.

gcc/
2018-05-21  Tamar Christina  

PR target/71233
* config/aarch64/aarch64-simd.md (aarch64_eor3qv8hi): Change to
eor3q4.
(aarch64_bcaxqv8hi): Change to bcaxq4.
* config/aarch64/aarch64-simd-builtins.def (veor3q_u8, veor3q_u32,
veor3q_u64, veor3q_s8, veor3q_s16, veor3q_s32, veor3q_s64, vbcaxq_u8,
vbcaxq_u32, vbcaxq_u64, vbcaxq_s8, vbcaxq_s16, vbcaxq_s32,
vbcaxq_s64): New.
* config/aarch64/arm_neon.h: Likewise.
* config/aarch64/iterators.md (VQ_I): New.

gcc/testsuite/
2018-05-21  Tamar Christina  

PR target/71233
* gcc.target/aarch64/sha3.h (veor3q_u8, veor3q_u32,
veor3q_u64, veor3q_s8, veor3q_s16, veor3q_s32, veor3q_s64, vbcaxq_u8,
vbcaxq_u32, vbcaxq_u64, vbcaxq_s8, vbcaxq_s16, vbcaxq_s32,
vbcaxq_s64): New.
* gcc.target/aarch64/sha3_1.c: Likewise.
* gcc.target/aarch64/sha3_2.c: Likewise.
* gcc.target/aarch64/sha3_3.c: Likewise.

(cherry picked from commit d21052ebd7ac9d545a26dde3229c57f872c1d5f3)


bcax.patch
Description: bcax.patch


Re: Add access through parameter derference tracking to modref

2020-09-24 Thread Richard Biener via Gcc-patches
On Thu, Sep 24, 2020 at 11:06 AM Jan Hubicka  wrote:
>
> Hi,
> this patch re-adds tracking of accesses which was unfinished in David's patch.
> At the moment I only implemented tracking of the fact that access is based on
> derefernece of the parameter (so we track THIS pointers).
> Patch does not implement IPA propagation since it needs bit more work which
> I will post shortly: ipa-fnsummary needs to track when parameter points to
> local memory, summaries needs to be merged when function is inlined (because
> jump functions are) and propagation needs to be turned into iterative dataflow
> on SCC components.
>
> Patch also adds documentation of -fipa-modref and params that was left 
> uncommited
> in my branch :(.
>
> Even without this change it does lead to nice increase of disambiguations
> for cc1plus build.
>
> Alias oracle query stats:
>   refs_may_alias_p: 62758323 disambiguations, 72935683 queries
>   ref_maybe_used_by_call_p: 139511 disambiguations, 63654045 queries
>   call_may_clobber_ref_p: 23502 disambiguations, 29242 queries
>   nonoverlapping_component_refs_p: 0 disambiguations, 37654 queries
>   nonoverlapping_refs_since_match_p: 19417 disambiguations, 5 must 
> overlaps, 75721 queries
>   aliasing_component_refs_p: 54665 disambiguations, 752449 queries
>   TBAA oracle: 21917926 disambiguations 53054678 queries
>15763411 are in alias set 0
>10162238 queries asked about the same object
>124 queries asked about the same alias set
>0 access volatile
>3681593 are dependent in the DAG
>1529386 are aritificially in conflict with void *
>
> Modref stats:
>   modref use: 8311 disambiguations, 32527 queries
>   modref clobber: 742126 disambiguations, 1036986 queries
>   1987054 tbaa queries (1.916182 per modref query)
>   125479 base compares (0.121004 per modref query)
>
> PTA query stats:
>   pt_solution_includes: 968314 disambiguations, 13609584 queries
>   pt_solutions_intersect: 1019136 disambiguations, 13147139 queries
>
> So compared to
> https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554605.html
> we get 41% more use disambiguations (with similar number of queries) and 8% 
> more
> clobber disambiguations.
>
> For tramp3d:
> Alias oracle query stats:
>   refs_may_alias_p: 2052256 disambiguations, 2312703 queries
>   ref_maybe_used_by_call_p: 7122 disambiguations, 2089118 queries
>   call_may_clobber_ref_p: 234 disambiguations, 234 queries
>   nonoverlapping_component_refs_p: 0 disambiguations, 4299 queries
>   nonoverlapping_refs_since_match_p: 329 disambiguations, 10200 must 
> overlaps, 10616 queries
>   aliasing_component_refs_p: 857 disambiguations, 34555 queries
>   TBAA oracle: 885546 disambiguations 1677080 queries
>132105 are in alias set 0
>469030 queries asked about the same object
>0 queries asked about the same alias set
>0 access volatile
>190084 are dependent in the DAG
>315 are aritificially in conflict with void *
>
> Modref stats:
>   modref use: 426 disambiguations, 1881 queries
>   modref clobber: 10042 disambiguations, 16202 queries
>   19405 tbaa queries (1.197692 per modref query)
>   2775 base compares (0.171275 per modref query)
>
> PTA query stats:
>   pt_solution_includes: 313908 disambiguations, 526183 queries
>   pt_solutions_intersect: 130510 disambiguations, 416084 queries
>
> Here uses decrease by 4 disambiguations and clobber improve by 3.5%.  I think
> the difference is caused by fact that gcc has much more alias set 0 accesses
> originating from gimple and tree unions as I mentioned in original mail.
>
> After pushing out the IPA propagation I will re-add code to track offsets and
> sizes that further improve disambiguation. On tramp3d it enables a lot of DSE
> for structure fields not acessed by uninlined function.
>
> Bootstrapped/regtested x86_64-linux also lto-bootstrapped without chekcing (to
> get the stats). OK?
>
> Richi, all aliasing related changes are in base_may_alias_with_dereference_p.
>
> Honza
>
> 2020-09-24  Jan Hubicka  
>
> * doc/invoke.texi: Document -fipa-modref, ipa-modref-max-bases,
> ipa-modref-max-refs, ipa-modref-max-accesses, ipa-modref-max-tests.
> * ipa-modref-tree.c (test_insert_search_collapse): Update.
> (test_merge): Update.
> (gt_ggc_mx): New function.
> * ipa-modref-tree.h (struct modref_access_node): New structure.
> (struct modref_ref_node): Add every_access and accesses array.
> (modref_ref_node::modref_ref_node): Update ctor.
> (modref_ref_node::search): New member function.
> (modref_ref_node::collapse): New member function.
> (modref_ref_node::insert_access): New member function.
> (modref_base_node::insert_ref): Do not collapse base if ref is 0.
> (modref_base_node::collapse): Copllapse also refs.
> (mod

[committed][testsuite, nvptx] Fix string matching in gcc.dg/pr87314-1.c

2020-09-24 Thread Tom de Vries
Hi,

with nvptx we run into:
...
FAIL: gcc.dg/pr87314-1.c scan-assembler hellooo
...

The required string is part of the assembly, just in a different format than
expected:
...
.const .align 1 .u8 $LC0[12] =
  { 104, 101, 108, 108, 111, 111, 111, 111, 98, 121, 101, 0 };
...

Fix this by adding an nvptx-specific scan-assembler directive.

Tested on nvptx and x86_64.

Committed to trunk.

Thanks,
- Tom

[testsuite, nvptx] Fix string matching in gcc.dg/pr87314-1.c

gcc/testsuite/ChangeLog:

2020-09-24  Tom de Vries  

* gcc.dg/pr87314-1.c: Add nvptx-specific scan-assembler directive.

---
 gcc/testsuite/gcc.dg/pr87314-1.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/pr87314-1.c b/gcc/testsuite/gcc.dg/pr87314-1.c
index 9bc905612b5..0cb9c07e32c 100644
--- a/gcc/testsuite/gcc.dg/pr87314-1.c
+++ b/gcc/testsuite/gcc.dg/pr87314-1.c
@@ -8,4 +8,6 @@ int h() { return "bye"=="hellbye"+8; }
 /* { dg-final { scan-tree-dump-times "hello" 1 "original" } } */
 /* The test in h() should be retained because the result depends on
string merging.  */
-/* { dg-final { scan-assembler "hellooo" } } */
+/* { dg-final { scan-assembler "hellooo" { target { ! nvptx*-*-* } } } } */
+/* { dg-final { scan-assembler "104, 101, 108, 108, 111, 111, 111" { target { 
nvptx*-*-* } } } } */
+


Re: Add access through parameter derference tracking to modref

2020-09-24 Thread Jan Hubicka
> > +  else if (TREE_CODE (op) == SSA_NAME
> > +  && POINTER_TYPE_P (TREE_TYPE (op)))
> > +{
> > +  if (DECL_P (base) && !ptr_deref_may_alias_decl_p (op, base))
> > +   return false;
> > +  if (TREE_CODE (base) == SSA_NAME
> > + && !ptr_derefs_may_alias_p (op, base))
> > +   return false;
> > +}
> 
> this all looks redundant - why is it important to look at the
> base of ref, why not simply ask below (*)
> 
> > + modref_access_node *access_node;
> > + FOR_EACH_VEC_SAFE_ELT (ref_node->accesses, k, access_node)
> > +   {
> > + if (num_tests >= max_tests)
> > +   return true;
> > +
> > + if (access_node->parm_index == -1
> > + || (unsigned)access_node->parm_index
> > +>= gimple_call_num_args (stmt))
> > +   return true;
> > +
> > + tree op = gimple_call_arg (stmt, access_node->parm_index);
> > +
> > + alias_stats.modref_baseptr_tests++;
> > +
> > + /* Lookup base, if this is the first time we compare bases.  
> > */
> > + if (!base)
> 
> Meh, so this function is a bit confusing with base_node, ref_node,
> access_node and now 'base' and 'op'.  The loop now got a
> new nest as well.
> 
> I'm looking for a high-level description of the modref_tree <>
> but cannot find any which makes reviewing this quite difficult...

There is a description in ipa-modref.c though it may need a bit of expanding.
Basically the modref summary represents a decision tree for
tree-ssa-alias that has three levels
  1) base which records base alias set,
  2) ref which records ref alias set and
  3) access wich presently records info whether the access is a
  dreference of pointer passed by parameter. In future I will re-add
  info about offset/size and base type. It would be posisble to record
  the access path though I am not sure if that it is worth the effort
> 
> > +   {
> > + base = ref->ref;
> > + while (handled_component_p (base))
> > +   base = TREE_OPERAND (base, 0);
> 
> ao_ref_base (ref)?  OK, that might strip an inner
> MEM_REF, yielding in a decl, but ...
> 
> > + if (TREE_CODE (base) == MEM_REF
> > + || TREE_CODE (base) == TARGET_MEM_REF)
> > +   base = TREE_OPERAND (base, 0);
> 
> that might happen here, too.  But in the MEM_REF case base
> is a pointer.
> 
> > +   }
> > +
> > + if (base_may_alias_with_dereference_p (base, op))
> 
> So this is a query purely at the caller side - whether 'ref' may
> alias 'op'.
> 
> ---
> 
> (*) ptr_deref_may_alias_ref_p_1 (op, ref)
> 
> without any of the magic?

Hmm, it may actually just work, I did not know that looks through
memrefs, let me re-test the patch.
> 
> Can you please amend ipa-modref-tree.h/c with a toplevel comment
> layint out the data structure and what is recorded?

I will do (but need to think bit of a redundancy between comment in
ipa-modref and ipa-modref-tree)

Honza
> 
> Thanks,
> Richard.
> 
> > +   return true;
> > + num_tests++;
> > +   }
> > }
> >  }
> >return false;
> > @@ -2510,7 +2584,7 @@ ref_maybe_used_by_call_p_1 (gcall *call, ao_ref *ref, 
> > bool tbaa_p)
> >   modref_summary *summary = get_modref_function_summary (node);
> >   if (summary)
> > {
> > - if (!modref_may_conflict (summary->loads, ref, tbaa_p))
> > + if (!modref_may_conflict (call, summary->loads, ref, tbaa_p))
> > {
> >   alias_stats.modref_use_no_alias++;
> >   if (dump_file && (dump_flags & TDF_DETAILS))
> > @@ -2934,7 +3008,7 @@ call_may_clobber_ref_p_1 (gcall *call, ao_ref *ref, 
> > bool tbaa_p)
> >   modref_summary *summary = get_modref_function_summary (node);
> >   if (summary)
> > {
> > - if (!modref_may_conflict (summary->stores, ref, tbaa_p))
> > + if (!modref_may_conflict (call, summary->stores, ref, tbaa_p))
> > {
> >   alias_stats.modref_clobber_no_alias++;
> >   if (dump_file && (dump_flags & TDF_DETAILS))


Re: Add access through parameter derference tracking to modref

2020-09-24 Thread Richard Biener via Gcc-patches
On Thu, Sep 24, 2020 at 12:54 PM Jan Hubicka  wrote:
>
> > > +  else if (TREE_CODE (op) == SSA_NAME
> > > +  && POINTER_TYPE_P (TREE_TYPE (op)))
> > > +{
> > > +  if (DECL_P (base) && !ptr_deref_may_alias_decl_p (op, base))
> > > +   return false;
> > > +  if (TREE_CODE (base) == SSA_NAME
> > > + && !ptr_derefs_may_alias_p (op, base))
> > > +   return false;
> > > +}
> >
> > this all looks redundant - why is it important to look at the
> > base of ref, why not simply ask below (*)
> >
> > > + modref_access_node *access_node;
> > > + FOR_EACH_VEC_SAFE_ELT (ref_node->accesses, k, access_node)
> > > +   {
> > > + if (num_tests >= max_tests)
> > > +   return true;
> > > +
> > > + if (access_node->parm_index == -1
> > > + || (unsigned)access_node->parm_index
> > > +>= gimple_call_num_args (stmt))
> > > +   return true;
> > > +
> > > + tree op = gimple_call_arg (stmt, access_node->parm_index);
> > > +
> > > + alias_stats.modref_baseptr_tests++;
> > > +
> > > + /* Lookup base, if this is the first time we compare bases. 
> > >  */
> > > + if (!base)
> >
> > Meh, so this function is a bit confusing with base_node, ref_node,
> > access_node and now 'base' and 'op'.  The loop now got a
> > new nest as well.
> >
> > I'm looking for a high-level description of the modref_tree <>
> > but cannot find any which makes reviewing this quite difficult...
>
> There is a description in ipa-modref.c though it may need a bit of expanding.
> Basically the modref summary represents a decision tree for
> tree-ssa-alias that has three levels
>   1) base which records base alias set,
>   2) ref which records ref alias set and
>   3) access wich presently records info whether the access is a
>   dreference of pointer passed by parameter. In future I will re-add
>   info about offset/size and base type. It would be posisble to record
>   the access path though I am not sure if that it is worth the effort
> >
> > > +   {
> > > + base = ref->ref;
> > > + while (handled_component_p (base))
> > > +   base = TREE_OPERAND (base, 0);
> >
> > ao_ref_base (ref)?  OK, that might strip an inner
> > MEM_REF, yielding in a decl, but ...
> >
> > > + if (TREE_CODE (base) == MEM_REF
> > > + || TREE_CODE (base) == TARGET_MEM_REF)
> > > +   base = TREE_OPERAND (base, 0);
> >
> > that might happen here, too.  But in the MEM_REF case base
> > is a pointer.
> >
> > > +   }
> > > +
> > > + if (base_may_alias_with_dereference_p (base, op))
> >
> > So this is a query purely at the caller side - whether 'ref' may
> > alias 'op'.
> >
> > ---
> >
> > (*) ptr_deref_may_alias_ref_p_1 (op, ref)
> >
> > without any of the magic?
>
> Hmm, it may actually just work, I did not know that looks through
> memrefs, let me re-test the patch.
> >
> > Can you please amend ipa-modref-tree.h/c with a toplevel comment
> > layint out the data structure and what is recorded?
>
> I will do (but need to think bit of a redundancy between comment in
> ipa-modref and ipa-modref-tree)

One place is enough - just add a pointer to the other place.

Richard.

> Honza
> >
> > Thanks,
> > Richard.
> >
> > > +   return true;
> > > + num_tests++;
> > > +   }
> > > }
> > >  }
> > >return false;
> > > @@ -2510,7 +2584,7 @@ ref_maybe_used_by_call_p_1 (gcall *call, ao_ref 
> > > *ref, bool tbaa_p)
> > >   modref_summary *summary = get_modref_function_summary (node);
> > >   if (summary)
> > > {
> > > - if (!modref_may_conflict (summary->loads, ref, tbaa_p))
> > > + if (!modref_may_conflict (call, summary->loads, ref, 
> > > tbaa_p))
> > > {
> > >   alias_stats.modref_use_no_alias++;
> > >   if (dump_file && (dump_flags & TDF_DETAILS))
> > > @@ -2934,7 +3008,7 @@ call_may_clobber_ref_p_1 (gcall *call, ao_ref *ref, 
> > > bool tbaa_p)
> > >   modref_summary *summary = get_modref_function_summary (node);
> > >   if (summary)
> > > {
> > > - if (!modref_may_conflict (summary->stores, ref, tbaa_p))
> > > + if (!modref_may_conflict (call, summary->stores, ref, 
> > > tbaa_p))
> > > {
> > >   alias_stats.modref_clobber_no_alias++;
> > >   if (dump_file && (dump_flags & TDF_DETAILS))


Re: Add access through parameter derference tracking to modref

2020-09-24 Thread Jan Hubicka
> >
> > I will do (but need to think bit of a redundancy between comment in
> > ipa-modref and ipa-modref-tree)
> 
> One place is enough - just add a pointer to the other place.
Here is updated patch I am testing.  I adds documentation into
ipa-modref-tree.h that is perhaps more natural place and links it from
ipa-modref.c documentation.

Also note that loads/stores are distinguished since for every function
we have both a decision tree for loads and a different decision tree for
stores.

I do not plan to add more levels to the tree (at least for time being).
I think that forming groups by alias sets is quite effective becuase
TBAA oracle lookup is cheap and has good chance to disambiguate.  For
the remaining info tracked I plan simple flat (and capped by small
constant) list of accesses.

It indeed seems that ptr_deref_may_alias_ref_p_1 is precisely what I
need which simplifies the patch.  I did not know about it and simply
followed for the main oracle does.

Once the base/offset tracking is added I will need to figure out when
the base pointers are same (or with a known offset) which is not readily
available from ptr_deref_may_alias_ref_p_1, but we can do that step by
step.

Thanks a lot.
I am re-testing the patch attached. OK if testing passes?

2020-09-24  Jan Hubicka  

* doc/invoke.texi: Document -fipa-modref, ipa-modref-max-bases,
ipa-modref-max-refs, ipa-modref-max-accesses, ipa-modref-max-tests.
* ipa-modref-tree.c (test_insert_search_collapse): Update.
(test_merge): Update.
(gt_ggc_mx): New function.
* ipa-modref-tree.h (struct modref_access_node): New structure.
(struct modref_ref_node): Add every_access and accesses array.
(modref_ref_node::modref_ref_node): Update ctor.
(modref_ref_node::search): New member function.
(modref_ref_node::collapse): New member function.
(modref_ref_node::insert_access): New member function.
(modref_base_node::insert_ref): Do not collapse base if ref is 0.
(modref_base_node::collapse): Copllapse also refs.
(modref_tree): Add accesses.
(modref_tree::modref_tree): Initialize max_accesses.
(modref_tree::insert): Add access parameter.
(modref_tree::cleanup): New member function.
(modref_tree::merge): Add parm_map; merge accesses.
(modref_tree::copy_from): New member function.
(modref_tree::create_ggc): Add max_accesses.
* ipa-modref.c (dump_access): New function.
(dump_records): Dump accesses.
(dump_lto_records): Dump accesses.
(get_access): New function.
(record_access): Record access.
(record_access_lto): Record access.
(analyze_call): Compute parm_map.
(analyze_function): Update construction of modref records.
(modref_summaries::duplicate): Likewise; use copy_from.
(write_modref_records): Stream accesses.
(read_modref_records): Sream accesses.
(pass_ipa_modref::execute): Update call of merge.
* params.opt (-param=modref-max-accesses): New.
* tree-ssa-alias.c (alias_stats): Add modref_baseptr_tests.
(dump_alias_stats): Update.
(modref_may_conflict): Check accesses.
(ref_maybe_used_by_call_p_1): Update call to modref_may_conflict.
(call_may_clobber_ref_p_1): Update call to modref_may_conflict.

gcc/testsuite/ChangeLog:

2020-09-24  Jan Hubicka  

* gcc.dg/tree-ssa/modref-1.c: New test.


diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 75203ba2420..623dfb8ac28 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -486,7 +486,7 @@ Objective-C and Objective-C++ Dialects}.
 -fgcse-sm  -fhoist-adjacent-loads  -fif-conversion @gol
 -fif-conversion2  -findirect-inlining @gol
 -finline-functions  -finline-functions-called-once  -finline-limit=@var{n} @gol
--finline-small-functions  -fipa-cp  -fipa-cp-clone @gol
+-finline-small-functions -fipa-modref -fipa-cp  -fipa-cp-clone @gol
 -fipa-bit-cp  -fipa-vrp  -fipa-pta  -fipa-profile  -fipa-pure-const @gol
 -fipa-reference  -fipa-reference-addressable @gol
 -fipa-stack-alignment  -fipa-icf  -fira-algorithm=@var{algorithm} @gol
@@ -9688,6 +9688,7 @@ compilation time.
 -fif-conversion @gol
 -fif-conversion2 @gol
 -finline-functions-called-once @gol
+-fipa-modref @gol
 -fipa-profile @gol
 -fipa-pure-const @gol
 -fipa-reference @gol
@@ -10783,11 +10784,18 @@ default at any optimization level.
 @opindex fipa-profile
 Perform interprocedural profile propagation.  The functions called only from
 cold functions are marked as cold. Also functions executed once (such as
-@code{cold}, @code{noreturn}, static constructors or destructors) are 
identified. Cold
-functions and loop less parts of functions executed once are then optimized for
-size.
+@code{cold}, @code{noreturn}, static constructors or destructors) are
+identified. Cold functions and loop less parts of functions executed once are
+then optimized for siz

[PATCH] target/97192 - new testcase for fixed PR

2020-09-24 Thread Richard Biener
This adds another testcase for the PR97085 fix.

Pused.

2020-09-24  Richard Biener  

PR tree-optimization/97085
* gcc.dg/pr97192.c: New testcase.
---
 gcc/testsuite/gcc.dg/pr97192.c | 16 
 1 file changed, 16 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/pr97192.c

diff --git a/gcc/testsuite/gcc.dg/pr97192.c b/gcc/testsuite/gcc.dg/pr97192.c
new file mode 100644
index 000..16647ca67a3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr97192.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O -ftracer" } */
+/* { dg-additional-options "-mavx512vl" { target x86_64-*-* i?86-*-* } } */
+
+typedef int __attribute__ ((__vector_size__ (32))) V;
+
+int a, b;
+V v;
+
+int
+foo (void)
+{
+  b -= 4 - !a;
+  V u = 0 != v == a;
+  return u[0];
+}
-- 
2.26.2


Re: Add access through parameter derference tracking to modref

2020-09-24 Thread Richard Biener via Gcc-patches
On Thu, Sep 24, 2020 at 1:26 PM Jan Hubicka  wrote:
>
> > >
> > > I will do (but need to think bit of a redundancy between comment in
> > > ipa-modref and ipa-modref-tree)
> >
> > One place is enough - just add a pointer to the other place.
> Here is updated patch I am testing.  I adds documentation into
> ipa-modref-tree.h that is perhaps more natural place and links it from
> ipa-modref.c documentation.
>
> Also note that loads/stores are distinguished since for every function
> we have both a decision tree for loads and a different decision tree for
> stores.
>
> I do not plan to add more levels to the tree (at least for time being).
> I think that forming groups by alias sets is quite effective becuase
> TBAA oracle lookup is cheap and has good chance to disambiguate.  For
> the remaining info tracked I plan simple flat (and capped by small
> constant) list of accesses.
>
> It indeed seems that ptr_deref_may_alias_ref_p_1 is precisely what I
> need which simplifies the patch.  I did not know about it and simply
> followed for the main oracle does.
>
> Once the base/offset tracking is added I will need to figure out when
> the base pointers are same (or with a known offset) which is not readily
> available from ptr_deref_may_alias_ref_p_1, but we can do that step by
> step.
>
> Thanks a lot.
> I am re-testing the patch attached. OK if testing passes?
OK.

Richard.

> 2020-09-24  Jan Hubicka  
>
> * doc/invoke.texi: Document -fipa-modref, ipa-modref-max-bases,
> ipa-modref-max-refs, ipa-modref-max-accesses, ipa-modref-max-tests.
> * ipa-modref-tree.c (test_insert_search_collapse): Update.
> (test_merge): Update.
> (gt_ggc_mx): New function.
> * ipa-modref-tree.h (struct modref_access_node): New structure.
> (struct modref_ref_node): Add every_access and accesses array.
> (modref_ref_node::modref_ref_node): Update ctor.
> (modref_ref_node::search): New member function.
> (modref_ref_node::collapse): New member function.
> (modref_ref_node::insert_access): New member function.
> (modref_base_node::insert_ref): Do not collapse base if ref is 0.
> (modref_base_node::collapse): Copllapse also refs.
> (modref_tree): Add accesses.
> (modref_tree::modref_tree): Initialize max_accesses.
> (modref_tree::insert): Add access parameter.
> (modref_tree::cleanup): New member function.
> (modref_tree::merge): Add parm_map; merge accesses.
> (modref_tree::copy_from): New member function.
> (modref_tree::create_ggc): Add max_accesses.
> * ipa-modref.c (dump_access): New function.
> (dump_records): Dump accesses.
> (dump_lto_records): Dump accesses.
> (get_access): New function.
> (record_access): Record access.
> (record_access_lto): Record access.
> (analyze_call): Compute parm_map.
> (analyze_function): Update construction of modref records.
> (modref_summaries::duplicate): Likewise; use copy_from.
> (write_modref_records): Stream accesses.
> (read_modref_records): Sream accesses.
> (pass_ipa_modref::execute): Update call of merge.
> * params.opt (-param=modref-max-accesses): New.
> * tree-ssa-alias.c (alias_stats): Add modref_baseptr_tests.
> (dump_alias_stats): Update.
> (modref_may_conflict): Check accesses.
> (ref_maybe_used_by_call_p_1): Update call to modref_may_conflict.
> (call_may_clobber_ref_p_1): Update call to modref_may_conflict.
>
> gcc/testsuite/ChangeLog:
>
> 2020-09-24  Jan Hubicka  
>
> * gcc.dg/tree-ssa/modref-1.c: New test.
>
>
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 75203ba2420..623dfb8ac28 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -486,7 +486,7 @@ Objective-C and Objective-C++ Dialects}.
>  -fgcse-sm  -fhoist-adjacent-loads  -fif-conversion @gol
>  -fif-conversion2  -findirect-inlining @gol
>  -finline-functions  -finline-functions-called-once  -finline-limit=@var{n} 
> @gol
> --finline-small-functions  -fipa-cp  -fipa-cp-clone @gol
> +-finline-small-functions -fipa-modref -fipa-cp  -fipa-cp-clone @gol
>  -fipa-bit-cp  -fipa-vrp  -fipa-pta  -fipa-profile  -fipa-pure-const @gol
>  -fipa-reference  -fipa-reference-addressable @gol
>  -fipa-stack-alignment  -fipa-icf  -fira-algorithm=@var{algorithm} @gol
> @@ -9688,6 +9688,7 @@ compilation time.
>  -fif-conversion @gol
>  -fif-conversion2 @gol
>  -finline-functions-called-once @gol
> +-fipa-modref @gol
>  -fipa-profile @gol
>  -fipa-pure-const @gol
>  -fipa-reference @gol
> @@ -10783,11 +10784,18 @@ default at any optimization level.
>  @opindex fipa-profile
>  Perform interprocedural profile propagation.  The functions called only from
>  cold functions are marked as cold. Also functions executed once (such as
> -@code{cold}, @code{noreturn}, static constructors or destructors) are 
> identifie

[committed][testsuite] Scan final instead of asm in independent-cloneids-1.c

2020-09-24 Thread Tom de Vries
Hi,

When running test-case gcc.dg/independent-cloneids-1.c for nvptx, we get:
...
FAIL: scan-assembler-times (?n)^_*bar[.$_]constprop[.$_]0: 1
FAIL: scan-assembler-times (?n)^_*bar[.$_]constprop[.$_]1: 1
FAIL: scan-assembler-times (?n)^_*bar[.$_]constprop[.$_]2: 1
FAIL: scan-assembler-times (?n)^_*foo[.$_]constprop[.$_]0: 1
FAIL: scan-assembler-times (?n)^_*foo[.$_]constprop[.$_]1: 1
FAIL: scan-assembler-times (?n)^_*foo[.$_]constprop[.$_]2: 1
...

The test expects to find something like:
...
bar.constprop.0:
...
but instead on nvptx we have:
...
.func (.param.u32 %value_out) bar$constprop$0
...

Fix this by rewriting the scans to use the final dump instead.

Tested on x86_64.

Committed to trunk.

Thanks,
- Tom

[testsuite] Scan final instead of asm in independent-cloneids-1.c

gcc/testsuite/ChangeLog:

2020-09-24  Tom de Vries  

* gcc.dg/independent-cloneids-1.c: Use scan-rtl-dump instead of
scan-assembler.

---
 gcc/testsuite/gcc.dg/independent-cloneids-1.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/independent-cloneids-1.c 
b/gcc/testsuite/gcc.dg/independent-cloneids-1.c
index 516211a6e86..efbc1c51da0 100644
--- a/gcc/testsuite/gcc.dg/independent-cloneids-1.c
+++ b/gcc/testsuite/gcc.dg/independent-cloneids-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O3 -fipa-cp -fipa-cp-clone"  } */
+/* { dg-options "-O3 -fipa-cp -fipa-cp-clone -fdump-rtl-final"  } */
 /* { dg-skip-if "Odd label definition syntax" { mmix-*-* } } */
 
 extern int printf (const char *, ...);
@@ -29,11 +29,11 @@ baz (int arg)
   return foo (8);
 }
 
-/* { dg-final { scan-assembler-times {(?n)^_*bar[.$_]constprop[.$_]0:} 1 } } */
-/* { dg-final { scan-assembler-times {(?n)^_*bar[.$_]constprop[.$_]1:} 1 } } */
-/* { dg-final { scan-assembler-times {(?n)^_*bar[.$_]constprop[.$_]2:} 1 } } */
-/* { dg-final { scan-assembler-times {(?n)^_*foo[.$_]constprop[.$_]0:} 1 } } */
-/* { dg-final { scan-assembler-times {(?n)^_*foo[.$_]constprop[.$_]1:} 1 } } */
-/* { dg-final { scan-assembler-times {(?n)^_*foo[.$_]constprop[.$_]2:} 1 } } */
-/* { dg-final { scan-assembler-not {(?n)^_*foo[.$_]constprop[.$_]3:} } } */
-/* { dg-final { scan-assembler-not {(?n)^_*foo[.$_]constprop[.$_]4:} } } */
+/* { dg-final { scan-rtl-dump-times {(?n)^;; Function bar.constprop 
\(bar[.$_]constprop[.$_]0,} 1 "final" } } */
+/* { dg-final { scan-rtl-dump-times {(?n)^;; Function bar.constprop 
\(bar[.$_]constprop[.$_]1,} 1 "final" } } */
+/* { dg-final { scan-rtl-dump-times {(?n)^;; Function bar.constprop 
\(bar[.$_]constprop[.$_]2,} 1 "final" } } */
+/* { dg-final { scan-rtl-dump-times {(?n)^;; Function foo.constprop 
\(foo[.$_]constprop[.$_]0,} 1 "final" } } */
+/* { dg-final { scan-rtl-dump-times {(?n)^;; Function foo.constprop 
\(foo[.$_]constprop[.$_]1,} 1 "final" } } */
+/* { dg-final { scan-rtl-dump-times {(?n)^;; Function foo.constprop 
\(foo[.$_]constprop[.$_]2,} 1 "final" } } */
+/* { dg-final { scan-rtl-dump-times {(?n)^;; Function foo.constprop 
\(foo[.$_]constprop[.$_]3,} 0 "final" } } */
+/* { dg-final { scan-rtl-dump-times {(?n)^;; Function foo.constprop 
\(foo[.$_]constprop[.$_]4,} 0 "final" } } */


Re: [PATCH][omp, ftracer] Don't duplicate blocks in SIMT region

2020-09-24 Thread Richard Biener
On Wed, 23 Sep 2020, Tom de Vries wrote:

> On 9/23/20 9:28 AM, Richard Biener wrote:
> > On Tue, 22 Sep 2020, Tom de Vries wrote:
> > 
> >> [ was: Re: [Patch] [middle-end & nvptx] gcc/tracer.c: Don't split BB
> >> with SIMT LANE [PR95654] ]
> >>
> >> On 9/16/20 8:20 PM, Alexander Monakov wrote:
> >>>
> >>>
> >>> On Wed, 16 Sep 2020, Tom de Vries wrote:
> >>>
>  [ cc-ing author omp support for nvptx. ]
> >>>
> >>> The issue looks familiar. I recognized it back in 2017 (and LLVM people
> >>> recognized it too for their GPU targets). In an attempt to get agreement
> >>> to fix the issue "properly" for GCC I found a similar issue that affects
> >>> all targets, not just offloading, and filed it as PR 80053.
> >>>
> >>> (yes, there are no addressable labels involved in offloading, but 
> >>> nevertheless
> >>> the nature of the middle-end issue is related)
> >>
> >> Hi Alexander,
> >>
> >> thanks for looking into this.
> >>
> >> Seeing that the attempt to fix things properly is stalled, for now I'm
> >> proposing a point-fix, similar to the original patch proposed by Tobias.
> >>
> >> Richi, Jakub, OK for trunk?
> > 
> > I notice that we call ignore_bb_p many times in tracer.c but one call
> > is conveniently early in tail_duplicate (void):
> > 
> >   int n = count_insns (bb);
> >   if (!ignore_bb_p (bb))
> > blocks[bb->index] = heap.insert (-bb->count.to_frequency (cfun), 
> > bb);
> > 
> > where count_insns already walks all stmts in the block.  It would be
> > nice to avoid repeatedly walking all stmts, maybe adjusting the above
> > call is enough and/or count_insns can compute this and/or the ignore_bb_p
> > result can be cached (optimize_bb_for_size_p might change though,
> > but maybe all other ignore_bb_p calls effectively just are that,
> > checks for blocks that became optimize_bb_for_size_p).
> > 
> 
> This untested follow-up patch tries something in that direction.
> 
> Is this what you meant?

Yeah, sort of.

+static bool
+cached_can_duplicate_bb_p (const_basic_block bb)
+{
+  if (can_duplicate_bb)

is there any path where can_duplicate_bb would be NULL?

+{
+  unsigned int size = SBITMAP_SIZE (can_duplicate_bb);
+  /* Assume added bb's should be ignored.  */
+  if ((unsigned int)bb->index < size
+ && bitmap_bit_p (can_duplicate_bb_computed, bb->index))
+   return !bitmap_bit_p (can_duplicate_bb, bb->index);

yes, newly added bbs should be ignored so,

 }
 
-  return false;
+  bool val = compute_can_duplicate_bb_p (bb);
+  if (can_duplicate_bb)
+cache_can_duplicate_bb_p (bb, val);

no need to compute & cache for them, just return true (because
we did duplicate them)?

Thanks,
Richard.


> Thanks,
> - Tom
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imend


Re: [PATCH PR96757] aarch64: ICE during GIMPLE pass: vect

2020-09-24 Thread Richard Sandiford
Hi,

"duanbo (C)"  writes:
> Sorry for the late reply.

My time to apologise for the late reply.

> Thanks for your suggestions. I have modified accordingly.
> Attached please find the v1 patch. 

Thanks, the logic to choose which precision we pick looks good.
But I think the build_mask_conversions should be deferred until
after we've decided to make the transform.  So…

> @@ -4340,16 +4342,91 @@ vect_recog_mask_conversion_pattern (vec_info *vinfo,
>  
>it is better for b1 and b2 to use the mask type associated
>with int elements rather bool (byte) elements.  */
> -   rhs1_type = integer_type_for_mask (TREE_OPERAND (rhs1, 0), vinfo);
> -   if (!rhs1_type)
> - rhs1_type = TREE_TYPE (TREE_OPERAND (rhs1, 0));
> +   rhs1_op0 = TREE_OPERAND (rhs1, 0);
> +   rhs1_op1 = TREE_OPERAND (rhs1, 1);
> +   if (!rhs1_op0 || !rhs1_op1)
> + return NULL;
> +   rhs1_op0_type = integer_type_for_mask (rhs1_op0, vinfo);
> +   rhs1_op1_type = integer_type_for_mask (rhs1_op1, vinfo);
> +
> +   if (!rhs1_op0_type && !rhs1_op1_type)
> + {
> +   rhs1_type = TREE_TYPE (rhs1_op0);
> +   vectype2 = get_mask_type_for_scalar_type (vinfo, rhs1_type);

…here we should just be able to set rhs1_type, and leave vectype2
to the code below.

> + }
> +   else if (!rhs1_op0_type && rhs1_op1_type)
> + {
> +   rhs1_type = TREE_TYPE (rhs1_op0);
> +   vectype2 = get_mask_type_for_scalar_type (vinfo, rhs1_type);
> +   if (!vectype2)
> + return NULL;
> +   rhs1_op1 = build_mask_conversion (vinfo, rhs1_op1,
> + vectype2, stmt_vinfo);
> + }
> +   else if (rhs1_op0_type && !rhs1_op1_type)
> + {
> +   rhs1_type = TREE_TYPE (rhs1_op1);
> +   vectype2 = get_mask_type_for_scalar_type (vinfo, rhs1_type);
> +   if (!vectype2)
> + return NULL;
> +   rhs1_op0 = build_mask_conversion (vinfo, rhs1_op0,
> + vectype2, stmt_vinfo);

Same for these two.

> + }
> +   else if (TYPE_PRECISION (rhs1_op0_type)
> +!= TYPE_PRECISION (rhs1_op1_type))
> + {
> +   int tmp1 = (int)TYPE_PRECISION (rhs1_op0_type)
> +  - (int)TYPE_PRECISION (TREE_TYPE (lhs));
> +   int tmp2 = (int)TYPE_PRECISION (rhs1_op1_type)
> +  - (int)TYPE_PRECISION (TREE_TYPE (lhs));
> +   if ((tmp1 > 0 && tmp2 > 0)||(tmp1 < 0 && tmp2 < 0))

Minor formatting nit, sorry, but: GCC style is to put a space after
(int) and on either side of ||.

Might be good to use the same numbering as the operands: tmp0 and tmp1
instead of tmp1 and tmp2.

> + {
> +   if (abs (tmp1) > abs (tmp2))
> + {
> +   vectype2 = get_mask_type_for_scalar_type (vinfo,
> + rhs1_op1_type);
> +   if (!vectype2)
> + return NULL;
> +   rhs1_op0 = build_mask_conversion (vinfo, rhs1_op0,
> + vectype2, stmt_vinfo);
> + }
> +   else
> + {
> +   vectype2 = get_mask_type_for_scalar_type (vinfo,
> + rhs1_op0_type);
> +   if (!vectype2)
> + return NULL;
> +   rhs1_op1 = build_mask_conversion (vinfo, rhs1_op1,
> + vectype2, stmt_vinfo);
> + }
> +   rhs1_type = integer_type_for_mask (rhs1_op0, vinfo);

Here I think we can just go with rhs1_type = rhs1_op1_type if
abs (tmp1) > abs (tmp2) (i.e. op1 is closer to the final type
than op0) and rhs1_op0_type otherwise.

> + }
> +   else
> + {
> +   rhs1_op0 = build_mask_conversion (vinfo, rhs1_op0,
> + vectype1, stmt_vinfo);
> +   rhs1_op1 = build_mask_conversion (vinfo, rhs1_op1,
> + vectype1, stmt_vinfo);
> +   rhs1_type = integer_type_for_mask (rhs1_op0, vinfo);
> +   if (!rhs1_type)
> + return NULL;
> +   vectype2 = get_mask_type_for_scalar_type (vinfo, rhs1_type);

and here I think rhs1_type can be:

  build_nonstandard_integer_type (TYPE_PRECISION (lhs_type), 1);

> + }
> + }
> +   else
> + {
> +   rhs1_type = integer_type_for_mask (rhs1_op0, vinfo);
> +   if (!rhs1_type)
> + return NULL;
> +   vectype2 = get_mask_type_for_scalar_type (vinfo, rhs1_type);

Here either rhs1_op0_type or rhs1_op1_type should be OK.

> + }
> +   tmp = build2 (TREE_CODE (rhs1), TREE_T

Re: [PATCH][omp, ftracer] Don't duplicate blocks in SIMT region

2020-09-24 Thread Tom de Vries
On 9/24/20 1:42 PM, Richard Biener wrote:
> On Wed, 23 Sep 2020, Tom de Vries wrote:
> 
>> On 9/23/20 9:28 AM, Richard Biener wrote:
>>> On Tue, 22 Sep 2020, Tom de Vries wrote:
>>>
 [ was: Re: [Patch] [middle-end & nvptx] gcc/tracer.c: Don't split BB
 with SIMT LANE [PR95654] ]

 On 9/16/20 8:20 PM, Alexander Monakov wrote:
>
>
> On Wed, 16 Sep 2020, Tom de Vries wrote:
>
>> [ cc-ing author omp support for nvptx. ]
>
> The issue looks familiar. I recognized it back in 2017 (and LLVM people
> recognized it too for their GPU targets). In an attempt to get agreement
> to fix the issue "properly" for GCC I found a similar issue that affects
> all targets, not just offloading, and filed it as PR 80053.
>
> (yes, there are no addressable labels involved in offloading, but 
> nevertheless
> the nature of the middle-end issue is related)

 Hi Alexander,

 thanks for looking into this.

 Seeing that the attempt to fix things properly is stalled, for now I'm
 proposing a point-fix, similar to the original patch proposed by Tobias.

 Richi, Jakub, OK for trunk?
>>>
>>> I notice that we call ignore_bb_p many times in tracer.c but one call
>>> is conveniently early in tail_duplicate (void):
>>>
>>>   int n = count_insns (bb);
>>>   if (!ignore_bb_p (bb))
>>> blocks[bb->index] = heap.insert (-bb->count.to_frequency (cfun), 
>>> bb);
>>>
>>> where count_insns already walks all stmts in the block.  It would be
>>> nice to avoid repeatedly walking all stmts, maybe adjusting the above
>>> call is enough and/or count_insns can compute this and/or the ignore_bb_p
>>> result can be cached (optimize_bb_for_size_p might change though,
>>> but maybe all other ignore_bb_p calls effectively just are that,
>>> checks for blocks that became optimize_bb_for_size_p).
>>>
>>
>> This untested follow-up patch tries something in that direction.
>>
>> Is this what you meant?
> 
> Yeah, sort of.
> 
> +static bool
> +cached_can_duplicate_bb_p (const_basic_block bb)
> +{
> +  if (can_duplicate_bb)
> 
> is there any path where can_duplicate_bb would be NULL?
> 

Yes, ignore_bb_p is called from gimple-ssa-split-paths.c.

> +{
> +  unsigned int size = SBITMAP_SIZE (can_duplicate_bb);
> +  /* Assume added bb's should be ignored.  */
> +  if ((unsigned int)bb->index < size
> + && bitmap_bit_p (can_duplicate_bb_computed, bb->index))
> +   return !bitmap_bit_p (can_duplicate_bb, bb->index);
> 
> yes, newly added bbs should be ignored so,
> 
>  }
>  
> -  return false;
> +  bool val = compute_can_duplicate_bb_p (bb);
> +  if (can_duplicate_bb)
> +cache_can_duplicate_bb_p (bb, val);
> 
> no need to compute & cache for them, just return true (because
> we did duplicate them)?
> 

Also the case for gimple-ssa-split-paths.c.?

Thanks,
- Tom


[committed][testsuite, nvptx] Fix gcc.dg/tls/thr-cse-1.c

2020-09-24 Thread Tom de Vries
Hi,

With nvptx, we run into:
...
FAIL: gcc.dg/tls/thr-cse-1.c scan-assembler-not \
  emutls_get_address.*emutls_get_address.*
...
because the nvptx assembly looks like:
...
  call (%value_in), __emutls_get_address, (%out_arg1);
  ...
// BEGIN GLOBAL FUNCTION DECL: __emutls_get_address
.extern .func (.param.u64 %value_out) __emutls_get_address (.param.u64 %in_ar0);
...

Fix this by checking the slim final dump instead, where we have just:
...
   12: r35:DI=call [`__emutls_get_address'] argc:0
...

Committed to trunk.

Thanks,
- Tom

[testsuite, nvptx] Fix gcc.dg/tls/thr-cse-1.c

gcc/testsuite/ChangeLog:

2020-09-24  Tom de Vries  

* gcc.dg/tls/thr-cse-1.c: Scan final dump instead of assembly for
nvptx.

---
 gcc/testsuite/gcc.dg/tls/thr-cse-1.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/tls/thr-cse-1.c 
b/gcc/testsuite/gcc.dg/tls/thr-cse-1.c
index 84eedfdb226..7145671eb95 100644
--- a/gcc/testsuite/gcc.dg/tls/thr-cse-1.c
+++ b/gcc/testsuite/gcc.dg/tls/thr-cse-1.c
@@ -4,6 +4,7 @@
registers and thus getting the counts wrong.  */
 /* { dg-additional-options "-mshort-calls" { target epiphany-*-* } } */
 /* { dg-require-effective-target tls_emulated } */
+/* { dg-additional-options "-fdump-rtl-final-slim" { target nvptx-*-* } }*/
 
 /* Test that we only get one call to emutls_get_address when CSE is
active.  Note that the var _must_ be initialized for the scan asm
@@ -18,10 +19,12 @@ int foo (int b, int c, int d)
   return a;
 }
 
-/* { dg-final { scan-assembler-not "emutls_get_address.*emutls_get_address.*" 
{ target { ! { "*-wrs-vxworks"  "*-*-darwin8"  "hppa*-*-hpux*" "i?86-*-mingw*" 
"x86_64-*-mingw*" visium-*-* } } } } } */
+/* { dg-final { scan-assembler-not "emutls_get_address.*emutls_get_address.*" 
{ target { ! { "*-wrs-vxworks"  "*-*-darwin8"  "hppa*-*-hpux*" "i?86-*-mingw*" 
"x86_64-*-mingw*" visium-*-* nvptx-*-* } } } } } */
 /* { dg-final { scan-assembler-not 
"call\tL___emutls_get_address.stub.*call\tL___emutls_get_address.stub.*" { 
target "*-*-darwin8" } } } */
 /* { dg-final { scan-assembler-not "(b,l|bl) __emutls_get_address.*(b,l|bl) 
__emutls_get_address.*" { target "hppa*-*-hpux*" } } } */
 /* { dg-final { scan-assembler-not "tls_lookup.*tls_lookup.*" { target 
*-wrs-vxworks } } } */
 /* { dg-final { scan-assembler-not 
"call\t___emutls_get_address.*call\t___emutls_get_address" { target 
"i?86-*-mingw*" } } } */
 /* { dg-final { scan-assembler-not 
"call\t__emutls_get_address.*call\t__emutls_get_address" { target 
"x86_64-*-mingw*" } } } */
 /* { dg-final { scan-assembler-not "%l __emutls_get_address.*%l 
__emutls_get_address" { target visium-*-* } } } */
+
+/* { dg-final { scan-rtl-dump-times "emutls_get_address" 1 "final" { target 
nvptx-*-* } } } */


Re: [PATCH v3 1/2] IFN: Implement IFN_VEC_SET for ARRAY_REF with VIEW_CONVERT_EXPR

2020-09-24 Thread Richard Sandiford
xionghu luo  writes:
> @@ -2658,6 +2659,43 @@ expand_vect_cond_mask_optab_fn (internal_fn, gcall 
> *stmt, convert_optab optab)
>  
>  #define expand_vec_cond_mask_optab_fn expand_vect_cond_mask_optab_fn
>  
> +/* Expand VEC_SET internal functions.  */
> +
> +static void
> +expand_vec_set_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
> +{
> +  tree lhs = gimple_call_lhs (stmt);
> +  tree op0 = gimple_call_arg (stmt, 0);
> +  tree op1 = gimple_call_arg (stmt, 1);
> +  tree op2 = gimple_call_arg (stmt, 2);
> +  rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
> +  rtx src = expand_expr (op0, NULL_RTX, VOIDmode, EXPAND_WRITE);

I'm not sure about the expand_expr here.  ISTM that op0 is a normal
input and so should be expanded by expand_normal rather than
EXPAND_WRITE.  Also:

> +
> +  machine_mode outermode = TYPE_MODE (TREE_TYPE (op0));
> +  scalar_mode innermode = GET_MODE_INNER (outermode);
> +
> +  rtx value = expand_expr (op1, NULL_RTX, VOIDmode, EXPAND_NORMAL);
> +  rtx pos = expand_expr (op2, NULL_RTX, VOIDmode, EXPAND_NORMAL);
> +
> +  class expand_operand ops[3];
> +  enum insn_code icode = optab_handler (optab, outermode);
> +
> +  if (icode != CODE_FOR_nothing)
> +{
> +  pos = convert_to_mode (E_SImode, pos, 0);
> +
> +  create_fixed_operand (&ops[0], src);

...this would mean that if SRC happens to be a MEM, the pattern
must also accept a MEM.

ISTM that we're making more work for ourselves by not “fixing” the optab
to have a natural pure-input + pure-output interface. :-)  But if we
stick with the current optab interface, I think we need to:

- create a temporary register
- move SRC into the temporary register before the insn
- use create_fixed_operand with the temporary register for operand 0
- move the temporary register into TARGET after the insn

> +  create_input_operand (&ops[1], value, innermode);
> +  create_input_operand (&ops[2], pos, GET_MODE (pos));

For this I think we should use convert_operand_from on the original “pos”,
so that the target gets to choose what the mode of the operand is.

Thanks,
Richard


Re: [PATCH] Add if-chain to switch conversion pass.

2020-09-24 Thread Richard Biener via Gcc-patches
On Wed, Sep 2, 2020 at 1:53 PM Martin Liška  wrote:
>
> On 9/1/20 4:50 PM, David Malcolm wrote:
> > Hope this is constructive
> > Dave
>
> Thank you David. All of them very very useful!
>
> There's updated version of the patch.

I noticed several functions without a function-level comment.

-  cluster (tree case_label_expr, basic_block case_bb, profile_probability prob,
-  profile_probability subtree_prob);
+  inline cluster (tree case_label_expr, basic_block case_bb,
+ profile_probability prob, profile_probability subtree_prob);

I thought we generally leave this to the compiler ...

+@item -fconvert-if-to-switch
+@opindex fconvert-if-to-switch
+Perform conversion of an if cascade into a switch statement.
+Do so if the switch can be later transformed using a jump table
+or a bit test.  The transformation can help to produce faster code for
+the switch statement.  This flag is enabled by default
+at @option{-O2} and higher.

this mentions we do this only when we later can convert the
switch again but both passes (we still have two :/) have
independent guards.

+  /* For now, just wipe the dominator information.  */
+  free_dominance_info (CDI_DOMINATORS);

could at least be conditional on the vop renaming condition...

+  if (!all_candidates.is_empty ())
+mark_virtual_operands_for_renaming (fun);

+  if (bitmap_bit_p (*visited_bbs, bb->index))
+   break;
+  bitmap_set_bit (*visited_bbs, bb->index);

since you are using a bitmap and not a sbitmap (why?)
you can combine those into

   if (!bitmap_set_bit (*visited_bbs, bb->index))
break;

+  /* Current we support following patterns (situations):
+
+1) if condition with equal operation:
+
...

did you see whether using

   register_edge_assert_for (lhs, true_edge, code, lhs, rhs, asserts);

works equally well?  It fills the 'asserts' vector with relations
derived from 'lhs'.  There's also
vr_values::extract_range_for_var_from_comparison_expr
to compute the case_range

+  /* If it's not the first condition, then we need a BB without
+any statements.  */
+  if (!first)
+   {
+ unsigned stmt_count = 0;
+ for (gimple_stmt_iterator gsi = gsi_start_nondebug_bb (bb);
+  !gsi_end_p (gsi); gsi_next_nondebug (&gsi))
+   ++stmt_count;
+
+ if (stmt_count - visited_stmt_count != 0)
+   break;

hmm, OK, this might be a bit iffy to get correct then, still it's a lot
of pattern maching code that is there elsewhere already.
ifcombine simply hoists any stmts without side-effects up the
dominator tree and thus only requires BBs without side-effects
(IIRC there's a predicate fn for that).

+  /* Prevent loosing information for a PHI node where 2 edges will
+be folded into one.  Note that we must do the same also for false_edge
+(for last BB in a if-elseif chain).  */
+  if (!chain->record_phi_arguments (true_edge)
+ || !chain->record_phi_arguments (false_edge))

I don't really get this - looking at record_phi_arguments it seems
we're requiring that all edges into the same PHI from inside the case
(irrespective of from which case label) have the same value for the
PHI arg?

+ if (arg != *v)
+   return false;

should use operand_equal_p at least, REAL_CSTs are for example
not shared tree nodes.  I'll also notice that if record_phi_arguments
fails we still may have altered its hash-map even though the particular
edge will not participate in the current chain, so it will affect other
chains ending in the same BB.  Overall this looks a bit too conservative
(and random, based on visiting order).

+expanded_location loc
+= expand_location (gimple_location (chain->m_first_condition));
+  if (dump_file)
+   {
+ fprintf (dump_file, "Condition chain (at %s:%d) with %d conditions "
+  "(%d BBs) transformed into a switch statement.\n",
+  loc.file, loc.line, total_case_values,
+  chain->m_entries.length ());

Use dump_printf_loc and you can pass a gimple * stmt as location.

+  /* Follow if-elseif-elseif chain.  */
+  bb = false_edge->dest;

so that means the code doesn't handle a tree, right?  But what
makes us sure the chain doesn't continue on the true_edge instead,
guess this degenerate tree isn't handled either.

I was thinking on whether doing the switch discovery in a reverse
CFG walk, recording for each BB what case_range(s) it represents
for a particular variable(s) so when visiting a dominator you
can quickly figure what's the relevant children (true, false or both).
It would also make the matching a BB-local operation where you'd
do the case_label discovery based on the single-pred BBs gimple-cond.

+  output = bit_test_cluster::find_bit_tests (filtered_clusters);
+  r = output.length () < filtered_clusters.length ();
+  if (r)
+dump_clusters (&output, "BT can be built");

so as of the very above comment - this might 

Re: [PATCH][omp, ftracer] Don't duplicate blocks in SIMT region

2020-09-24 Thread Richard Biener
On Thu, 24 Sep 2020, Tom de Vries wrote:

> On 9/24/20 1:42 PM, Richard Biener wrote:
> > On Wed, 23 Sep 2020, Tom de Vries wrote:
> > 
> >> On 9/23/20 9:28 AM, Richard Biener wrote:
> >>> On Tue, 22 Sep 2020, Tom de Vries wrote:
> >>>
>  [ was: Re: [Patch] [middle-end & nvptx] gcc/tracer.c: Don't split BB
>  with SIMT LANE [PR95654] ]
> 
>  On 9/16/20 8:20 PM, Alexander Monakov wrote:
> >
> >
> > On Wed, 16 Sep 2020, Tom de Vries wrote:
> >
> >> [ cc-ing author omp support for nvptx. ]
> >
> > The issue looks familiar. I recognized it back in 2017 (and LLVM people
> > recognized it too for their GPU targets). In an attempt to get agreement
> > to fix the issue "properly" for GCC I found a similar issue that affects
> > all targets, not just offloading, and filed it as PR 80053.
> >
> > (yes, there are no addressable labels involved in offloading, but 
> > nevertheless
> > the nature of the middle-end issue is related)
> 
>  Hi Alexander,
> 
>  thanks for looking into this.
> 
>  Seeing that the attempt to fix things properly is stalled, for now I'm
>  proposing a point-fix, similar to the original patch proposed by Tobias.
> 
>  Richi, Jakub, OK for trunk?
> >>>
> >>> I notice that we call ignore_bb_p many times in tracer.c but one call
> >>> is conveniently early in tail_duplicate (void):
> >>>
> >>>   int n = count_insns (bb);
> >>>   if (!ignore_bb_p (bb))
> >>> blocks[bb->index] = heap.insert (-bb->count.to_frequency (cfun), 
> >>> bb);
> >>>
> >>> where count_insns already walks all stmts in the block.  It would be
> >>> nice to avoid repeatedly walking all stmts, maybe adjusting the above
> >>> call is enough and/or count_insns can compute this and/or the ignore_bb_p
> >>> result can be cached (optimize_bb_for_size_p might change though,
> >>> but maybe all other ignore_bb_p calls effectively just are that,
> >>> checks for blocks that became optimize_bb_for_size_p).
> >>>
> >>
> >> This untested follow-up patch tries something in that direction.
> >>
> >> Is this what you meant?
> > 
> > Yeah, sort of.
> > 
> > +static bool
> > +cached_can_duplicate_bb_p (const_basic_block bb)
> > +{
> > +  if (can_duplicate_bb)
> > 
> > is there any path where can_duplicate_bb would be NULL?
> > 
> 
> Yes, ignore_bb_p is called from gimple-ssa-split-paths.c.

Oh, that was probably done because of the very same OMP issue ...

> > +{
> > +  unsigned int size = SBITMAP_SIZE (can_duplicate_bb);
> > +  /* Assume added bb's should be ignored.  */
> > +  if ((unsigned int)bb->index < size
> > + && bitmap_bit_p (can_duplicate_bb_computed, bb->index))
> > +   return !bitmap_bit_p (can_duplicate_bb, bb->index);
> > 
> > yes, newly added bbs should be ignored so,
> > 
> >  }
> >  
> > -  return false;
> > +  bool val = compute_can_duplicate_bb_p (bb);
> > +  if (can_duplicate_bb)
> > +cache_can_duplicate_bb_p (bb, val);
> > 
> > no need to compute & cache for them, just return true (because
> > we did duplicate them)?
> > 
> 
> Also the case for gimple-ssa-split-paths.c.?

If it had the bitmap then yes ... since it doesn't the early
out should be in the conditional above only.

Richard.

> Thanks,
> - Tom
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imend


[PATCH] libstdc++: Specialize ranges::__detail::__box for semiregular types

2020-09-24 Thread Patrick Palka via Gcc-patches
The class template semiregular-box defined in [range.semi.wrap] is
used by a number of views to accomodate non-semiregular subobjects
while ensuring that the overall view remains semiregular.  It provides
a stand-in default constructor, copy assignment operator and move
assignment operator whenever the underlying type lacks them.  The
wrapper derives from std::optional to support default construction
when T is not default constructible.

It would be nice for this wrapper to essentially be a no-op when the
underlying type is already semiregular, but this is currently not the
case due to its use of std::optional, which incurs space overhead
compared to storing just T.

To that end, this patch specializes the semiregular wrapper for
semiregular T.  Compared to the primary template, this specialization
uses less space and it allows [[no_unique_address]] to optimize away
wrapped data members whose underlying type is empty and semiregular
(e.g. a non-capturing lambda).  This patch also applies
[[no_unique_address]] to the five data members that currently use the
wrapper.

Tested on x86_64-pc-linux-gnu, does this look OK to commit?

libstdc++-v3/ChangeLog:

* include/std/ranges (__detail::__boxable): Split out the
associated constraints of __box into here.
(__detail::__box): Use the __boxable concept.  Define a leaner
partial specialization for semiregular types.
(single_view::_M_value): Mark it [[no_unique_address]].
(filter_view::_M_pred): Likewise.
(transform_view::_M_fun): Likewise.
(take_while_view::_M_pred): Likewise.
(drop_while_view::_M_pred):: Likewise.
* testsuite/std/ranges/adaptors/detail/semiregular_box.cc: New
test.
---
 libstdc++-v3/include/std/ranges   | 68 +++--
 .../ranges/adaptors/detail/semiregular_box.cc | 73 +++
 2 files changed, 135 insertions(+), 6 deletions(-)
 create mode 100644 
libstdc++-v3/testsuite/std/ranges/adaptors/detail/semiregular_box.cc

diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
index e7fa4493612..8a302a7918f 100644
--- a/libstdc++-v3/include/std/ranges
+++ b/libstdc++-v3/include/std/ranges
@@ -86,7 +86,10 @@ namespace ranges
 
   namespace __detail
   {
-template requires is_object_v<_Tp>
+template
+  concept __boxable = copy_constructible<_Tp> && is_object_v<_Tp>;
+
+template<__boxable _Tp>
   struct __box : std::optional<_Tp>
   {
using std::optional<_Tp>::optional;
@@ -130,6 +133,59 @@ namespace ranges
}
   };
 
+// For types which are already semiregular, this specialization of the
+// semiregular wrapper stores the object directly without going through
+// std::optional.  It provides the subset of the primary template's API
+// that we currently use.
+template<__boxable _Tp> requires semiregular<_Tp>
+  struct __box<_Tp>
+  {
+  private:
+   [[no_unique_address]] _Tp _M_value;
+
+  public:
+   __box() = default;
+
+   constexpr
+   __box(const _Tp& __t)
+   noexcept(is_nothrow_copy_constructible_v<_Tp>)
+   : _M_value{__t}
+   { }
+
+   constexpr
+   __box(_Tp&& __t)
+   noexcept(is_nothrow_move_constructible_v<_Tp>)
+   : _M_value{std::move(__t)}
+   { }
+
+   template
+ requires constructible_from<_Tp, _Args...>
+ constexpr
+ __box(in_place_t, _Args&&... __args)
+ noexcept(is_nothrow_constructible_v<_Tp, _Args...>)
+ : _M_value{std::forward<_Args>(__args)...}
+ { }
+
+   constexpr bool
+   has_value() const noexcept
+   { return true; };
+
+   constexpr _Tp&
+   operator*() noexcept
+   { return _M_value; }
+
+   constexpr const _Tp&
+   operator*() const noexcept
+   { return _M_value; }
+
+   constexpr _Tp*
+   operator->() noexcept
+   { return &_M_value; }
+
+   constexpr const _Tp*
+   operator->() const noexcept
+   { return &_M_value; }
+  };
   } // namespace __detail
 
   /// A view that contains exactly one element.
@@ -185,7 +241,7 @@ namespace ranges
   { return _M_value.operator->(); }
 
 private:
-  __detail::__box<_Tp> _M_value;
+  [[no_unique_address]] __detail::__box<_Tp> _M_value;
 };
 
   namespace __detail
@@ -1195,7 +1251,7 @@ namespace views
   };
 
   _Vp _M_base = _Vp();
-  __detail::__box<_Pred> _M_pred;
+  [[no_unique_address]] __detail::__box<_Pred> _M_pred;
   [[no_unique_address]] __detail::_CachedPosition<_Vp> _M_cached_begin;
 
 public:
@@ -1533,7 +1589,7 @@ namespace views
};
 
   _Vp _M_base = _Vp();
-  __detail::__box<_Fp> _M_fun;
+  [[no_unique_address]] __detail::__box<_Fp> _M_fun;
 
 public:
   transform_view() = default;
@@ -1787,7 +1843,7 @@ namespace views
};
 
   _Vp _M_base = _Vp();
-  __detail::__box<_Pred> _M_pred;
+  [[no_unique_addr

Re: [PATCH] add move CTOR to auto_vec, use auto_vec for get_loop_exit_edges

2020-09-24 Thread Jonathan Wakely via Gcc-patches

On 24/09/20 11:11 +0200, Richard Biener wrote:

On Wed, 26 Aug 2020, Richard Biener wrote:


On Thu, 6 Aug 2020, Richard Biener wrote:

> On Thu, 6 Aug 2020, Richard Biener wrote:
>
> > This adds a move CTOR to auto_vec and makes use of a
> > auto_vec return value for get_loop_exit_edges denoting
> > that lifetime management of the vector is handed to the caller.
> >
> > The move CTOR prompted the hash_table change because it appearantly
> > makes the copy CTOR implicitely deleted (good) and hash-table
> > expansion of the odr_enum_map which is
> > hash_map  where odr_enum has an
> > auto_vec member triggers this.  Not sure if
> > there's a latent bug there before this (I think we're not
> > invoking DTORs, but we're invoking copy-CTORs).
> >
> > Bootstrap / regtest running on x86_64-unknown-linux-gnu.
> >
> > Does this all look sensible and is it a good change
> > (the get_loop_exit_edges one)?
>
> Regtest went OK, here's an update with a complete ChangeLog
> (how useful..) plus the move assign operator deleted, copy
> assign wouldn't work as auto-generated and at the moment
> there's no use of assigning.  I guess if we'd have functions
> that take an auto_vec<> argument meaning they will destroy
> the vector that will become useful and we can implement it.
>
> OK for trunk?

Ping.


Ping^2.


Looks good to me as far as the use of C++ features goes.



Re: [PATCH 1/1] arm: [testsuite] Skip thumb2-cond-cmp tests on Cortex-M [PR94595]

2020-09-24 Thread Christophe Lyon via Gcc-patches
Ping?

On Mon, 7 Sep 2020 at 18:13, Christophe Lyon  wrote:
>
> Since r204778 (g571880a0a4c512195aa7d41929ba6795190887b2), we favor
> branches over IT blocks on Cortex-M. As a result, instead of
> generating two nested IT blocks in thumb2-cond-cmp-[1234].c, we
> generate either a single IT block, or use branches depending on
> conditions tested by the program.
>
> Since this was a deliberate change and the tests still pass as
> expected on Cortex-A, this patch skips them when targetting
> Cortex-M. The avoids the failures on Cortex M3, M4, and M33.  This
> patch makes the testcases unsupported on Cortex-M7 although they pass
> in this case because this CPU has different branch costs.
>
> I tried to relax the scan-assembler directives using eg. cmpne|subne
> or cmpgt|ble but that seemed fragile.
>
> OK?
>
> 2020-09-07  Christophe Lyon  
>
> gcc/testsuite/
> PR target/94595
> * gcc.target/arm/thumb2-cond-cmp-1.c: Skip if arm_cortex_m.
> * gcc.target/arm/thumb2-cond-cmp-2.c: Skip if arm_cortex_m.
> * gcc.target/arm/thumb2-cond-cmp-3.c: Skip if arm_cortex_m.
> * gcc.target/arm/thumb2-cond-cmp-3.c: Skip if arm_cortex_m.
> ---
>  gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-1.c | 2 +-
>  gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-2.c | 2 +-
>  gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-3.c | 2 +-
>  gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-4.c | 2 +-
>  4 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-1.c 
> b/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-1.c
> index 45ab605..36204f4 100644
> --- a/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-1.c
> +++ b/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-1.c
> @@ -1,6 +1,6 @@
>  /* Use conditional compare */
>  /* { dg-options "-O2" } */
> -/* { dg-skip-if "" { arm_thumb1_ok } } */
> +/* { dg-skip-if "" { arm_thumb1_ok || arm_cortex_m } } */
>  /* { dg-final { scan-assembler "cmpne" } } */
>
>  int f(int i, int j)
> diff --git a/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-2.c 
> b/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-2.c
> index 17d9a8f..108d1c3 100644
> --- a/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-2.c
> +++ b/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-2.c
> @@ -1,6 +1,6 @@
>  /* Use conditional compare */
>  /* { dg-options "-O2" } */
> -/* { dg-skip-if "" { arm_thumb1_ok } } */
> +/* { dg-skip-if "" { arm_thumb1_ok || arm_cortex_m } } */
>  /* { dg-final { scan-assembler "cmpeq" } } */
>
>  int f(int i, int j)
> diff --git a/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-3.c 
> b/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-3.c
> index 6b2a79b..ca7fd9f 100644
> --- a/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-3.c
> +++ b/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-3.c
> @@ -1,6 +1,6 @@
>  /* Use conditional compare */
>  /* { dg-options "-O2" } */
> -/* { dg-skip-if "" { arm_thumb1_ok } } */
> +/* { dg-skip-if "" { arm_thumb1_ok || arm_cortex_m } } */
>  /* { dg-final { scan-assembler "cmpgt" } } */
>
>  int f(int i, int j)
> diff --git a/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-4.c 
> b/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-4.c
> index 80e1076..91cc8f4 100644
> --- a/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-4.c
> +++ b/gcc/testsuite/gcc.target/arm/thumb2-cond-cmp-4.c
> @@ -1,6 +1,6 @@
>  /* Use conditional compare */
>  /* { dg-options "-O2" } */
> -/* { dg-skip-if "" { arm_thumb1_ok } } */
> +/* { dg-skip-if "" { arm_thumb1_ok || arm_cortex_m } } */
>  /* { dg-final { scan-assembler "cmpgt" } } */
>
>  int f(int i, int j)
> --
> 2.7.4
>


c++: local-decls are never member fns [PR97186]

2020-09-24 Thread Nathan Sidwell


This fixes an ICE in noexcept instantiation.  It was presuming
functions always have template_info, but that changed with my
DECL_LOCAL_DECL_P changes.  Fortunately DECL_LOCAL_DECL_P fns are
never member fns, so we don't need to go fishing out a this pointer.

Also I realized I'd misnamed local10.C, so renaming it local-fn3.C,
and while there adding the effective-target lto that David E pointed
out was missing.

PR c++/97186
gcc/cp/
* pt.c (maybe_instantiate_noexcept): Local externs are never
member fns.
gcc/testsuite/
* g++.dg/template/local10.C: Rename ...
* g++.dg/template/local-fn3.C: .. here.  Require lto.
* g++.dg/template/local-fn4.C: New.

pushing to trunk

nathan
--
Nathan Sidwell
diff --git c/gcc/cp/pt.c w/gcc/cp/pt.c
index 1ec039d0793..62e85095bc4 100644
--- c/gcc/cp/pt.c
+++ w/gcc/cp/pt.c
@@ -25397,15 +25397,20 @@ maybe_instantiate_noexcept (tree fn, tsubst_flags_t complain)
 	  push_deferring_access_checks (dk_no_deferred);
 	  input_location = DECL_SOURCE_LOCATION (fn);
 
-	  /* If needed, set current_class_ptr for the benefit of
-	 tsubst_copy/PARM_DECL.  */
-	  tree tdecl = DECL_TEMPLATE_RESULT (DECL_TI_TEMPLATE (fn));
-	  if (DECL_NONSTATIC_MEMBER_FUNCTION_P (tdecl))
+	  if (!DECL_LOCAL_DECL_P (fn))
 	{
-	  tree this_parm = DECL_ARGUMENTS (tdecl);
-	  current_class_ptr = NULL_TREE;
-	  current_class_ref = cp_build_fold_indirect_ref (this_parm);
-	  current_class_ptr = this_parm;
+	  /* If needed, set current_class_ptr for the benefit of
+		 tsubst_copy/PARM_DECL.  The exception pattern will
+		 refer to the parm of the template, not the
+		 instantiation.  */
+	  tree tdecl = DECL_TEMPLATE_RESULT (DECL_TI_TEMPLATE (fn));
+	  if (DECL_NONSTATIC_MEMBER_FUNCTION_P (tdecl))
+		{
+		  tree this_parm = DECL_ARGUMENTS (tdecl);
+		  current_class_ptr = NULL_TREE;
+		  current_class_ref = cp_build_fold_indirect_ref (this_parm);
+		  current_class_ptr = this_parm;
+		}
 	}
 
 	  /* If this function is represented by a TEMPLATE_DECL, then
diff --git c/gcc/testsuite/g++.dg/template/local10.C w/gcc/testsuite/g++.dg/template/local-fn3.C
similarity index 87%
rename from gcc/testsuite/g++.dg/template/local10.C
rename to gcc/testsuite/g++.dg/template/local-fn3.C
index a2ffc1e7306..2affe235bd3 100644
--- c/gcc/testsuite/g++.dg/template/local10.C
+++ w/gcc/testsuite/g++.dg/template/local-fn3.C
@@ -1,4 +1,6 @@
 // PR c++/97171
+
+// { dg-require-effective-target lto }
 // { dg-additional-options -flto }
 
 template 
diff --git c/gcc/testsuite/g++.dg/template/local-fn4.C w/gcc/testsuite/g++.dg/template/local-fn4.C
new file mode 100644
index 000..4699012accc
--- /dev/null
+++ w/gcc/testsuite/g++.dg/template/local-fn4.C
@@ -0,0 +1,21 @@
+// PR c++/97186
+// ICE in exception spec substitution
+
+
+template 
+struct no {
+  static void
+  tg ()
+  {
+void
+  hk () noexcept (tg); // { dg-error "convert" }
+
+hk ();
+  }
+};
+
+void
+os ()
+{
+  no ().tg ();
+}


Re: [PATCH v2 2/2] rs6000: Expand vec_insert in expander instead of gimple [PR79251]

2020-09-24 Thread Richard Biener via Gcc-patches
On Thu, Sep 24, 2020 at 10:21 AM xionghu luo  wrote:
>
> Hi Segher,
>
> The attached two patches are updated and split from
>  "[PATCH v2 2/2] rs6000: Expand vec_insert in expander instead of gimple 
> [PR79251]"
> as your comments.
>
>
> [PATCH v3 2/3] rs6000: Fix lvsl&lvsr mode and change rs6000_expand_vector_set 
> param
>
> This one is preparation work of fix lvsl&lvsr arg mode and 
> rs6000_expand_vector_set
> parameter support for both constant and variable index input.
>
>
> [PATCH v3 2/3] rs6000: Support variable insert and Expand vec_insert in 
> expander [PR79251]
>
> This one is Building VIEW_CONVERT_EXPR and expand the IFN VEC_SET to fast.

I'll just comment that

xxperm 34,34,33
xxinsertw 34,0,12
xxperm 34,34,32

doesn't look like a variable-position insert instruction but
this is a variable whole-vector rotate plus an insert at index zero
followed by a variable whole-vector rotate.  I'm not fluend in
ppc assembly but

rlwinm 6,6,2,28,29
mtvsrwz 0,5
lvsr 1,0,6
lvsl 0,0,6

possibly computes the shift masks for r33/r32?  though
I do not see those registers mentioned...

This might be a generic viable expansion strathegy btw,
which is why I asked before whether the CPU supports
inserts at a variable position ...  the building blocks are
already there with vec_set at constant zero position
plus vec_perm_const for the rotates.

But well, I did ask this question.  Multiple times.

ppc does _not_ have a VSX instruction
like xxinsertw r34, r8, r12 where r8 denotes
the vector element (or byte position or whatever).

So I don't think vec_set with a variable index is the
best approach.
Xionghu - you said even without the patch the stack
storage is eventually elided but

addi 9,1,-16
rldic 6,6,2,60
stxv 34,-16(1)
stwx 5,9,6
lxv 34,-16(1)

still shows stack(?) store/load with a bad STLF penalty.

Richard.

>
> Thanks,
> Xionghu


[PATCH] libstdc++: Fix Unicode codecvt and add tests [PR86419]

2020-09-24 Thread Dimitrij Mijoski via Gcc-patches
Fixes the conversion from UTF-8 to UTF-16 to properly return partial
instead ok.
Fixes the conversion from UTF-16 to UTF-8 to properly return partial
instead ok.
Fixes the conversion from UTF-8 to UCS-2 to properly return partial
instead error.
Fixes the conversion from UTF-8 to UCS-2 to treat 4-byte UTF-8 sequences
as error just by seeing the leading byte.
Fixes UTF-8 decoding for all codecvts so they detect error at the end of
the input range when the last code point is also incomplete.

The testsute is large and may need splitting into multiple files.

libstdc++-v3/ChangeLog:
PR libstdc++/86419
* src/c++11/codecvt.cc: Fix bugs.
* testsuite/22_locale/codecvt/codecvt_unicode.cc: New tests.
---
 libstdc++-v3/src/c++11/codecvt.cc |   25 +-
 .../22_locale/codecvt/codecvt_unicode.cc  | 1310 +
 2 files changed, 1323 insertions(+), 12 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/22_locale/codecvt/codecvt_unicode.cc

diff --git a/libstdc++-v3/src/c++11/codecvt.cc 
b/libstdc++-v3/src/c++11/codecvt.cc
index 0311b15177d0..4545ba1b5933 100644
--- a/libstdc++-v3/src/c++11/codecvt.cc
+++ b/libstdc++-v3/src/c++11/codecvt.cc
@@ -277,13 +277,15 @@ namespace
 }
 else if (c1 < 0xF0) // 3-byte sequence
 {
-  if (avail < 3)
+  if (avail < 2)
return incomplete_mb_character;
   unsigned char c2 = from[1];
   if ((c2 & 0xC0) != 0x80)
return invalid_mb_sequence;
   if (c1 == 0xE0 && c2 < 0xA0) // overlong
return invalid_mb_sequence;
+  if (avail < 3)
+   return incomplete_mb_character;
   unsigned char c3 = from[2];
   if ((c3 & 0xC0) != 0x80)
return invalid_mb_sequence;
@@ -292,9 +294,9 @@ namespace
from += 3;
   return c;
 }
-else if (c1 < 0xF5) // 4-byte sequence
+else if (c1 < 0xF5 && maxcode > 0x) // 4-byte sequence
 {
-  if (avail < 4)
+  if (avail < 2)
return incomplete_mb_character;
   unsigned char c2 = from[1];
   if ((c2 & 0xC0) != 0x80)
@@ -302,10 +304,14 @@ namespace
   if (c1 == 0xF0 && c2 < 0x90) // overlong
return invalid_mb_sequence;
   if (c1 == 0xF4 && c2 >= 0x90) // > U+10
-  return invalid_mb_sequence;
+   return invalid_mb_sequence;
+  if (avail < 3)
+   return incomplete_mb_character;
   unsigned char c3 = from[2];
   if ((c3 & 0xC0) != 0x80)
return invalid_mb_sequence;
+  if (avail < 4)
+   return incomplete_mb_character;
   unsigned char c4 = from[3];
   if ((c4 & 0xC0) != 0x80)
return invalid_mb_sequence;
@@ -540,12 +546,7 @@ namespace
auto orig = from;
const char32_t codepoint = read_utf8_code_point(from, maxcode);
if (codepoint == incomplete_mb_character)
- {
-   if (s == surrogates::allowed)
- return codecvt_base::partial;
-   else
- return codecvt_base::error; // No surrogates in UCS2
- }
+ return codecvt_base::partial;
if (codepoint > maxcode)
  return codecvt_base::error;
if (!write_utf16_code_point(to, codepoint, mode))
@@ -554,7 +555,7 @@ namespace
return codecvt_base::partial;
  }
   }
-return codecvt_base::ok;
+return from.size() ? codecvt_base::partial : codecvt_base::ok;
   }
 
   // utf16 -> utf8 (or ucs2 -> utf8 if s == surrogates::disallowed)
@@ -576,7 +577,7 @@ namespace
  return codecvt_base::error; // No surrogates in UCS-2
 
if (from.size() < 2)
- return codecvt_base::ok; // stop converting at this point
+ return codecvt_base::partial; // stop converting at this point
 
const char32_t c2 = from[1];
if (is_low_surrogate(c2))
diff --git a/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_unicode.cc 
b/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_unicode.cc
new file mode 100644
index ..88afd49206d1
--- /dev/null
+++ b/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_unicode.cc
@@ -0,0 +1,1310 @@
+// Copyright (C) 2020 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// ;.
+
+// { dg-do run { target c++11 } }
+
+#include 
+#include 
+#include 
+#include 
+
+using namespace std;
+
+template 
+std::unique_p

[PATCH] Add cgraph_edge::debug function.

2020-09-24 Thread Martin Liška

I see it handy to debug cgraph_edge and I've dumped it manually many times.
Maybe it's time to come up with the function? Example output:

(gdb) p e->debug()
ag/9 -> h/3 (1 (adjusted),0.25 per call)

ag/9 (ag) @0x7773eca8
  Type: function definition analyzed
  Visibility: public
  next sharing asm name: 7
  References: table/5 (addr)
  Referring:
  Function ag/9 is inline copy in ap/4
  Clone of ag/7
  Availability: local
  Function flags: count:2 (adjusted) first_run:6 body local hot
  Called by: ai/8 (inlined) (indirect_inlining) (4 (adjusted),1.00 per call)
  Calls: h/3 (1 (adjusted),0.25 per call)
h/3 (h) @0x7772b438
  Type: function definition analyzed
  Visibility: externally_visible public
  References: ap/4 (addr)
  Referring:
  Availability: available
  Profile id: 1806506296
  Function flags: count:4 (precise) first_run:3 body hot
  Called by: ag/9 (1 (adjusted),0.25 per call) ag/7 (1 (adjusted),0.25 per 
call) ag/0 (2 (estimated locally, globally 0 adjusted),0.50 per call) bug/2 (1 
(precise),1.00 per call) bug/2 (1 (precise),1.00 per call)
  Calls: ai/1 (4 (precise),1.00 per call)

(gdb) p ie->debug()
ai/1 -> (null) (speculative) (0 (adjusted),0.00 per call)

ai/1 (ai) @0x7772b168
  Type: function definition analyzed
  Visibility: prevailing_def_ironly
  previous sharing asm name: 8
  References: table/5 (addr) ap/4 (addr) (speculative) ag/0 (addr) (speculative)
  Referring:
  Function ai/1 is inline copy in h/3
  Availability: local
  Profile id: 1923518911
  Function flags: count:4 (precise) first_run:4 body local hot
  Called by: h/3 (inlined) (4 (precise),1.00 per call)
  Calls: ag/7 (speculative) (inlined) (2 (adjusted),0.50 per call) ap/4 
(speculative) (2 (adjusted),0.50 per call) PyErr_Format/6 (0 (precise),0.00 per 
call)
   Indirect call(speculative) (0 (adjusted),0.00 per call)  of param:1 (vptr 
maybe changed) Num speculative call targets: 2

Ready to be installed?
Thanks,
Martin

gcc/ChangeLog:

* cgraph.c (cgraph_edge::debug): New.
* cgraph.h (cgraph_edge::debug): New.
---
 gcc/cgraph.c | 14 ++
 gcc/cgraph.h |  3 +++
 2 files changed, 17 insertions(+)

diff --git a/gcc/cgraph.c b/gcc/cgraph.c
index b43adaac7c0..46c3b124b1a 100644
--- a/gcc/cgraph.c
+++ b/gcc/cgraph.c
@@ -2072,6 +2072,20 @@ cgraph_edge::dump_edge_flags (FILE *f)
 fprintf (f, "(can throw external) ");
 }
 
+/* Dump edge to stderr.  */

+
+void
+cgraph_edge::debug (void)
+{
+  fprintf (stderr, "%s -> %s ", caller->dump_asm_name (),
+  callee == NULL ? "(null)" : callee->dump_asm_name ());
+  dump_edge_flags (stderr);
+  fprintf (stderr, "\n\n");
+  caller->debug ();
+  if (callee != NULL)
+callee->debug ();
+}
+
 /* Dump call graph node to file F.  */
 
 void

diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index 0211f08964f..96d6cf609fe 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -2022,6 +2022,9 @@ private:
   /* Output flags of edge to a file F.  */
   void dump_edge_flags (FILE *f);
 
+  /* Dump edge to stderr.  */

+  void DEBUG_FUNCTION debug (void);
+
   /* Verify that call graph edge corresponds to DECL from the associated
  statement.  Return true if the verification should fail.  */
   bool verify_corresponds_to_fndecl (tree decl);
--
2.28.0



Re: [PATCH v2 2/2] rs6000: Expand vec_insert in expander instead of gimple [PR79251]

2020-09-24 Thread Richard Biener via Gcc-patches
On Thu, Sep 24, 2020 at 3:27 PM Richard Biener
 wrote:
>
> On Thu, Sep 24, 2020 at 10:21 AM xionghu luo  wrote:
> >
> > Hi Segher,
> >
> > The attached two patches are updated and split from
> >  "[PATCH v2 2/2] rs6000: Expand vec_insert in expander instead of gimple 
> > [PR79251]"
> > as your comments.
> >
> >
> > [PATCH v3 2/3] rs6000: Fix lvsl&lvsr mode and change 
> > rs6000_expand_vector_set param
> >
> > This one is preparation work of fix lvsl&lvsr arg mode and 
> > rs6000_expand_vector_set
> > parameter support for both constant and variable index input.
> >
> >
> > [PATCH v3 2/3] rs6000: Support variable insert and Expand vec_insert in 
> > expander [PR79251]
> >
> > This one is Building VIEW_CONVERT_EXPR and expand the IFN VEC_SET to fast.
>
> I'll just comment that
>
> xxperm 34,34,33
> xxinsertw 34,0,12
> xxperm 34,34,32

Btw, on x86_64 the following produces sth reasonable:

#define N 32
typedef int T;
typedef T V __attribute__((vector_size(N)));
V setg (V v, int idx, T val)
{
  V valv = (V){idx, idx, idx, idx, idx, idx, idx, idx};
  V mask = ((V){0, 1, 2, 3, 4, 5, 6, 7} == valv);
  v = (v & ~mask) | (valv & mask);
  return v;
}

vmovd   %edi, %xmm1
vpbroadcastd%xmm1, %ymm1
vpcmpeqd.LC0(%rip), %ymm1, %ymm2
vpblendvb   %ymm2, %ymm1, %ymm0, %ymm0
ret

I'm quite sure you could do sth similar on power?

> doesn't look like a variable-position insert instruction but
> this is a variable whole-vector rotate plus an insert at index zero
> followed by a variable whole-vector rotate.  I'm not fluend in
> ppc assembly but
>
> rlwinm 6,6,2,28,29
> mtvsrwz 0,5
> lvsr 1,0,6
> lvsl 0,0,6
>
> possibly computes the shift masks for r33/r32?  though
> I do not see those registers mentioned...
>
> This might be a generic viable expansion strathegy btw,
> which is why I asked before whether the CPU supports
> inserts at a variable position ...  the building blocks are
> already there with vec_set at constant zero position
> plus vec_perm_const for the rotates.
>
> But well, I did ask this question.  Multiple times.
>
> ppc does _not_ have a VSX instruction
> like xxinsertw r34, r8, r12 where r8 denotes
> the vector element (or byte position or whatever).
>
> So I don't think vec_set with a variable index is the
> best approach.
> Xionghu - you said even without the patch the stack
> storage is eventually elided but
>
> addi 9,1,-16
> rldic 6,6,2,60
> stxv 34,-16(1)
> stwx 5,9,6
> lxv 34,-16(1)
>
> still shows stack(?) store/load with a bad STLF penalty.
>
> Richard.
>
> >
> > Thanks,
> > Xionghu


Re: [gcc-7-arm] Backport -moutline-atomics flag

2020-09-24 Thread Pop, Sebastian via Gcc-patches
Thanks Richard for your recommendations.
I am still discussing with Kyrill about a good name for the branch.
Once we agree on a name we will commit the patches to that branch.

Sebastian

On 9/24/20, 4:10 AM, "Richard Biener"  wrote:

On Fri, Sep 11, 2020 at 12:38 AM Pop, Sebastian via Gcc-patches
 wrote:
>
> Hi,
>
> the attached patches are back-porting the flag -moutline-atomics to the 
gcc-7-arm vendor branch.
> The flag enables a very important performance optimization for 
N1-neoverse processors.
> The patches pass bootstrap and make check on Graviton2 aarch64-linux.
>
> Ok to commit to the gcc-7-arm vendor branch?

Given the branch doesn't exist yet can you eventually push this series to
a user branch (or a amazon vendor branch)?

You failed to CC arm folks so your mail might have been lost in the noise.

Thanks,
Richard.

> Thanks,
> Sebastian
>



Re: [PATCH] libstdc++: Specialize ranges::__detail::__box for semiregular types

2020-09-24 Thread Jonathan Wakely via Gcc-patches

On 24/09/20 09:04 -0400, Patrick Palka via Libstdc++ wrote:

The class template semiregular-box defined in [range.semi.wrap] is
used by a number of views to accomodate non-semiregular subobjects
while ensuring that the overall view remains semiregular.  It provides
a stand-in default constructor, copy assignment operator and move
assignment operator whenever the underlying type lacks them.  The
wrapper derives from std::optional to support default construction
when T is not default constructible.

It would be nice for this wrapper to essentially be a no-op when the
underlying type is already semiregular, but this is currently not the
case due to its use of std::optional, which incurs space overhead
compared to storing just T.

To that end, this patch specializes the semiregular wrapper for
semiregular T.  Compared to the primary template, this specialization
uses less space and it allows [[no_unique_address]] to optimize away
wrapped data members whose underlying type is empty and semiregular
(e.g. a non-capturing lambda).  This patch also applies
[[no_unique_address]] to the five data members that currently use the
wrapper.

Tested on x86_64-pc-linux-gnu, does this look OK to commit?

libstdc++-v3/ChangeLog:

* include/std/ranges (__detail::__boxable): Split out the
associated constraints of __box into here.
(__detail::__box): Use the __boxable concept.  Define a leaner
partial specialization for semiregular types.
(single_view::_M_value): Mark it [[no_unique_address]].
(filter_view::_M_pred): Likewise.
(transform_view::_M_fun): Likewise.
(take_while_view::_M_pred): Likewise.
(drop_while_view::_M_pred):: Likewise.
* testsuite/std/ranges/adaptors/detail/semiregular_box.cc: New
test.
---
libstdc++-v3/include/std/ranges   | 68 +++--
.../ranges/adaptors/detail/semiregular_box.cc | 73 +++
2 files changed, 135 insertions(+), 6 deletions(-)
create mode 100644 
libstdc++-v3/testsuite/std/ranges/adaptors/detail/semiregular_box.cc

diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
index e7fa4493612..8a302a7918f 100644
--- a/libstdc++-v3/include/std/ranges
+++ b/libstdc++-v3/include/std/ranges
@@ -86,7 +86,10 @@ namespace ranges

  namespace __detail
  {
-template requires is_object_v<_Tp>
+template
+  concept __boxable = copy_constructible<_Tp> && is_object_v<_Tp>;
+
+template<__boxable _Tp>
  struct __box : std::optional<_Tp>
  {
using std::optional<_Tp>::optional;
@@ -130,6 +133,59 @@ namespace ranges
}
  };

+// For types which are already semiregular, this specialization of the
+// semiregular wrapper stores the object directly without going through
+// std::optional.  It provides the subset of the primary template's API
+// that we currently use.
+template<__boxable _Tp> requires semiregular<_Tp>
+  struct __box<_Tp>
+  {
+  private:
+   [[no_unique_address]] _Tp _M_value;
+
+  public:
+   __box() = default;
+
+   constexpr
+   __box(const _Tp& __t)
+   noexcept(is_nothrow_copy_constructible_v<_Tp>)
+   : _M_value{__t}
+   { }
+
+   constexpr
+   __box(_Tp&& __t)


To be consistent with optional, these constructors should be
conditionally explicit (and since we're in C++20 code here, we can
actually use explicit(bool) rather than needing two overloads of each
constructor).

But I think we could just make them unconditionally explicit, since we
only ever construct them explicitly. No need to allow implicit
conversions if we never need them.

Otherwise this looks great, p[lease push. It's an ABI change for the
types using __box, so isn't appropriate for backporting to gcc-10
(unlike most changes to ).




Re: [PATCH] add move CTOR to auto_vec, use auto_vec for get_loop_exit_edges

2020-09-24 Thread Richard Biener
On Thu, 24 Sep 2020, Jonathan Wakely wrote:

> On 24/09/20 11:11 +0200, Richard Biener wrote:
> >On Wed, 26 Aug 2020, Richard Biener wrote:
> >
> >> On Thu, 6 Aug 2020, Richard Biener wrote:
> >>
> >> > On Thu, 6 Aug 2020, Richard Biener wrote:
> >> >
> >> > > This adds a move CTOR to auto_vec and makes use of a
> >> > > auto_vec return value for get_loop_exit_edges denoting
> >> > > that lifetime management of the vector is handed to the caller.
> >> > >
> >> > > The move CTOR prompted the hash_table change because it appearantly
> >> > > makes the copy CTOR implicitely deleted (good) and hash-table
> >> > > expansion of the odr_enum_map which is
> >> > > hash_map  where odr_enum has an
> >> > > auto_vec member triggers this.  Not sure if
> >> > > there's a latent bug there before this (I think we're not
> >> > > invoking DTORs, but we're invoking copy-CTORs).
> >> > >
> >> > > Bootstrap / regtest running on x86_64-unknown-linux-gnu.
> >> > >
> >> > > Does this all look sensible and is it a good change
> >> > > (the get_loop_exit_edges one)?
> >> >
> >> > Regtest went OK, here's an update with a complete ChangeLog
> >> > (how useful..) plus the move assign operator deleted, copy
> >> > assign wouldn't work as auto-generated and at the moment
> >> > there's no use of assigning.  I guess if we'd have functions
> >> > that take an auto_vec<> argument meaning they will destroy
> >> > the vector that will become useful and we can implement it.
> >> >
> >> > OK for trunk?
> >>
> >> Ping.
> >
> >Ping^2.
> 
> Looks good to me as far as the use of C++ features goes.

Thanks, now pushed after re-testing.

Richard.


Re: [PATCH] aarch64: Do not alter force_reg returned rtx expanding pauth builtins

2020-09-24 Thread Andrea Corallo
Hi Richard,

thanks for reviewing

Richard Sandiford  writes:

> Andrea Corallo  writes:
>> Hi all,
>>
>> having a look for force_reg returned rtx later on modified I've found
>> this other case in `aarch64_general_expand_builtin` while expanding 
>> pointer authentication builtins.
>>
>> Regtested and bootsraped on aarch64-linux-gnu.
>>
>> Okay for trunk?
>>
>>   Andrea
>>
>> From 8869ee04e3788fdec86aa7e5a13e2eb477091d0e Mon Sep 17 00:00:00 2001
>> From: Andrea Corallo 
>> Date: Mon, 21 Sep 2020 13:52:45 +0100
>> Subject: [PATCH] aarch64: Do not alter force_reg returned rtx expanding pauth
>>  builtins
>>
>> 2020-09-21  Andrea Corallo  
>>
>>  * config/aarch64/aarch64-builtins.c
>>  (aarch64_general_expand_builtin): Do not alter value on a
>>  force_reg returned rtx.
>> ---
>>  gcc/config/aarch64/aarch64-builtins.c | 6 +++---
>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/gcc/config/aarch64/aarch64-builtins.c 
>> b/gcc/config/aarch64/aarch64-builtins.c
>> index b787719cf5e..a77718ccfac 100644
>> --- a/gcc/config/aarch64/aarch64-builtins.c
>> +++ b/gcc/config/aarch64/aarch64-builtins.c
>> @@ -2079,10 +2079,10 @@ aarch64_general_expand_builtin (unsigned int fcode, 
>> tree exp, rtx target,
>>arg0 = CALL_EXPR_ARG (exp, 0);
>>op0 = force_reg (Pmode, expand_normal (arg0));
>>  
>> -  if (!target)
>> +  if (!(target
>> +&& REG_P (target)
>> +&& GET_MODE (target) == Pmode))
>>  target = gen_reg_rtx (Pmode);
>> -  else
>> -target = force_reg (Pmode, target);
>>  
>>emit_move_insn (target, op0);
>
> Do we actually use the result of this move?  It looked like we always
> use op0 rather than target (good) and overwrite target with a later move.
>
> If so, I think we should delete the move

Good point agree.

> and convert the later code to use expand_insn.

I'm not sure I understand the suggestion right, xpaclri&friends patterns
are written with hardcoded in/out regs, is the suggestion to just use like
'expand_insn (CODE_FOR_xpaclri, 0, NULL)' in place of GEN_FCN+emit_insn?

Thanks!

  Andrea


Re: [PATCH] Add cgraph_edge::debug function.

2020-09-24 Thread Jan Hubicka
> I see it handy to debug cgraph_edge and I've dumped it manually many times.
> Maybe it's time to come up with the function? Example output:
> 
> (gdb) p e->debug()
> ag/9 -> h/3 (1 (adjusted),0.25 per call)
> 
> ag/9 (ag) @0x7773eca8
>   Type: function definition analyzed
>   Visibility: public
>   next sharing asm name: 7
>   References: table/5 (addr)
>   Referring:
>   Function ag/9 is inline copy in ap/4
>   Clone of ag/7
>   Availability: local
>   Function flags: count:2 (adjusted) first_run:6 body local hot
>   Called by: ai/8 (inlined) (indirect_inlining) (4 (adjusted),1.00 per call)
>   Calls: h/3 (1 (adjusted),0.25 per call)
> h/3 (h) @0x7772b438
>   Type: function definition analyzed
>   Visibility: externally_visible public
>   References: ap/4 (addr)
>   Referring:
>   Availability: available
>   Profile id: 1806506296
>   Function flags: count:4 (precise) first_run:3 body hot
>   Called by: ag/9 (1 (adjusted),0.25 per call) ag/7 (1 (adjusted),0.25 per 
> call) ag/0 (2 (estimated locally, globally 0 adjusted),0.50 per call) bug/2 
> (1 (precise),1.00 per call) bug/2 (1 (precise),1.00 per call)
>   Calls: ai/1 (4 (precise),1.00 per call)
> 
> (gdb) p ie->debug()
> ai/1 -> (null) (speculative) (0 (adjusted),0.00 per call)
> 
> ai/1 (ai) @0x7772b168
>   Type: function definition analyzed
>   Visibility: prevailing_def_ironly
>   previous sharing asm name: 8
>   References: table/5 (addr) ap/4 (addr) (speculative) ag/0 (addr) 
> (speculative)
>   Referring:
>   Function ai/1 is inline copy in h/3
>   Availability: local
>   Profile id: 1923518911
>   Function flags: count:4 (precise) first_run:4 body local hot
>   Called by: h/3 (inlined) (4 (precise),1.00 per call)
>   Calls: ag/7 (speculative) (inlined) (2 (adjusted),0.50 per call) ap/4 
> (speculative) (2 (adjusted),0.50 per call) PyErr_Format/6 (0 (precise),0.00 
> per call)
>Indirect call(speculative) (0 (adjusted),0.00 per call)  of param:1 (vptr 
> maybe changed) Num speculative call targets: 2
> 
> Ready to be installed?
> Thanks,
> Martin
> 
> gcc/ChangeLog:
> 
>   * cgraph.c (cgraph_edge::debug): New.
>   * cgraph.h (cgraph_edge::debug): New.
OK,
Honza
> ---
>  gcc/cgraph.c | 14 ++
>  gcc/cgraph.h |  3 +++
>  2 files changed, 17 insertions(+)
> 
> diff --git a/gcc/cgraph.c b/gcc/cgraph.c
> index b43adaac7c0..46c3b124b1a 100644
> --- a/gcc/cgraph.c
> +++ b/gcc/cgraph.c
> @@ -2072,6 +2072,20 @@ cgraph_edge::dump_edge_flags (FILE *f)
>  fprintf (f, "(can throw external) ");
>  }
> +/* Dump edge to stderr.  */
> +
> +void
> +cgraph_edge::debug (void)
> +{
> +  fprintf (stderr, "%s -> %s ", caller->dump_asm_name (),
> +callee == NULL ? "(null)" : callee->dump_asm_name ());
> +  dump_edge_flags (stderr);
> +  fprintf (stderr, "\n\n");
> +  caller->debug ();
> +  if (callee != NULL)
> +callee->debug ();
> +}
> +
>  /* Dump call graph node to file F.  */
>  void
> diff --git a/gcc/cgraph.h b/gcc/cgraph.h
> index 0211f08964f..96d6cf609fe 100644
> --- a/gcc/cgraph.h
> +++ b/gcc/cgraph.h
> @@ -2022,6 +2022,9 @@ private:
>/* Output flags of edge to a file F.  */
>void dump_edge_flags (FILE *f);
> +  /* Dump edge to stderr.  */
> +  void DEBUG_FUNCTION debug (void);
> +
>/* Verify that call graph edge corresponds to DECL from the associated
>   statement.  Return true if the verification should fail.  */
>bool verify_corresponds_to_fndecl (tree decl);
> -- 
> 2.28.0
> 


[PATCH][AArch64][GCC 9] Add support for __jcvt intrinsic

2020-09-24 Thread Kyrylo Tkachov
Hi all,

I'd like to backport support for the __jcvt intrinsic to the active branches as 
it's an Armv8.3-a intrinsic that should have been supported there.
This is a squashed commit of the initial supported and a couple of follow-up 
fixes from Andrea.
This is the GCC 9 version.

Bootstrapped and tested on the branch.

This patch implements the __jcvt ACLE intrinsic [1] that maps down to the 
FJCVTZS [2] instruction from Armv8.3-a.
No fancy mode iterators or nothing. Just a single builtin, UNSPEC and 
define_insn and the associate plumbing.
This patch also defines __ARM_FEATURE_JCVT to indicate when the intrinsic is 
available.

[1] https://developer.arm.com/docs/101028/latest/data-processing-intrinsics
[2] 
https://developer.arm.com/docs/ddi0596/latest/simd-and-floating-point-instructions-alphabetic-order/fjcvtzs-floating-point-javascript-convert-to-signed-fixed-point-rounding-toward-zero

gcc/
PR target/71233
* config/aarch64/aarch64.md (UNSPEC_FJCVTZS): Define.
(aarch64_fjcvtzs): New define_insn.
* config/aarch64/aarch64.h (TARGET_JSCVT): Define.
* config/aarch64/aarch64-builtins.c (aarch64_builtins):
Add AARCH64_JSCVT.
(aarch64_init_builtins): Initialize __builtin_aarch64_jcvtzs.
(aarch64_expand_builtin): Handle AARCH64_JSCVT.
* config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Define
__ARM_FEATURE_JCVT where appropriate.
* config/aarch64/arm_acle.h (__jcvt): Define.
* doc/sourcebuild.texi (aarch64_fjcvtzs_hw) Document new
target supports option.

gcc/testsuite/
PR target/71233
* gcc.target/aarch64/acle/jcvt_1.c: New test.
* gcc.target/aarch64/acle/jcvt_2.c: New testcase.
* lib/target-supports.exp
(check_effective_target_aarch64_fjcvtzs_hw): Add new check for
FJCVTZS hw.

Co-Authored-By: Andrea Corallo  

(cherry picked from commit e1d5d19ec4f84b67ac693fef5b2add7dc9cf056d)
(cherry picked from commit 2c62952f8160bdc8d4111edb34a4bc75096c1e05)
(cherry picked from commit d2b86e14c14020f3e119ab8f462e2a91bd7d46e5)


jcvt-9.patch
Description: jcvt-9.patch


[PATCH][AArch64][GCC 8] Add support for __jcvt intrinsic

2020-09-24 Thread Kyrylo Tkachov
Hi all,

I'd like to backport support for the __jcvt intrinsic to the active branches as 
it's an Armv8.3-a intrinsic that should have been supported there.
This is a squashed commit of the initial supported and a couple of follow-up 
fixes from Andrea.
This is the GCC 8 version.

Bootstrapped and tested on the branch.

This patch implements the __jcvt ACLE intrinsic [1] that maps down to the 
FJCVTZS [2] instruction from Armv8.3-a.
No fancy mode iterators or nothing. Just a single builtin, UNSPEC and 
define_insn and the associate plumbing.
This patch also defines __ARM_FEATURE_JCVT to indicate when the intrinsic is 
available.

[1] https://developer.arm.com/docs/101028/latest/data-processing-intrinsics
[2] 
https://developer.arm.com/docs/ddi0596/latest/simd-and-floating-point-instructions-alphabetic-order/fjcvtzs-floating-point-javascript-convert-to-signed-fixed-point-rounding-toward-zero

gcc/
PR target/71233
* config/aarch64/aarch64.md (UNSPEC_FJCVTZS): Define.
(aarch64_fjcvtzs): New define_insn.
* config/aarch64/aarch64.h (TARGET_JSCVT): Define.
* config/aarch64/aarch64-builtins.c (aarch64_builtins):
Add AARCH64_JSCVT.
(aarch64_init_builtins): Initialize __builtin_aarch64_jcvtzs.
(aarch64_expand_builtin): Handle AARCH64_JSCVT.
* config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Define
__ARM_FEATURE_JCVT where appropriate.
* config/aarch64/arm_acle.h (__jcvt): Define.
* doc/sourcebuild.texi (aarch64_fjcvtzs_hw) Document new
target supports option.

gcc/testsuite/
PR target/71233
* gcc.target/aarch64/acle/jcvt_1.c: New test.
* gcc.target/aarch64/acle/jcvt_2.c: New testcase.
* lib/target-supports.exp
(check_effective_target_aarch64_fjcvtzs_hw): Add new check for
FJCVTZS hw.

Co-Authored-By: Andrea Corallo  

(cherry picked from commit e1d5d19ec4f84b67ac693fef5b2add7dc9cf056d)
(cherry picked from commit 2c62952f8160bdc8d4111edb34a4bc75096c1e05)
(cherry picked from commit d2b86e14c14020f3e119ab8f462e2a91bd7d46e5)
(cherry picked from commit 58ae77d3ba70a2b9ccc90a90f3f82cf46239d5f1)


jcvt-8.patch
Description: jcvt-8.patch


[PATCH 1/2, rs6000] int128 sign extention instructions (partial prereq)

2020-09-24 Thread will schmidt via Gcc-patches
[PATCH, rs6000] int128 sign extention instructions (partial prereq)

Hi
  This is a sub-set of the 128-bit sign extension support patch series
that I believe will be fully implemented in a subsequent patch from Carl.
This is a necessary pre-requisite for the vector-load/store rightmost
element patch that follows in this thread.

Thanks,
-Will

gcc/ChangeLog:
* config/rs6000/rs6000.md (enum c_enum): Add UNSPEC_EXTENDDITI2
and UNSPEC_MTVSRD_DITI_W1 entries.
(mtvsrdd_diti_w1, extendditi2_vector): New define_insns.
(extendditi2): New define_expand.

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 9c5a228..7d0b296 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -150,10 +150,12 @@
UNSPEC_PLTSEQ
UNSPEC_PLT16_HA
UNSPEC_CFUGED
UNSPEC_CNTLZDM
UNSPEC_CNTTZDM
+   UNSPEC_EXTENDDITI2
+   UNSPEC_MTVSRD_DITI_W1
UNSPEC_PDEPD
UNSPEC_PEXTD
   ])
 
 ;;
@@ -963,10 +965,41 @@
   ""
   [(set_attr "type" "shift")
(set_attr "dot" "yes")
(set_attr "length" "4,8")])
 
+;; Move DI value from GPR to TI mode in VSX register, word 1.
+(define_insn "mtvsrdd_diti_w1"
+  [(set (match_operand:TI 0 "register_operand" "=wa")
+   (unspec:TI [(match_operand:DI 1 "register_operand" "r")]
+  UNSPEC_MTVSRD_DITI_W1))]
+  "TARGET_POWERPC64 && TARGET_DIRECT_MOVE"
+  "mtvsrdd %x0,0,%1"
+  [(set_attr "type" "vecsimple")])
+
+;; Sign extend 64-bit value in TI reg, word 1, to 128-bit value in TI reg
+(define_insn "extendditi2_vector"
+  [(set (match_operand:TI 0 "gpc_reg_operand" "=v")
+(unspec:TI [(match_operand:TI 1 "gpc_reg_operand" "v")]
+ UNSPEC_EXTENDDITI2))]
+  "TARGET_POWER10"
+  "vextsd2q %0,%1"
+  [(set_attr "type" "exts")])
+
+(define_expand "extendditi2"
+  [(set (match_operand:TI 0 "gpc_reg_operand")
+(sign_extend:DI (match_operand:DI 1 "gpc_reg_operand")))]
+  "TARGET_POWER10"
+  {
+/* Move 64-bit src from GPR to vector reg and sign extend to 128-bits */
+rtx temp = gen_reg_rtx (TImode);
+emit_insn (gen_mtvsrdd_diti_w1 (temp, operands[1]));
+emit_insn (gen_extendditi2_vector (operands[0], temp));
+DONE;
+  }
+  [(set_attr "type" "exts")])
+
 
 (define_insn "extendqi2"
   [(set (match_operand:EXTQI 0 "gpc_reg_operand" "=r,?*v")
(sign_extend:EXTQI (match_operand:QI 1 "gpc_reg_operand" "r,?*v")))]
   ""



[PATCH 1/2] aarch64: Add support for Neoverse N2 CPU

2020-09-24 Thread Alex Coplan
This patch adds support for Arm's Neoverse N2 CPU to the AArch64
backend.

Testing:
 * Bootstrapped and regtested on aarch64-none-linux-gnu.

OK for trunk?

Thanks,
Alex

---

gcc/ChangeLog:

* config/aarch64/aarch64-cores.def: Add Neoverse N2.
* config/aarch64/aarch64-tune.md: Regenerate.
* doc/invoke.texi: Document AArch64 support for Neoverse N2.
diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index 04dc587681e..469ee99824c 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -136,6 +136,9 @@ AARCH64_CORE("thunderx3t110",  thunderx3t110,  
thunderx3t110, 8_3A,  AARCH64_FL_
 AARCH64_CORE("zeus", zeus, cortexa57, 8_4A,  AARCH64_FL_FOR_ARCH8_4 | 
AARCH64_FL_SVE | AARCH64_FL_RCPC | AARCH64_FL_I8MM | AARCH64_FL_BF16 | 
AARCH64_FL_F16 | AARCH64_FL_PROFILE | AARCH64_FL_SSBS | AARCH64_FL_RNG, 
neoversen1, 0x41, 0xd40, -1)
 AARCH64_CORE("neoverse-v1", neoversev1, cortexa57, 8_4A,  
AARCH64_FL_FOR_ARCH8_4 | AARCH64_FL_SVE | AARCH64_FL_RCPC | AARCH64_FL_I8MM | 
AARCH64_FL_BF16 | AARCH64_FL_F16 | AARCH64_FL_PROFILE | AARCH64_FL_SSBS | 
AARCH64_FL_RNG, neoversen1, 0x41, 0xd40, -1)
 
+/* Armv8.5-A Architecture Processors.  */
+AARCH64_CORE("neoverse-n2", neoversen2, cortexa57, 8_5A, 
AARCH64_FL_FOR_ARCH8_5 | AARCH64_FL_I8MM | AARCH64_FL_BF16 | AARCH64_FL_F16 | 
AARCH64_FL_SVE | AARCH64_FL_SVE2 | AARCH64_FL_SVE2_BITPERM | AARCH64_FL_RNG | 
AARCH64_FL_MEMTAG, neoversen1, 0x41, 0xd49, -1)
+
 /* Qualcomm ('Q') cores. */
 AARCH64_CORE("saphira", saphira,saphira,8_4A,  
AARCH64_FL_FOR_ARCH8_4 | AARCH64_FL_CRYPTO | AARCH64_FL_RCPC, saphira,   0x51, 
0xC01, -1)
 
diff --git a/gcc/config/aarch64/aarch64-tune.md 
b/gcc/config/aarch64/aarch64-tune.md
index 729eb3ec2c7..3cf69ceadaf 100644
--- a/gcc/config/aarch64/aarch64-tune.md
+++ b/gcc/config/aarch64/aarch64-tune.md
@@ -1,5 +1,5 @@
 ;; -*- buffer-read-only: t -*-
 ;; Generated automatically by gentune.sh from aarch64-cores.def
 (define_attr "tune"
-   
"cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa65,cortexa65ae,ares,neoversen1,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,zeus,neoversev1,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82"
+   
"cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa65,cortexa65ae,ares,neoversen1,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,zeus,neoversev1,neoversen2,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82"
(const (symbol_ref "((enum attr_tune) aarch64_tune)")))
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 75203ba2420..f420da6c9f8 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -17517,8 +17517,8 @@ performance of the code.  Permissible values for this 
option are:
 @samp{cortex-a76}, @samp{cortex-a76ae}, @samp{cortex-a77},
 @samp{cortex-a65}, @samp{cortex-a65ae}, @samp{cortex-a34},
 @samp{ares}, @samp{exynos-m1}, @samp{emag}, @samp{falkor},
-@samp{neoverse-e1},@samp{neoverse-n1},@samp{neoverse-v1},@samp{qdf24xx},
-@samp{saphira},
+@samp{neoverse-e1}, @samp{neoverse-n1}, @samp{neoverse-n2},
+@samp{neoverse-v1}, @samp{qdf24xx}, @samp{saphira},
 @samp{phecda}, @samp{xgene1}, @samp{vulcan}, @samp{octeontx},
 @samp{octeontx81},  @samp{octeontx83},
 @samp{octeontx2}, @samp{octeontx2t98}, @samp{octeontx2t96}


[PATCH 2/2] arm: Add support for Neoverse N2 CPU

2020-09-24 Thread Alex Coplan
This adds support for Arm's Neoverse N2 CPU to the AArch32 backend.
Neoverse N2 builds AArch32 at EL0 and therefore needs support in AArch32
GCC.

Testing:
 * Bootstrapped and regtested on arm-none-linux-gnueabihf.

OK for master?

Thanks,
Alex

---

gcc/ChangeLog:

* config/arm/arm-cpus.in (neoverse-n2): New.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/arm-tune.md: Regenerate.
* doc/invoke.texi: Document support for Neoverse N2.
diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
index 4550694e138..be563b7f807 100644
--- a/gcc/config/arm/arm-cpus.in
+++ b/gcc/config/arm/arm-cpus.in
@@ -1459,6 +1459,17 @@ begin cpu neoverse-n1
  part d0c
 end cpu neoverse-n1
 
+begin cpu neoverse-n2
+  cname neoversen2
+  tune for cortex-a57
+  tune flags LDSCHED
+  architecture armv8.5-a+fp16+bf16+i8mm
+  option crypto add FP_ARMv8 CRYPTO
+  costs cortex_a57
+  vendor 41
+  part 0xd49
+end cpu neoverse-n2
+
 # ARMv8.2 A-profile ARM DynamIQ big.LITTLE implementations
 begin cpu cortex-a75.cortex-a55
  cname cortexa75cortexa55
diff --git a/gcc/config/arm/arm-tables.opt b/gcc/config/arm/arm-tables.opt
index 1a7c3191784..b57206313e2 100644
--- a/gcc/config/arm/arm-tables.opt
+++ b/gcc/config/arm/arm-tables.opt
@@ -243,6 +243,9 @@ Enum(processor_type) String(cortex-a77) Value( 
TARGET_CPU_cortexa77)
 EnumValue
 Enum(processor_type) String(neoverse-n1) Value( TARGET_CPU_neoversen1)
 
+EnumValue
+Enum(processor_type) String(neoverse-n2) Value( TARGET_CPU_neoversen2)
+
 EnumValue
 Enum(processor_type) String(cortex-a75.cortex-a55) Value( 
TARGET_CPU_cortexa75cortexa55)
 
diff --git a/gcc/config/arm/arm-tune.md b/gcc/config/arm/arm-tune.md
index 3874f42a26b..2377037bf7d 100644
--- a/gcc/config/arm/arm-tune.md
+++ b/gcc/config/arm/arm-tune.md
@@ -45,7 +45,8 @@ (define_attr "tune"
cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,
cortexa73cortexa53,cortexa55,cortexa75,
cortexa76,cortexa76ae,cortexa77,
-   neoversen1,cortexa75cortexa55,cortexa76cortexa55,
-   neoversev1,cortexm23,cortexm33,
-   cortexm35p,cortexm55,cortexr52"
+   neoversen1,neoversen2,cortexa75cortexa55,
+   cortexa76cortexa55,neoversev1,cortexm23,
+   cortexm33,cortexm35p,cortexm55,
+   cortexr52"
(const (symbol_ref "((enum attr_tune) arm_tune)")))
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 75203ba2420..7948ed4fa95 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -19365,9 +19365,9 @@ Permissible names are: @samp{arm7tdmi}, 
@samp{arm7tdmi-s}, @samp{arm710t},
 @samp{cortex-m35p}, @samp{cortex-m55},
 @samp{cortex-m1.small-multiply}, @samp{cortex-m0.small-multiply},
 @samp{cortex-m0plus.small-multiply}, @samp{exynos-m1}, @samp{marvell-pj4},
-@samp{neoverse-n1}, @samp{neoverse-v1}, @samp{xscale}, @samp{iwmmxt},
-@samp{iwmmxt2}, @samp{ep9312}, @samp{fa526}, @samp{fa626}, @samp{fa606te},
-@samp{fa626te}, @samp{fmp626}, @samp{fa726te}, @samp{xgene1}.
+@samp{neoverse-n1}, @samp{neoverse-n2}, @samp{neoverse-v1}, @samp{xscale},
+@samp{iwmmxt}, @samp{iwmmxt2}, @samp{ep9312}, @samp{fa526}, @samp{fa626},
+@samp{fa606te}, @samp{fa626te}, @samp{fmp626}, @samp{fa726te}, @samp{xgene1}.
 
 Additionally, this option can specify that GCC should tune the performance
 of the code for a big.LITTLE system.  Permissible names are:


[PATCH 2/2, rs6000] VSX load/store rightmost element operations

2020-09-24 Thread will schmidt via Gcc-patches
[PATCH 2/2, rs6000] VSX load/store rightmost element operations

Hi,
  This adds support for the VSX load/store rightmost element operations.
This includes the instructions lxvrbx, lxvrhx, lxvrwx, lxvrdx,
stxvrbx, stxvrhx, stxvrwx, stxvrdx; And the builtins
vec_xl_sext() /* vector load sign extend */
vec_xl_zext() /* vector load zero extend */
vec_xst_trunc() /* vector store truncate */.

Testcase results show that the instructions added with this patch show
up at low/no optimization (-O0), with a number of those being replaced
with other load and store instructions at higher optimization levels.
For consistency I've left the tests at -O0.

Regtested OK for Linux on power8,power9 targets.  Sniff-regtested OK on
power10 simulator.
OK for trunk?

Thanks,
-Will

gcc/ChangeLog:
* config/rs6000/altivec.h (vec_xl_zest, vec_xl_sext, vec_xst_trunc): New
defines.
* config/rs6000/rs6000-builtin.def (BU_P10V_OVERLOAD_X): New builtin 
macro.
(BU_P10V_AV_X): New builtin macro.
(se_lxvrhbx, se_lxrbhx, se_lxvrwx, se_lxvrdx): Define internal names for
load and sign extend vector element.
(ze_lxvrbx, ze_lxvrhx, ze_lxvrwx, ze_lxvrdx): Define internal names for
load and zero extend vector element.
(tr_stxvrbx, tr_stxvrhx, tr_stxvrwx, tr_stxvrdx): Define internal names
for truncate and store vector element.
(se_lxvrx, ze_lxvrx, tr_stxvrx): Define internal names for overloaded
load/store rightmost element.
* config/rs6000/rs6000-call.c (altivec_builtin_types): Define the 
internal
monomorphs P10_BUILTIN_SE_LXVRBX, P10_BUILTIN_SE_LXVRHX,
P10_BUILTIN_SE_LXVRWX, P10_BUILTIN_SE_LXVRDX,
P10_BUILTIN_ZE_LXVRBX, P10_BUILTIN_ZE_LXVRHX, P10_BUILTIN_ZE_LXVRWX,
P10_BUILTIN_ZE_LXVRDX,
P10_BUILTIN_TR_STXVRBX, P10_BUILTIN_TR_STXVRHX, P10_BUILTIN_TR_STXVRWX,
P10_BUILTIN_TR_STXVRDX,
(altivec_expand_lxvr_builtin): New expansion for load element builtins.
(altivec_expand_stv_builtin): Update to support truncate and store 
builtins.
(altivec_expand_builtin): Add clases for the load/store rightmost 
builtins.
(altivec_init_builtins): Add def_builtin entries for
__builtin_altivec_se_lxvrbx, __builtin_altivec_se_lxvrhx,
__builtin_altivec_se_lxvrwx, __builtin_altivec_se_lxvrdx,
__builtin_altivec_ze_lxvrbx, __builtin_altivec_ze_lxvrhx,
__builtin_altivec_ze_lxvrwx, __builtin_altivec_ze_lxvrdx,
__builtin_altivec_tr_stxvrbx, __builtin_altivec_tr_stxvrhx,
__builtin_altivec_tr_stxvrwx, __builtin_altivec_tr_stxvrdx,
__builtin_vec_se_lxvrx, __builtin_vec_ze_lxvrx, __builtin_vec_tr_stxvrx.
* config/rs6000/vsx.md (vsx_lxvrx, vsx_stxvrx, vsx_stxvrx):
New define_insn entries.
* gcc/doc/extend.texi:  Add documentation for vsx_xl_sext, vsx_xl_zext,
and vec_xst_trunc.

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/vsx-load-element-extend-char.c: New test.
* gcc.target/powerpc/vsx-load-element-extend-int.c: New test.
* gcc.target/powerpc/vsx-load-element-extend-longlong.c: New test.
* gcc.target/powerpc/vsx-load-element-extend-short.c: New test.
* gcc.target/powerpc/vsx-store-element-truncate-char.c: New test.
* gcc.target/powerpc/vsx-store-element-truncate-int.c: New test.
* gcc.target/powerpc/vsx-store-element-truncate-longlong.c: New test.
* gcc.target/powerpc/vsx-store-element-truncate-short.c: New test.

diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
index 8a2dcda..df10a8c 100644
--- a/gcc/config/rs6000/altivec.h
+++ b/gcc/config/rs6000/altivec.h
@@ -234,10 +234,13 @@
 #define vec_lde __builtin_vec_lde
 #define vec_ldl __builtin_vec_ldl
 #define vec_lvebx __builtin_vec_lvebx
 #define vec_lvehx __builtin_vec_lvehx
 #define vec_lvewx __builtin_vec_lvewx
+#define vec_xl_zext __builtin_vec_ze_lxvrx
+#define vec_xl_sext __builtin_vec_se_lxvrx
+#define vec_xst_trunc __builtin_vec_tr_stxvrx
 #define vec_neg __builtin_vec_neg
 #define vec_pmsum_be __builtin_vec_vpmsum
 #define vec_shasigma_be __builtin_crypto_vshasigma
 /* Cell only intrinsics.  */
 #ifdef __PPU__
diff --git a/gcc/config/rs6000/rs6000-builtin.def 
b/gcc/config/rs6000/rs6000-builtin.def
index e91a48d..c481e81 100644
--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -1143,10 +1143,18 @@
(RS6000_BTC_ ## ATTR/* ATTR */  \
 | RS6000_BTC_BINARY),  \
CODE_FOR_ ## ICODE) /* ICODE */
 #endif
 
+#define BU_P10V_OVERLOAD_X(ENUM, NAME) \
+  RS6000_BUILTIN_X (P10_BUILTIN_VEC_ ## ENUM,  /* ENUM */  \
+   "__builtin_vec_" NAME,  /* NAME */  \
+   RS6000_BTM_P10, /* MASK */  \
+ 

RE: [PATCH 1/2] aarch64: Add support for Neoverse N2 CPU

2020-09-24 Thread Kyrylo Tkachov
Hi Alex,

> -Original Message-
> From: Alex Coplan 
> Sent: 24 September 2020 17:00
> To: gcc-patches@gcc.gnu.org
> Cc: Richard Earnshaw ; Richard Sandiford
> ; Kyrylo Tkachov 
> Subject: [PATCH 1/2] aarch64: Add support for Neoverse N2 CPU
> 
> This patch adds support for Arm's Neoverse N2 CPU to the AArch64
> backend.
> 
> Testing:
>  * Bootstrapped and regtested on aarch64-none-linux-gnu.
> 
> OK for trunk?

Ok.
Thanks,
Kyrill

> 
> Thanks,
> Alex
> 
> ---
> 
> gcc/ChangeLog:
> 
>   * config/aarch64/aarch64-cores.def: Add Neoverse N2.
>   * config/aarch64/aarch64-tune.md: Regenerate.
>   * doc/invoke.texi: Document AArch64 support for Neoverse N2.


RE: [PATCH 2/2] arm: Add support for Neoverse N2 CPU

2020-09-24 Thread Kyrylo Tkachov
Hi Alex,

> -Original Message-
> From: Alex Coplan 
> Sent: 24 September 2020 17:01
> To: gcc-patches@gcc.gnu.org
> Cc: ni...@redhat.com; Richard Earnshaw ;
> Ramana Radhakrishnan ; Kyrylo
> Tkachov 
> Subject: [PATCH 2/2] arm: Add support for Neoverse N2 CPU
> 
> This adds support for Arm's Neoverse N2 CPU to the AArch32 backend.
> Neoverse N2 builds AArch32 at EL0 and therefore needs support in AArch32
> GCC.
> 
> Testing:
>  * Bootstrapped and regtested on arm-none-linux-gnueabihf.
> 
> OK for master?

Ok.
Thanks,
Kyrill

> 
> Thanks,
> Alex
> 
> ---
> 
> gcc/ChangeLog:
> 
>   * config/arm/arm-cpus.in (neoverse-n2): New.
>   * config/arm/arm-tables.opt: Regenerate.
>   * config/arm/arm-tune.md: Regenerate.
>   * doc/invoke.texi: Document support for Neoverse N2.


[committed] libstdc++: Fix misnamed configure option in manual

2020-09-24 Thread Jonathan Wakely via Gcc-patches
libstdc++-v3/ChangeLog:

* doc/xml/manual/configure.xml: Correct name of option.
* doc/html/*: Regenerate.

Committed to trunk.

commit 61f7995398a719f2ff91d07e8f8ed6d4413db697
Author: Jonathan Wakely 
Date:   Thu Sep 24 17:33:16 2020

libstdc++: Fix misnamed configure option in manual

libstdc++-v3/ChangeLog:

* doc/xml/manual/configure.xml: Correct name of option.
* doc/html/*: Regenerate.

diff --git a/libstdc++-v3/doc/xml/manual/configure.xml 
b/libstdc++-v3/doc/xml/manual/configure.xml
index 58587e858a4..74d6db71ab4 100644
--- a/libstdc++-v3/doc/xml/manual/configure.xml
+++ b/libstdc++-v3/doc/xml/manual/configure.xml
@@ -204,7 +204,8 @@
 
  --enable-libstdcxx-debug-flags=FLAGS
 
- This option is only valid when  --enable-debug 
+ This option is only valid when
+   --enable-libstdcxx-debug
is also specified, and applies to the debug builds only. With
this option, you can pass a specific string of flags to the
compiler to use when building the debug versions of libstdc++.


[GCC 8] [PATCH] Ignore the clobbered stack pointer in asm statment

2020-09-24 Thread H.J. Lu via Gcc-patches
On Wed, Sep 16, 2020 at 4:47 AM Jakub Jelinek  wrote:
>
> On Wed, Sep 16, 2020 at 12:34:50PM +0100, Richard Sandiford wrote:
> > Jakub Jelinek via Gcc-patches  writes:
> > > On Mon, Sep 14, 2020 at 08:57:18AM -0700, H.J. Lu via Gcc-patches wrote:
> > >> Something like this for GCC 8 and 9.
> > >
> > > Guess my preference would be to do this everywhere and then let's discuss 
> > > if
> > > we change the warning into error there or keep it being deprecated.
> >
> > Agreed FWIW.  On turning it into an error: I think it might be better
> > to wait a bit longer if we can.
>
> Ok.  The patch is ok for trunk and affected release branches after a week.
>

I cherry-picked it to GCC 9 and 10 branches.   GCC 8 needs some
changes.  I am enclosing the backported patch for GCC 8.  I will check
it in if there are no regressions on Linux/x86-64.

Thanks.

H.J.
From 97c34eb5f57bb1d37f3feddefefa5f553bcea9fc Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Mon, 14 Sep 2020 08:52:27 -0700
Subject: [PATCH] rtl_data: Add sp_is_clobbered_by_asm

Add sp_is_clobbered_by_asm to rtl_data to inform backends that the stack
pointer is clobbered by asm statement.

gcc/

	PR target/97032
	* cfgexpand.c (expand_asm_stmt): Set sp_is_clobbered_by_asm to
	true if the stack pointer is clobbered by asm statement.
	* emit-rtl.h (rtl_data): Add sp_is_clobbered_by_asm.
	* config/i386/i386.c (ix86_get_drap_rtx): Set need_drap to true
	if the stack pointer is clobbered by asm statement.

gcc/testsuite/

	PR target/97032
	* gcc.target/i386/pr97032.c: New test.

(cherry picked from commit 453a20c65722719b9e2d84339f215e7ec87692dc)
---
 gcc/cfgexpand.c |  3 +++
 gcc/config/i386/i386.c  |  6 --
 gcc/emit-rtl.h  |  3 +++
 gcc/testsuite/gcc.target/i386/pr97032.c | 22 ++
 4 files changed, 32 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr97032.c

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 18565bf1dab..dcf491954f1 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -2972,6 +2972,9 @@ expand_asm_stmt (gasm *stmt)
 			   regname);
 		return;
 		  }
+		/* Clobbering the stack pointer register.  */
+		else if (reg == (int) STACK_POINTER_REGNUM)
+		  crtl->sp_is_clobbered_by_asm = true;
 
 	SET_HARD_REG_BIT (clobbered_regs, reg);
 	rtx x = gen_rtx_REG (reg_raw_mode[reg], reg);
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index f3c722b51e9..ce20bc2ab4e 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -12528,10 +12528,12 @@ ix86_update_stack_boundary (void)
 static rtx
 ix86_get_drap_rtx (void)
 {
-  /* We must use DRAP if there are outgoing arguments on stack and
+  /* We must use DRAP if there are outgoing arguments on stack or
+ the stack pointer register is clobbered by asm statment and
  ACCUMULATE_OUTGOING_ARGS is false.  */
   if (ix86_force_drap
-  || (cfun->machine->outgoing_args_on_stack
+  || ((cfun->machine->outgoing_args_on_stack
+	   || crtl->sp_is_clobbered_by_asm)
 	  && !ACCUMULATE_OUTGOING_ARGS))
 crtl->need_drap = true;
 
diff --git a/gcc/emit-rtl.h b/gcc/emit-rtl.h
index 4e7bd1ec26d..55dc3e84e9c 100644
--- a/gcc/emit-rtl.h
+++ b/gcc/emit-rtl.h
@@ -265,6 +265,9 @@ struct GTY(()) rtl_data {
  pass_stack_ptr_mod has run.  */
   bool sp_is_unchanging;
 
+  /* True if the stack pointer is clobbered by asm statement.  */
+  bool sp_is_clobbered_by_asm;
+
   /* Nonzero if function being compiled doesn't contain any calls
  (ignoring the prologue and epilogue).  This is set prior to
  register allocation in IRA and is valid for the remaining
diff --git a/gcc/testsuite/gcc.target/i386/pr97032.c b/gcc/testsuite/gcc.target/i386/pr97032.c
new file mode 100644
index 000..b9ef2ad0c05
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr97032.c
@@ -0,0 +1,22 @@
+/* { dg-do compile { target { ia32 && fstack_protector } } } */
+/* { dg-options "-O2 -mincoming-stack-boundary=2 -fstack-protector-all" } */
+
+#include 
+
+extern int *__errno_location (void);
+
+long
+sys_socketcall (int op, ...)
+{
+  long int res;
+  va_list ap;
+  va_start (ap, op);
+  asm volatile ("push %%ebx; movl %2, %%ebx; int $0x80; pop %%ebx"
+		: "=a" (res) : "0" (102), "ri" (16), "c" (ap) : "memory", "esp");
+  if (__builtin_expect (res > 4294963200UL, 0))
+*__errno_location () = -res;
+  va_end (ap);
+  return res;
+}
+
+/* { dg-final { scan-assembler "call\[ \t\]*_?__errno_location" } } */
-- 
2.26.2



[patch] Adjust the VxWorks alternative LIMITS_H guard for glimits.h

2020-09-24 Thread Olivier Hainque

This is a minor adjustment to the vxworks specific macro name
used to guard the header file contents, to make it closer to the
original one and easier to search for.

We have been using this in gcc-9 based compilers for a while now,
I was able to build and test a gcc-10 based toolchain for ppc-vxworks7
with it, and performed a sanity check build with a recent mainline.

Committing to mainline shortly,

Olivier

2020-09-24  Olivier Hainque  

* config/t-vxworks: Adjust the VxWorks alternative LIMITS_H guard
for glimits.h, make it both closer to the previous one and easier to
search for.



0002-Adjust-the-VxWorks-alternative-LIMITS_H-guard-for-gl.diff
Description: Binary data


[patch] Add include-fixed to include search paths for libgcc on VxWorks

2020-09-24 Thread Olivier Hainque

The special vxworks rules for the compilation of libgcc had
-I.../gcc/include and not .../gcc/include-fixed, causing build
failure of our arm-vxworks7r2 port because of indirect dependencies
on limits.h.

The omission was just an oversight and this change just adds the
missing -I,

This fixes the aforementioned build failure, has been used in gcc-9
based production compilers for several targets for a year, passed a build
& test sequence for powerpc-vxworks7 with gcc-10 and a sanity check build
with a recent mainline.

Committing to mainline shortly.

Olivier

2020-09-24  Olivier Hainque  

libgcc/
* config/t-vxworks: Add include-fixed to include search
paths for libgcc on VxWorks.
* config/t-vxworks7: Likewise.



0003-Add-include-fixed-to-include-search-paths-for-libgcc.diff
Description: Binary data




Re: [PATCH] generalized range_query class for multiple contexts

2020-09-24 Thread Martin Sebor via Gcc-patches

On 9/24/20 12:46 AM, Aldy Hernandez wrote:



On 9/24/20 1:53 AM, Martin Sebor wrote:


Finally, unless both a type and function with the same name exist
in the same scope there is no reason to mention the class-id when
referencing a class name.  I.e., this

   value_range_equiv *allocate_value_range_equiv ();
   void free_value_range_equiv (value_range_equiv *);

is the same as this:

   class value_range_equiv *allocate_value_range_equiv ();
   void free_value_range_equiv (class value_range_equiv *);

but the former is shorter and clearer (and in line with existing
practice).


value_range_equiv may not be defined in the scope of range-query.h, so 
that is why the class specifier is there.


I see.  It's probably a reflection of my C++ background that this
style catches my eye.  In C++ I think it's more common to introduce
a forward declaration of a class before using it.

Just as a side note, the first declaration of a type introduces it
into the enclosing namespace so that from that point forward it can
be used without the class-id.  E.g., this is valid:

  struct A
  {
// Need class here...
class B *f ();
// ...but not here...
void g (B *);
  };

 // ...or even here:
 B* A::f () { return 0; }

Either way, the code is correct as is and I don't object to it,
just noting that (at least some of) the class-ids are redundant.

Martin


[patch] Honor $(MULTISUBDIR) in -I directives for libgcc on VxWorks

2020-09-24 Thread Olivier Hainque

To handle ports where we might arrange to use different
sets of fixed headers for different multilibs.

This has been used in gcc-9 based production compilers for several targets
for a year, passed a build & test sequence for powerpc-vxworks7 with gcc-10
and a sanity check build with a recent mainline.

Olivier

2020-09-24  Olivier Hainque 

* libgcc/config/t-vxworks (LIBGCC2_INCLUDES): Append
$(MULTISUBDIR) to the -I path for fixed headers.



0004-Honor-MULTISUBDIR-in-I-directives-for-libgcc-on-VxWo.diff
Description: Binary data


[patch] Fallback to default CPP spec for C++ on VxWorks

2020-09-24 Thread Olivier Hainque
Arrange to inhibit the effects of CPLUSPLUS_CPP_SPEC in gnu-user.h,
which #defines _GNU_SOURCE, which is invalid for VxWorks (possibly
not providing ::mkstemp, for example).

This has been used in gcc-9 based production compilers for several targets
for a year, passed a build & test sequence for powerpc-vxworks7 with gcc-10
and a sanity check build with a recent mainline.

Olivier

2020-09-24  Olivier Hainque  

* config/vxworks.h: #undef CPLUSPLUS_CPP_SPEC.



0005-Fallback-to-default-CPP-spec-for-C-on-VxWorks.diff
Description: Binary data





[patch] Fix the VX_CPU selection for -mcpu=xscale on arm-vxworks

2020-09-24 Thread Olivier Hainque

This fixlet makes sure -mcpu=xscale selects the correct VX_CPU.

Fixes a number of tests for arm-vxworks.

Committing to mainline shortly.

Olivier


2020-09-24  Olivier Hainque  

* config/arm/vxworks.h (TARGET_OS_CPP_BUILTINS): Fix
the VX_CPU selection for -mcpu=xscale on arm-vxworks.



0009-Fix-thinko-in-TARGET_OS_CPP_BUILTINS-for-arm-vxworks.diff
Description: Binary data


Re: [GCC 8] [PATCH] Ignore the clobbered stack pointer in asm statment

2020-09-24 Thread H.J. Lu via Gcc-patches
On Thu, Sep 24, 2020 at 9:48 AM H.J. Lu  wrote:
>
> On Wed, Sep 16, 2020 at 4:47 AM Jakub Jelinek  wrote:
> >
> > On Wed, Sep 16, 2020 at 12:34:50PM +0100, Richard Sandiford wrote:
> > > Jakub Jelinek via Gcc-patches  writes:
> > > > On Mon, Sep 14, 2020 at 08:57:18AM -0700, H.J. Lu via Gcc-patches wrote:
> > > >> Something like this for GCC 8 and 9.
> > > >
> > > > Guess my preference would be to do this everywhere and then let's 
> > > > discuss if
> > > > we change the warning into error there or keep it being deprecated.
> > >
> > > Agreed FWIW.  On turning it into an error: I think it might be better
> > > to wait a bit longer if we can.
> >
> > Ok.  The patch is ok for trunk and affected release branches after a week.
> >
>
> I cherry-picked it to GCC 9 and 10 branches.   GCC 8 needs some
> changes.  I am enclosing the backported patch for GCC 8.  I will check
> it in if there are no regressions on Linux/x86-64.
>

No regression.  I am checking it into GCC 8 branch.

-- 
H.J.


Re: [PATCH] libiberty: Add get_DW_UT_name and update include/dwarf2.{def, h}

2020-09-24 Thread Jakub Jelinek via Gcc-patches
On Wed, Sep 23, 2020 at 04:51:01PM +0200, Mark Wielaard wrote:
> This adds a get_DW_UT_name function to dwarfnames using dwarf2.def
> for use in binutils readelf to show the unit types in a DWARF5 header.
> 
> Also remove DW_CIE_VERSION which was already removed in binutils/gdb
> and is not used in gcc.
> 
> include/ChangeLog:
> 
>   * dwarf2.def: Add DWARF5 Unit type header encoding macros
>   DW_UT_FIRST, DW_UT and DW_UT_END.
>   * dwarf2.h (enum dwarf_unit_type): Removed and define using
>   DW_UT_FIRST, DW_UT and DW_UT_END macros.
>   (DW_CIE_VERSION): Removed.
>   (get_DW_UT_name): New function declaration.
> 
> libiberty/ChangeLog:
> 
>   * dwarfnames.c (get_DW_UT_name): Define using DW_UT_FIRST, DW_UT
>   and DW_UT_END.

LGTM, thanks.

Jakub



Re: [PATCH] PR libstdc++/71579 assert that type traits are not misused with an incomplete type

2020-09-24 Thread Jonathan Wakely via Gcc-patches

On 20/08/20 18:31 +0300, Antony Polukhin via Libstdc++ wrote:

ср, 19 авг. 2020 г. в 14:29, Jonathan Wakely :
<...>

Do we also want to check
(std::__is_complete_or_unbounded(__type_identity<_ArgTypes>{}) && ...)
for invoke_result and the is_invocable traits?


Done.

Changelog:

2020-08-20  Antony Polukhin  

   PR libstdc/71579
   * include/std/type_traits (invoke_result, is_invocable, is_invocable_r)
   (is_nothrow_invocable, is_nothrow_invocable_r): Add static_asserts
   to make sure that the arguments of the type traits are not misused
   with incomplete types.`
   * testsuite/20_util/invoke_result/incomplete_args_neg.cc: New test.
   * testsuite/20_util/is_invocable/incomplete_args_neg.cc: New test.
   * testsuite/20_util/is_invocable/incomplete_neg.cc: New test.
   * testsuite/20_util/is_nothrow_invocable/incomplete_args_neg.cc: New test.
   * testsuite/20_util/is_nothrow_invocable/incomplete_neg.cc: Check for
   error on incomplete response type usage in trait.


Committed with some tweaks to the static assert messages to say:

"each argument type must be a complete class or an unbounded array"

Thanks!




Re: *PING* Re: [PATCH] Fortran : ICE in build_field PR95614

2020-09-24 Thread Thomas König

Hi Mark,


I haven't yet committed this.


I am unfamiliar with Andre, I've checked MAINTAINERS and I find Andre in 
the "Write after approval" section.


Is Andre's approval sufficient? If so MAINTAINERS needs to be updated.


The official list of people who can review is at

https://gcc.gnu.org/fortran/

but that is clearly not sufficient.  We need to update it to reflect
current realities, there are people who have approved patches (with
nobody objecting) for a long time, and many people who are on that list
are no longer active.

I'm not 100% sure what is needed to update that file, if we need OK
from the steering committee or not.  I had already taken a straw
poll I think I will simply do tomorrow moring.  If anybody
strenuously objects, we can always revert the patch :-)


If not: OK to commit and backport?


OK from my side, and thanks for the patch.

Best regards

Thomas


Re: [PATCH 1/5] RS6000 Add 128-bit Binary Integer sign extend operations

2020-09-24 Thread will schmidt via Gcc-patches
On Mon, 2020-09-21 at 16:56 -0700, Carl Love wrote:
> Segher, Will:
> 
> Patch 1, adds the 128-bit sign extension instruction support and
> corresponding builtin support.
> 
> No changes from the previous version.
> 
> The patch has been tested on 
> 
>   powerpc64le-unknown-linux-gnu (Power 9 LE)
> 
> with no regression errors.
> 
> Fixed the issues in the ChangeLog noted by Will.
> 
>  Carl Love
> 
> ---
> 
> gcc/ChangeLog
> 
> 2020-09-21  Carl Love  
>   * config/rs6000/altivec.h (vec_signextll, vec_signexti): Add define
>   for new builtins.
>   * config/rs6000/rs6000-builtin.def (VSIGNEXTI, VSIGNEXTLL):  Add
>   overloaded builtin definitions.
>   (VSIGNEXTSB2W, VSIGNEXTSB2D, VSIGNEXTSH2D,VSIGNEXTSW2D): Add builtin
>   expansions.

+VSIGNEXTSH2W


>   * config/rs6000-call.c (P9V_BUILTIN_VEC_VSIGNEXTI,
>   P9V_BUILTIN_VEC_VSIGNEXTLL): Add overloaded argument definitions.
>   * config/rs6000/vsx.md: Make define_insn vsx_sign_extend_si_v2di
>   visible.
>   * doc/extend.texi:  Add documentation for the vec_signexti and
>   vec_signextll builtins.
> 
> gcc/testsuite/ChangeLog
> 
> 2020-09-21  Carl Love  
>   * gcc.target/powerpc/p9-sign_extend-runnable.c:  New test case.
> ---
>  gcc/config/rs6000/altivec.h   |   3 +
>  gcc/config/rs6000/rs6000-builtin.def  |   9 ++
>  gcc/config/rs6000/rs6000-call.c   |  13 ++
>  gcc/config/rs6000/vsx.md  |   2 +-
>  gcc/doc/extend.texi   |  15 ++
>  .../powerpc/p9-sign_extend-runnable.c | 128 ++
>  6 files changed, 169 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/p9-sign_extend-runnable.c
> 
> diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
> index 8a2dcda0144..acc365612be 100644
> --- a/gcc/config/rs6000/altivec.h
> +++ b/gcc/config/rs6000/altivec.h
> @@ -494,6 +494,9 @@
> 
>  #define vec_xlx __builtin_vec_vextulx
>  #define vec_xrx __builtin_vec_vexturx
> +#define vec_signexti  __builtin_vec_vsignexti
> +#define vec_signextll __builtin_vec_vsignextll
> +
>  #endif
> 
>  /* Predicates.
> diff --git a/gcc/config/rs6000/rs6000-builtin.def 
> b/gcc/config/rs6000/rs6000-builtin.def
> index e91a48ddf5f..4c2e9460949 100644
> --- a/gcc/config/rs6000/rs6000-builtin.def
> +++ b/gcc/config/rs6000/rs6000-builtin.def
> @@ -2715,6 +2715,8 @@ BU_P9V_OVERLOAD_1 (VPRTYBD, "vprtybd")
>  BU_P9V_OVERLOAD_1 (VPRTYBQ,  "vprtybq")
>  BU_P9V_OVERLOAD_1 (VPRTYBW,  "vprtybw")
>  BU_P9V_OVERLOAD_1 (VPARITY_LSBB, "vparity_lsbb")
> +BU_P9V_OVERLOAD_1 (VSIGNEXTI,"vsignexti")
> +BU_P9V_OVERLOAD_1 (VSIGNEXTLL,   "vsignextll")
> 
>  /* 2 argument functions added in ISA 3.0 (power9).  */
>  BU_P9_2 (CMPRB,  "byte_in_range",CONST,  cmprb)
> @@ -2726,6 +2728,13 @@ BU_P9_OVERLOAD_2 (CMPRB,   "byte_in_range")
>  BU_P9_OVERLOAD_2 (CMPRB2,"byte_in_either_range")
>  BU_P9_OVERLOAD_2 (CMPEQB,"byte_in_set")
>  
> +/* Sign extend builtins that work on ISA 3.0, but not defined until ISA 3.1. 
>  */
> +BU_P9V_AV_1 (VSIGNEXTSB2W,   "vsignextsb2w", CONST,  
> vsx_sign_extend_qi_v4si)
> +BU_P9V_AV_1 (VSIGNEXTSH2W,   "vsignextsh2w", CONST,  
> vsx_sign_extend_hi_v4si)
> +BU_P9V_AV_1 (VSIGNEXTSB2D,   "vsignextsb2d", CONST,  
> vsx_sign_extend_qi_v2di)
> +BU_P9V_AV_1 (VSIGNEXTSH2D,   "vsignextsh2d", CONST,  
> vsx_sign_extend_hi_v2di)
> +BU_P9V_AV_1 (VSIGNEXTSW2D,   "vsignextsw2d", CONST,  
> vsx_sign_extend_si_v2di)
> +
>  /* Builtins for scalar instructions added in ISA 3.1 (power10).  */
>  BU_P10_MISC_2 (CFUGED, "cfuged", CONST, cfuged)
>  BU_P10_MISC_2 (CNTLZDM, "cntlzdm", CONST, cntlzdm)
> diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
> index a8b520834c7..9e514a01012 100644
> --- a/gcc/config/rs6000/rs6000-call.c
> +++ b/gcc/config/rs6000/rs6000-call.c
> @@ -5527,6 +5527,19 @@ const struct altivec_builtin_types 
> altivec_overloaded_builtins[] = {
>  RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
>  RS6000_BTI_INTSI, RS6000_BTI_INTSI },
> 
> +  /* Sign extend builtins that work work on ISA 3.0, not added until ISA 3.1 
> */
> +  { P9V_BUILTIN_VEC_VSIGNEXTI, P9V_BUILTIN_VSIGNEXTSB2W,
> +RS6000_BTI_V4SI, RS6000_BTI_V16QI, 0, 0 },
> +  { P9V_BUILTIN_VEC_VSIGNEXTI, P9V_BUILTIN_VSIGNEXTSH2W,
> +RS6000_BTI_V4SI, RS6000_BTI_V8HI, 0, 0 },
> +
> +  { P9V_BUILTIN_VEC_VSIGNEXTLL, P9V_BUILTIN_VSIGNEXTSB2D,
> +RS6000_BTI_V2DI, RS6000_BTI_V16QI, 0, 0 },
> +  { P9V_BUILTIN_VEC_VSIGNEXTLL, P9V_BUILTIN_VSIGNEXTSH2D,
> +RS6000_BTI_V2DI, RS6000_BTI_V8HI, 0, 0 },
> +  { P9V_BUILTIN_VEC_VSIGNEXTLL, P9V_BUILTIN_VSIGNEXTSW2D,
> +RS6000_BTI_V2DI, RS6000_BTI_V4SI, 0, 0 },
> +
>/* Overloaded built-in functions for ISA3.1 (power10). */
>{ P10_BUILTIN_VEC_CLRL, P10V_BUILTIN_VCLRLB,
>  RS6000_BTI_V16QI, RS6000

Re: [PATCH 3/5] Add TI to TD (128-bit DFP) and TD to TI support

2020-09-24 Thread will schmidt via Gcc-patches
On Mon, 2020-09-21 at 16:56 -0700, Carl Love wrote:
> Segher, Will:
> 
> Add support for converting to/from 128-bit integers and 128-bit 
> decimal floating point formats.

A more wordy blurb here clarifying what the patch does would be useful.

i.e. this adds support for dcffixqq and dctfixqq instructons..

> 
> The updates from the previous version of the patch:
> 
> Removed stray ";; carll" comment.  
> 
> Removed #if 1 and #endif in the test case.
> 
> Replaced TARGET_TI_VECTOR_OPS with POWER10.
> 
> The patch has been tested on
> 
>   powerpc64le-unknown-linux-gnu (Power 9 LE)
> 
> with no regression errors.
> 
> The P10 test was run by hand on Mambo.
> 
> 
>  Carl Love
> 
> 
> ---
> 
> gcc/ChangeLog
> 
> 2020-09-21  Carl Love  
>   * config/rs6000/dfp.md (floattitd2, fixtdti2): New define_insns.
> 

ok.


Need changelog blurb to reflect the rs6000-call changes.
(this may have leaked in from previous or subsequent patch?)


> gcc/testsuite/ChangeLog
> 
> 2020-09-21  Carl Love  
>   * gcc.target/powerpc/int_128bit-runnable.c:  Update test.


ok.


> ---


>  gcc/config/rs6000/dfp.md  | 14 +
>  gcc/config/rs6000/rs6000-call.c   |  4 ++
>  .../gcc.target/powerpc/int_128bit-runnable.c  | 62 +++
>  3 files changed, 80 insertions(+)
> 
> diff --git a/gcc/config/rs6000/dfp.md b/gcc/config/rs6000/dfp.md
> index 8f822732bac..0e82e315fee 100644
> --- a/gcc/config/rs6000/dfp.md
> +++ b/gcc/config/rs6000/dfp.md
> @@ -222,6 +222,13 @@
>"dcffixq %0,%1"
>[(set_attr "type" "dfp")])
> 
> +(define_insn "floattitd2"
> +  [(set (match_operand:TD 0 "gpc_reg_operand" "=d")
> + (float:TD (match_operand:TI 1 "gpc_reg_operand" "v")))]
> +  "TARGET_POWER10"
> +  "dcffixqq %0,%1"
> +  [(set_attr "type" "dfp")])
> +
>  ;; Convert a decimal64/128 to a decimal64/128 whose value is an integer.
>  ;; This is the first stage of converting it to an integer type.
> 
> @@ -241,6 +248,13 @@
>"TARGET_DFP"
>"dctfix %0,%1"
>[(set_attr "type" "dfp")])
> +
> +(define_insn "fixtdti2"
> +  [(set (match_operand:TI 0 "gpc_reg_operand" "=v")
> + (fix:TI (match_operand:TD 1 "gpc_reg_operand" "d")))]
> +  "TARGET_POWER10"
> +  "dctfixqq %0,%1"
> +  [(set_attr "type" "dfp")])
> 
>  ;; Decimal builtin support
> 
> diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
> index e1d9c2e8729..9c50cd3c5a7 100644
> --- a/gcc/config/rs6000/rs6000-call.c
> +++ b/gcc/config/rs6000/rs6000-call.c
> @@ -4967,6 +4967,8 @@ const struct altivec_builtin_types 
> altivec_overloaded_builtins[] = {
>  RS6000_BTI_bool_V2DI, 0 },
>{ P9V_BUILTIN_VEC_VCMPNE_P, P10V_BUILTIN_VCMPNET_P,
>  RS6000_BTI_INTSI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
> +  { P9V_BUILTIN_VEC_VCMPNE_P, P10V_BUILTIN_VCMPNET_P,
> +RS6000_BTI_INTSI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, 0 
> },
> 
>{ P9V_BUILTIN_VEC_VCMPNE_P, P9V_BUILTIN_VCMPNEFP_P,
>  RS6000_BTI_INTSI, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
> @@ -5074,6 +5076,8 @@ const struct altivec_builtin_types 
> altivec_overloaded_builtins[] = {
>  RS6000_BTI_bool_V2DI, 0 },
>{ P9V_BUILTIN_VEC_VCMPAE_P, P10V_BUILTIN_VCMPAET_P,
>  RS6000_BTI_INTSI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
> +  { P9V_BUILTIN_VEC_VCMPAE_P, P10V_BUILTIN_VCMPAET_P,
> +RS6000_BTI_INTSI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, 0 
> },
>{ P9V_BUILTIN_VEC_VCMPAE_P, P9V_BUILTIN_VCMPAEFP_P,
>  RS6000_BTI_INTSI, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
>{ P9V_BUILTIN_VEC_VCMPAE_P, P9V_BUILTIN_VCMPAEDP_P,
> diff --git a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c 
> b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
> index 85ad544e22b..ec3dcf3dff1 100644
> --- a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
> +++ b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
> @@ -38,6 +38,7 @@
>  #if DEBUG
>  #include 
>  #include 
> +#include 
> 
> 
>  void print_i128(__int128_t val)
> @@ -59,6 +60,13 @@ int main ()
>__int128_t arg1, result;
>__uint128_t uarg2;
> 
> +  _Decimal128 arg1_dfp128, result_dfp128, expected_result_dfp128;
> +
> +  struct conv_t {
> +__uint128_t u128;
> +_Decimal128 d128;
> +  } conv, conv2;
> +
>vector signed long long int vec_arg1_di, vec_arg2_di;
>vector unsigned long long int vec_uarg1_di, vec_uarg2_di, vec_uarg3_di;
>vector unsigned long long int vec_uresult_di;
> @@ -2249,6 +2257,60 @@ int main ()
>  abort();
>  #endif
>}
> +  
> +  /* DFP to __int128 and __int128 to DFP conversions */
> +  /* Can't get printing of DFP values to work.  Print the DFP value as an
> + unsigned int so we can see the bit patterns.  */
> +  conv.u128 = 0x2208ULL;
> +  conv.u128 = (conv.u128 << 64) | 0x4ULL;   //DFP bit pattern for integer 4
> +  expected_result_dfp128 = conv.d128;
> 
> +  arg1 = 4;
> +
> +  conv.d128 = (_Decimal128) arg1

Re: [PATCH 2/5] RS6000 add 128-bit Integer Operations

2020-09-24 Thread will schmidt via Gcc-patches
On Mon, 2020-09-21 at 16:56 -0700, Carl Love wrote:
> Will, Segher:
> 
> Add support for divide, modulo, shift, compare of 128-bit
> integers instructions and builtin support.
> 
> The following are the changes from the previous version of the patch.
> 
> The TARGET_TI_VECTOR_OPS was removed per comments for patch 3.  Just
> using TARGET_POWER10.
> 
> Removed extra comment.
> 
> Note the change
> 
> -#define vec_rlnm(a,b,c) (__builtin_vec_rlnm((a),((c)<<8)|(b)))
> +#define
> vec_rlnm(a,b,c) (__builtin_vec_rlnm((a),((b)<<8)|(c)))
> 
> is a bug fix. Added missing comment to ChangeLog.
> 
> Removed vector_eqv1ti, used eqvv1ti3 instead.
> 
> Test case, put ppc_native_128bit in the dg-require command.
> 
> The patch has been tested on
> 
>   powerpc64le-unknown-linux-gnu (Power 9 LE)
> 
> with no regression errors.
> 
> The P10 test was run by hand on Mambo.
> 
> 
> Carl Love
> 
> 
> ---
> 
> gcc/ChangeLog
> 
>   2020-09-21  Carl Love  
>   * config/rs6000/altivec.h (vec_signextq, vec_dive, vec_mod): Add define
>   for new builtins.
>   (vec_rlnm): Fix bug in argument generation.


If there is delay, this bugfix could (and probably should) be broken out into 
it's own patch.


>   * config/rs6000/altivec.md (UNSPEC_VMULEUD, UNSPEC_VMULESD,
>   UNSPEC_VMULOUD, UNSPEC_VMULOSD): New unspecs.
>   (altivec_eqv1ti, altivec_gtv1ti, altivec_gtuv1ti, altivec_vmuleud,
>   altivec_vmuloud, altivec_vmulesd, altivec_vmulosd, altivec_vrlq,
>   altivec_vrlqmi, altivec_vrlqmi_inst, altivec_vrlqnm,
>   altivec_vrlqnm_inst, altivec_vslq, altivec_vsrq, altivec_vsraq,
>   altivec_vcmpequt_p, altivec_vcmpgtst_p, altivec_vcmpgtut_p): New
>   define_insn.

altivec_vrlqnm, altivec_vrlqmi should be in the define_expand list.


>   (vec_widen_umult_even_v2di, vec_widen_smult_even_v2di,
>   vec_widen_umult_odd_v2di, vec_widen_smult_odd_v2di, altivec_vrlqmi,
>   altivec_vrlqnm): New define_expands.

Actually, they are here... so just need to be removed from the
define_insn list.  :-)


>   * config/rs6000/rs6000-builtin.def (BU_P10_P, BU_P10_128BIT_1,
>   BU_P10_128BIT_2, BU_P10_128BIT_3): New macro definitions.

Question below.

>   (VCMPEQUT_P, VCMPGTST_P, VCMPGTUT_P): Add macro expansions.
>   (VCMPGTUT, VCMPGTST, VCMPEQUT, CMPNET, CMPGE_1TI,
>   CMPGE_U1TI, CMPLE_1TI, CMPLE_U1TI, VNOR_V1TI_UNS, VNOR_V1TI, VCMPNET_P,
>   VCMPAET_P): New macro expansions.
>   (VSIGNEXTSD2Q, VMULEUD, VMULESD, VMULOUD, VMULOSD, VRLQ, VSLQ,
>   VSRQ, VSRAQ, VRLQNM, DIV_V1TI, UDIV_V1TI, DIVES_V1TI, DIVEU_V1TI,
>   MODS_V1TI, MODU_V1TI, VRLQMI): New macro expansions.
>   (VRLQ, VSLQ, VSRQ, VSRAQ, DIVE, MOD, SIGNEXT): New overload expansions.
>   * config/rs6000/rs6000-call.c (P10_BUILTIN_VCMPEQUT,
>   P10_BUILTIN_CMPGE_1TI, P10_BUILTIN_CMPGE_U1TI,
>   P10_BUILTIN_VCMPGTUT, P10_BUILTIN_VCMPGTST,
>   P10_BUILTIN_CMPLE_1TI, P10_BUILTIN_VCMPLE_U1TI,
>   P10_BUILTIN_128BIT_DIV_V1TI, P10_BUILTIN_128BIT_UDIV_V1TI,
>   P10_BUILTIN_128BIT_VMULESD, P10_BUILTIN_128BIT_VMULEUD,
>   P10_BUILTIN_128BIT_VMULOSD, P10_BUILTIN_128BIT_VMULOUD,
>   P10_BUILTIN_VNOR_V1TI, P10_BUILTIN_VNOR_V1TI_UNS,
>   P10_BUILTIN_128BIT_VRLQ, P10_BUILTIN_128BIT_VRLQMI,
>   P10_BUILTIN_128BIT_VRLQNM, P10_BUILTIN_128BIT_VSLQ,
>   P10_BUILTIN_128BIT_VSRQ, P10_BUILTIN_128BIT_VSRAQ,
>   P10_BUILTIN_VCMPGTUT_P, P10_BUILTIN_VCMPGTST_P,
>   P10_BUILTIN_VCMPEQUT_P, P10_BUILTIN_VCMPGTUT_P,
>   P10_BUILTIN_VCMPGTST_P, P10_BUILTIN_CMPNET,
>   P10_BUILTIN_VCMPNET_P, P10_BUILTIN_VCMPAET_P,
>   P10_BUILTIN_128BIT_VSIGNEXTSD2Q, P10_BUILTIN_128BIT_DIVES_V1TI,
>   P10_BUILTIN_128BIT_MODS_V1TI, P10_BUILTIN_128BIT_MODU_V1TI):
>   New overloaded definitions.

Looks like those should (all?) now be P10V_BUILTIN_.


>   (int_ftype_int_v1ti_v1ti) [P10_BUILTIN_VCMPEQUT,

?  That appeasr to be the (rs6000_gimple_fold_builtin) function.  

>   P10_BUILTIN_CMPNET, P10_BUILTIN_CMPGE_1TI,
>   P10_BUILTIN_CMPGE_U1TI, P10_BUILTIN_VCMPGTUT,
>   P10_BUILTIN_VCMPGTST, P10_BUILTIN_CMPLE_1TI,
>   P10_BUILTIN_CMPLE_U1TI, E_V1TImode]: New case statements.

Also should be P10V_BUILTIN_  ?

I see both P10_BUILTIN_CMPNET and P10V_BUILTIN_CMPNET references in the
patch.  

>   (int_ftype_int_v1ti_v1ti) [bool_V1TI_type_node, 
> int_ftype_int_v1ti_v1ti]:
>   New assignments.

Thats in the (rs6000_init_builtins) function.

>   (altivec_init_builtins): New E_V1TImode case statement.
>   (builtin_function_type)[P10_BUILTIN_128BIT_VMULEUD,
>   P10_BUILTIN_128BIT_VMULOUD, P10_BUILTIN_128BIT_DIVEU_V1TI,
>   P10_BUILTIN_128BIT_MODU_V1TI, P10_BUILTIN_CMPGE_U1TI,
>   P10_BUILTIN_VCMPGTUT, P10_BUILTIN_VCMPEQUT]: New case statements.

May need a refresh with respect to the P10_BUILTIN_ vs
P10V_BUILTIN_ ... 


>   * config/rs6000/r6000.c (rs6000_option_override

Re: [PATCH 5/5] Conversions between 128-bit integer and floating point values.

2020-09-24 Thread will schmidt via Gcc-patches
On Mon, 2020-09-21 at 16:57 -0700, Carl Love wrote:
> Segher, Will:
> 
> Patch 5 adds the 128-bit integer to/from 128-floating point
> conversions.  This patch has to invoke the routines to use the 128-
> bit
> hardware instructions if on Power 10 or use software routines if
> running on a pre Power 10 system via the resolve function.
> 
> Add ifunc resolves for __fixkfti, __floatuntikf_sw, __fixkfti_swn,
> __fixunskfti_sw.
> 
> The following changes were made to the previous version of the
> patch: 
> 
> Fixed typos in ChangeLog noted by Will.
> 
> Turned off debug in test case.
> 
> Removed extra blank lines, fixed spacing of #else in the test case.
> 
> Added comment to fixunskfti-sw.c about changes made from the original
> file fixunskfti.c.
> 
> The patch has been tested on
> 
>   powerpc64le-unknown-linux-gnu (Power 9 LE)
> 
> with no regression errors.
> 
> The P10 tests were run by hand on Mambo.
> 
> Carl Love
> -
> 
> gcc/ChangeLog
> 
> 2020-09-21  Carl Love  
>   config/rs6000/rs6000.md (floatti2, floatunsti2,
>   fix_truncti2, fixuns_truncti2): Add
>   define_insn for mode IEEE 128.
ok

>   libgcc/config/rs6000/fixkfi-sw.c: New file.
>   libgcc/config/rs6000/fixkfi.c: Remove file.

Should that be fixkfti-sw.c (missing t).

Adjust to indicate this is a rename
libgcc/config/rs6000/fixkfti.c: Rename to
libgcc/config/rs6000/fixkfti-sw.c


>   libgcc/config/rs6000/fixunskfti-sw.c: New file.
>   libgcc/config/rs6000/fixunskfti.c: Remove file.
>   libgcc/config/rs6000/float128-hw.c (__floattikf_hw,
>   __floatuntikf_hw, __fixkfti_hw, __fixunskfti_hw):
>   New functions.
>   libgcc/config/rs6000/float128-ifunc.c (SW_OR_HW_ISA3_1):
>   New macro.
>   (__floattikf_resolve, __floatuntikf_resolve, __fixkfti_resolve,
>   __fixunskfti_resolve): Add resolve functions.
>   (__floattikf, __floatuntikf, __fixkfti, __fixunskfti): New
>   functions.
>   libgcc/config/rs6000/float128-sed (floattitf, __floatuntitf,
>   __fixtfti, __fixunstfti): Add editor commands to change
>   names.
>   libgcc/config/rs6000/float128-sed-hw (__floattitf,
>   __floatuntitf, __fixtfti, __fixunstfti): Add editor commands
>   to change names.
>   libgcc/config/rs6000/floattikf-sw.c: New file.
>   libgcc/config/rs6000/floattikf.c: Remove file.
>   libgcc/config/rs6000/floatuntikf-sw.c: New file.
>   libgcc/config/rs6000/floatuntikf.c: Remove file.
>   libgcc/config/rs6000/floatuntikf-sw.c: New file.
>   libgcc/config/rs6000/quaad-float128.h (__floattikf_sw,
>   __floatuntikf_sw, __fixkfti_sw, __fixunskfti_sw,
> __floattikf_hw,
>   __floatuntikf_hw, __fixkfti_hw, __fixunskfti_hw, __floattikf,
>   __floatuntikf, __fixkfti, __fixunskfti): New extern
> declarations.
>   libgcc/config/rs6000/t-float128 (floattikf, floatuntikf,
>   fixkfti, fixunskfti): Remove file names from fp128_ppc_funcs.
>   (floattikf-sw, floatuntikf-sw, fixkfti-sw, fixunskfti-sw): Add
>   file names to fp128_ppc_funcs.
> 
> gcc/testsuite/ChangeLog
> 
> 2020-09-21  Carl Love  
>   gcc.target/powerpc/fl128_conversions.c: New file.
> ---
>  gcc/config/rs6000/rs6000.md   |  36 +++
>  .../gcc.target/powerpc/fp128_conversions.c| 286
> ++
>  .../config/rs6000/{fixkfti.c => fixkfti-sw.c} |   4 +-
>  .../rs6000/{fixunskfti.c => fixunskfti-sw.c}  |   7 +-
>  libgcc/config/rs6000/float128-hw.c|  24 ++
>  libgcc/config/rs6000/float128-ifunc.c |  44 ++-
>  libgcc/config/rs6000/float128-sed |   4 +
>  libgcc/config/rs6000/float128-sed-hw  |   4 +
>  .../rs6000/{floattikf.c => floattikf-sw.c}|   4 +-
>  .../{floatuntikf.c => floatuntikf-sw.c}   |   4 +-
>  libgcc/config/rs6000/quad-float128.h  |  17 +-
>  libgcc/config/rs6000/t-float128   |   3 +-
>  12 files changed, 417 insertions(+), 20 deletions(-)
>  create mode 100644
> gcc/testsuite/gcc.target/powerpc/fp128_conversions.c
>  rename libgcc/config/rs6000/{fixkfti.c => fixkfti-sw.c} (96%)
>  rename libgcc/config/rs6000/{fixunskfti.c => fixunskfti-sw.c} (90%)
>  rename libgcc/config/rs6000/{floattikf.c => floattikf-sw.c} (96%)
>  rename libgcc/config/rs6000/{floatuntikf.c => floatuntikf-sw.c}
> (96%)
> 
> diff --git a/gcc/config/rs6000/rs6000.md
> b/gcc/config/rs6000/rs6000.md
> index 694ff70635e..5db5d0b4505 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -6390,6 +6390,42 @@
> xscvsxddp %x0,%x1"
>[(set_attr "type" "fp")])
> 
> +(define_insn "floatti2"
> +  [(set (match_operand:IEEE128 0 "vsx_register_operand" "=v")
> +   (float:IEEE128 (match_operand:TI 1 "vsx_register_operand"
> "v")))]
> +  "TARGET_POWER10"
> +{
> +  return  "xscvsqqp %0,%1";
> +}
> +  [(set_attr "type" "fp")])
> +
> +(define_insn "floatunsti2"
> +  [(set (match_operand:IEEE128 0 "

Re: [PATCH 4/5] Test 128-bit shifts for just the int128 type.

2020-09-24 Thread will schmidt via Gcc-patches
On Mon, 2020-09-21 at 16:56 -0700, Carl Love wrote:
> Segher, Will:
> 
> Patch 4 adds the vector 128-bit integer shift instruction support for
> the V1TI type.
> 
> The following changes were made from the previous version.
> 
> Renamed VSX_TI to VEC_TI, put def in vector.md.  Didn't get it
> separated into a different patch.
> 
> Reworked the XXSWAPD_V1TI to not use UNSPEC.
> 
> Test suite program cleanups, removed "//" comments that were not
> needed.
> 
> The patch has been tested on
> 
>   powerpc64le-unknown-linux-gnu (Power 9 LE)
> 
> with no regression errors.
> 
> The P10 test was run by hand on Mambo.
> 
> 
> Carl Love
> 
> --
> 
> 
> gcc/ChangeLog
> 
> 2020-09-21  Carl Love  
>   * config/rs6000/altivec.md (altivec_vslq, altivec_vsrq):
>   Rename altivec_vslq_, altivec_vsrq_, mode VEC_TI.

Nit "Rename to"

>   * config/rs6000/vector.md (VEC_TI): New mode iterator.
>   (vashlv1ti3): Change to vashl3, mode VEC_TI.
>   (vlshrv1ti3): Change to vlshr3, mode VEC_TI.
s/Change/Rename to/

'New' isn't quite right for the mode iterator, since it's renamed from
the VSX_TI iterator.
perhaps something like

* config/rs6000/vector.md (VEC_TI): New name for VSX_TI 
iterator from vsx.md.

>   * config/rs6000/vsx.md (VSX_TI): Remove define_mode_iterator.
>   (VSX_TI): Renamed VEC_TI.


Just the Remove.  VEC_TI doesn't exist in vsx.md. 



> 
> gcc/testsuite/ChangeLog
> 
> 2020-09-21  Carl Love  
>   gcc.target/powerpc/int_128bit-runnable.c: Add shift_right,
> shift_left
>   tests.
> ---
>  gcc/config/rs6000/altivec.md  | 16 -
>  gcc/config/rs6000/vector.md   | 27 ---
>  gcc/config/rs6000/vsx.md  | 33 +--
> 
>  .../gcc.target/powerpc/int_128bit-runnable.c  | 16 +++--
>  4 files changed, 52 insertions(+), 40 deletions(-)
> 
> diff --git a/gcc/config/rs6000/altivec.md
> b/gcc/config/rs6000/altivec.md
> index 34a4731342a..5db3de3cc9f 100644
> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md
> @@ -2219,10 +2219,10 @@
>"vsl %0,%1,%2"
>[(set_attr "type" "vecsimple")])
> 
> -(define_insn "altivec_vslq"
> -  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> - (ashift:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
> -  (match_operand:V1TI 2 "vsx_register_operand"
> "v")))]
> +(define_insn "altivec_vslq_"
> +  [(set (match_operand:VEC_TI 0 "vsx_register_operand" "=v")
> + (ashift:VEC_TI (match_operand:VEC_TI 1 "vsx_register_operand"
> "v")
> +  (match_operand:VEC_TI 2 "vsx_register_operand"
> "v")))]
>"TARGET_POWER10"
>/* Shift amount in needs to be in bits[57:63] of 128-bit operand.
> */
>"vslq %0,%1,%2"
> @@ -2236,10 +2236,10 @@
>"vsr %0,%1,%2"
>[(set_attr "type" "vecsimple")])
> 
> -(define_insn "altivec_vsrq"
> -  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> - (lshiftrt:V1TI (match_operand:V1TI 1 "vsx_register_operand"
> "v")
> -(match_operand:V1TI 2 "vsx_register_operand"
> "v")))]
> +(define_insn "altivec_vsrq_"
> +  [(set (match_operand:VEC_TI 0 "vsx_register_operand" "=v")
> + (lshiftrt:VEC_TI (match_operand:VEC_TI 1 "vsx_register_operand"
> "v")
> +(match_operand:VEC_TI 2
> "vsx_register_operand" "v")))]
>"TARGET_POWER10"
>/* Shift amount in needs to be in bits[57:63] of 128-bit operand.
> */
>"vsrq %0,%1,%2"
> diff --git a/gcc/config/rs6000/vector.md
> b/gcc/config/rs6000/vector.md
> index 0cca4232619..3ea3a91845a 100644
> --- a/gcc/config/rs6000/vector.md
> +++ b/gcc/config/rs6000/vector.md
> @@ -26,6 +26,9 @@
>  ;; Vector int modes
>  (define_mode_iterator VEC_I [V16QI V8HI V4SI V2DI])
> 
> +;; 128-bit int modes
> +(define_mode_iterator VEC_TI [V1TI TI])
> +
>  ;; Vector int modes for parity
>  (define_mode_iterator VEC_IP [V8HI
> V4SI
> @@ -1627,17 +1630,17 @@
>"")
> 
>  ;; No immediate version of this 128-bit instruction
> -(define_expand "vashlv1ti3"
> -  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> - (ashift:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
> -  (match_operand:V1TI 2 "vsx_register_operand"
> "v")))]
> +(define_expand "vashl3"
> +  [(set (match_operand:VEC_TI 0 "vsx_register_operand" "=v")
> + (ashift:VEC_TI (match_operand:VEC_TI 1 "vsx_register_operand")
> +  (match_operand:VEC_TI 2
> "vsx_register_operand")))]
>"TARGET_POWER10"
>  {
>/* Shift amount in needs to be put in bits[57:63] of 128-bit
> operand2. */
> -  rtx tmp = gen_reg_rtx (V1TImode);
> +  rtx tmp = gen_reg_rtx (mode);
> 
>emit_insn(gen_xxswapd_v1ti (tmp, operands[2]));
> -  emit_insn(gen_altivec_vslq (operands[0], operands[1], tmp));
> +  emit_insn(gen_altivec_vslq_ (operands[0], operands[1],
> tmp));
>DONE;
>  })
> 
> @@ -1

libgo patch committed: Don't build __go_ptrace on AIX

2020-09-24 Thread Ian Lance Taylor via Gcc-patches
This libgo patch by Clément Chigot removes __go_ptrace on AIX.  AIX
ptrace syscalls doesn't have the same semantic than the glibc one.
The syscall package is already handling it correctly so disable the
new __go_ptrace C function for AIX.  Bootstrapped and ran Go testsuite
on x86_64-pc-linux-gnu.  Committed to mainline.

Ian
763460e4776ce2d1ca2fe87678fc233f27f70e64
diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE
index f51dac55365..daa0d2d6177 100644
--- a/gcc/go/gofrontend/MERGE
+++ b/gcc/go/gofrontend/MERGE
@@ -1,4 +1,4 @@
-6a7648c97c3e0cdbecbec7e760b30246521a6d90
+2357468ae9b071de0e2ebe6574d78572967b7183
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
diff --git a/libgo/runtime/go-varargs.c b/libgo/runtime/go-varargs.c
index f9270a97bfd..9cb4a7e79bd 100644
--- a/libgo/runtime/go-varargs.c
+++ b/libgo/runtime/go-varargs.c
@@ -114,7 +114,9 @@ __go_syscall6(uintptr_t flag, uintptr_t a1, uintptr_t a2, 
uintptr_t a3,
 
 #endif
 
-#ifdef HAVE_SYS_PTRACE_H
+// AIX ptrace is really different from Linux ptrace. Let syscall
+// package handles it.
+#if defined(HAVE_SYS_PTRACE_H) && !defined(_AIX)
 
 // Despite documented appearances, this is actually implemented as
 // a variadic function within glibc.


Re: [PATCH v2 2/2] rs6000: Expand vec_insert in expander instead of gimple [PR79251]

2020-09-24 Thread Segher Boessenkool
Hi!

On Thu, Sep 24, 2020 at 03:27:48PM +0200, Richard Biener wrote:
> On Thu, Sep 24, 2020 at 10:21 AM xionghu luo  wrote:
> I'll just comment that
> 
> xxperm 34,34,33
> xxinsertw 34,0,12
> xxperm 34,34,32
> 
> doesn't look like a variable-position insert instruction but
> this is a variable whole-vector rotate plus an insert at index zero
> followed by a variable whole-vector rotate.  I'm not fluend in
> ppc assembly but
> 
> rlwinm 6,6,2,28,29
> mtvsrwz 0,5
> lvsr 1,0,6
> lvsl 0,0,6
> 
> possibly computes the shift masks for r33/r32?  though
> I do not see those registers mentioned...

v0/v1 (what the lvs[lr] write to) are the same as vs32/vs33.

The low half of the VSRs (vector-scalar registers) are the FP registers
(expanded to 16B each), and the high half are the original VRs (vector
registers).  AltiVec insns (like lvsl, lvsr) naturally only work on VRs,
as do some newer insns for which there wasn't enough budget in the
opcode space to have for VSRs (which take 6 bits each, while VRs take
only 5, just like FPRs and GPRs).

> This might be a generic viable expansion strathegy btw,
> which is why I asked before whether the CPU supports
> inserts at a variable position ...

ISA 3.1 (Power10) supports variable position inserts.  Power9 supports
fixed position inserts.  Older CPUs can of course construct it some
other way.

> ppc does _not_ have a VSX instruction
> like xxinsertw r34, r8, r12 where r8 denotes
> the vector element (or byte position or whatever).

vins[bhwd][v][lr]x does this.  Those are Power10 instructions.


Segher


Re: [PATCH] tree-optimization/97151 - improve PTA for C++ operator delete

2020-09-24 Thread Jason Merrill via Gcc-patches

On 9/24/20 3:43 AM, Richard Biener wrote:

On Wed, 23 Sep 2020, Jason Merrill wrote:


On 9/23/20 2:42 PM, Richard Biener wrote:

On September 23, 2020 7:53:18 PM GMT+02:00, Jason Merrill 
wrote:

On 9/23/20 4:14 AM, Richard Biener wrote:

C++ operator delete, when DECL_IS_REPLACEABLE_OPERATOR_DELETE_P,
does not cause the deleted object to be escaped.  It also has no
other interesting side-effects for PTA so skip it like we do
for BUILT_IN_FREE.


Hmm, this is true of the default implementation, but since the function

is replaceable, we don't know what a user definition might do with the
pointer.


But can the object still be 'used' after delete? Can delete fail / throw?

What guarantee does the predicate give us?


The deallocation function is called as part of a delete expression in order to
release the storage for an object, ending its lifetime (if it was not ended by
a destructor), so no, the object can't be used afterward.


OK, but the delete operator can access the object contents if there
wasn't a destructor ...



A deallocation function that throws has undefined behavior.


OK, so it seems the 'replaceable' operators are the global ones
(for user-defined/class-specific placement variants I see arbitrary
extra arguments that we'd possibly need to handle).

I'm happy to revert but I'd like to have a testcase that FAILs
with the patch ;)

Now, the following aborts:

struct X {
   static struct X saved;
   int *p;
   X() { __builtin_memcpy (this, &saved, sizeof (X)); }
};
void operator delete (void *p)
{
   __builtin_memcpy (&X::saved, p, sizeof (X));
}
int main()
{
   int y = 1;
   X *p = new X;
   p->p = &y;
   delete p;
   X *q = new X;
   *(q->p) = 2;
   if (y != 2)
 __builtin_abort ();
}

and I could fix this by not making *p but what *p points to escape.
The testcase is of course maximally awkward, but hey ... ;)

Now this would all be moot if operator delete may not access
the object (or if the object contents are undefined at that point).

Oh, and the testcase segfaults when compiled with GCC 10 because
there we elide the new X / delete p pair ... which is invalid then?
Hmm, we emit

   MEM[(struct X *)_8] ={v} {CLOBBER};
   operator delete (_8, 8);

so the object contents are undefined _before_ calling delete
even when I do not have a DTOR?  That is, the above,
w/o -fno-lifetime-dse, makes the PTA patch OK for the testcase.


Yes, all classes have a destructor, even if it's trivial, so the 
object's lifetime definitely ends before the call to operator delete. 
This is less clear for scalar objects, but treating them similarly would 
be consistent with other recent changes, so I think it's fine for us to 
assume that scalar objects are also invalidated before the call to 
operator delete.  But of course this doesn't apply to explicit calls to 
operator delete outside of a delete expression.



Richard.


Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

Richard.

2020-09-23  Richard Biener  

  PR tree-optimization/97151
  * tree-ssa-structalias.c (find_func_aliases_for_call):
  DECL_IS_REPLACEABLE_OPERATOR_DELETE_P has no effect on
  arguments.

* g++.dg/cpp1y/new1.C: Adjust for two more handled transforms.
---
gcc/testsuite/g++.dg/cpp1y/new1.C | 4 ++--
gcc/tree-ssa-structalias.c| 2 ++
2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/g++.dg/cpp1y/new1.C

b/gcc/testsuite/g++.dg/cpp1y/new1.C

index aa5f647d535..fec0088cb40 100644
--- a/gcc/testsuite/g++.dg/cpp1y/new1.C
+++ b/gcc/testsuite/g++.dg/cpp1y/new1.C
@@ -69,5 +69,5 @@ test_unused() {
  delete p;
}

-/* { dg-final { scan-tree-dump-times "Deleting : operator delete" 5

"cddce1"} } */

-/* { dg-final { scan-tree-dump-times "Deleting : _\\d+ = operator

new" 7 "cddce1"} } */

+/* { dg-final { scan-tree-dump-times "Deleting : operator delete" 6

"cddce1"} } */

+/* { dg-final { scan-tree-dump-times "Deleting : _\\d+ = operator

new" 8 "cddce1"} } */

diff --git a/gcc/tree-ssa-structalias.c b/gcc/tree-ssa-structalias.c
index 44fe52e0f65..f676bf91e95 100644
--- a/gcc/tree-ssa-structalias.c
+++ b/gcc/tree-ssa-structalias.c
@@ -4857,6 +4857,8 @@ find_func_aliases_for_call (struct function

*fn, gcall *t)

  point for reachable memory of their arguments.  */
   else if (flags & (ECF_PURE|ECF_LOOPING_CONST_OR_PURE))
handle_pure_call (t, &rhsc);
+  else if (fndecl && DECL_IS_REPLACEABLE_OPERATOR_DELETE_P

(fndecl))

+   ;
   else
 handle_rhs_call (t, &rhsc);
  if (gimple_call_lhs (t))












Re: [PATCH v2 2/2] rs6000: Expand vec_insert in expander instead of gimple [PR79251]

2020-09-24 Thread Segher Boessenkool
Hi!

On Thu, Sep 24, 2020 at 04:55:21PM +0200, Richard Biener wrote:
> Btw, on x86_64 the following produces sth reasonable:
> 
> #define N 32
> typedef int T;
> typedef T V __attribute__((vector_size(N)));
> V setg (V v, int idx, T val)
> {
>   V valv = (V){idx, idx, idx, idx, idx, idx, idx, idx};
>   V mask = ((V){0, 1, 2, 3, 4, 5, 6, 7} == valv);
>   v = (v & ~mask) | (valv & mask);
>   return v;
> }
> 
> vmovd   %edi, %xmm1
> vpbroadcastd%xmm1, %ymm1
> vpcmpeqd.LC0(%rip), %ymm1, %ymm2
> vpblendvb   %ymm2, %ymm1, %ymm0, %ymm0
> ret
> 
> I'm quite sure you could do sth similar on power?

This only allows inserting aligned elements.  Which is probably fine
of course (we don't allow elements that straddle vector boundaries
either, anyway).

And yes, we can do that :-)

That should be
  #define N 32
  typedef int T;
  typedef T V __attribute__((vector_size(N)));
  V setg (V v, int idx, T val)
  {
V valv = (V){val, val, val, val, val, val, val, val};
V idxv = (V){idx, idx, idx, idx, idx, idx, idx, idx};
V mask = ((V){0, 1, 2, 3, 4, 5, 6, 7} == idxv);
v = (v & ~mask) | (valv & mask);
return v;
  }

after which I get (-march=znver2)

setg:
vmovd   %edi, %xmm1
vmovd   %esi, %xmm2
vpbroadcastd%xmm1, %ymm1
vpbroadcastd%xmm2, %ymm2
vpcmpeqd.LC0(%rip), %ymm1, %ymm1
vpandn  %ymm0, %ymm1, %ymm0
vpand   %ymm2, %ymm1, %ymm1
vpor%ymm0, %ymm1, %ymm0
ret

.LC0:
.long   0
.long   1
.long   2
.long   3
.long   4
.long   5
.long   6
.long   7

and for powerpc (changing it to 16B vectors, -mcpu=power9) it is

setg:
addis 9,2,.LC0@toc@ha
mtvsrws 32,5
mtvsrws 33,6
addi 9,9,.LC0@toc@l
lxv 45,0(9)
vcmpequw 0,0,13
xxsel 34,34,33,32
blr

.LC0:
.long   0
.long   1
.long   2
.long   3

(We can generate that 0..3 vector without doing loads; I guess x86 can
do that as well?  But it takes more than one insn to do (of course we
have to set up the memory address first *with* the load, heh).)

For power8 it becomes (we need to splat in separate insns):

setg:
addis 9,2,.LC0@toc@ha
mtvsrwz 32,5
mtvsrwz 33,6
addi 9,9,.LC0@toc@l
lxvw4x 45,0,9
xxspltw 32,32,1
xxspltw 33,33,1
vcmpequw 0,0,13
xxsel 34,34,33,32
blr


Segher


c++: Cleanup some decl pushing apis

2020-09-24 Thread Nathan Sidwell


In cleaning up local decl handling, here's an initial patch that takes
advantage of C++'s default args for the is_friend parm of pushdecl,
duplicate_decls and push_template_decl_real and the scope & tpl_header
parms of xref_tag.  Then many of the calls simply not mention these.
I also rename push_template_decl_real to push_template_decl, deleting
the original forwarding function.  This'll make my later patches
changing their types less intrusive.  There are 2 functional changes:

1) push_template_decl requires is_friend to be correct, it doesn't go
checking for a friend function (an assert is added).

2) debug_overload prints out Hidden and Using markers for the overload set.

gcc/cp/
* cp-tree.h (duplicate_decls): Default is_friend to false.
(xref_tag): Default tag_scope & tpl_header_p to ts_current & false.
(push_template_decl_real): Default is_friend to false.  Rename to
...
(push_template_decl): ... here.  Delete original decl.
* name-lookup.h (pushdecl_namespace_level): Default is_friend to
false.
(pushtag): Default tag_scope to ts_current.
* coroutine.cc (morph_fn_to_coro): Drop default args to xref_tag.
* decl.c (start_decl): Drop default args to duplicate_decls.
(start_enum): Drop default arg to pushtag & xref_tag.
(start_preparsed_function): Pass DECL_FRIEND_P to
push_template_decl.
(grokmethod): Likewise.
* friend.c (do_friend): Rename push_template_decl_real calls.
* lambda.c (begin_lamnbda_type): Drop default args to xref_tag.
(vla_capture_type): Likewise.
* name-lookup.c (maybe_process_template_type_declaration): Rename
push_template_decl_real call.
(pushdecl_top_level_and_finish): Drop default arg to
pushdecl_namespace_level.
* pt.c (push_template_decl_real): Assert no surprising friend
functions.  Rename to ...
(push_template_decl): ... here.  Delete original function.
(lookup_template_class_1): Drop default args from pushtag.
(instantiate_class_template_1): Likewise.
* ptree.c (debug_overload): Print hidden and using markers.
* rtti.c (init_rtti_processing): Drop refault args from xref_tag.
* semantics.c (begin_class_definition): Drop default args to
pushtag.
gcc/objcp/
* objcp-decl.c (objcp_start_struct): Drop default args to
xref_tag.
(objcp_xref_tag): Likewise.
libcc1/
* libcp1plugin.cc (supplement_binding): Drop default args to
duplicate_decls.
(safe_pushtag): Drop scope parm.  Drop default args to pushtag.
(safe_pushdecl_maybe_friend): Rename to ...
(safe_pushdecl): ... here. Drop is_friend parm.  Drop default args
to pushdecl.
(plugin_build_decl): Adjust safe_pushdecl & safe_pushtag calls.
(plugin_build_constant): Adjust safe_pushdecl call.


pushing to trunk

nathan
--
Nathan Sidwell
diff --git i/gcc/cp/coroutines.cc w/gcc/cp/coroutines.cc
index 898b88b7075..ba813454a0b 100644
--- i/gcc/cp/coroutines.cc
+++ w/gcc/cp/coroutines.cc
@@ -4011,7 +4011,7 @@ morph_fn_to_coro (tree orig, tree *resumer, tree *destroyer)
   /* 2. Types we need to define or look up.  */
 
   tree fr_name = get_fn_local_identifier (orig, "frame");
-  tree coro_frame_type = xref_tag (record_type, fr_name, ts_current, false);
+  tree coro_frame_type = xref_tag (record_type, fr_name);
   DECL_CONTEXT (TYPE_NAME (coro_frame_type)) = current_scope ();
   tree coro_frame_ptr = build_pointer_type (coro_frame_type);
   tree act_des_fn_type
diff --git i/gcc/cp/cp-tree.h w/gcc/cp/cp-tree.h
index 029a165a3e8..3ae48749b3d 100644
--- i/gcc/cp/cp-tree.h
+++ w/gcc/cp/cp-tree.h
@@ -6461,7 +6461,8 @@ extern void note_iteration_stmt_body_end	(bool);
 extern void determine_local_discriminator	(tree);
 extern int decls_match(tree, tree, bool = true);
 extern bool maybe_version_functions		(tree, tree, bool);
-extern tree duplicate_decls			(tree, tree, bool);
+extern tree duplicate_decls			(tree, tree,
+		 bool is_friend = false);
 extern tree declare_local_label			(tree);
 extern tree define_label			(location_t, tree);
 extern void check_goto(tree);
@@ -6501,7 +6502,9 @@ extern tree get_scope_of_declarator		(const cp_declarator *);
 extern void grok_special_member_properties	(tree);
 extern bool grok_ctor_properties		(const_tree, const_tree);
 extern bool grok_op_properties			(tree, bool);
-extern tree xref_tag(enum tag_types, tree, tag_scope, bool);
+extern tree xref_tag(tag_types, tree,
+		 tag_scope = ts_current,
+		 bool tpl_header_p = false);
 extern void xref_basetypes			(tree, tree);
 extern tree start_enum(tree, tree, tree, tree, bool, bool *);
 extern void finish_enum_value_list		(tree);
@@ -6849,8 +6852,7 @@ extern void end_template_parm_list		(void);
 extern void end_template_decl			(void);
 extern tree maybe_update_decl_type		(tree, tree);
 e

[PATCH 0/9] PowerPC: Patches to enable changing the long double default to IEEE 128-bit on little endian PowerPC 64-bit Linux systems

2020-09-24 Thread Michael Meissner via Gcc-patches
This series of 9 patches is an attempt to gather together all of the patches
that are needed to be able to configure and build a little endian 64-bit
PowerPC Linux GCC compiler where the defualt long double format uses the IEEE
128-bit representation.

I have created an IBM vendor branch that includes these patches (along the
other outstanding patches that I have for IEEE 128-bit min/max/cmove on
power10, and power10 PCREL_OPT support):

vendors/ibm/ieee-longdouble-001

You will need a new enough GLIBC in order to do this configuration.  The
Advance Toolchain AT14.0 from IBM includes the changes in the library that are
needed to build a compiler with this default.

Note, with these patches, we need the libstdc++ work that was begun last year
to be finished and committed.  This shows up in trying to build the Spec 2017
511.parest_r (rate) benchmark when long double uses the IEEE representation.

Using the steps outlined below, I have build and bootstraped current GCC
sources, comparing builds where the default long double is the current IBM
extended double to builds where long double uses the IEEE 128-bit
representation.  The only difference in C, C++, LTO, and Fortran tests are 3
Fortran tests that either were marked as XFAIL or just failed now pass.

The patches that will be posted include:

#1  Map built-in function names for long double;
#2  Update error messages intermixing the 2 128-bit types;
#3  Fixes libgcc conversions between the 2 128-bit types;
#4  Add support for converting IEEE 128-bit <-> Decimal;
#5  Update tests to run with IEEE 128-bit long double;
#6  Map nanq, nansq, etc. to long double if long double is IEEE;
#7  Update power10 __float128 tests to work with IEEE long double;
#8  Use __float128 in some of the tests instead of __ieee128; (and)
#9  Use __builtin_pack_ieee128 in libgcc if IEEE long double.

I put the following file in the branch:

gcc/config/rs6000/gcc-with-ieee-128bit-longdouble.txt

This is a short memo of how to build a GCC 11 compiler where the long double
type is IEEE 128-bit instead of using the IBM extended double format on the
PowerPC 64-bit little endian Linux environment.

You will likely need the Advance Toolchain AT14.0 library, as it has all of the
changes to support switching the long double default to IEEE 128-bit.

*   https://www.ibm.com/support/pages/advance-toolchain-linux-power

You will need a recent version of binutils.  I've used the binutils that I
downloaded via git on September 14th, 2020:

*   git clone git://sourceware.org/git/binutils-gdb.git

You will need appropriate versions of the gmp, mpfr, and mpc libraries:

*   http://gcc.gnu.org/pub/gcc/infrastructure/gmp-6.1.0.tar.bz2
*   http://gcc.gnu.org/pub/gcc/infrastructure/mpfr-3.1.4.tar.bz2
*   http://gcc.gnu.org/pub/gcc/infrastructure/mpc-1.0.3.tar.gz

Currently, I use --without-ppl --without-cloog --without-isl so I haven't used
those libraries.

I currently disable plug-in support.  If you want plug-in support, you will
likely need to build a binutils with the first compiler, to use with the second
and third compilers.  If you use a binutils compiled with a compiler where the
long double format is IBM extended double, it may not work.

I found I needed the configuration option --with-system-zlib to avoid some
issues when doing a bootstrap build.

Build the first PowerPC GCC compiler (non-bootstrap) using at least the
following options:

--prefix=
--enable-stage1-languages=c,c++,fortran
--disable-bootstrap
--disable-plugin
--with-long-double-format=ieee
--with-advance-toolchain=at14.0
--with-system-zlib
--with-native-system-header-dir=/opt/at14.0/include
--without-ppl
--without-cloog
--without-isl

Other configuration options that I use but may not affect switching the long
double default include:

--enable-checking
--enable-languages=c,c++,fortran
--enable-stage1-checking
--enable-gnu-indirect-function
--enable-decimal-float
--with-long-double-128
--enable-secureplt
--enable-threads=posix
--enable-__cxa_atexit
--with-as=
--with-ld=
--with-gnu-as=
--with-gnu-ld=
--with-cpu=power9   (or --with-cpu=power8)

Build and install the first compiler.

Configure, build, and install gmp 6.1.0 using the first compiler built above
with following configuration options:

--prefix=
--enable-static
--disable-shared
--enable-cxx
CPPFLAGS=-fexceptions

Configure, build, and install mpfr 3.1.4 using the first compiler build above
with the following configuration options:

--prefix=
--enable-static
--disable-shared
--with-gmp=

Configure, build, and install mpc 1.0.3 using the first compiler build above
with the following configuration options:

--prefix=

[PATCH 1/9] PowerPC: Map long double built-in functions if IEEE 128-bit long double.

2020-09-24 Thread Michael Meissner via Gcc-patches
PowerPC: Map long double built-in functions if IEEE 128-bit long double.

This patch goes through the built-in functions and changes the name that is
used to the name used for __float128 and _Float128 support in glibc if the
PowerPC long double type is IEEE 128-bit instead of IBM extended double.

Normally the mapping is done in the math.h and stdio.h files.  However, not
everybody uses these files, which means we also need to change the external
name for the built-in function within the compiler.

In addition, changing the name in GCC allows the Fortran compiler to
automatically use the correct name.

To map the math functions, typically this patch changes l to f128.
However there are some exceptions that are handled with this patch.

To map the printf functions,  is mapped to __ieee128.

To map the scanf functions,  is mapped to __isoc99ieee128.

gcc/
2020-09-23  Michael Meissner  

* config/rs6000/rs6000.c (rs6000_mangle_decl_assembler_name): Add
support for mapping built-in function names for long double
built-in functions if long double is IEEE 128-bit.

gcc/testsuite/
2020-09-23  Michael Meissner  

* gcc.target/powerpc/float128-longdouble-math.c: New test.
* gcc.target/powerpc/float128-longdouble-stdio.c: New test.
---
 gcc/config/rs6000/rs6000.c| 153 -
 .../powerpc/float128-longdouble-math.c| 559 ++
 .../powerpc/float128-longdouble-stdio.c   |  37 ++
 3 files changed, 718 insertions(+), 31 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/float128-longdouble-math.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/float128-longdouble-stdio.c

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index b589f4566c2..0ff0f31d552 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -26909,56 +26909,147 @@ rs6000_globalize_decl_name (FILE * stream, tree decl)
library before you can switch the real*16 type at compile time.
 
We use the TARGET_MANGLE_DECL_ASSEMBLER_NAME hook to change this name.  We
-   only do this if the default is that long double is IBM extended double, and
-   the user asked for IEEE 128-bit.  */
+   only do this transformation if the __float128 type is enabled.  This
+   prevents us from doing the transformation on older 32-bit ports that might
+   have enabled using IEEE 128-bit floating point as the default long double
+   type.  */
 
 static tree
 rs6000_mangle_decl_assembler_name (tree decl, tree id)
 {
-  if (!TARGET_IEEEQUAD_DEFAULT && TARGET_IEEEQUAD && TARGET_LONG_DOUBLE_128
-  && TREE_CODE (decl) == FUNCTION_DECL && DECL_IS_BUILTIN (decl) )
+  if (TARGET_FLOAT128_TYPE && TARGET_IEEEQUAD && TARGET_LONG_DOUBLE_128
+  && TREE_CODE (decl) == FUNCTION_DECL
+  && fndecl_built_in_p (decl, BUILT_IN_NORMAL))
 {
   size_t len = IDENTIFIER_LENGTH (id);
   const char *name = IDENTIFIER_POINTER (id);
+  const char *newname = NULL;
 
-  if (name[len - 1] == 'l')
+  /* See if it is one of the built-in functions with an unusual name.  */
+  switch (DECL_FUNCTION_CODE (decl))
{
- bool uses_ieee128_p = false;
- tree type = TREE_TYPE (decl);
- machine_mode ret_mode = TYPE_MODE (type);
+   default:
+ break;
 
- /* See if the function returns a IEEE 128-bit floating point type or
-complex type.  */
- if (ret_mode == TFmode || ret_mode == TCmode)
-   uses_ieee128_p = true;
- else
+   case BUILT_IN_DREML:
+ newname = "remainderf128";
+ break;
+
+   case BUILT_IN_GAMMAL:
+ newname = "lgammaf128";
+ break;
+
+   case BUILT_IN_GAMMAL_R:
+   case BUILT_IN_LGAMMAL_R:
+ newname = "__lgammaieee128_r";
+ break;
+
+   case BUILT_IN_NEXTTOWARD:
+ newname = "__nexttoward_to_ieee128";
+ break;
+
+   case BUILT_IN_NEXTTOWARDF:
+ newname = "__nexttowardf_to_ieee128";
+ break;
+
+   case BUILT_IN_NEXTTOWARDL:
+ newname = "__nexttowardieee128";
+ break;
+
+   case BUILT_IN_POW10L:
+ newname = "exp10f128";
+ break;
+
+   case BUILT_IN_SCALBL:
+ newname = "__scalbnieee128";
+ break;
+
+   case BUILT_IN_SIGNIFICANDL:
+ newname = "__significandieee128";
+ break;
+
+   case BUILT_IN_SINCOSL:
+ newname = "__sincosieee128";
+ break;
+   }
+
+  /* Update the __builtin_*printf && __builtin_*scanf functions.  */
+  if (!newname)
+   {
+ const size_t printf_len = sizeof ("printf") - 1;
+ const size_t scanf_len = sizeof ("scanf") - 1;
+ const size_t printf_extra
+   = sizeof ("__") - 1 + sizeof ("ieee128") - 1;
+ const size_t scanf_extra
+   = sizeof ("__isoc99_") - 1 + sizeof ("ieee128") - 1;
+
+ if (len >= printf_len
+ && strcmp (name + len - printf_len, "prin

[PATCH 2/9] PowerPC: Update __float128 and __ibm128 error messages.

2020-09-24 Thread Michael Meissner via Gcc-patches
PowerPC: Update __float128 and __ibm128 error messages.

This patch attempts to make the error messages for intermixing IEEE 128-bit
floating point with IBM 128-bit extended double types to be clearer if the long
double type uses the IEEE 128-bit format.

gcc/
2020-09-23  Michael Meissner  

* config/rs6000/rs6000.c (rs6000_invalid_binary_op): Update error
messages about mixing IBM long double and IEEE 128-bit.

gcc/testsuite/
2020-09-23  Michael Meissner  

* gcc.target/powerpc/bfp/scalar-extract-exp-4.c: Update failure
messages.
* gcc.target/powerpc/bfp/scalar-extract-sig-4.c: Update failure
messages.
* gcc.target/powerpc/bfp/scalar-test-data-class-11.c: Update
failure messages.
* gcc.target/powerpc/bfp/scalar-test-neg-5.c: Update failure
messages.
* gcc.target/powerpc/float128-mix-2.c: New test.
* gcc.target/powerpc/float128-mix-3.c: New test.
* gcc.target/powerpc/float128-mix.c: Update failure messages.
---
 gcc/config/rs6000/rs6000.c| 20 ---
 .../powerpc/bfp/scalar-extract-exp-4.c|  4 +---
 .../powerpc/bfp/scalar-extract-sig-4.c|  2 +-
 .../powerpc/bfp/scalar-test-data-class-11.c   |  2 +-
 .../powerpc/bfp/scalar-test-neg-5.c   |  2 +-
 .../gcc.target/powerpc/float128-mix-2.c   | 17 
 .../gcc.target/powerpc/float128-mix-3.c   | 17 
 .../gcc.target/powerpc/float128-mix.c | 19 ++
 8 files changed, 53 insertions(+), 30 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/float128-mix-2.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/float128-mix-3.c

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 0ff0f31d552..97f535f0018 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -14352,22 +14352,10 @@ rs6000_invalid_binary_op (int op ATTRIBUTE_UNUSED,
 
   if (!TARGET_FLOAT128_CVT)
 {
-  if ((mode1 == KFmode && mode2 == IFmode)
- || (mode1 == IFmode && mode2 == KFmode))
-   return N_("__float128 and __ibm128 cannot be used in the same "
- "expression");
-
-  if (TARGET_IEEEQUAD
- && ((mode1 == IFmode && mode2 == TFmode)
- || (mode1 == TFmode && mode2 == IFmode)))
-   return N_("__ibm128 and long double cannot be used in the same "
- "expression");
-
-  if (!TARGET_IEEEQUAD
- && ((mode1 == KFmode && mode2 == TFmode)
- || (mode1 == TFmode && mode2 == KFmode)))
-   return N_("__float128 and long double cannot be used in the same "
- "expression");
+  if ((FLOAT128_IEEE_P (mode1) && FLOAT128_IBM_P (mode2))
+ || (FLOAT128_IBM_P (mode1) && FLOAT128_IEEE_P (mode2)))
+   return N_("Invalid mixing of IEEE 128-bit and IBM 128-bit floating "
+ "point types");
 }
 
   return NULL;
diff --git a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-exp-4.c 
b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-exp-4.c
index 850ff620490..2065a287bb3 100644
--- a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-exp-4.c
+++ b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-exp-4.c
@@ -11,7 +11,5 @@ get_exponent (__ieee128 *p)
 {
   __ieee128 source = *p;
 
-  return __builtin_vec_scalar_extract_exp (source); /* { dg-error 
"'__builtin_vsx_scalar_extract_expq' requires" } */
+  return __builtin_vec_scalar_extract_exp (source); /* { dg-error 
"'__builtin_vsx_scalar_extract_exp.*' requires" } */
 }
-
-
diff --git a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-sig-4.c 
b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-sig-4.c
index 32a53c6fffd..37bc8332961 100644
--- a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-sig-4.c
+++ b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-sig-4.c
@@ -11,5 +11,5 @@ get_significand (__ieee128 *p)
 {
   __ieee128 source = *p;
 
-  return __builtin_vec_scalar_extract_sig (source);/* { dg-error 
"'__builtin_vsx_scalar_extract_sigq' requires" } */
+  return __builtin_vec_scalar_extract_sig (source);/* { dg-error 
"'__builtin_vsx_scalar_extract_sig.*' requires" } */
 }
diff --git a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-test-data-class-11.c 
b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-test-data-class-11.c
index 7c6fca2b729..ec3118792c4 100644
--- a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-test-data-class-11.c
+++ b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-test-data-class-11.c
@@ -10,5 +10,5 @@ test_data_class (__ieee128 *p)
 {
   __ieee128 source = *p;
 
-  return __builtin_vec_scalar_test_data_class (source, 3); /* { dg-error 
"'__builtin_vsx_scalar_test_data_class_qp' requires" } */
+  return __builtin_vec_scalar_test_data_class (source, 3); /* { dg-error 
"'__builtin_vsx_scalar_test_data_class_.*' requires" } */
 }
diff --git a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-test-neg-5.c 
b/gcc/testsuite/g

[PATCH 3/9] PowerPC: Update IEEE <-> IBM 128-bit floating point conversions.

2020-09-24 Thread Michael Meissner via Gcc-patches
PowerPC: Update IEEE <-> IBM 128-bit floating point conversions.

This patch changes the code for doing conversions between IEEE 128-bit floating
point and IBM 128-bit extended double floating point.  It moves the conversion
functions to a separate file.  It uses explicit __ibm128 instead of long
double to allow the long double type to be set to IEEE 128-bit.

libgcc/
2020-09-23  Michael Meissner  

* config/rs6000/extendkftf2-sw.c: Move __float128 to __ibm128
conversion into float128-convert.h.
* config/rs6000/float128-convert.h: New file.
* config/rs6000/float128-hw.c: Move conversions between __float128
and __ibm128 into float128-convert.h.
* config/rs6000/quad-float128.h: Move conversions between
__float128 and __ibm128 into float128-convert.h.
* config/rs6000/trunctfkf2-sw.c: Move __ibm128 to __float128
conversion to float128-convert.h.
---
 libgcc/config/rs6000/extendkftf2-sw.c   |  6 +-
 libgcc/config/rs6000/float128-convert.h | 77 +
 libgcc/config/rs6000/float128-hw.c  | 11 +---
 libgcc/config/rs6000/quad-float128.h| 58 ---
 libgcc/config/rs6000/trunctfkf2-sw.c|  6 +-
 5 files changed, 84 insertions(+), 74 deletions(-)
 create mode 100644 libgcc/config/rs6000/float128-convert.h

diff --git a/libgcc/config/rs6000/extendkftf2-sw.c 
b/libgcc/config/rs6000/extendkftf2-sw.c
index f0de1784c43..80b48c20d9c 100644
--- a/libgcc/config/rs6000/extendkftf2-sw.c
+++ b/libgcc/config/rs6000/extendkftf2-sw.c
@@ -38,6 +38,7 @@
 
 #include "soft-fp.h"
 #include "quad-float128.h"
+#include "float128-convert.h"
 
 #ifndef FLOAT128_HW_INSNS
 #define __extendkftf2_sw __extendkftf2
@@ -46,8 +47,5 @@
 IBM128_TYPE
 __extendkftf2_sw (TFtype value)
 {
-  IBM128_TYPE ret;
-
-  CVT_FLOAT128_TO_IBM128 (ret, value);
-  return ret;
+  return convert_float128_to_ibm128 (value);
 }
diff --git a/libgcc/config/rs6000/float128-convert.h 
b/libgcc/config/rs6000/float128-convert.h
new file mode 100644
index 000..bb6b3d71889
--- /dev/null
+++ b/libgcc/config/rs6000/float128-convert.h
@@ -0,0 +1,77 @@
+/* Convert between IEEE 128-bit and IBM 128-bit floating point types.
+   Copyright (C) 2016-2020 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+   Contributed by Michael Meissner (meiss...@linux.ibm.com).
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   In addition to the permissions in the GNU Lesser General Public
+   License, the Free Software Foundation gives you unlimited
+   permission to link the compiled version of this file into
+   combinations with other programs, and to distribute those
+   combinations without any restriction coming from the use of this
+   file.  (The Lesser General Public License restrictions do apply in
+   other respects; for example, they cover modification of the file,
+   and distribution when not linked into a combine executable.)
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   .  */
+
+/* Implementation of conversions between __ibm128 and __float128, to allow the
+   same code to be used on systems with IEEE 128-bit emulation and with IEEE
+   128-bit hardware support.
+
+   These functions are called by the actual conversion functions called by the
+   compiler.  This code is here to allow being built at power8 (no hardware
+   float128) and power9 (hardware float128) varients that is selected by an
+   IFUNC function.  */
+
+static inline __ibm128 convert_float128_to_ibm128 (__float128);
+static inline __float128 convert_ibm128_to_float128 (__ibm128);
+
+static inline __ibm128
+convert_float128_to_ibm128 (__float128 value)
+{
+  double high, high_temp, low;
+
+  high = (double) value;
+  if (__builtin_isnan (high) || __builtin_isinf (high))
+low = 0.0;
+
+  else
+{
+  low = (double) (value - (__float128) high);
+  /* Renormalize low/high and move them into canonical IBM long
+double form.  */
+  high_temp = high + low;
+  low = (high - high_temp) + low;
+  high = high_temp;
+}
+
+  return __builtin_pack_ibm128 (high, low);
+}
+
+static inline __float128
+convert_ibm128_to_float128 (__ibm128 value)
+{
+  double high = __builtin_unpack_ibm128 (value, 0);
+  double low = __builtin_unpack_ibm128 (value, 1);
+
+  /* Handle the special cases of NAN and infinity.  Similarly, if low is 0.0,
+ there no need to do t

One issue with default implementation of zero_call_used_regs

2020-09-24 Thread Qing Zhao via Gcc-patches
Hi, Richard,

As you suggested, I added a default implementation of the target hook 
“zero_cal_used_regs (HARD_REG_SET)” as following in my latest patch


/* The default hook for TARGET_ZERO_CALL_USED_REGS.  */

void
default_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
{
  gcc_assert (!hard_reg_set_empty_p (need_zeroed_hardregs));

  /* This array holds the zero rtx with the correponding machine mode.  */
  rtx zero_rtx[(int)MAX_MACHINE_MODE];
  for (int i = 0; i < (int) MAX_MACHINE_MODE; i++)
zero_rtx[i] = NULL_RTX;

  expand_asm_memory_blockage ();

  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
if (TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
  {
rtx reg, tmp;
machine_mode mode = reg_raw_mode[regno];

reg = gen_rtx_REG (mode, regno);

/* update the data flow information.  */
expand_asm_reg_clobber_blockage (reg);
df_update_zeroed_reg_set (regno);

if (zero_rtx[(int)mode] == NULL_RTX)
  {
zero_rtx[(int)mode] = reg;
tmp = gen_rtx_SET (reg, const0_rtx);
emit_insn (tmp);
  }
else
  emit_move_insn (reg, zero_rtx[(int)mode]);
  }
  return;
}

I tested this default implementation on aarch64 with a small testing case, 
-fzero-call-used-regs=all-gpr|used-gpr|used-gpr-arg|used-arg|used work well, 
however, 
-fzero-call-used-regs=all-arg or -fzero-call-used-regs=all have an internal 
compiler error as following:

t1.c:15:1: internal compiler error: in gen_highpart, at emit-rtl.c:1631
   15 | }
  | ^
0xcff58b gen_highpart(machine_mode, rtx_def*)
../../hjl-caller-saved-gcc/gcc/emit-rtl.c:1631
0x174b373 aarch64_split_128bit_move(rtx_def*, rtx_def*)
../../hjl-caller-saved-gcc/gcc/config/aarch64/aarch64.c:3390
0x1d8b087 gen_split_11(rtx_insn*, rtx_def**)
../../hjl-caller-saved-gcc/gcc/config/aarch64/aarch64.md:1394

As I studied today, I found the major issue for this bug is because the 
following statement:

machine_mode mode = reg_raw_mode[regno];

“reg_raw_mode” returns E_TImode for aarch64 register V0 (which is a vector 
register on aarch64) , as a result, the zeroing insn for this register is:

(insn 112 111 113 7 (set (reg:TI 32 v0)
(const_int 0 [0])) "t1.c":15:1 -1
 (nil))


However, looks like that the above RTL have to be splitted into two sub 
register moves on aarch64, and the splitting has some issue. 

So, I guess that on aarch64, zeroing vector registers might need other modes 
than the one returned by “reg_raw_mode”.  

My questions are:

1. Is there another available utility routine that returns the proper MODE for 
the hard registers that can be readily used to zero the hardr register?
2. If not, should I add one more target hook for this purpose? i.e 

/* Return the proper machine mode that can be used to zero this hard register 
specified by REGNO.  */
machine_mode zero-call-used-regs-mode (unsigned int REGNO)


Thanks.

Qing





[PATCH 4/9] PowerPC: Add IEEE 128-bit <-> Decimal conversions.

2020-09-24 Thread Michael Meissner via Gcc-patches
PowerPC: Add IEEE 128-bit <-> Decimal conversions.

This patch adds the basic support for converting between IEEE 128-bit floating
point and Decimal types.

libgcc/
2020-09-23  Michael Meissner  

* config/rs6000/_dd_to_kf.c: New file.
* config/rs6000/_kf_to_dd.c: New file.
* config/rs6000/_kf_to_sd.c: New file.
* config/rs6000/_kf_to_td.c: New file.
* config/rs6000/_sd_to_kf.c: New file.
* config/rs6000/_td_to_kf.c: New file.
* config/rs6000/t-float128: Build __floating conversions to/from
Decimal support functions.  By default compile with long double
being IBM extended double.
* dfp-bit.c: Add support for building the PowerPC _Float128
to/from Decimal conversion functions.
* dfp-bit.h: Likewise.
---
 libgcc/config/rs6000/_dd_to_kf.c | 30 ++
 libgcc/config/rs6000/_kf_to_dd.c | 30 ++
 libgcc/config/rs6000/_kf_to_sd.c | 30 ++
 libgcc/config/rs6000/_kf_to_td.c | 30 ++
 libgcc/config/rs6000/_sd_to_kf.c | 30 ++
 libgcc/config/rs6000/_td_to_kf.c | 30 ++
 libgcc/config/rs6000/t-float128  | 30 +-
 libgcc/dfp-bit.c | 10 +++--
 libgcc/dfp-bit.h | 37 +---
 9 files changed, 251 insertions(+), 6 deletions(-)
 create mode 100644 libgcc/config/rs6000/_dd_to_kf.c
 create mode 100644 libgcc/config/rs6000/_kf_to_dd.c
 create mode 100644 libgcc/config/rs6000/_kf_to_sd.c
 create mode 100644 libgcc/config/rs6000/_kf_to_td.c
 create mode 100644 libgcc/config/rs6000/_sd_to_kf.c
 create mode 100644 libgcc/config/rs6000/_td_to_kf.c

diff --git a/libgcc/config/rs6000/_dd_to_kf.c b/libgcc/config/rs6000/_dd_to_kf.c
new file mode 100644
index 000..081415fd393
--- /dev/null
+++ b/libgcc/config/rs6000/_dd_to_kf.c
@@ -0,0 +1,30 @@
+/* Copyright (C) 1989-2020 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+.  */
+
+/* Decimal64 -> _Float128 conversion.  */
+#define FINE_GRAINED_LIBRARIES 1
+#define L_dd_to_kf 1
+#define WIDTH  64
+
+/* Use dfp-bit.c to do the real work.  */
+#include "dfp-bit.c"
diff --git a/libgcc/config/rs6000/_kf_to_dd.c b/libgcc/config/rs6000/_kf_to_dd.c
new file mode 100644
index 000..09a62cbe629
--- /dev/null
+++ b/libgcc/config/rs6000/_kf_to_dd.c
@@ -0,0 +1,30 @@
+/* Copyright (C) 1989-2020 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+.  */
+
+/* _Float128 -> Decimal64 conversion.  */
+#define FINE_GRAINED_LIBRARIES 1
+#define L_kf_to_dd 1
+#define WIDTH  64
+
+/* Use dfp-bit.c to do the real work.  */
+#include "dfp-bit.c"
diff --git a/libgcc/config/rs6000/_kf_to_sd.c b/libgcc/config/rs6000/_kf_to_sd.c
new file mode 100644
index 000..f35b68eb4d9
--- /dev/null
+++ b/libgcc/config/rs6000/_kf_to_sd.c
@@ -0,0 +1,30 @@
+/* Copyright (C) 1989-2020 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foun

[PATCH, rs6000] correct an erroneous BTM value in the BU_P10_MISC define

2020-09-24 Thread will schmidt via Gcc-patches
[PATCH, rs6000] correct an erroneous blip in the BU_P10_MISC define

Hi, 
We have extraneous BTM entry (RS6000_BTM_POWERPC64) in the define for
our P10 MISC 2 builtin definition.  This does not exist for the '0',
'1' or '3' definitions. It appears to me that this was erroneously
copied from the P7 version of the define which contains a version of the
BU macro both with and without that element.  Removing the
RS6000_BTM_POWERPC64 portion of the define does not introduce any obvious
failures, I believe this extra line can be safely removed.

OK for trunk?

Thanks
-Will

diff --git a/gcc/config/rs6000/rs6000-builtin.def 
b/gcc/config/rs6000/rs6000-builtin.def
index e91a48ddf5fe..62c9b77cb76d 100644
--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -1112,12 +1112,11 @@
CODE_FOR_ ## ICODE) /* ICODE */
 
 #define BU_P10_MISC_2(ENUM, NAME, ATTR, ICODE) \
   RS6000_BUILTIN_2 (P10_BUILTIN_ ## ENUM,  /* ENUM */  \
"__builtin_" NAME,  /* NAME */  \
-   RS6000_BTM_P10  \
-   | RS6000_BTM_POWERPC64, /* MASK */  \
+   RS6000_BTM_P10, /* MASK */  \
(RS6000_BTC_ ## ATTR/* ATTR */  \
 | RS6000_BTC_BINARY),  \
CODE_FOR_ ## ICODE) /* ICODE */
 
 #define BU_P10_MISC_3(ENUM, NAME, ATTR, ICODE) \



[PATCH 5/9] PowerPC: Update tests to run if long double is IEEE 128-bit.

2020-09-24 Thread Michael Meissner via Gcc-patches
PowerPC: Update tests to run if long double is IEEE 128-bit.

gcc/testsuite/
2020-09-23  Michael Meissner  

* c-c++-common/dfp/convert-bfp-11.c: If long double is IEEE
128-bit, skip the test.
* gcc.dg/nextafter-2.c: On PowerPC, if long double is IEEE
128-bit, include math.h to get the built-in mapped correctly.
* gcc.target/powerpc/pr70117.c: Add support for long double being
IEEE 128-bit.
---
 gcc/testsuite/c-c++-common/dfp/convert-bfp-11.c |  7 +++
 gcc/testsuite/gcc.dg/nextafter-2.c  | 10 ++
 gcc/testsuite/gcc.target/powerpc/pr70117.c  |  6 --
 3 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/c-c++-common/dfp/convert-bfp-11.c 
b/gcc/testsuite/c-c++-common/dfp/convert-bfp-11.c
index 95c433d2c24..6ee0c1c6ae9 100644
--- a/gcc/testsuite/c-c++-common/dfp/convert-bfp-11.c
+++ b/gcc/testsuite/c-c++-common/dfp/convert-bfp-11.c
@@ -5,6 +5,7 @@
Don't force 128-bit long doubles because runtime support depends
on glibc.  */
 
+#include 
 #include "convert.h"
 
 volatile _Decimal32 sd;
@@ -39,6 +40,12 @@ main ()
   if (sizeof (long double) != 16)
 return 0;
 
+  /* This test is written to test IBM extended double, which is a pair of
+ doubles.  If long double can hold a larger value than a double can, such
+ as when long double is IEEE 128-bit, just exit immediately.  */
+  if (LDBL_MAX_10_EXP > DBL_MAX_10_EXP)
+return 0;
+
   convert_101 ();
   convert_102 ();
 
diff --git a/gcc/testsuite/gcc.dg/nextafter-2.c 
b/gcc/testsuite/gcc.dg/nextafter-2.c
index e51ae94be0c..64e9e3c485f 100644
--- a/gcc/testsuite/gcc.dg/nextafter-2.c
+++ b/gcc/testsuite/gcc.dg/nextafter-2.c
@@ -13,4 +13,14 @@
 #  define NO_LONG_DOUBLE 1
 # endif
 #endif
+
+#if defined(_ARCH_PPC) && defined(__LONG_DOUBLE_IEEE128__)
+/* On PowerPC systems, long double uses either the IBM long double format, or
+   IEEE 128-bit format.  The compiler switches the long double built-in
+   function names and glibc switches the names when math.h is included.
+   Because this test is run with -fno-builtin, include math.h so that the
+   appropriate nextafter functions are called.  */
+#include 
+#endif
+
 #include "nextafter-1.c"
diff --git a/gcc/testsuite/gcc.target/powerpc/pr70117.c 
b/gcc/testsuite/gcc.target/powerpc/pr70117.c
index 3bbd2c595e0..928efe39c7b 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr70117.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr70117.c
@@ -9,9 +9,11 @@
128-bit floating point, because the type is not enabled on those
systems.  */
 #define LDOUBLE __ibm128
+#define IBM128_MAX ((__ibm128) 1.79769313486231580793728971405301199e+308L)
 
 #elif defined(__LONG_DOUBLE_IBM128__)
 #define LDOUBLE long double
+#define IBM128_MAX LDBL_MAX
 
 #else
 #error "long double must be either IBM 128-bit or IEEE 128-bit"
@@ -75,10 +77,10 @@ main (void)
   if (__builtin_isnormal (ld))
 __builtin_abort ();
 
-  ld = LDBL_MAX;
+  ld = IBM128_MAX;
   if (!__builtin_isnormal (ld))
 __builtin_abort ();
-  ld = -LDBL_MAX;
+  ld = -IBM128_MAX;
   if (!__builtin_isnormal (ld))
 __builtin_abort ();
 
-- 
2.22.0


-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797


[PATCH 6/9] PowerPC: If long double is IEEE 128-bit, map q built-ins to *l instead of *f128.

2020-09-24 Thread Michael Meissner via Gcc-patches
PowerPC: If long double is IEEE 128-bit, map q built-ins to *l instead of *f128.

If we map nanq to nanf128 when long double is IEEE, it seems to lose the
special signaling vs. non-signaling NAN support.  This patch maps the functions
to the long double version if long double is IEEE 128-bit.

gcc/
2020-09-23  Michael Meissner  

* config/rs6000/rs6000-c.c (rs6000_cpu_cpp_builtins): If long
double is IEEE-128 map the nanq built-in functions to the long
double function, not the f128 function.
---
 gcc/config/rs6000/rs6000-c.c | 31 ---
 1 file changed, 24 insertions(+), 7 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c
index f5982907e90..8f7a8eec740 100644
--- a/gcc/config/rs6000/rs6000-c.c
+++ b/gcc/config/rs6000/rs6000-c.c
@@ -681,15 +681,32 @@ rs6000_cpu_cpp_builtins (cpp_reader *pfile)
   builtin_define ("__builtin_vsx_xvnmsubmsp=__builtin_vsx_xvnmsubsp");
 }
 
-  /* Map the old _Float128 'q' builtins into the new 'f128' builtins.  */
+  /* Map the old _Float128 'q' builtins into the new 'f128' builtins if long
+ double is IBM or 64-bit.
+
+ However, if long double is IEEE 128-bit, map both sets of built-in
+ functions to the normal long double version.  This shows up in nansf128
+ vs. nanf128.  */
   if (TARGET_FLOAT128_TYPE)
 {
-  builtin_define ("__builtin_fabsq=__builtin_fabsf128");
-  builtin_define ("__builtin_copysignq=__builtin_copysignf128");
-  builtin_define ("__builtin_nanq=__builtin_nanf128");
-  builtin_define ("__builtin_nansq=__builtin_nansf128");
-  builtin_define ("__builtin_infq=__builtin_inff128");
-  builtin_define ("__builtin_huge_valq=__builtin_huge_valf128");
+  if (FLOAT128_IEEE_P (TFmode))
+   {
+ builtin_define ("__builtin_fabsq=__builtin_fabsl");
+ builtin_define ("__builtin_copysignq=__builtin_copysignl");
+ builtin_define ("__builtin_nanq=__builtin_nanl");
+ builtin_define ("__builtin_nansq=__builtin_nansl");
+ builtin_define ("__builtin_infq=__builtin_infl");
+ builtin_define ("__builtin_huge_valq=__builtin_huge_vall");
+   }
+  else
+   {
+ builtin_define ("__builtin_fabsq=__builtin_fabsf128");
+ builtin_define ("__builtin_copysignq=__builtin_copysignf128");
+ builtin_define ("__builtin_nanq=__builtin_nanf128");
+ builtin_define ("__builtin_nansq=__builtin_nansf128");
+ builtin_define ("__builtin_infq=__builtin_inff128");
+ builtin_define ("__builtin_huge_valq=__builtin_huge_valf128");
+   }
 }
 
   /* Tell users they can use __builtin_bswap{16,64}.  */
-- 
2.22.0


-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797


  1   2   >