Re: [PATCH] middle-end IFN_ASSUME support [PR106654]

2022-10-13 Thread Richard Biener via Gcc-patches
On Wed, 12 Oct 2022, Andrew MacLeod wrote:

> 
> On 10/12/22 10:39, Jakub Jelinek wrote:
> > On Wed, Oct 12, 2022 at 10:31:00AM -0400, Andrew MacLeod wrote:
> >> I presume you are looking to get this working for this release, making the
> >> priority high? :-)
> > Yes.  So that we can claim we actually support C++23 Portable Assumptions
> > and OpenMP assume directive's hold clauses for something non-trivial so
> > people won't be afraid to actually use it.
> > Of course, first the posted patch needs to be reviewed and only once it gets
> > in, the ranger/GORI part can follow.  As the latter is only an optimization,
> > it can be done incrementally.
> 
> I will start poking at something to find ranges for parameters from the return
> backwards.

If the return were

  if (return_val)
return return_val;

you could use path-ranger with the parameter SSA default defs as
"interesting".  So you "only" need to somehow interpret the return
statement as such and do path rangers compute_ranges () 

> 
> >> Intersection I believe...?  I think the value from the assume's should add
> >> restrictions to the range..
> > Sure, sorry.
> >
> >> I figured as much, I was just wondering if there might be some way to
> >> "simplify" certain things by processing it and turning each parameter query
> >> into a smaller function returning the range we determined from the main
> >> one...   but perhaps that is more complicated.
> > We don't really know what the condition is, it can be pretty arbitrary
> > expression (well, e.g. for C++ conditional expression, so say
> > [[assume (var = foo ())]];
> > is not valid but
> > [[assume ((var = foo ()))]];
> > is.  And with GNU statement expressions it can do a lot of stuff and until
> > we e.g. inline into it and optimize it a little, we don't really know what
> > it will be like.
> >
> >  
> 
> No, I just meant that once we finally process the complicated function, and
> decide the final range we are storing is for x_1 is say [20,30], we could
> replace the assume call site with something like
> 
>   int assume03_x (x) { if (x>= 20 || x <= 30) return x; gcc_unreachable(); }
> 
> then at call sites:
> 
>    x_5 = assume03_x(x_3);
> 
> For that matter, once all the assume functions have been processed, we could
> textually replace the assume call with an expression which represents the
> determined range...  Kind of our own mini inlining?  Maybe thats even better
> than adding any kind of support in fold_using_range..   just let things
> naturally fall into place?
> 
> .ASSUME_blah ( , , x_4);
> 
> where if x is determined to be [20, 30][50,60] could be textually "expanded"
> in the IL with
> 
>   if (x<20 || x>60 || (x>30 && x < 50)) gcc_unreachcable();
> 
> for each of the parameters?   If we processed this like early inlining, we
> could maybe expose the entire thing to optimization that way?
> 
> Andrew
> 
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)


Re: ifcvt: Fix bitpos calculation in bitfield lowering [PR107229]

2022-10-13 Thread Richard Biener via Gcc-patches
On Wed, 12 Oct 2022, Andre Vieira (lists) wrote:

> Hi,
> 
> The bitposition calculation for the bitfield lowering in loop if conversion
> was not
> taking DECL_FIELD_OFFSET into account, which meant that it would result in
> wrong bitpositions for bitfields that did not end up having representations
> starting at the beginning of the struct.
> 
> Bootstrappend and regression tested on aarch64-none-linux-gnu and
> x86_64-pc-linux-gnu.

+{
+  tree bf_pos = fold_build2 (MULT_EXPR, bitsizetype,
+DECL_FIELD_OFFSET (field_decl),
+build_int_cst (bitsizetype, 8));
+  bf_pos = fold_build2 (PLUS_EXPR, bitsizetype, bf_pos,
+   DECL_FIELD_BIT_OFFSET (field_decl));
+  tree rep_pos = fold_build2 (MULT_EXPR, bitsizetype,
+ DECL_FIELD_OFFSET (rep_decl),
+ build_int_cst (bitsizetype, 8));
+  rep_pos = fold_build2 (PLUS_EXPR, bitsizetype, rep_pos,
+DECL_FIELD_BIT_OFFSET (rep_decl));

you can use the invariant that DECL_FIELD_OFFSET of rep_decl
and field_decl are always the same.  Also please use BITS_PER_UNIT
instead of '8'.

Richard.


Re: vect: Don't pattern match BITFIELD_REF's of non-integrals [PR107226]

2022-10-13 Thread Richard Biener via Gcc-patches
On Wed, 12 Oct 2022, Andre Vieira (lists) wrote:

> Hi,
> 
> The original patch supported matching the vect_recog_bitfield_ref_pattern for
> BITFIELD_REF's where the first operand didn't have a INTEGRAL_TYPE_P type.
> That means it would also match vectors, leading to regressions in targets that
> supported vectorization of those.
> 
> Bootstrappend and regression tested on aarch64-none-linux-gnu and
> x86_64-pc-linux-gnu.

OK.

Richard.

> gcc/ChangeLog:
> 
>     PR tree-optimization/107226
>     * tree-vect-patterns.cc (vect_recog_bitfield_ref_pattern): Reject
>     BITFIELD_REF's with non integral typed first operands.


Re: [PATCH] Optimize nested permutation to single VEC_PERM_EXPR [PR54346]

2022-10-13 Thread Lulu Cheng



在 2022/10/13 下午2:44, Xi Ruoyao 写道:

On Thu, 2022-10-13 at 14:15 +0800, Levy wrote:

Hi RuoYao

It’s probably because loongarch64 doesn’t support
can_vec_perm_const_p(result_mode, op_mode, sel2, false)

I’m not sure whether if loongarch will support it or should I just
limit the test target for pr54346.c?

I'm not sure if we can add TARGET_VECTORIZE_VEC_PERM_CONST when we don't
actually support vector.  (LoongArch has SIMD instructions but the
support in GCC won't be added in a very recent future.)


If what I understand is correct, I think this might be a better solution.

 /* { dg-do compile } */

+/* { dg-require-effective-target vect_perm } */
 /* { dg-options "-O -fdump-tree-dse1" } */



Re: [PATCH] LoongArch: implement count_{leading,trailing}_zeros

2022-10-13 Thread Lulu Cheng

Looks good to me!

Thanks!

在 2022/10/12 下午10:23, Xi Ruoyao 写道:

LoongArch always support clz and ctz instructions, so we can always use
__builtin_{clz,ctz} for count_{leading,trailing}_zeros.  This improves
the code of libgcc, and also benefits Glibc once we merge longlong.h
there.

Bootstrapped and regtested on loongarch64-linux-gnu.

include/ChangeLog:

* longlong.h [__loongarch__] (count_leading_zeros): Define.
[__loongarch__] (count_trailing_zeros): Likewise.
[__loongarch__] (COUNT_LEADING_ZEROS_0): Likewise.
---
  include/longlong.h | 12 
  1 file changed, 12 insertions(+)

diff --git a/include/longlong.h b/include/longlong.h
index 64a7b10f9b2..c3a6f1e7eaa 100644
--- a/include/longlong.h
+++ b/include/longlong.h
@@ -593,6 +593,18 @@ extern UDItype __umulsidi3 (USItype, USItype);
  #define UMUL_TIME 14
  #endif
  
+#ifdef __loongarch__

+# if W_TYPE_SIZE == 32
+#  define count_leading_zeros(count, x)  ((count) = __builtin_clz (x))
+#  define count_trailing_zeros(count, x) ((count) = __builtin_ctz (x))
+#  define COUNT_LEADING_ZEROS_0 32
+# elif W_TYPE_SIZE == 64
+#  define count_leading_zeros(count, x)  ((count) = __builtin_clzll (x))
+#  define count_trailing_zeros(count, x) ((count) = __builtin_ctzll (x))
+#  define COUNT_LEADING_ZEROS_0 64
+# endif
+#endif
+
  #if defined (__M32R__) && W_TYPE_SIZE == 32
  #define add_ss(sh, sl, ah, al, bh, bl) \
/* The cmp clears the condition bit.  */ \




Re: [PATCH] 16/19 modula2 front end: bootstrap and documentation tools

2022-10-13 Thread Martin Liška
On 10/10/22 17:31, Gaius Mulley via Gcc-patches wrote:
>  
> 

Hi!

> This patch set contains the bootstrap linking tool as well as python3
> scripts to automatically generate texi libraries section of the gm2
> documentation.  In the fullness of time this will be changed to emit
> sphinx.

Yep, looking forward to it. I'm going to write an email with Sphinx transition
schedule once Sphinx 5.3 gets released (should happen during the upcoming 
weekend).

I have general comments about the Python scripts:

1) please follow the Python coding style and not the GCC one (I'm going to 
document
it in https://gcc.gnu.org/codingconventions.html under a new Python section).
The easiest approach is using flake8 and the following plugins:

python3-flake8, python3-flake8-builtins, python3-flake8-bugbear, 
python3-flake8-import-order, python3-flake8-quotes

plus, you might want to come up with a setup.cfg like we have in:
./maintainer-scripts/setup.cfg

> 
>  
> --8<--8<--8<--8<--8<--8< 
> diff -ruw /dev/null gcc-git-devel-modula2/gcc/m2/tools-src/tidydates.py
> --- /dev/null 2022-08-24 16:22:16.88870 +0100
> +++ gcc-git-devel-modula2/gcc/m2/tools-src/tidydates.py   2022-10-07 
> 20:21:18.682097332 +0100
> @@ -0,0 +1,184 @@
> +#!/usr/bin/env python3
> +
> +# utility to tidy dates and detect lack of copyright.
> +
> +# Copyright (C) 2016-2022 Free Software Foundation, Inc.
> +#
> +# This file is part of GNU Modula-2.
> +#
> +# GNU Modula-2 is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 3, or (at your option)
> +# any later version.
> +#
> +# GNU Modula-2 is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with GNU Modula-2; see the file COPYING.  If not, write to the
> +# Free Software Foundation, 51 Franklin Street, Fifth Floor, Boston, MA
> +# 02110-1301, USA.
> +
> +import os, sys
> +
> +maxLineLength = 60
> +
> +
> +#
> +#  visitDir - call func for each file below, dir, matching extension, ext.
> +#
> +
> +def visitDir (dir, ext, func):
> +listOfFiles = os.listdir(dir)
> +listOfFiles.sort()
> +for file in listOfFiles:
> +if os.path.isfile(os.path.join(dir, file)):
> +l = len(ext)
> +if (len(file)>l) and (file[-l:] == ext):
> +func(os.path.join(dir, file))

please use pathlib.Path(...).stem

> +elif os.path.isdir(os.path.join(dir, file)):
> +visitDir(os.path.join(dir, file), ext, func)
> +
> +#
> +#  isYear - returns True if, year, is legal.
> +#
> +
> +def isYear (year):
> +if len(year)==5:
> +year = year[:-1]
> +for c in year:
> +if not c.isdigit():
> +return False
> +return True
> +
> +
> +#
> +#  handleCopyright -
> +#
> +
> +def handleCopyright (outfile, lines, n, leader1, leader2):
> +global maxLineLength
> +i = lines[n]
> +c = i.find('Copyright (C) ')+len('Copyright (C)')
> +outfile.write(i[:c])
> +d = i[c:].split()
> +start = c
> +seenDate = True
> +years = []
> +while seenDate:
> +if d == []:
> +n += 1
> +i = lines[n]
> +d = i[2:].split()
> +else:
> +e = d[0]
> +punctuation = ""

Please unify "" and '', you only apostrophes.

> +if len(d)==1:
> +d = []
> +else:
> +d = d[1:]
> +
> +if c>maxLineLength:
> +outfile.write('\n')
> +outfile.write(leader1)
> +outfile.write(leader2)
> +outfile.write(' '*(start-2))
> +c = start
> +
> +if isYear(e):
> +if (e[-1]=='.') or (e[-1]==','):
> +punctuation = e[-1]
> +e = e[:-1]
> +else:
> +punctuation = ""
> +else:
> +seenDate = False
> +if seenDate:
> +if not (e in years):
> +c += len(e) + len(punctuation)
> +outfile.write(' ')
> +outfile.write(e)
> +outfile.write(punctuation)
> +years += [e]
> +else:
> +if start < c:
> +outfile.write('\n')
> +outfile.write(leader1)
> +outfile.write(leader2)
> +outfile.write(' '*(start-2))
> +
> +outfile.write(' ')
> +outfile.write(e)
> +outfile.write(punctuation)
> +for w in d:
> +   

[DOCS] Python Language Conventions

2022-10-13 Thread Martin Liška
I think we should add how Python scripts should be formatted. I noticed
that while reading the Modula-2 patchset where it follows the C/C++ style
when it comes to Python files.

Ready to be installed?
Thanks,
Martin

---
 htdocs/codingconventions.html | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/htdocs/codingconventions.html b/htdocs/codingconventions.html
index e4d30510..180ef35a 100644
--- a/htdocs/codingconventions.html
+++ b/htdocs/codingconventions.html
@@ -80,6 +80,7 @@ the conventions separately from any other changes to the 
code.
 
 
 
+Python Language Conventions
 
 
 
@@ -1483,6 +1484,19 @@ with a right brace, optional closing comment, and a new 
line.
 Definitions within the body of a namespace are not indented.
 
 
+Python Language Conventions
+
+
+Python scripts should follow https://peps.python.org/pep-0008/";>PEP 8 
– Style Guide for Python Code
+which can be verified by flake8 tool.
+We do also recommend using the following flake8 plug-ins:
+
+
+flake8-builtins
+flake8-import-order
+flake8-quotes
+
+
 
 
 
-- 
2.37.3



Re: [DOCS] Python Language Conventions

2022-10-13 Thread Gerald Pfeifer
Hi Martin,

On Thu, 13 Oct 2022, Martin Liška wrote:
> I think we should add how Python scripts should be formatted. I noticed
> that while reading the Modula-2 patchset where it follows the C/C++ style
> when it comes to Python files.

good initiative, thank you! This makes sense to me, alas I'm not a Python 
hacker, so best wait to see what David and Gaius think, too?


Some suggestions on the web side of things:

> +Python Language Conventions

Since the name of the page already is codingconventions.html, I suggest
making this simply "#python" - shorter and simpler. :-)

> +Python scripts should follow https://peps.python.org/pep-0008/";>PEP 
> 8 – Style Guide for Python Code
> +which can be verified by flake8 tool.

...by the...tool.

> +We do also recommend using the following flake8 plug-ins:

Here maybe simply say "We recommend using"?

Hope this helps,
Gerald


Re: [PATCH] middle-end IFN_ASSUME support [PR106654]

2022-10-13 Thread Jakub Jelinek via Gcc-patches
On Thu, Oct 13, 2022 at 08:11:53AM +, Richard Biener wrote:
> On Wed, 12 Oct 2022, Andrew MacLeod wrote:
> 
> > 
> > On 10/12/22 10:39, Jakub Jelinek wrote:
> > > On Wed, Oct 12, 2022 at 10:31:00AM -0400, Andrew MacLeod wrote:
> > >> I presume you are looking to get this working for this release, making 
> > >> the
> > >> priority high? :-)
> > > Yes.  So that we can claim we actually support C++23 Portable Assumptions
> > > and OpenMP assume directive's hold clauses for something non-trivial so
> > > people won't be afraid to actually use it.
> > > Of course, first the posted patch needs to be reviewed and only once it 
> > > gets
> > > in, the ranger/GORI part can follow.  As the latter is only an 
> > > optimization,
> > > it can be done incrementally.
> > 
> > I will start poking at something to find ranges for parameters from the 
> > return
> > backwards.
> 
> If the return were
> 
>   if (return_val)
> return return_val;
> 
> you could use path-ranger with the parameter SSA default defs as
> "interesting".  So you "only" need to somehow interpret the return
> statement as such and do path rangers compute_ranges () 

If it was easier for handling, another possible representation of the
assume_function could be not that it returns a bool where [1,1] returned
means defined behavior, otherwise UB, but that the function returns void
and the assumption is that it returns, the other paths would be
__builtin_unreachable ().  But still in both cases it needs a specialized
backwards walk from the assumption that either it returns [1,1] or that it
returns through GIMPLE_RETURN to be defined behavior.  In either case,
external exceptions, or infinite loops or other reasons why the function
might not return normally (exit/abort/longjmp/non-local goto etc.) are still
UB for assumptions.
Say normally, if we have:
extern void foo (int);

bool
assume1 (int x)
{
  foo (x);
  if (x != 42)
__builtin_unreachable ();
  return true;
}
we can't through backwards ranger walk determine that x_1(D) at the start of
the function has [42,42] range, we can just say it is true at the end of the
function, because foo could do if (x != 42) exit (0); or if (x != 42) throw
1; or if (x != 42) longjmp (buf, 1); or while (x != 42) ; or if (x != 42)
abort ();
But with assumption functions we actually can and stick [42, 42] on the
parameters even when we know nothing about foo function.

Of course, perhaps initially, we can choose to ignore those extra
guarantees.

Jakub



Re: [PATCH] middle-end IFN_ASSUME support [PR106654]

2022-10-13 Thread Jakub Jelinek via Gcc-patches
On Wed, Oct 12, 2022 at 12:12:38PM -0400, Andrew MacLeod wrote:
> No, I just meant that once we finally process the complicated function, and
> decide the final range we are storing is for x_1 is say [20,30], we could
> replace the assume call site with something like
> 
>   int assume03_x (x) { if (x>= 20 || x <= 30) return x; gcc_unreachable(); }
> 
> then at call sites:
> 
>    x_5 = assume03_x(x_3);
> 
> For that matter, once all the assume functions have been processed, we could
> textually replace the assume call with an expression which represents the
> determined range...  Kind of our own mini inlining?  Maybe thats even better
> than adding any kind of support in fold_using_range..   just let things
> naturally fall into place?
> 
> .ASSUME_blah ( , , x_4);
> 
> where if x is determined to be [20, 30][50,60] could be textually "expanded"
> in the IL with
> 
>   if (x<20 || x>60 || (x>30 && x < 50)) gcc_unreachcable();
> 
> for each of the parameters?   If we processed this like early inlining, we
> could maybe expose the entire thing to optimization that way?

That could work for integral parameters, but doesn't work for floating point
nor when builtins are involved.  We do not want to put floating point
comparisons into the IL as __builtin_unreachable (); guards because they
have observable side-effects (floating point exceptions/traps) and we
wouldn't DCE them for those reasons.  Similarly, if there are builtins
involved we don't want to call the corresponding library functions because
something wasn't DCEd.

Jakub



Re: [PATCH V4] rs6000: cannot_force_const_mem for HIGH code rtx[PR106460]

2022-10-13 Thread Kewen.Lin via Gcc-patches
Hi Jeff,

on 2022/10/12 14:48, Jiufu Guo via Gcc-patches wrote:
> Hi,
> 
> As the issue in PR106460, a rtx 'high:DI (symbol_ref:DI ("var_48")' is tried
> to store into constant pool and ICE occur.  But actually, this rtx represents
> partial incomplete address and can not be put into a .rodata section.
> 
> This patch updates rs6000_cannot_force_const_mem to return true for rtx(s) 
> with
> HIGH code, because these rtx(s) indicate part of address and are not ok for
> constant pool.
> 
> Below are some examples:
> (high:DI (const:DI (plus:DI (symbol_ref:DI ("xx") (const_int 12 [0xc])
> (high:DI (symbol_ref:DI ("var_1")..)))
> 
> This patch updated the patchV3 according previous comments.
> https://gcc.gnu.org/pipermail/gcc-patches/2022-September/602308.html
> 
> Bootstrap and regtest pass on ppc64 and ppc64le.
> Is this ok for trunk.

This patch is ok, thanks!

BR,
Kewen

> 
> BR,
> Jeff(Jiufu)
> 
>   PR target/106460
> 
> gcc/ChangeLog:
> 
>   * config/rs6000/rs6000.cc (rs6000_cannot_force_const_mem): Return true
>   for HIGH code rtx.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/powerpc/pr106460.c: New test.
> 
> ---
>  gcc/config/rs6000/rs6000.cc |  7 +--
>  gcc/testsuite/gcc.target/powerpc/pr106460.c | 12 
>  2 files changed, 17 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr106460.c
> 
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index 5f347e9574f..dab66f9213a 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -9759,8 +9759,11 @@ rs6000_init_stack_protect_guard (void)
>  static bool
>  rs6000_cannot_force_const_mem (machine_mode mode ATTRIBUTE_UNUSED, rtx x)
>  {
> -  if (GET_CODE (x) == HIGH
> -  && GET_CODE (XEXP (x, 0)) == UNSPEC)
> +  /* If GET_CODE (x) is HIGH, the 'X' represets the high part of a 
> symbol_ref.
> + It can not be put into a constant pool.  e.g.
> + (high:DI (unspec:DI [(symbol_ref/u:DI ("*.LC0")..)
> + (high:DI (symbol_ref:DI ("var")..)).  */
> +  if (GET_CODE (x) == HIGH)
>  return true;
>  
>/* A TLS symbol in the TOC cannot contain a sum.  */
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr106460.c 
> b/gcc/testsuite/gcc.target/powerpc/pr106460.c
> new file mode 100644
> index 000..aae4b015bba
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr106460.c
> @@ -0,0 +1,12 @@
> +/* { dg-require-effective-target power10_ok } */
> +/* { dg-options "-O1 -mdejagnu-cpu=power10" } */
> +
> +/* (high:DI (symbol_ref:DI ("var_48")..))) should not cause ICE. */
> +extern short var_48;
> +void
> +foo (double *r)
> +{
> +  if (var_48)
> +*r = 1234.5678;
> +}
> +




Re: [DOCS] Python Language Conventions

2022-10-13 Thread Richard Sandiford via Gcc-patches
Martin Liška  writes:
> I think we should add how Python scripts should be formatted. I noticed
> that while reading the Modula-2 patchset where it follows the C/C++ style
> when it comes to Python files.
>
> Ready to be installed?
> Thanks,
> Martin

Did you consider requiring black formatting instead?  Maybe black -l79
to maintain the usual 80-character limit.

At least that way there's only one right answer.

Richard

>
> ---
>  htdocs/codingconventions.html | 14 ++
>  1 file changed, 14 insertions(+)
>
> diff --git a/htdocs/codingconventions.html b/htdocs/codingconventions.html
> index e4d30510..180ef35a 100644
> --- a/htdocs/codingconventions.html
> +++ b/htdocs/codingconventions.html
> @@ -80,6 +80,7 @@ the conventions separately from any other changes to the 
> code.
>  
>  
>  
> +Python Language Conventions
>  
>  
>  
> @@ -1483,6 +1484,19 @@ with a right brace, optional closing comment, and a 
> new line.
>  Definitions within the body of a namespace are not indented.
>  
>  
> +Python Language Conventions
> +
> +
> +Python scripts should follow https://peps.python.org/pep-0008/";>PEP 
> 8 – Style Guide for Python Code
> +which can be verified by flake8 tool.
> +We do also recommend using the following flake8 plug-ins:
> +
> +
> +flake8-builtins
> +flake8-import-order
> +flake8-quotes
> +
> +
>  
>  
>  


pushed: [PATCH] LoongArch: implement count_{leading,trailing}_zeros

2022-10-13 Thread Xi Ruoyao via Gcc-patches
On Thu, 2022-10-13 at 16:43 +0800, Lulu Cheng wrote:
> Looks good to me!
> 
> Thanks!

Pushed r13-3269.

> 
> 在 2022/10/12 下午10:23, Xi Ruoyao 写道:
> > LoongArch always support clz and ctz instructions, so we can always
> > use
> > __builtin_{clz,ctz} for count_{leading,trailing}_zeros.  This
> > improves
> > the code of libgcc, and also benefits Glibc once we merge longlong.h
> > there.
> > 
> > Bootstrapped and regtested on loongarch64-linux-gnu.
> > 
> > include/ChangeLog:
> > 
> > * longlong.h [__loongarch__] (count_leading_zeros): Define.
> > [__loongarch__] (count_trailing_zeros): Likewise.
> > [__loongarch__] (COUNT_LEADING_ZEROS_0): Likewise.
> > ---
> >   include/longlong.h | 12 
> >   1 file changed, 12 insertions(+)
> > 
> > diff --git a/include/longlong.h b/include/longlong.h
> > index 64a7b10f9b2..c3a6f1e7eaa 100644
> > --- a/include/longlong.h
> > +++ b/include/longlong.h
> > @@ -593,6 +593,18 @@ extern UDItype __umulsidi3 (USItype, USItype);
> >   #define UMUL_TIME 14
> >   #endif
> >   
> > +#ifdef __loongarch__
> > +# if W_TYPE_SIZE == 32
> > +#  define count_leading_zeros(count, x)  ((count) = __builtin_clz
> > (x))
> > +#  define count_trailing_zeros(count, x) ((count) = __builtin_ctz
> > (x))
> > +#  define COUNT_LEADING_ZEROS_0 32
> > +# elif W_TYPE_SIZE == 64
> > +#  define count_leading_zeros(count, x)  ((count) = __builtin_clzll
> > (x))
> > +#  define count_trailing_zeros(count, x) ((count) = __builtin_ctzll
> > (x))
> > +#  define COUNT_LEADING_ZEROS_0 64
> > +# endif
> > +#endif
> > +
> >   #if defined (__M32R__) && W_TYPE_SIZE == 32
> >   #define add_ss(sh, sl, ah, al, bh, bl) \
> >     /* The cmp clears the condition bit.  */ \
> 

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v2] rs6000: Rework option -mpowerpc64 handling [PR106680]

2022-10-13 Thread Iain Sandoe



> On 12 Oct 2022, at 09:57, Iain Sandoe  wrote:
>> On 12 Oct 2022, at 09:12, Kewen.Lin  wrote:
> 
>> PR106680 shows that -m32 -mpowerpc64 is different from
>> -mpowerpc64 -m32, this is determined by the way how we
>> handle option powerpc64 in rs6000_handle_option.
>> 
>> Segher pointed out this difference should be taken as
>> a bug and we should ensure that option powerpc64 is
>> independent of -m32/-m64.  So this patch removes the
>> handlings in rs6000_handle_option and add some necessary
>> supports in rs6000_option_override_internal instead.
>> 
>> With this patch, if users specify -m{no-,}powerpc64, the
>> specified value is honoured, otherwise, for 64bit it
>> always enables OPTION_MASK_POWERPC64; while for 32bit
>> and TARGET_POWERPC64 and OS_MISSING_POWERPC64, it disables
>> OPTION_MASK_POWERPC64.
>> 
>> btw, following Segher's suggestion, I did some tries to warn
>> when OPTION_MASK_POWERPC64 is set for OS_MISSING_POWERPC64.
>> If warn for the case that powerpc64 is specified explicitly,
>> there are some TCs using -m32 -mpowerpc64 on ppc64-linux,
>> they need some updates, meanwhile the artificial run
>> with "--target_board=unix'{-m32/-mpowerpc64}'" will have
>> noisy warnings on ppc64-linux.  If warn for the case that
>> it's specified implicitly, they can just be initialized by
>> TARGET_DEFAULT (like -m32 on ppc64-linux) or set from the 
>> given cpu mask, we have to special case them and not to warn.
>> As Segher's latest comment, I decide not to warn them and
>> keep it consistent with before.
>> 
>> Bootstrapped and regress-tested on:
>> - powerpc64-linux-gnu P7 and P8 {-m64,-m32}
>> - powerpc64le-linux-gnu P9 and P10
>> - powerpc-ibm-aix7.2.0.0 {-maix64,-maix32}
>> 
>> Hi Iain, could you help to test this new patch on darwin
>> again?  Thanks in advance!
> 
> I kicked off a bootstrap - and 'check-gcc-c' .. if all goes well, there will 
> be an 
> answer in ≈ 7hours.  If something fails, the answer will be sooner ;)

bootstrapped and tested on powerpc-darwin9, with default CPU configuration.
I have not yet tried tuning or cpu configure options.

testresults compare “nominal" against a recent set (another day elapsed time
would be needed for a proper regtest).

thanks
Iain



Re: ifcvt: Fix bitpos calculation in bitfield lowering [PR107229]

2022-10-13 Thread Andre Vieira (lists) via Gcc-patches

Added some extra comments to describe what is going on there.

On 13/10/2022 09:14, Richard Biener wrote:

On Wed, 12 Oct 2022, Andre Vieira (lists) wrote:


Hi,

The bitposition calculation for the bitfield lowering in loop if conversion
was not
taking DECL_FIELD_OFFSET into account, which meant that it would result in
wrong bitpositions for bitfields that did not end up having representations
starting at the beginning of the struct.

Bootstrappend and regression tested on aarch64-none-linux-gnu and
x86_64-pc-linux-gnu.

+{
+  tree bf_pos = fold_build2 (MULT_EXPR, bitsizetype,
+DECL_FIELD_OFFSET (field_decl),
+build_int_cst (bitsizetype, 8));
+  bf_pos = fold_build2 (PLUS_EXPR, bitsizetype, bf_pos,
+   DECL_FIELD_BIT_OFFSET (field_decl));
+  tree rep_pos = fold_build2 (MULT_EXPR, bitsizetype,
+ DECL_FIELD_OFFSET (rep_decl),
+ build_int_cst (bitsizetype, 8));
+  rep_pos = fold_build2 (PLUS_EXPR, bitsizetype, rep_pos,
+DECL_FIELD_BIT_OFFSET (rep_decl));

you can use the invariant that DECL_FIELD_OFFSET of rep_decl
and field_decl are always the same.  Also please use BITS_PER_UNIT
instead of '8'.

Richard.diff --git a/gcc/testsuite/gcc.dg/vect/pr107229-1.c 
b/gcc/testsuite/gcc.dg/vect/pr107229-1.c
new file mode 100644
index 
..67b432383d057a630746aa00af50c25fcb527d8e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr107229-1.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* PR tree-optimization/107229.  */
+
+int a, c;
+struct {
+  long d;
+  int : 8;
+  int : 27;
+  int e : 21;
+} f;
+void g(int b) { a = a & 1; }
+int main() {
+  while (c)
+g(f.e);
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/pr107229-2.c 
b/gcc/testsuite/gcc.dg/vect/pr107229-2.c
new file mode 100644
index 
..88bffb63d5e8b2d7bcdeae223f4ec6ea4f611bc9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr107229-2.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* PR tree-optimization/107229.  */
+
+int a, c;
+struct {
+  long f;
+  long g;
+  long d;
+  int : 8;
+  int : 27;
+  int e : 21;
+} f;
+void g(int b) { a = a & 1; }
+int main() {
+  while (c)
+g(f.e);
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/pr107229-3.c 
b/gcc/testsuite/gcc.dg/vect/pr107229-3.c
new file mode 100644
index 
..4abd8c14531b40e9dbe9802a8f9a0eabba673c9f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr107229-3.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* PR tree-optimization/107229.  */
+
+int a, c;
+struct {
+  long f;
+  long g;
+  long d;
+  int : 8;
+  int : 32;
+  int : 2;
+  int e : 21;
+} f;
+void g(int b) { a = a & 1; }
+int main() {
+  while (c)
+g(f.e);
+  return 0;
+}
diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
index 
e468a4659fa28a3a31c3390cf19bee65f4590b80..01637c5da08d5a2a00a495522fc9a6436a804398
 100644
--- a/gcc/tree-if-conv.cc
+++ b/gcc/tree-if-conv.cc
@@ -3298,10 +3298,34 @@ get_bitfield_rep (gassign *stmt, bool write, tree 
*bitpos,
 *struct_expr = TREE_OPERAND (comp_ref, 0);
 
   if (bitpos)
-*bitpos
-  = fold_build2 (MINUS_EXPR, bitsizetype,
-DECL_FIELD_BIT_OFFSET (field_decl),
-DECL_FIELD_BIT_OFFSET (rep_decl));
+{
+  /* To calculate the bitposition of the BITFIELD_REF we have to determine
+where our bitfield starts in relation to the container REP_DECL. The
+DECL_FIELD_OFFSET of the original bitfield's member FIELD_DECL tells
+us how many bytes from the start of the structure there are until the
+start of the group of bitfield members the FIELD_DECL belongs to,
+whereas DECL_FIELD_BIT_OFFSET will tell us how many bits from that
+position our actual bitfield member starts.  For the container
+REP_DECL adding DECL_FIELD_OFFSET and DECL_FIELD_BIT_OFFSET will tell
+us the distance between the start of the structure and the start of
+the container, though the first is in bytes and the later other in
+bits.  With this in mind we calculate the bit position of our new
+BITFIELD_REF by subtracting the number of bits between the start of
+the structure and the container from the number of bits from the start
+of the structure and the actual bitfield member. */
+  tree bf_pos = fold_build2 (MULT_EXPR, bitsizetype,
+DECL_FIELD_OFFSET (field_decl),
+build_int_cst (bitsizetype, BITS_PER_UNIT));
+  bf_pos = fold_build2 (PLUS_EXPR, bitsizetype, bf_pos,
+   DECL_FIELD_BIT_OFFSET (field_decl));
+  tree rep_pos = fold_build2 (MULT_EXPR, bitsizetype,
+ DECL_FIELD_OFFSET (rep_decl),
+ build_int_cst

Re: [DOCS] Python Language Conventions

2022-10-13 Thread Martin Liška
On 10/13/22 12:03, Richard Sandiford wrote:
> Martin Liška  writes:
>> I think we should add how Python scripts should be formatted. I noticed
>> that while reading the Modula-2 patchset where it follows the C/C++ style
>> when it comes to Python files.
>>
>> Ready to be installed?
>> Thanks,
>> Martin
> 
> Did you consider requiring black formatting instead?  Maybe black -l79
> to maintain the usual 80-character limit.

No, the automatic formatting might be a next step. About 80 chars, can we relax
that for Python scripts? I think it's hairy restriction these days. 

> 
> At least that way there's only one right answer.

Yep. We can definitely recommend using black as an optional approach,
what do you think?

Martin

> 
> Richard
> 
>>
>> ---
>>  htdocs/codingconventions.html | 14 ++
>>  1 file changed, 14 insertions(+)
>>
>> diff --git a/htdocs/codingconventions.html b/htdocs/codingconventions.html
>> index e4d30510..180ef35a 100644
>> --- a/htdocs/codingconventions.html
>> +++ b/htdocs/codingconventions.html
>> @@ -80,6 +80,7 @@ the conventions separately from any other changes to the 
>> code.
>>  
>>  
>>  
>> +Python Language Conventions
>>  
>>  
>>  
>> @@ -1483,6 +1484,19 @@ with a right brace, optional closing comment, and a 
>> new line.
>>  Definitions within the body of a namespace are not indented.
>>  
>>  
>> +Python Language Conventions
>> +
>> +
>> +Python scripts should follow > href="https://peps.python.org/pep-0008/";>PEP 8 – Style Guide for Python 
>> Code
>> +which can be verified by flake8 tool.
>> +We do also recommend using the following flake8 plug-ins:
>> +
>> +
>> +flake8-builtins
>> +flake8-import-order
>> +flake8-quotes
>> +
>> +
>>  
>>  
>>  



Re: ifcvt: Fix bitpos calculation in bitfield lowering [PR107229]

2022-10-13 Thread Richard Biener via Gcc-patches
On Thu, 13 Oct 2022, Andre Vieira (lists) wrote:

> Added some extra comments to describe what is going on there.

Just to note I was confused and DECL_FIELD_OFFSET can indeed be
different (but then are guaranteed to be constant), so the patch
looks correct.

> On 13/10/2022 09:14, Richard Biener wrote:
> > On Wed, 12 Oct 2022, Andre Vieira (lists) wrote:
> >
> >> Hi,
> >>
> >> The bitposition calculation for the bitfield lowering in loop if conversion
> >> was not
> >> taking DECL_FIELD_OFFSET into account, which meant that it would result in
> >> wrong bitpositions for bitfields that did not end up having representations
> >> starting at the beginning of the struct.
> >>
> >> Bootstrappend and regression tested on aarch64-none-linux-gnu and
> >> x86_64-pc-linux-gnu.
> > +{
> > +  tree bf_pos = fold_build2 (MULT_EXPR, bitsizetype,
> > +DECL_FIELD_OFFSET (field_decl),
> > +build_int_cst (bitsizetype, 8));
> > +  bf_pos = fold_build2 (PLUS_EXPR, bitsizetype, bf_pos,
> > +   DECL_FIELD_BIT_OFFSET (field_decl));
> > +  tree rep_pos = fold_build2 (MULT_EXPR, bitsizetype,
> > + DECL_FIELD_OFFSET (rep_decl),
> > + build_int_cst (bitsizetype, 8));
> > +  rep_pos = fold_build2 (PLUS_EXPR, bitsizetype, rep_pos,
> > +DECL_FIELD_BIT_OFFSET (rep_decl));
> >
> > you can use the invariant that DECL_FIELD_OFFSET of rep_decl
> > and field_decl are always the same.  Also please use BITS_PER_UNIT
> > instead of '8'.
> >
> > Richard.
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)


Re: [PATCH] Optimize identical permutation in my last r13-3212-gb88adba751da63

2022-10-13 Thread Richard Biener via Gcc-patches
On Thu, Oct 13, 2022 at 5:15 AM Liwei Xu  wrote:
>
> Add extra index check when merging VEC_CST, this handles the case when 
> exactly op1 needs to be return.
>
> This fixes:
> FAIL: gcc.dg/tree-ssa/forwprop-19.c scan-tree-dump-not forwprop1 
> "VEC_PERM_EXPR"
>
> gcc/ChangeLog:
>
> PR target/107220
> * match.pd: Check the index of VEC_CST and return the op1 if needed.
> ---
>  gcc/match.pd | 11 ++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 3550c16aaa6..1efdc3abb5d 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -8106,6 +8106,7 @@ and,
>  vec_perm_builder builder0;
>  vec_perm_builder builder1;
>  vec_perm_builder builder2 (nelts, nelts, 1);
> +bool ident_to_1 = true;
>
>  if (!tree_to_vec_perm_builder (&builder0, @3)
> || !tree_to_vec_perm_builder (&builder1, @4))
> @@ -8115,7 +8116,15 @@ and,
>  vec_perm_indices sel1 (builder1, 1, nelts);
>
>  for (int i = 0; i < nelts; i++)
> -  builder2.quick_push (sel0[sel1[i].to_constant ()]);
> +  {
> +int tmp_index = sel0[sel1[i].to_constant ()].to_constant ();
> +builder2.quick_push (sel0[sel1[i].to_constant ()]);
> +if ( i != tmp_index)
> + ident_to_1 = false;
> +  }
> +
> +if (ident_to_1)
> +  return @1;

You can't "return" in match.pd code.  I think the code was fine and the testcase
can be adjusted to scan the subsequent DSE or DCE pass instead.

The "correct" match.pd approach would be to do the if (ident_to_1) check here:

@@ -8124,7 +8124,9 @@ and,

 op0 = vec_perm_indices_to_tree (TREE_TYPE (@4), sel2);
   }
-  (vec_perm @1 @2 { op0; })))
+  (if (ident_to_1)
+   @1
+   (vec_perm @1 @2 { op0; }

I'll see to reject 'return' in c-exprs ;)

Richard.

>
>  vec_perm_indices sel2 (builder2, 2, nelts);
>
> --
> 2.18.2
>


[PATCH] Diagnose return statement in match.pd (with { ... } expressions

2022-10-13 Thread Richard Biener via Gcc-patches
The expression in (with { ... } is used like a statement expression
which means control flow that leaves it is not allowed.  The following
explicitely diagnoses 'return' and fixes up the few cases that crept
into match.pd (oops).  Any such return will prematurely end matching
the current expression.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

* genmatch.c (parser::parse_c_expr): Diagnose 'return'.
* match.pd: Replace 'return' statements in with expressions
with appropriate variants.
---
 gcc/genmatch.cc |   7 +-
 gcc/match.pd| 293 
 2 files changed, 150 insertions(+), 150 deletions(-)

diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc
index a0b22c50ae3..4a8802469cd 100644
--- a/gcc/genmatch.cc
+++ b/gcc/genmatch.cc
@@ -4447,8 +4447,11 @@ parser::parse_c_expr (cpp_ttype start)
   /* If this is possibly a user-defined identifier mark it used.  */
   if (token->type == CPP_NAME)
{
- id_base *idb = get_operator ((const char *)CPP_HASHNODE
- (token->val.node.node)->ident.str);
+ const char *str
+   = (const char *)CPP_HASHNODE (token->val.node.node)->ident.str;
+ if (strcmp (str, "return") == 0)
+   fatal_at (token, "return statement not allowed in C expression");
+ id_base *idb = get_operator (str);
  user_id *p;
  if (idb && (p = dyn_cast (idb)) && p->is_oper_list)
record_operlist (token->src_loc, p);
diff --git a/gcc/match.pd b/gcc/match.pd
index 3550c16aaa6..fd64ad740b6 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -7930,131 +7930,131 @@ and,
 
 /* Build a vector of integers from the tree mask.  */
 vec_perm_builder builder;
-if (!tree_to_vec_perm_builder (&builder, op2))
-  return NULL_TREE;
-
-/* Create a vec_perm_indices for the integer vector.  */
-poly_uint64 nelts = TYPE_VECTOR_SUBPARTS (type);
-bool single_arg = (op0 == op1);
-vec_perm_indices sel (builder, single_arg ? 1 : 2, nelts);
   }
-  (if (sel.series_p (0, 1, 0, 1))
-   { op0; }
-   (if (sel.series_p (0, 1, nelts, 1))
-{ op1; }
-(with
- {
-   if (!single_arg)
- {
-  if (sel.all_from_input_p (0))
-op1 = op0;
-  else if (sel.all_from_input_p (1))
+  (if (tree_to_vec_perm_builder (&builder, op2))
+   (with
+{
+  /* Create a vec_perm_indices for the integer vector.  */
+  poly_uint64 nelts = TYPE_VECTOR_SUBPARTS (type);
+  bool single_arg = (op0 == op1);
+  vec_perm_indices sel (builder, single_arg ? 1 : 2, nelts);
+}
+(if (sel.series_p (0, 1, 0, 1))
+ { op0; }
+ (if (sel.series_p (0, 1, nelts, 1))
+  { op1; }
+  (with
+   {
+ if (!single_arg)
+  {
+if (sel.all_from_input_p (0))
+  op1 = op0;
+else if (sel.all_from_input_p (1))
+  {
+op0 = op1;
+sel.rotate_inputs (1);
+  }
+else if (known_ge (poly_uint64 (sel[0]), nelts))
+  {
+std::swap (op0, op1);
+sel.rotate_inputs (1);
+  }
+  }
+gassign *def;
+tree cop0 = op0, cop1 = op1;
+if (TREE_CODE (op0) == SSA_NAME
+&& (def = dyn_cast  (SSA_NAME_DEF_STMT (op0)))
+&& gimple_assign_rhs_code (def) == CONSTRUCTOR)
+  cop0 = gimple_assign_rhs1 (def);
+if (TREE_CODE (op1) == SSA_NAME
+&& (def = dyn_cast  (SSA_NAME_DEF_STMT (op1)))
+&& gimple_assign_rhs_code (def) == CONSTRUCTOR)
+  cop1 = gimple_assign_rhs1 (def);
+tree t;
+   }
+   (if ((TREE_CODE (cop0) == VECTOR_CST
+|| TREE_CODE (cop0) == CONSTRUCTOR)
+   && (TREE_CODE (cop1) == VECTOR_CST
+   || TREE_CODE (cop1) == CONSTRUCTOR)
+   && (t = fold_vec_perm (type, cop0, cop1, sel)))
+   { t; }
+   (with
+{
+  bool changed = (op0 == op1 && !single_arg);
+  tree ins = NULL_TREE;
+  unsigned at = 0;
+
+  /* See if the permutation is performing a single element
+ insert from a CONSTRUCTOR or constant and use a BIT_INSERT_EXPR
+ in that case.  But only if the vector mode is supported,
+ otherwise this is invalid GIMPLE.  */
+  if (op_mode != BLKmode
+  && (TREE_CODE (cop0) == VECTOR_CST
+  || TREE_CODE (cop0) == CONSTRUCTOR
+  || TREE_CODE (cop1) == VECTOR_CST
+  || TREE_CODE (cop1) == CONSTRUCTOR))
 {
-  op0 = op1;
-  sel.rotate_inputs (1);
+  bool insert_first_p = sel.series_p (1, 1, nelts + 1, 1);
+  if (insert_first_p)
+{
+  /* After canonicalizing the first elt to come from the
+ first vector we only can insert the first el

Re: [PATCH] Optimize nested permutation to single VEC_PERM_EXPR [PR54346]

2022-10-13 Thread Richard Biener via Gcc-patches
On Thu, Oct 13, 2022 at 10:16 AM Lulu Cheng  wrote:
>
>
> 在 2022/10/13 下午2:44, Xi Ruoyao 写道:
> > On Thu, 2022-10-13 at 14:15 +0800, Levy wrote:
> >> Hi RuoYao
> >>
> >> It’s probably because loongarch64 doesn’t support
> >> can_vec_perm_const_p(result_mode, op_mode, sel2, false)
> >>
> >> I’m not sure whether if loongarch will support it or should I just
> >> limit the test target for pr54346.c?
> > I'm not sure if we can add TARGET_VECTORIZE_VEC_PERM_CONST when we don't
> > actually support vector.  (LoongArch has SIMD instructions but the
> > support in GCC won't be added in a very recent future.)
> >
> If what I understand is correct, I think this might be a better solution.
>
>   /* { dg-do compile } */
>
> +/* { dg-require-effective-target vect_perm } */
>   /* { dg-options "-O -fdump-tree-dse1" } */

Btw, what forwprop does is check whether any of the original permutations are
not supported and then elide the supportability check for the result.
The reasoning
is that the original permute(s) would be lowered during vectlower so we can as
well do that for the result.  We should just never turn a supported permutation
sequence into a not supported one.

Richard.

>


Re: [PATCH] Fix emit_group_store regression on big-endian

2022-10-13 Thread Richard Biener via Gcc-patches
On Wed, Oct 12, 2022 at 1:01 AM Eric Botcazou via Gcc-patches
 wrote:
>
> Hi,
>
> the recent optimization implemented for complex modes in:
>   https://gcc.gnu.org/pipermail/gcc-patches/2022-May/595865.html
> contains an oversight for big-endian platforms in the "interesting corner
> case" mentioned in the message: it uses a lowpart SUBREG when the integer
> modes have different sizes, but this does not match the semantics of the
> PARALLELs which have a bundled byte offset; this offset is always zero in the
> code path and the lowpart is not at offset zero on big-endian platforms.
>
> Calling validate_subreg with this zero offset would fix the regression by
> disabling the optimization on big-endian platforms, so instead the attached
> fix adds the appropriate right shift for them.
>
> This fixes the following regressions in the C testsuite on SPARC64/Linux:
> FAIL: gcc.c-torture/execute/20041124-1.c   -O0  execution test
> FAIL: gcc.c-torture/execute/20041124-1.c   -O1  execution test
> FAIL: gcc.c-torture/execute/20041124-1.c   -O2  execution test
> FAIL: gcc.c-torture/execute/20041124-1.c   -O2 -flto -fno-use-linker-plugin -
> flto-partition=none  execution test
> FAIL: gcc.c-torture/execute/20041124-1.c   -O2 -flto -fuse-linker-plugin -fno-
> fat-lto-objects  execution test
> FAIL: gcc.c-torture/execute/20041124-1.c   -O3 -g  execution test
> FAIL: gcc.c-torture/execute/20041124-1.c   -Os  execution test
> FAIL: gcc.dg/compat/struct-by-value-11 c_compat_x_tst.o-c_compat_y_tst.o
> execute
> FAIL: gcc.dg/compat/struct-by-value-12 c_compat_x_tst.o-c_compat_y_tst.o
> execute
> FAIL: tmpdir-gcc.dg-struct-layout-1/t027 c_compat_x_tst.o-c_compat_y_tst.o
> execute
>
> Tested on SPARC64/Linux, OK for the mainline?

OK.

Thanks,
Richard.

>
> 2022-10-11  Eric Botcazou  
>
> * expr.cc (emit_group_stote): Fix handling of modes of different
> sizes for big-endian targets in latest change and add commentary.
>
> --
> Eric Botcazou


Re: [PATCH v2 00/10] Introduce strub: machine-independent stack scrubbing

2022-10-13 Thread Richard Biener via Gcc-patches
On Tue, Oct 11, 2022 at 3:33 PM Alexandre Oliva  wrote:
>
> On Oct 11, 2022, Richard Biener  wrote:
>
> > On Tue, Oct 11, 2022 at 1:57 PM Alexandre Oliva  wrote:
> >>
> >> On Oct 10, 2022, Richard Biener  wrote:
> >>
> >> > As noted in the Cauldron Discussion I think you should do all
> >> > instrumentation post-IPA only to simplify your life not needing to
> >> > handle inlining of instrumentation
> >>
> >> I looked a bit into that after the Cauldron, and recalled why I wanted
> >> to instrument before inlining: in the case of internal strub, that
> >> introduces a wrapper, it's desirable to be able to inline the wrapper.
>
> > I think if the wrapper is created at IPA time it is also available for
> > IPA inlining.
>
> Yeah, but now I'm not sure what you're suggesting.  The wrapper is
> instrumentation, and requires instrumentation of the wrapped
> counterpart, so that can't be post-IPA.

IPA folks can probably explain better but there's IPA (local)
analysis, IPA (global) propagation
and IPA (local) code generation.  You'd instrument (and actually
create the wrappers)
at IPA code generation time but virtually the wrapper would become existent
somewhen during IPA propagation by means of a clone to be materialized
(I understand the wrapper is something like a thunk).  The IPA
propagation phase would
decide which calls should go to the wrapper (and instrumented
function) and which
can use the original uninstrumented function (maybe from local already
strubbed functions).

Richard.

>
> --
> Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
>Free Software Activist   GNU Toolchain Engineer
> Disinformation flourishes because many people care deeply about injustice
> but very few check the facts.  Ask me about 


Re: ifcvt: Fix bitpos calculation in bitfield lowering [PR107229]

2022-10-13 Thread Rainer Orth
Hi Andre,

> The bitposition calculation for the bitfield lowering in loop if conversion
> was not
> taking DECL_FIELD_OFFSET into account, which meant that it would result in
> wrong bitpositions for bitfields that did not end up having representations
> starting at the beginning of the struct.
>
> Bootstrappend and regression tested on aarch64-none-linux-gnu and
> x86_64-pc-linux-gnu.

I tried this patch together with the one for PR tree-optimization/107226
on sparc-sun-solaris2.11 to check if it cures the bootstrap failure
reported in PR tree-optimization/107232.  While this restores bootstrap,
several of the new tests FAIL:

+FAIL: gcc.dg/vect/vect-bitfield-read-1.c -flto -ffat-lto-objects  
scan-tree-dump-times vect "vectorized 1 loops" 1
+FAIL: gcc.dg/vect/vect-bitfield-read-1.c scan-tree-dump-times vect "vectorized 
1 loops" 1
+FAIL: gcc.dg/vect/vect-bitfield-read-2.c -flto -ffat-lto-objects  
scan-tree-dump-times vect "vectorized 1 loops" 1
+FAIL: gcc.dg/vect/vect-bitfield-read-2.c scan-tree-dump-times vect "vectorized 
1 loops" 1
+FAIL: gcc.dg/vect/vect-bitfield-read-3.c -flto -ffat-lto-objects  
scan-tree-dump-times vect "vectorized 2 loops" 1
+FAIL: gcc.dg/vect/vect-bitfield-read-3.c scan-tree-dump-times vect "vectorized 
2 loops" 1
+FAIL: gcc.dg/vect/vect-bitfield-read-4.c -flto -ffat-lto-objects  
scan-tree-dump-times vect "vectorized 1 loops" 1
+FAIL: gcc.dg/vect/vect-bitfield-read-4.c scan-tree-dump-times vect "vectorized 
1 loops" 1
+FAIL: gcc.dg/vect/vect-bitfield-read-6.c -flto -ffat-lto-objects  
scan-tree-dump-times vect "vectorized 1 loops" 1
+FAIL: gcc.dg/vect/vect-bitfield-write-1.c -flto -ffat-lto-objects  
scan-tree-dump-times vect "vectorized 1 loops" 1
+FAIL: gcc.dg/vect/vect-bitfield-write-1.c scan-tree-dump-times vect 
"vectorized 1 loops" 1
+FAIL: gcc.dg/vect/vect-bitfield-write-2.c -flto -ffat-lto-objects  
scan-tree-dump-times vect "vectorized 1 loops" 1
+FAIL: gcc.dg/vect/vect-bitfield-write-2.c scan-tree-dump-times vect 
"vectorized 1 loops" 1
+FAIL: gcc.dg/vect/vect-bitfield-write-3.c -flto -ffat-lto-objects  
scan-tree-dump-times vect "vectorized 1 loops" 1
+FAIL: gcc.dg/vect/vect-bitfield-write-3.c scan-tree-dump-times vect 
"vectorized 1 loops" 1
+FAIL: gcc.dg/vect/vect-bitfield-write-5.c -flto -ffat-lto-objects  
scan-tree-dump-times vect "vectorized 1 loops" 1
+FAIL: gcc.dg/vect/vect-bitfield-write-5.c scan-tree-dump-times vect 
"vectorized 1 loops" 1

For vect-bitfield-read-1.c, the dump has

gcc.dg/vect/vect-bitfield-read-1.c:25:23: note:   ==> examining pattern def 
statement: patt_31 = patt_30 >> 1;
gcc.dg/vect/vect-bitfield-read-1.c:25:23: note:   ==> examining statement: 
patt_31 = patt_30 >> 1;
gcc.dg/vect/vect-bitfield-read-1.c:25:23: note:   vect_is_simple_use: operand 
_ifc__27 & 4294967294, type of def: internal
gcc.dg/vect/vect-bitfield-read-1.c:25:23: note:   vect_is_simple_use: vectype 
vector(2) unsigned int
gcc.dg/vect/vect-bitfield-read-1.c:25:23: note:   vect_is_simple_use: operand 
1, type of def: constant
gcc.dg/vect/vect-bitfield-read-1.c:25:23: missed:   op not supported by target.
gcc.dg/vect/vect-bitfield-read-1.c:23:1: missed:   not vectorized: relevant 
stmt not supported: patt_31 = patt_30 >> 1;
gcc.dg/vect/vect-bitfield-read-1.c:25:23: missed:  bad operation or unsupported 
loop bound.
gcc.dg/vect/vect-bitfield-read-1.c:25:23: note:  * Analysis  failed with 
vector mode V2SI

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH 1/2] gcov: test switch/break line counts

2022-10-13 Thread Richard Biener via Gcc-patches
On Tue, Oct 11, 2022 at 2:43 PM Jørgen Kvalsvik
 wrote:
>
> The coverage support will under some conditions decide to split edges to
> accurately report coverage. By running the test suite with/without this
> edge splitting a small diff shows up, addressed by this patch, which
> should catch future regressions.
>
> Removing the edge splitting:
>
> $ diff --git a/gcc/profile.cc b/gcc/profile.cc
> --- a/gcc/profile.cc
> +++ b/gcc/profile.cc
> @@ -1244,19 +1244,7 @@ branch_prob (bool thunk)
> Don't do that when the locuses match, so
> if (blah) goto something;
> is not computed twice.  */
> - if (last
> - && gimple_has_location (last)
> - && !RESERVED_LOCATION_P (e->goto_locus)
> - && !single_succ_p (bb)
> - && (LOCATION_FILE (e->goto_locus)
> - != LOCATION_FILE (gimple_location (last))
> - || (LOCATION_LINE (e->goto_locus)
> - != LOCATION_LINE (gimple_location (last)
> -   {
> - basic_block new_bb = split_edge (e);
> - edge ne = single_succ_edge (new_bb);
> - ne->goto_locus = e->goto_locus;
> -   }
> +
> if ((e->flags & (EDGE_ABNORMAL | EDGE_ABNORMAL_CALL))
> && e->dest != EXIT_BLOCK_PTR_FOR_FN (cfun))
> need_exit_edge = 1;
>
> Assuming the .gcov files from make chec-gcc RUNTESTFLAGS=gcov.exp are
> kept:
>
> $ diff -r no-split-edge with-split-edge | grep -C 2 -E "^[<>]\s\s"
> diff -r sans-split-edge/gcc/gcov-4.c.gcov 
> with-split-edge/gcc/gcov-4.c.gcov
> 228c228
> < -:  224:break;
> ---
> > 1:  224:break;
> 231c231
> < -:  227:break;
> ---
> > #:  227:break;
> 237c237
> < -:  233:break;
> ---
> > 2:  233:break;
>
> gcc/testsuite/ChangeLog:

OK.

Thanks,
Richard.

>
> * g++.dg/gcov/gcov-1.C: Add line count check.
> * gcc.misc-tests/gcov-4.c: Likewise.
> ---
>  gcc/testsuite/g++.dg/gcov/gcov-1.C| 8 
>  gcc/testsuite/gcc.misc-tests/gcov-4.c | 4 ++--
>  2 files changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/gcc/testsuite/g++.dg/gcov/gcov-1.C 
> b/gcc/testsuite/g++.dg/gcov/gcov-1.C
> index 9018b9a3a73..ee383b480a8 100644
> --- a/gcc/testsuite/g++.dg/gcov/gcov-1.C
> +++ b/gcc/testsuite/g++.dg/gcov/gcov-1.C
> @@ -257,20 +257,20 @@ test_switch (int i, int j)
>switch (i)   /* count(5) */
> /* branch(end) */
>  {
> -  case 1:
> +  case 1:  /* count(1) */
>  result = do_something (2); /* count(1) */
> -break;
> +break; /* count(1) */
>case 2:
>  result = do_something (1024);
>  break;
> -  case 3:
> +  case 3:  /* count(3) */
>case 4:
> /* branch(67) */
>  if (j == 2)/* count(3) */
> /* branch(end) */
>return do_something (4); /* count(1) */
>  result = do_something (8); /* count(2) */
> -break;
> +break; /* count(2) */
>default:
> result = do_something (32); /* count(1) */
> switch_m++; /* count(1) */
> diff --git a/gcc/testsuite/gcc.misc-tests/gcov-4.c 
> b/gcc/testsuite/gcc.misc-tests/gcov-4.c
> index 9d8ab1c1097..498d299b66b 100644
> --- a/gcc/testsuite/gcc.misc-tests/gcov-4.c
> +++ b/gcc/testsuite/gcc.misc-tests/gcov-4.c
> @@ -221,7 +221,7 @@ test_switch (int i, int j)
>  {
>case 1:
>  result = do_something (2); /* count(1) */
> -break;
> +break; /* count(1) */
>case 2:
>  result = do_something (1024);
>  break;
> @@ -230,7 +230,7 @@ test_switch (int i, int j)
>  if (j == 2)/* count(3) */
>return do_something (4); /* count(1) */
>  result = do_something (8); /* count(2) */
> -break;
> +break; /* count(2) */
>default:
> result = do_something (32); /* count(1) */
> switch_m++; /* count(1) */
> --
> 2.34.0
>


Re: [PATCH 2/2] gcov: test line count for label in then/else block

2022-10-13 Thread Richard Biener via Gcc-patches
On Tue, Oct 11, 2022 at 2:43 PM Jørgen Kvalsvik
 wrote:
>
> Add a test to catch regression in line counts for labels on top of
> then/else blocks. Only the 'goto ' should contribute to the line
> counter for the label, not the if.

OK.

> gcc/testsuite/ChangeLog:
>
> * gcc.misc-tests/gcov-4.c:
> ---
>  gcc/testsuite/gcc.misc-tests/gcov-4.c | 26 +-
>  1 file changed, 25 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/testsuite/gcc.misc-tests/gcov-4.c 
> b/gcc/testsuite/gcc.misc-tests/gcov-4.c
> index 498d299b66b..da7929ef7fc 100644
> --- a/gcc/testsuite/gcc.misc-tests/gcov-4.c
> +++ b/gcc/testsuite/gcc.misc-tests/gcov-4.c
> @@ -110,6 +110,29 @@ lab2:
>return 8;/* count(1) */
>  }
>
> +int
> +test_goto3 (int i, int j)
> +{
> +if (j) goto else_; /* count(1) */
> +
> +top:
> +if (i) /* count(1) */
> +  {
> +   i = do_something (i);
> +  }
> +else
> +  {
> +else_: /* count(1) */
> +   j = do_something (j);   /* count(2) */
> +   if (j)  /* count(2) */
> + {
> +   j = 0;  /* count(1) */
> +   goto top;   /* count(1) */
> + }
> +  }
> +return 16;
> +}
> +
>  void
>  call_goto ()
>  {
> @@ -117,6 +140,7 @@ call_goto ()
>goto_val += test_goto1 (1);
>goto_val += test_goto2 (3);
>goto_val += test_goto2 (30);
> +  goto_val += test_goto3 (0, 1);
>  }
>
>  /* Check nested if-then-else statements. */
> @@ -260,7 +284,7 @@ main()
>call_unref ();
>if ((for_val1 != 12)
>|| (for_val2 != 87)
> -  || (goto_val != 15)
> +  || (goto_val != 31)
>|| (ifelse_val1 != 31)
>|| (ifelse_val2 != 23)
>|| (ifelse_val3 != 246)
> --
> 2.34.0
>


[PATCH] Fix bogus -Wstringop-overflow warning

2022-10-13 Thread Eric Botcazou via Gcc-patches
Hi,

if you compile the attached testcase with -O2 -fno-inline -Wall, you get:

In function 'process_array3':
cc1: warning: 'process_array4' accessing 4 bytes in a region of size 3 [-
Wstringop-overflow=]
cc1: note: referencing argument 1 of type 'char[4]'
t.c:6:6: note: in a call to function 'process_array4'
6 | void process_array4 (char a[4], int n)
  |  ^~
cc1: warning: 'process_array4' accessing 4 bytes in a region of size 3 [-
Wstringop-overflow=]
cc1: note: referencing argument 1 of type 'char[4]'
t.c:6:6: note: in a call to function 'process_array4'

That's because the ICF IPA pass has identified the two functions and turned 
process_array3 into a wrapper of process_array4.  This looks sensible to me 
given that the only difference between them is an "access" attribute on their 
type describing the access size of the parameter and the "access" attribute 
does not affect type identity (struct attribute_spec.affects_type_identity).

Hence the proposed fix, tested on x86-64/Linux, OK for the mainline?


2022-10-13  Eric Botcazou  

* gimple-ssa-warn-access.cc (pass_waccess::check_call): Return
early for calls made from thunks.


2022-10-13  Eric Botcazou  

* gcc.dg/Wstringop-overflow-89.c: New test.

-- 
Eric Botcazoudiff --git a/gcc/gimple-ssa-warn-access.cc b/gcc/gimple-ssa-warn-access.cc
index 04aa849a4b1..59a70530600 100644
--- a/gcc/gimple-ssa-warn-access.cc
+++ b/gcc/gimple-ssa-warn-access.cc
@@ -4291,14 +4291,18 @@ pass_waccess::check_pointer_uses (gimple *stmt, tree ptr,
 void
 pass_waccess::check_call (gcall *stmt)
 {
-  if (gimple_call_builtin_p (stmt, BUILT_IN_NORMAL))
-check_builtin (stmt);
+  /* Skip special calls generated by the compiler.  */
+  if (gimple_call_from_thunk_p (stmt))
+return;
 
   /* .ASAN_MARK doesn't access any vars, only modifies shadow memory.  */
   if (gimple_call_internal_p (stmt)
   && gimple_call_internal_fn (stmt) == IFN_ASAN_MARK)
 return;
 
+  if (gimple_call_builtin_p (stmt, BUILT_IN_NORMAL))
+check_builtin (stmt);
+
   if (!m_early_checks_p)
 if (tree callee = gimple_call_fndecl (stmt))
   {
/* { dg-do compile } */
/* { dg-options "-O2 -fno-inline -Wall" } */

extern void process (char);

void process_array4 (char a[4], int n)
{
  for (int i = 0; i < n; i++)
process (a[i]);
}

void process_array3 (char a[3], int n)
{
  for (int i = 0; i < n; i++)
process (a[i]);
}


[COMMITTED] Add op1_op2_relation for float operands.

2022-10-13 Thread Aldy Hernandez via Gcc-patches
op1_op2_relation can be called for relops (bool = a < b) as well as
regular binary operators (z = a + b).  This patch adds the overloaded
method for floating point results.

gcc/ChangeLog:

* range-op-float.cc (range_operator_float::op1_op2_relation): New.
(class foperator_equal): Add using.
(class foperator_not_equal): Same.
(class foperator_lt): Same.
(class foperator_le): Same.
(class foperator_gt): Same.
(class foperator_ge): Same.
* range-op.cc (range_op_handler::op1_op2_relation): New.
* range-op.h (range_operator_float::op1_op2_relation): New.
---
 gcc/range-op-float.cc | 12 
 gcc/range-op.cc   |  4 +++-
 gcc/range-op.h|  1 +
 3 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/gcc/range-op-float.cc b/gcc/range-op-float.cc
index 229b9d23351..23e0f5ef4e2 100644
--- a/gcc/range-op-float.cc
+++ b/gcc/range-op-float.cc
@@ -160,6 +160,12 @@ range_operator_float::op1_op2_relation (const irange &lhs 
ATTRIBUTE_UNUSED) cons
   return VREL_VARYING;
 }
 
+relation_kind
+range_operator_float::op1_op2_relation (const frange &lhs ATTRIBUTE_UNUSED) 
const
+{
+  return VREL_VARYING;
+}
+
 // Return TRUE if OP1 is known to be free of NANs.
 
 static inline bool
@@ -338,6 +344,7 @@ class foperator_equal : public range_operator_float
   using range_operator_float::fold_range;
   using range_operator_float::op1_range;
   using range_operator_float::op2_range;
+  using range_operator_float::op1_op2_relation;
 public:
   bool fold_range (irange &r, tree type,
   const frange &op1, const frange &op2,
@@ -444,6 +451,7 @@ class foperator_not_equal : public range_operator_float
 {
   using range_operator_float::fold_range;
   using range_operator_float::op1_range;
+  using range_operator_float::op1_op2_relation;
 public:
   bool fold_range (irange &r, tree type,
   const frange &op1, const frange &op2,
@@ -545,6 +553,7 @@ class foperator_lt : public range_operator_float
   using range_operator_float::fold_range;
   using range_operator_float::op1_range;
   using range_operator_float::op2_range;
+  using range_operator_float::op1_op2_relation;
 public:
   bool fold_range (irange &r, tree type,
   const frange &op1, const frange &op2,
@@ -660,6 +669,7 @@ class foperator_le : public range_operator_float
   using range_operator_float::fold_range;
   using range_operator_float::op1_range;
   using range_operator_float::op2_range;
+  using range_operator_float::op1_op2_relation;
 public:
   bool fold_range (irange &r, tree type,
   const frange &op1, const frange &op2,
@@ -767,6 +777,7 @@ class foperator_gt : public range_operator_float
   using range_operator_float::fold_range;
   using range_operator_float::op1_range;
   using range_operator_float::op2_range;
+  using range_operator_float::op1_op2_relation;
 public:
   bool fold_range (irange &r, tree type,
   const frange &op1, const frange &op2,
@@ -882,6 +893,7 @@ class foperator_ge : public range_operator_float
   using range_operator_float::fold_range;
   using range_operator_float::op1_range;
   using range_operator_float::op2_range;
+  using range_operator_float::op1_op2_relation;
 public:
   bool fold_range (irange &r, tree type,
   const frange &op1, const frange &op2,
diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index 16fa1f4f46d..f8255dd10a1 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -4476,7 +4476,9 @@ range_op_handler::op1_op2_relation (const vrange &lhs) 
const
   gcc_checking_assert (m_valid);
   if (m_int)
 return m_int->op1_op2_relation (as_a  (lhs));
-  return m_float->op1_op2_relation (as_a  (lhs));
+  if (is_a  (lhs))
+return m_float->op1_op2_relation (as_a  (lhs));
+  return m_float->op1_op2_relation (as_a  (lhs));
 }
 
 // Cast the range in R to TYPE.
diff --git a/gcc/range-op.h b/gcc/range-op.h
index b2f063afb07..48adcecc7c6 100644
--- a/gcc/range-op.h
+++ b/gcc/range-op.h
@@ -160,6 +160,7 @@ public:
  const frange &op2,
  relation_kind = VREL_VARYING) const;
   virtual relation_kind op1_op2_relation (const irange &lhs) const;
+  virtual relation_kind op1_op2_relation (const frange &lhs) const;
 };
 
 class range_op_handler
-- 
2.37.3



[PATCH] [PR24021] Implement PLUS_EXPR range-op entry for floats.

2022-10-13 Thread Aldy Hernandez via Gcc-patches
[Jakub, this is a cleaned up version of what we iterated on earlier
this summer.  It contains additional smarts to propagate NAN signs on
entry.  I'd like a nod before committing.]

This is the range-op entry for floating point PLUS_EXPR.  It's the
most intricate range entry we have so far, because we need to keep
track of rounding and target FP formats.  This will be the last FP
entry I commit, mostly to avoid disturbing the tree any further, and
also because what we have so far is enough for a solid VRP.

So far we track NANs and signs correctly.  We also handle relationals
(symbolics and numeric), both ordered and unordered, ABS_EXPR and
NEGATE_EXPR which are used to fold __builtin_isinf, and __builtin_sign
(__builtin_copysign is coming up).  All in all, I think this provide
more than enough for basic VRP on floats, as well as provide a basis
to flesh out the rest if there's interest.

My goal with this entry is to provide a template for additional binary
operators, as they tend to follow a similar pattern: handle NANs, do
the arithmetic while keeping track of rounding, and adjust for NAN.  I
may abstract the general parts as we do for irange's fold_range and
wi_fold.

Oh yeah... and I'd like to finally close this PR ;-).

How does this look?

PR tree-optimization/24021

gcc/ChangeLog:

* range-op-float.cc (update_nan_sign): New.
(propagate_nans): New.
(frange_nextafter): New.
(frange_arithmetic): New.
(class foperator_plus): New.
(floating_op_table::floating_op_table): Add PLUS_EXPR entry.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/vrp-float-plus.c: New test.
---
 gcc/range-op-float.cc | 171 ++
 .../gcc.dg/tree-ssa/vrp-float-plus.c  |  21 +++
 2 files changed, 192 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/vrp-float-plus.c

diff --git a/gcc/range-op-float.cc b/gcc/range-op-float.cc
index 23e0f5ef4e2..a967c4da393 100644
--- a/gcc/range-op-float.cc
+++ b/gcc/range-op-float.cc
@@ -200,6 +200,124 @@ frelop_early_resolve (irange &r, tree type,
  && relop_early_resolve (r, type, op1, op2, rel, my_rel));
 }
 
+// If R contains a NAN of unknown sign, update the NAN's signbit
+// depending on two operands.
+
+inline void
+update_nan_sign (frange &r, const frange &op1, const frange &op2)
+{
+  if (!r.maybe_isnan ())
+return;
+
+  bool op1_nan = op1.maybe_isnan ();
+  bool op2_nan = op2.maybe_isnan ();
+  bool sign1, sign2;
+
+  gcc_checking_assert (!r.nan_signbit_p (sign1));
+  if (op1_nan && op2_nan)
+{
+  if (op1.nan_signbit_p (sign1) && op2.nan_signbit_p (sign2))
+   r.update_nan (sign1 | sign2);
+}
+  else if (op1_nan)
+{
+  if (op1.nan_signbit_p (sign1))
+   r.update_nan (sign1);
+}
+  else if (op2_nan)
+{
+  if (op2.nan_signbit_p (sign2))
+   r.update_nan (sign2);
+}
+}
+
+// If either operand is a NAN, set R to the combination of both NANs
+// signwise and return TRUE.
+
+inline bool
+propagate_nans (frange &r, const frange &op1, const frange &op2)
+{
+  if (op1.known_isnan () || op2.known_isnan ())
+{
+  r.set_nan (op1.type ());
+  update_nan_sign (r, op1, op2);
+  return true;
+}
+  return false;
+}
+
+// Set VALUE to its next real value, or INF if the operation overflows.
+
+inline void
+frange_nextafter (enum machine_mode mode,
+ REAL_VALUE_TYPE &value,
+ const REAL_VALUE_TYPE &inf)
+{
+  const real_format *fmt = REAL_MODE_FORMAT (mode);
+  REAL_VALUE_TYPE tmp;
+  bool overflow = real_nextafter (&tmp, fmt, &value, &inf);
+  if (overflow)
+value = inf;
+  else
+value = tmp;
+}
+
+// Like real_arithmetic, but round the result to INF if the operation
+// produced inexact results.
+//
+// ?? There is still one problematic case, i387.  With
+// -fexcess-precision=standard we perform most SF/DFmode arithmetic in
+// XFmode (long_double_type_node), so that case is OK.  But without
+// -mfpmath=sse, all the SF/DFmode computations are in XFmode
+// precision (64-bit mantissa) and only occassionally rounded to
+// SF/DFmode (when storing into memory from the 387 stack).  Maybe
+// this is ok as well though it is just occassionally more precise. ??
+
+static void
+frange_arithmetic (enum tree_code code, tree type,
+  REAL_VALUE_TYPE &result,
+  const REAL_VALUE_TYPE &op1,
+  const REAL_VALUE_TYPE &op2,
+  const REAL_VALUE_TYPE &inf)
+{
+  REAL_VALUE_TYPE value;
+  enum machine_mode mode = TYPE_MODE (type);
+  bool mode_composite = MODE_COMPOSITE_P (mode);
+
+  bool inexact = real_arithmetic (&value, code, &op1, &op2);
+  real_convert (&result, mode, &value);
+
+  // If real_convert above has rounded an inexact value to towards
+  // inf, we can keep the result as is, otherwise we'll adjust by 1 ulp
+  // later (real_nextafter).
+  bool rounding = (flag_rounding_math
+  && (real_isneg

Re: [PATCH] [PR24021] Implement PLUS_EXPR range-op entry for floats.

2022-10-13 Thread Toon Moene

On 10/13/22 14:36, Aldy Hernandez via Gcc-patches wrote:


PR tree-optimization/24021


Ah - Verboten in Fortran:

$ cat d.f
  DOUBLE PRECISION A, X
  A = 0.0
  DO X = 0.1, 1.0
 A = A + X
  ENDDO
  END
$ gfortran d.f
d.f:3:9:

3 |   DO X = 0.1, 1.0
  | 1
Warning: Deleted feature: Loop variable at (1) must be integer
d.f:3:12:

3 |   DO X = 0.1, 1.0
  |1
Warning: Deleted feature: Start expression in DO loop at (1) must be integer
d.f:3:17:

3 |   DO X = 0.1, 1.0
  | 1
Warning: Deleted feature: End expression in DO loop at (1) must be integer

--
Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands



Re: [PATCH v2 00/10] Introduce strub: machine-independent stack scrubbing

2022-10-13 Thread Alexandre Oliva via Gcc-patches
On Oct 13, 2022, Richard Biener  wrote:

> On Tue, Oct 11, 2022 at 3:33 PM Alexandre Oliva  wrote:
>> 
>> On Oct 11, 2022, Richard Biener  wrote:
>> 
>> > On Tue, Oct 11, 2022 at 1:57 PM Alexandre Oliva  wrote:
>> >>
>> >> On Oct 10, 2022, Richard Biener  wrote:
>> >>
>> >> > As noted in the Cauldron Discussion I think you should do all
>> >> > instrumentation post-IPA only to simplify your life not needing to
>> >> > handle inlining of instrumentation
>> >>
>> >> I looked a bit into that after the Cauldron, and recalled why I wanted
>> >> to instrument before inlining: in the case of internal strub, that
>> >> introduces a wrapper, it's desirable to be able to inline the wrapper.
>> 
>> > I think if the wrapper is created at IPA time it is also available for
>> > IPA inlining.
>> 
>> Yeah, but now I'm not sure what you're suggesting.  The wrapper is
>> instrumentation, and requires instrumentation of the wrapped
>> counterpart, so that can't be post-IPA.

> IPA folks can probably explain better but there's IPA (local)
> analysis, IPA (global) propagation
> and IPA (local) code generation.

I think we're miscommunicating.

None of these are post-IPA, they're all part of IPA.

At first, you'd suggested instrumentation to be made post-IPA, to avoid
the trouble of inlining instrumentation.  But we *do* want to inline the
instrumentation.

Now you seem to be suggesting a major revamp of the implementation to
integrate it into IPA, rather than post-IPA, while I keep on trying to
reconcile that with the initial recommendation of moving it post-IPA.

Have you dropped the initial recommendation, and moved on to an
unrelated recommendation?

If so, I can stop trying to reconcile the unrelated recommendations as
if they were related, and focus on the newer one alone.

My reasons to not want to integrate strub tightly into IPA
infrastructure was that there was no perceived benefit from a tighter
integration, and I wasn't sure the feature would be welcome, so I
designed something that could be added in a very standalone way, maybe
even as a plugin.

Maybe there is interest and this decoupling could be reduced, but
there'd have to be very compelling reasons to justify undergoing such
major reengineering.

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


[PATCH] tree-optimization/107160 - avoid reusing multiple accumulators

2022-10-13 Thread Richard Biener via Gcc-patches
Epilogue vectorization is not set up to re-use a vectorized
accumulator consisting of more than one vector.  For non-SLP
we always reduce to a single but for SLP that isn't happening.
In such case we currenlty miscompile the epilog so avoid this.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/107160
* tree-vect-loop.cc (vect_create_epilog_for_reduction):
Do not register accumulator if we failed to reduce it
to a single vector.

* gcc.dg/vect/pr107160.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/pr107160.c | 41 
 gcc/tree-vect-loop.cc|  3 +-
 2 files changed, 43 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr107160.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr107160.c 
b/gcc/testsuite/gcc.dg/vect/pr107160.c
new file mode 100644
index 000..4f9f853cafb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr107160.c
@@ -0,0 +1,41 @@
+/* { dg-do run } */
+
+#include 
+
+#define N 128
+float fl[N];
+
+__attribute__ ((noipa)) void
+init ()
+{
+  for (int i = 0; i < N; i++)
+fl[i] = i;
+}
+
+__attribute__ ((noipa)) float
+foo (int n1)
+{
+  float sum0, sum1, sum2, sum3;
+  sum0 = sum1 = sum2 = sum3 = 0.0f;
+
+  int n = (n1 / 4) * 4;
+  for (int i = 0; i < n; i += 4)
+{
+  sum0 += fabs (fl[i]);
+  sum1 += fabs (fl[i + 1]);
+  sum2 += fabs (fl[i + 2]);
+  sum3 += fabs (fl[i + 3]);
+}
+
+  return sum0 + sum1 + sum2 + sum3;
+}
+
+int
+main ()
+{
+  init ();
+  float res = foo (80);
+  if (res != 3160)
+__builtin_abort ();
+  return 0;
+}
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 1996ecfee7a..b1442a93581 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -6232,7 +6232,8 @@ vect_create_epilog_for_reduction (loop_vec_info 
loop_vinfo,
 }
 
   /* Record this operation if it could be reused by the epilogue loop.  */
-  if (STMT_VINFO_REDUC_TYPE (reduc_info) == TREE_CODE_REDUCTION)
+  if (STMT_VINFO_REDUC_TYPE (reduc_info) == TREE_CODE_REDUCTION
+  && vec_num == 1)
 loop_vinfo->reusable_accumulators.put (scalar_results[0],
   { orig_reduc_input, reduc_info });
 
-- 
2.35.3


Re: [PATCH] middle-end IFN_ASSUME support [PR106654]

2022-10-13 Thread Andrew MacLeod via Gcc-patches



On 10/13/22 05:53, Jakub Jelinek wrote:

On Thu, Oct 13, 2022 at 08:11:53AM +, Richard Biener wrote:

On Wed, 12 Oct 2022, Andrew MacLeod wrote:


On 10/12/22 10:39, Jakub Jelinek wrote:

On Wed, Oct 12, 2022 at 10:31:00AM -0400, Andrew MacLeod wrote:

I presume you are looking to get this working for this release, making the
priority high? :-)

Yes.  So that we can claim we actually support C++23 Portable Assumptions
and OpenMP assume directive's hold clauses for something non-trivial so
people won't be afraid to actually use it.
Of course, first the posted patch needs to be reviewed and only once it gets
in, the ranger/GORI part can follow.  As the latter is only an optimization,
it can be done incrementally.

I will start poking at something to find ranges for parameters from the return
backwards.

If the return were

   if (return_val)
 return return_val;

you could use path-ranger with the parameter SSA default defs as
"interesting".  So you "only" need to somehow interpret the return
statement as such and do path rangers compute_ranges ()

If it was easier for handling, another possible representation of the
assume_function could be not that it returns a bool where [1,1] returned
means defined behavior, otherwise UB, but that the function returns void
and the assumption is that it returns, the other paths would be
__builtin_unreachable ().  But still in both cases it needs a specialized
backwards walk from the assumption that either it returns [1,1] or that it
returns through GIMPLE_RETURN to be defined behavior.  In either case,
external exceptions, or infinite loops or other reasons why the function
might not return normally (exit/abort/longjmp/non-local goto etc.) are still
UB for assumptions.
Say normally, if we have:
extern void foo (int);

bool
assume1 (int x)
{
   foo (x);
   if (x != 42)
 __builtin_unreachable ();
   return true;
}
we can't through backwards ranger walk determine that x_1(D) at the start of
the function has [42,42] range, we can just say it is true at the end of the
function, because foo could do if (x != 42) exit (0); or if (x != 42) throw
1; or if (x != 42) longjmp (buf, 1); or while (x != 42) ; or if (x != 42)
abort ();
But with assumption functions we actually can and stick [42, 42] on the
parameters even when we know nothing about foo function.

Of course, perhaps initially, we can choose to ignore those extra
guarantees.

I dont think we need to change anything.  All I intend to do is provide 
something that looks for the returns, wire GORI in, and reuse a global 
range table in to a reverse dependency walk to starting with a range of 
[1,1] for whatever is on the return stmt.


From GORIs point of view, thats all outgoing_edge_range_p () does, 
except it picks up the initial value of [0,0] or [1,1]  from the 
specified edge instead.


Initially It'll stop at the top of the block, but I don't think its too 
much work beyond that provide "simple" processing of PHIs and edges 
coming into the block..  In the absence of loops it should be pretty 
straightforward.  "All" you do is feed the value of the phi argument to 
the previous block.  Of course it'll probably be a little more 
complicated than that, but the basic premise seems pretty straightforward.


The result produced would be vector over the ssa-names in the function 
with any ranges that were determined.   You could use that to more 
efficiently store just the values of the parameters somewhere and 
somehow associate them with the assume function decl.


I'll try to get to this shortly.

Andrew






[PATCH] tree-optimization/107247 - reduce SLP reduction accumulator

2022-10-13 Thread Richard Biener via Gcc-patches
The following makes sure to reduce a multi-vector SLP reduction
accumulator to a single vector using vector operations if
easily possible (if the number of lanes in the vector type is
a multiple of the number of scalar accumulators).

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

PR tree-optimization/107247
* tree-vect-loop.cc (vect_create_epilog_for_reduction):
Reduce multi vector SLP reduction accumulators.  Check
the adjusted number of accumulator vectors against
one for the re-use in the epilogue.
---
 gcc/tree-vect-loop.cc | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index b1442a93581..98a943d8a4b 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -5642,9 +5642,21 @@ vect_create_epilog_for_reduction (loop_vec_info 
loop_vinfo,
  we may end up with more than one vector result.  Here we reduce them
  to one vector.
 
+ The same is true for a SLP reduction, e.g.,
+ # a1 = phi 
+ # b1 = phi 
+ a2 = operation (a1)
+ b2 = operation (a2),
+
+ where we can end up with more than one vector as well.  We can
+ easily accumulate vectors when the number of vector elements is
+ a multiple of the SLP group size.
+
  The same is true if we couldn't use a single defuse cycle.  */
   if (REDUC_GROUP_FIRST_ELEMENT (stmt_info)
   || direct_slp_reduc
+  || (slp_reduc
+ && constant_multiple_p (TYPE_VECTOR_SUBPARTS (vectype), group_size))
   || ncopies > 1)
 {
   gimple_seq stmts = NULL;
@@ -6233,7 +6245,7 @@ vect_create_epilog_for_reduction (loop_vec_info 
loop_vinfo,
 
   /* Record this operation if it could be reused by the epilogue loop.  */
   if (STMT_VINFO_REDUC_TYPE (reduc_info) == TREE_CODE_REDUCTION
-  && vec_num == 1)
+  && reduc_inputs.length () == 1)
 loop_vinfo->reusable_accumulators.put (scalar_results[0],
   { orig_reduc_input, reduc_info });
 
-- 
2.35.3


Re: [PATCH][AArch64] Improve bit tests [PR105773]

2022-10-13 Thread Wilco Dijkstra via Gcc-patches
Hi Richard,

> Maybe pre-existing, but are ordered comparisons safe for the
> ZERO_EXTRACT case?  If we extract the top 8 bits (say), zero extend,
> and compare with zero, the result should be >= 0, whereas TST would
> set N to the top bit.

Yes in principle zero extract should always be positive assuming we never
extract all bits (= nop). GCC never generates a zero extend of the top bits
(it becomes a shift), so I don't think it can be generated.

However I'll change it to use CC_Z in the commit since signed comparisons
of zero extend seem to be folded to equality (or true/false), so there is no
point in supporting anything but equality comparisons anyway.

Cheers,
Wilco

Re: [PATCH] [PR24021] Implement PLUS_EXPR range-op entry for floats.

2022-10-13 Thread Aldy Hernandez via Gcc-patches
I'm not following.  My patch doesn't affect this behavior.

What am I missing?

Aldy

On Thu, Oct 13, 2022 at 3:04 PM Toon Moene  wrote:
>
> On 10/13/22 14:36, Aldy Hernandez via Gcc-patches wrote:
>
> >   PR tree-optimization/24021
>
> Ah - Verboten in Fortran:
>
> $ cat d.f
>DOUBLE PRECISION A, X
>A = 0.0
>DO X = 0.1, 1.0
>   A = A + X
>ENDDO
>END
> $ gfortran d.f
> d.f:3:9:
>
>  3 |   DO X = 0.1, 1.0
>| 1
> Warning: Deleted feature: Loop variable at (1) must be integer
> d.f:3:12:
>
>  3 |   DO X = 0.1, 1.0
>|1
> Warning: Deleted feature: Start expression in DO loop at (1) must be integer
> d.f:3:17:
>
>  3 |   DO X = 0.1, 1.0
>| 1
> Warning: Deleted feature: End expression in DO loop at (1) must be integer
>
> --
> Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290
> Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
>



Re: [PATCH] [PR24021] Implement PLUS_EXPR range-op entry for floats.

2022-10-13 Thread Toon Moene

It was just a comment on the code of the PR ...

Toon.

On 10/13/22 15:44, Aldy Hernandez wrote:


I'm not following.  My patch doesn't affect this behavior.

What am I missing?

Aldy

On Thu, Oct 13, 2022 at 3:04 PM Toon Moene  wrote:


On 10/13/22 14:36, Aldy Hernandez via Gcc-patches wrote:


   PR tree-optimization/24021


Ah - Verboten in Fortran:

$ cat d.f
DOUBLE PRECISION A, X
A = 0.0
DO X = 0.1, 1.0
   A = A + X
ENDDO
END
$ gfortran d.f
d.f:3:9:

  3 |   DO X = 0.1, 1.0
| 1
Warning: Deleted feature: Loop variable at (1) must be integer
d.f:3:12:

  3 |   DO X = 0.1, 1.0
|1
Warning: Deleted feature: Start expression in DO loop at (1) must be integer
d.f:3:17:

  3 |   DO X = 0.1, 1.0
| 1
Warning: Deleted feature: End expression in DO loop at (1) must be integer

--
Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands





--
Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands



Re: ifcvt: Fix bitpos calculation in bitfield lowering [PR107229]

2022-10-13 Thread Andre Vieira (lists) via Gcc-patches

Hi Rainer,

Thanks for reporting, I was actually expecting these! I thought about 
pre-empting them by using a positive filter on the tests for aarch64 and 
x86_64 as I knew those would pass, but I thought it would be better to 
let other targets report failures since then you get a testsuite that 
covers more targets than just what I'm able to check.


Are there any sparc architectures that would support these or should I 
just xfail sparc*-*-* ?


For instance: I also saw PR107240 for which one of the write tests fails 
on Power 7 BE. I'm suggesting adding an xfail for that one


Kind regards,
Andre

On 13/10/2022 12:39, Rainer Orth wrote:

Hi Andre,


The bitposition calculation for the bitfield lowering in loop if conversion
was not
taking DECL_FIELD_OFFSET into account, which meant that it would result in
wrong bitpositions for bitfields that did not end up having representations
starting at the beginning of the struct.

Bootstrappend and regression tested on aarch64-none-linux-gnu and
x86_64-pc-linux-gnu.

I tried this patch together with the one for PR tree-optimization/107226
on sparc-sun-solaris2.11 to check if it cures the bootstrap failure
reported in PR tree-optimization/107232.  While this restores bootstrap,
several of the new tests FAIL:

+FAIL: gcc.dg/vect/vect-bitfield-read-1.c -flto -ffat-lto-objects  scan-tree-dump-times 
vect "vectorized 1 loops" 1
+FAIL: gcc.dg/vect/vect-bitfield-read-1.c scan-tree-dump-times vect "vectorized 1 
loops" 1
+FAIL: gcc.dg/vect/vect-bitfield-read-2.c -flto -ffat-lto-objects  scan-tree-dump-times 
vect "vectorized 1 loops" 1
+FAIL: gcc.dg/vect/vect-bitfield-read-2.c scan-tree-dump-times vect "vectorized 1 
loops" 1
+FAIL: gcc.dg/vect/vect-bitfield-read-3.c -flto -ffat-lto-objects  scan-tree-dump-times 
vect "vectorized 2 loops" 1
+FAIL: gcc.dg/vect/vect-bitfield-read-3.c scan-tree-dump-times vect "vectorized 2 
loops" 1
+FAIL: gcc.dg/vect/vect-bitfield-read-4.c -flto -ffat-lto-objects  scan-tree-dump-times 
vect "vectorized 1 loops" 1
+FAIL: gcc.dg/vect/vect-bitfield-read-4.c scan-tree-dump-times vect "vectorized 1 
loops" 1
+FAIL: gcc.dg/vect/vect-bitfield-read-6.c -flto -ffat-lto-objects  scan-tree-dump-times 
vect "vectorized 1 loops" 1
+FAIL: gcc.dg/vect/vect-bitfield-write-1.c -flto -ffat-lto-objects  scan-tree-dump-times 
vect "vectorized 1 loops" 1
+FAIL: gcc.dg/vect/vect-bitfield-write-1.c scan-tree-dump-times vect "vectorized 1 
loops" 1
+FAIL: gcc.dg/vect/vect-bitfield-write-2.c -flto -ffat-lto-objects  scan-tree-dump-times 
vect "vectorized 1 loops" 1
+FAIL: gcc.dg/vect/vect-bitfield-write-2.c scan-tree-dump-times vect "vectorized 1 
loops" 1
+FAIL: gcc.dg/vect/vect-bitfield-write-3.c -flto -ffat-lto-objects  scan-tree-dump-times 
vect "vectorized 1 loops" 1
+FAIL: gcc.dg/vect/vect-bitfield-write-3.c scan-tree-dump-times vect "vectorized 1 
loops" 1
+FAIL: gcc.dg/vect/vect-bitfield-write-5.c -flto -ffat-lto-objects  scan-tree-dump-times 
vect "vectorized 1 loops" 1
+FAIL: gcc.dg/vect/vect-bitfield-write-5.c scan-tree-dump-times vect "vectorized 1 
loops" 1

For vect-bitfield-read-1.c, the dump has

gcc.dg/vect/vect-bitfield-read-1.c:25:23: note:   ==> examining pattern def 
statement: patt_31 = patt_30 >> 1;
gcc.dg/vect/vect-bitfield-read-1.c:25:23: note:   ==> examining statement: patt_31 = 
patt_30 >> 1;
gcc.dg/vect/vect-bitfield-read-1.c:25:23: note:   vect_is_simple_use: operand 
_ifc__27 & 4294967294, type of def: internal
gcc.dg/vect/vect-bitfield-read-1.c:25:23: note:   vect_is_simple_use: vectype 
vector(2) unsigned int
gcc.dg/vect/vect-bitfield-read-1.c:25:23: note:   vect_is_simple_use: operand 
1, type of def: constant
gcc.dg/vect/vect-bitfield-read-1.c:25:23: missed:   op not supported by target.
gcc.dg/vect/vect-bitfield-read-1.c:23:1: missed:   not vectorized: relevant stmt not 
supported: patt_31 = patt_30 >> 1;
gcc.dg/vect/vect-bitfield-read-1.c:25:23: missed:  bad operation or unsupported 
loop bound.
gcc.dg/vect/vect-bitfield-read-1.c:25:23: note:  * Analysis  failed with 
vector mode V2SI

Rainer



Re: [PATCH v2] c++: ICE with VEC_INIT_EXPR and defarg [PR106925]

2022-10-13 Thread Marek Polacek via Gcc-patches
On Wed, Oct 12, 2022 at 02:23:40PM -0400, Marek Polacek wrote:
> On Wed, Oct 12, 2022 at 01:12:57PM -0400, Marek Polacek wrote:
> > On Wed, Oct 12, 2022 at 12:47:21PM -0400, Jason Merrill wrote:
> > > On 10/12/22 12:27, Marek Polacek wrote:
> > > > On Tue, Oct 11, 2022 at 04:28:11PM -0400, Jason Merrill wrote:
> > > > > On 10/11/22 16:00, Marek Polacek wrote:
> > > > > > Since r12-8066, in cxx_eval_vec_init we perform expand_vec_init_expr
> > > > > > while processing the default argument in this test.
> > > > > 
> > > > > Hmm, why are we calling cxx_eval_vec_init during parsing of the 
> > > > > default
> > > > > argument?  In particular, any expansion that depends on the enclosing
> > > > > function context should be deferred until the default arg is used by 
> > > > > a call.
> > > > 
> > > > I think this is part of the semantic constraints checking 
> > > > [dcl.fct.default]/5
> > > > talks about, as in, this doesn't compile even though the default 
> > > > argument is
> > > > not executed:
> > > > 
> > > > struct S {
> > > >S() = delete;
> > > > };
> > > > void foo (S = S()) { }
> > > > In the test below we parse '= MyVector<1>()' and end up calling 
> > > > mark_used
> > > > on the implicit "constexpr MyVector<1>::MyVector() noexcept 
> > > > ()"
> > > > ctor.  mark_used calls maybe_instantiate_noexcept.  Since the ctor has
> > > > a DEFERRED_NOEXCEPT, we have to figure out if the ctor should be 
> > > > noexcept
> > > > or not using get_defaulted_eh_spec.  That means walking the members of
> > > > MyVector.  Thus we reach
> > > >/* Core 1351: If the field has an NSDMI that could throw, the
> > > >   default constructor is noexcept(false).  */
> > > 
> > > Maybe we need a cp_unevaluated here?  The operand of noexcept should be
> > > unevaluated.
> > 
> > That wouldn't help since get_nsdmi specifically does "cp_evaluated ev;",
> > so...
> >  
> > > > and call get_nsdmi on 'data'.  There we digest its initializer which is 
> > > > {}.
> > > > massage_init_elt calls digest_init_r on the {} and produces
> > > >TARGET_EXPR  > > >  D.2518
> > > >  {} 
> > > > and the subsequent fold_non_dependent_init leads to cxx_eval_vec_init
> > > > -> expand_vec_init_expr.
> > > > 
> > > > I think this is all correct except that the fold_non_dependent_init is
> > > > somewhat questionable to me; do we really have to fold in order to say
> > > > if the NSDMI init can throw?  Sure, we need to digest the {}, maybe
> > > > the field's ctors can throw, but I don't know about the folding.
> > > 
> > > And we can check cp_unevaluated_operand to avoid the
> > > fold_non_dependent_init?
> > 
> > ...we'd still fold.  I'm not sure if we want a LOOKUP_ flag that says
> > "we're just checking if we can throw, don't fold".
> 
> Eh, a new flag is overkill.  Maybe don't do cp_evaluated in get_nsdmi if
> we're called from walk_field_subobs would be worth a try?

FWIW, my experiments with cp_unevaluated_operand failed because then we'd
miss warnings as in g++.dg/ext/cond5.C which warns from the
get_defaulted_eh_spec context -- so I'd have no way to distinguish that
from the test in this PR.  Should we just go back to my patch?

Marek



[PATCH V1] RISC-V: Fix a redefinition bug for the fd-4.c

2022-10-13 Thread shiyulong
From: yulong 

This patch fix a redefinition bug.
There are have a definition about mode_t in the fd-4.c, but it duplicates the 
definition in stdio.h.There are have a definition about mode_t in the fd-4.c, 
but it duplicates the definition in stdio.h.

gcc/testsuite/ChangeLog:

* gcc.dg/analyzer/fd-4.c: delete the definition of mode_t.

---
 gcc/testsuite/gcc.dg/analyzer/fd-4.c | 5 -
 1 file changed, 5 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/analyzer/fd-4.c 
b/gcc/testsuite/gcc.dg/analyzer/fd-4.c
index 842a26b4364..db342feb6ee 100644
--- a/gcc/testsuite/gcc.dg/analyzer/fd-4.c
+++ b/gcc/testsuite/gcc.dg/analyzer/fd-4.c
@@ -12,11 +12,6 @@ int read (int fd, void *buf, int nbytes);
 #define O_WRONLY 1
 #define O_RDWR 2
 
-typedef enum {
-  S_IRWXU
-  // etc
-} mode_t;
-
 int creat (const char *, mode_t mode);
 
 void
-- 
2.17.1



Re: [DOCS] Python Language Conventions

2022-10-13 Thread Richard Sandiford via Gcc-patches
Martin Liška  writes:
> On 10/13/22 12:03, Richard Sandiford wrote:
>> Martin Liška  writes:
>>> I think we should add how Python scripts should be formatted. I noticed
>>> that while reading the Modula-2 patchset where it follows the C/C++ style
>>> when it comes to Python files.
>>>
>>> Ready to be installed?
>>> Thanks,
>>> Martin
>> 
>> Did you consider requiring black formatting instead?  Maybe black -l79
>> to maintain the usual 80-character limit.
>
> No, the automatic formatting might be a next step. About 80 chars, can we 
> relax
> that for Python scripts? I think it's hairy restriction these days. 

In practice it seems to work well, even at 79 chars.  The default is 88
and I don't think the extra 8 or 9 columns are enough to make a different
rule for Python worth it.

FWIW, personally I use an 80-column editor for GCC stuff, and it would
be a pain to have to switch to something different to work on Python.

>> At least that way there's only one right answer.
>
> Yep. We can definitely recommend using black as an optional approach,
> what do you think?

IMO the real value is if it's the defined approach, rather than an
optional approach.  It's "format and forget", just like with clang-format.

Thanks,
Richard


Re: ifcvt: Fix bitpos calculation in bitfield lowering [PR107229]

2022-10-13 Thread Richard Biener via Gcc-patches
On Thu, 13 Oct 2022, Andre Vieira (lists) wrote:

> Hi Rainer,
> 
> Thanks for reporting, I was actually expecting these! I thought about
> pre-empting them by using a positive filter on the tests for aarch64 and
> x86_64 as I knew those would pass, but I thought it would be better to let
> other targets report failures since then you get a testsuite that covers more
> targets than just what I'm able to check.
> 
> Are there any sparc architectures that would support these or should I just
> xfail sparc*-*-* ?
> 
> For instance: I also saw PR107240 for which one of the write tests fails on
> Power 7 BE. I'm suggesting adding an xfail for that one

for the failure below we seem to require vectorizing shifts for which I
think we have a vect_* target to check?

> Kind regards,
> Andre
> 
> On 13/10/2022 12:39, Rainer Orth wrote:
> > Hi Andre,
> >
> >> The bitposition calculation for the bitfield lowering in loop if conversion
> >> was not
> >> taking DECL_FIELD_OFFSET into account, which meant that it would result in
> >> wrong bitpositions for bitfields that did not end up having representations
> >> starting at the beginning of the struct.
> >>
> >> Bootstrappend and regression tested on aarch64-none-linux-gnu and
> >> x86_64-pc-linux-gnu.
> > I tried this patch together with the one for PR tree-optimization/107226
> > on sparc-sun-solaris2.11 to check if it cures the bootstrap failure
> > reported in PR tree-optimization/107232.  While this restores bootstrap,
> > several of the new tests FAIL:
> >
> > +FAIL: gcc.dg/vect/vect-bitfield-read-1.c -flto -ffat-lto-objects
> > scan-tree-dump-times vect "vectorized 1 loops" 1
> > +FAIL: gcc.dg/vect/vect-bitfield-read-1.c scan-tree-dump-times vect
> > "vectorized 1 loops" 1
> > +FAIL: gcc.dg/vect/vect-bitfield-read-2.c -flto -ffat-lto-objects
> > scan-tree-dump-times vect "vectorized 1 loops" 1
> > +FAIL: gcc.dg/vect/vect-bitfield-read-2.c scan-tree-dump-times vect
> > "vectorized 1 loops" 1
> > +FAIL: gcc.dg/vect/vect-bitfield-read-3.c -flto -ffat-lto-objects
> > scan-tree-dump-times vect "vectorized 2 loops" 1
> > +FAIL: gcc.dg/vect/vect-bitfield-read-3.c scan-tree-dump-times vect
> > "vectorized 2 loops" 1
> > +FAIL: gcc.dg/vect/vect-bitfield-read-4.c -flto -ffat-lto-objects
> > scan-tree-dump-times vect "vectorized 1 loops" 1
> > +FAIL: gcc.dg/vect/vect-bitfield-read-4.c scan-tree-dump-times vect
> > "vectorized 1 loops" 1
> > +FAIL: gcc.dg/vect/vect-bitfield-read-6.c -flto -ffat-lto-objects
> > scan-tree-dump-times vect "vectorized 1 loops" 1
> > +FAIL: gcc.dg/vect/vect-bitfield-write-1.c -flto -ffat-lto-objects
> > scan-tree-dump-times vect "vectorized 1 loops" 1
> > +FAIL: gcc.dg/vect/vect-bitfield-write-1.c scan-tree-dump-times vect
> > "vectorized 1 loops" 1
> > +FAIL: gcc.dg/vect/vect-bitfield-write-2.c -flto -ffat-lto-objects
> > scan-tree-dump-times vect "vectorized 1 loops" 1
> > +FAIL: gcc.dg/vect/vect-bitfield-write-2.c scan-tree-dump-times vect
> > "vectorized 1 loops" 1
> > +FAIL: gcc.dg/vect/vect-bitfield-write-3.c -flto -ffat-lto-objects
> > scan-tree-dump-times vect "vectorized 1 loops" 1
> > +FAIL: gcc.dg/vect/vect-bitfield-write-3.c scan-tree-dump-times vect
> > "vectorized 1 loops" 1
> > +FAIL: gcc.dg/vect/vect-bitfield-write-5.c -flto -ffat-lto-objects
> > scan-tree-dump-times vect "vectorized 1 loops" 1
> > +FAIL: gcc.dg/vect/vect-bitfield-write-5.c scan-tree-dump-times vect
> > "vectorized 1 loops" 1
> >
> > For vect-bitfield-read-1.c, the dump has
> >
> > gcc.dg/vect/vect-bitfield-read-1.c:25:23: note:   ==> examining pattern def
> > statement: patt_31 = patt_30 >> 1;
> > gcc.dg/vect/vect-bitfield-read-1.c:25:23: note:   ==> examining statement:
> > patt_31 = patt_30 >> 1;
> > gcc.dg/vect/vect-bitfield-read-1.c:25:23: note:   vect_is_simple_use:
> > operand _ifc__27 & 4294967294, type of def: internal
> > gcc.dg/vect/vect-bitfield-read-1.c:25:23: note:   vect_is_simple_use:
> > vectype vector(2) unsigned int
> > gcc.dg/vect/vect-bitfield-read-1.c:25:23: note:   vect_is_simple_use:
> > operand 1, type of def: constant
> > gcc.dg/vect/vect-bitfield-read-1.c:25:23: missed:   op not supported by
> > target.
> > gcc.dg/vect/vect-bitfield-read-1.c:23:1: missed:   not vectorized: relevant
> > stmt not supported: patt_31 = patt_30 >> 1;
> > gcc.dg/vect/vect-bitfield-read-1.c:25:23: missed:  bad operation or
> > unsupported loop bound.
> > gcc.dg/vect/vect-bitfield-read-1.c:25:23: note:  * Analysis  failed with
> > vector mode V2SI
> >
> >  Rainer
> >
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)


[PATCH] use proper DECL_INITIAL for VTV

2022-10-13 Thread Martin Liška
Hi.

I am working on the early debug info emission that would benefit from a late
use of asm_put_file. This is last blocker where C++ emits early a section 
directive
in assemble_vtv_preinit_initializer. We can use a proper DECL_INITIAL for that.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
with --enable-vtable-verify.

Ready to be installed?
Thanks,
Martin

gcc/cp/ChangeLog:

* vtable-class-hierarchy.cc (vtv_generate_init_routine): Emit
an artificial variable that would be put into .preinit_array
section.

gcc/ChangeLog:

* output.h (assemble_vtv_preinit_initializer): Remove.
* varasm.cc (assemble_vtv_preinit_initializer): Remove.
---
 gcc/cp/vtable-class-hierarchy.cc | 14 --
 gcc/output.h |  4 
 gcc/varasm.cc| 17 -
 3 files changed, 12 insertions(+), 23 deletions(-)

diff --git a/gcc/cp/vtable-class-hierarchy.cc b/gcc/cp/vtable-class-hierarchy.cc
index 79cb5f8de02..cc1df1ebdb2 100644
--- a/gcc/cp/vtable-class-hierarchy.cc
+++ b/gcc/cp/vtable-class-hierarchy.cc
@@ -1192,8 +1192,18 @@ vtv_generate_init_routine (void)
   cgraph_node::add_new_function (vtv_fndecl, false);
 
   if (flag_vtable_verify == VTV_PREINIT_PRIORITY && !TARGET_PECOFF)
-assemble_vtv_preinit_initializer (vtv_fndecl);
-
+   {
+ tree vtv_var
+   = build_decl (BUILTINS_LOCATION, VAR_DECL,
+ get_identifier ("__vtv_preinit"),
+ build_pointer_type (TREE_TYPE (vtv_fndecl)));
+ TREE_STATIC (vtv_var) = 1;
+ DECL_ARTIFICIAL (vtv_var) = 1;
+ DECL_INITIAL (vtv_var) = build_fold_addr_expr (vtv_fndecl);
+ set_decl_section_name (vtv_var, ".preinit_array");
+
+ varpool_node::add (vtv_var);
+   }
 }
   pop_lang_context ();
 }
diff --git a/gcc/output.h b/gcc/output.h
index 6dea630913a..6936bdeeb6c 100644
--- a/gcc/output.h
+++ b/gcc/output.h
@@ -199,10 +199,6 @@ extern void assemble_end_function (tree, const char *);
initial value (that will be done by the caller).  */
 extern void assemble_variable (tree, int, int, int);
 
-/* Put the vtable verification constructor initialization function
-   into the preinit array.  */
-extern void assemble_vtv_preinit_initializer (tree);
-
 /* Assemble everything that is needed for a variable declaration that has
no definition in the current translation unit.  */
 extern void assemble_undefined_decl (tree);
diff --git a/gcc/varasm.cc b/gcc/varasm.cc
index 423f3f91af8..a11184584a2 100644
--- a/gcc/varasm.cc
+++ b/gcc/varasm.cc
@@ -2419,23 +2419,6 @@ assemble_variable (tree decl, int top_level 
ATTRIBUTE_UNUSED,
 }
 }
 
-
-/* Given a function declaration (FN_DECL), this function assembles the
-   function into the .preinit_array section.  */
-
-void
-assemble_vtv_preinit_initializer (tree fn_decl)
-{
-  section *sect;
-  unsigned flags = SECTION_WRITE;
-  rtx symbol = XEXP (DECL_RTL (fn_decl), 0);
-
-  flags |= SECTION_NOTYPE;
-  sect = get_section (".preinit_array", flags, fn_decl);
-  switch_to_section (sect);
-  assemble_addr_to_section (symbol, sect);
-}
-
 /* Return 1 if type TYPE contains any pointers.  */
 
 static int
-- 
2.37.3



Re: ifcvt: Fix bitpos calculation in bitfield lowering [PR107229]

2022-10-13 Thread Andre Vieira (lists) via Gcc-patches



On 13/10/2022 15:15, Richard Biener wrote:

On Thu, 13 Oct 2022, Andre Vieira (lists) wrote:


Hi Rainer,

Thanks for reporting, I was actually expecting these! I thought about
pre-empting them by using a positive filter on the tests for aarch64 and
x86_64 as I knew those would pass, but I thought it would be better to let
other targets report failures since then you get a testsuite that covers more
targets than just what I'm able to check.

Are there any sparc architectures that would support these or should I just
xfail sparc*-*-* ?

For instance: I also saw PR107240 for which one of the write tests fails on
Power 7 BE. I'm suggesting adding an xfail for that one

for the failure below we seem to require vectorizing shifts for which I
think we have a vect_* target to check?
'vect_shift' no sparc on the list of supported targets, so that should 
do it, I'll add it when I add my fix for powerpc too.



Kind regards,
Andre

On 13/10/2022 12:39, Rainer Orth wrote:

Hi Andre,


The bitposition calculation for the bitfield lowering in loop if conversion
was not
taking DECL_FIELD_OFFSET into account, which meant that it would result in
wrong bitpositions for bitfields that did not end up having representations
starting at the beginning of the struct.

Bootstrappend and regression tested on aarch64-none-linux-gnu and
x86_64-pc-linux-gnu.

I tried this patch together with the one for PR tree-optimization/107226
on sparc-sun-solaris2.11 to check if it cures the bootstrap failure
reported in PR tree-optimization/107232.  While this restores bootstrap,
several of the new tests FAIL:

+FAIL: gcc.dg/vect/vect-bitfield-read-1.c -flto -ffat-lto-objects
scan-tree-dump-times vect "vectorized 1 loops" 1
+FAIL: gcc.dg/vect/vect-bitfield-read-1.c scan-tree-dump-times vect
"vectorized 1 loops" 1
+FAIL: gcc.dg/vect/vect-bitfield-read-2.c -flto -ffat-lto-objects
scan-tree-dump-times vect "vectorized 1 loops" 1
+FAIL: gcc.dg/vect/vect-bitfield-read-2.c scan-tree-dump-times vect
"vectorized 1 loops" 1
+FAIL: gcc.dg/vect/vect-bitfield-read-3.c -flto -ffat-lto-objects
scan-tree-dump-times vect "vectorized 2 loops" 1
+FAIL: gcc.dg/vect/vect-bitfield-read-3.c scan-tree-dump-times vect
"vectorized 2 loops" 1
+FAIL: gcc.dg/vect/vect-bitfield-read-4.c -flto -ffat-lto-objects
scan-tree-dump-times vect "vectorized 1 loops" 1
+FAIL: gcc.dg/vect/vect-bitfield-read-4.c scan-tree-dump-times vect
"vectorized 1 loops" 1
+FAIL: gcc.dg/vect/vect-bitfield-read-6.c -flto -ffat-lto-objects
scan-tree-dump-times vect "vectorized 1 loops" 1
+FAIL: gcc.dg/vect/vect-bitfield-write-1.c -flto -ffat-lto-objects
scan-tree-dump-times vect "vectorized 1 loops" 1
+FAIL: gcc.dg/vect/vect-bitfield-write-1.c scan-tree-dump-times vect
"vectorized 1 loops" 1
+FAIL: gcc.dg/vect/vect-bitfield-write-2.c -flto -ffat-lto-objects
scan-tree-dump-times vect "vectorized 1 loops" 1
+FAIL: gcc.dg/vect/vect-bitfield-write-2.c scan-tree-dump-times vect
"vectorized 1 loops" 1
+FAIL: gcc.dg/vect/vect-bitfield-write-3.c -flto -ffat-lto-objects
scan-tree-dump-times vect "vectorized 1 loops" 1
+FAIL: gcc.dg/vect/vect-bitfield-write-3.c scan-tree-dump-times vect
"vectorized 1 loops" 1
+FAIL: gcc.dg/vect/vect-bitfield-write-5.c -flto -ffat-lto-objects
scan-tree-dump-times vect "vectorized 1 loops" 1
+FAIL: gcc.dg/vect/vect-bitfield-write-5.c scan-tree-dump-times vect
"vectorized 1 loops" 1

For vect-bitfield-read-1.c, the dump has

gcc.dg/vect/vect-bitfield-read-1.c:25:23: note:   ==> examining pattern def
statement: patt_31 = patt_30 >> 1;
gcc.dg/vect/vect-bitfield-read-1.c:25:23: note:   ==> examining statement:
patt_31 = patt_30 >> 1;
gcc.dg/vect/vect-bitfield-read-1.c:25:23: note:   vect_is_simple_use:
operand _ifc__27 & 4294967294, type of def: internal
gcc.dg/vect/vect-bitfield-read-1.c:25:23: note:   vect_is_simple_use:
vectype vector(2) unsigned int
gcc.dg/vect/vect-bitfield-read-1.c:25:23: note:   vect_is_simple_use:
operand 1, type of def: constant
gcc.dg/vect/vect-bitfield-read-1.c:25:23: missed:   op not supported by
target.
gcc.dg/vect/vect-bitfield-read-1.c:23:1: missed:   not vectorized: relevant
stmt not supported: patt_31 = patt_30 >> 1;
gcc.dg/vect/vect-bitfield-read-1.c:25:23: missed:  bad operation or
unsupported loop bound.
gcc.dg/vect/vect-bitfield-read-1.c:25:23: note:  * Analysis  failed with
vector mode V2SI

  Rainer



Re: [PATCH v2] c++: ICE with VEC_INIT_EXPR and defarg [PR106925]

2022-10-13 Thread Jason Merrill via Gcc-patches

On 10/12/22 14:23, Marek Polacek wrote:

On Wed, Oct 12, 2022 at 01:12:57PM -0400, Marek Polacek wrote:

On Wed, Oct 12, 2022 at 12:47:21PM -0400, Jason Merrill wrote:

On 10/12/22 12:27, Marek Polacek wrote:

On Tue, Oct 11, 2022 at 04:28:11PM -0400, Jason Merrill wrote:

On 10/11/22 16:00, Marek Polacek wrote:

Since r12-8066, in cxx_eval_vec_init we perform expand_vec_init_expr
while processing the default argument in this test.


Hmm, why are we calling cxx_eval_vec_init during parsing of the default
argument?  In particular, any expansion that depends on the enclosing
function context should be deferred until the default arg is used by a call.


I think this is part of the semantic constraints checking [dcl.fct.default]/5
talks about, as in, this doesn't compile even though the default argument is
not executed:

struct S {
S() = delete;
};
void foo (S = S()) { }
In the test below we parse '= MyVector<1>()' and end up calling mark_used
on the implicit "constexpr MyVector<1>::MyVector() noexcept ()"
ctor.  mark_used calls maybe_instantiate_noexcept.  Since the ctor has
a DEFERRED_NOEXCEPT, we have to figure out if the ctor should be noexcept
or not using get_defaulted_eh_spec.  That means walking the members of
MyVector.  Thus we reach
/* Core 1351: If the field has an NSDMI that could throw, the
   default constructor is noexcept(false).  */


Maybe we need a cp_unevaluated here?  The operand of noexcept should be
unevaluated.


That wouldn't help since get_nsdmi specifically does "cp_evaluated ev;",
so...
  

and call get_nsdmi on 'data'.  There we digest its initializer which is {}.
massage_init_elt calls digest_init_r on the {} and produces
TARGET_EXPR >>>
and the subsequent fold_non_dependent_init leads to cxx_eval_vec_init
-> expand_vec_init_expr.

I think this is all correct except that the fold_non_dependent_init is
somewhat questionable to me; do we really have to fold in order to say
if the NSDMI init can throw?  Sure, we need to digest the {}, maybe
the field's ctors can throw, but I don't know about the folding.


And we can check cp_unevaluated_operand to avoid the
fold_non_dependent_init?


...we'd still fold.  I'm not sure if we want a LOOKUP_ flag that says
"we're just checking if we can throw, don't fold".


Eh, a new flag is overkill.  Maybe don't do cp_evaluated in get_nsdmi if
we're called from walk_field_subobs would be worth a try?


It seems that we treat DMI instantiations as evaluated even if they're 
triggered from unevaluated context so sharing lambdas between different 
uses of the DMI works properly.  I don't think this is worth messing 
with at this point; thanks for satisfying my curiosity.


Jason



Re: [PATCH v2] c++: ICE with VEC_INIT_EXPR and defarg [PR106925]

2022-10-13 Thread Jason Merrill via Gcc-patches

On 10/13/22 09:58, Marek Polacek wrote:

On Wed, Oct 12, 2022 at 02:23:40PM -0400, Marek Polacek wrote:

On Wed, Oct 12, 2022 at 01:12:57PM -0400, Marek Polacek wrote:

On Wed, Oct 12, 2022 at 12:47:21PM -0400, Jason Merrill wrote:

On 10/12/22 12:27, Marek Polacek wrote:

On Tue, Oct 11, 2022 at 04:28:11PM -0400, Jason Merrill wrote:

On 10/11/22 16:00, Marek Polacek wrote:

Since r12-8066, in cxx_eval_vec_init we perform expand_vec_init_expr
while processing the default argument in this test.


Hmm, why are we calling cxx_eval_vec_init during parsing of the default
argument?  In particular, any expansion that depends on the enclosing
function context should be deferred until the default arg is used by a call.


I think this is part of the semantic constraints checking [dcl.fct.default]/5
talks about, as in, this doesn't compile even though the default argument is
not executed:

struct S {
S() = delete;
};
void foo (S = S()) { }
In the test below we parse '= MyVector<1>()' and end up calling mark_used
on the implicit "constexpr MyVector<1>::MyVector() noexcept ()"
ctor.  mark_used calls maybe_instantiate_noexcept.  Since the ctor has
a DEFERRED_NOEXCEPT, we have to figure out if the ctor should be noexcept
or not using get_defaulted_eh_spec.  That means walking the members of
MyVector.  Thus we reach
/* Core 1351: If the field has an NSDMI that could throw, the
   default constructor is noexcept(false).  */


Maybe we need a cp_unevaluated here?  The operand of noexcept should be
unevaluated.


That wouldn't help since get_nsdmi specifically does "cp_evaluated ev;",
so...
  

and call get_nsdmi on 'data'.  There we digest its initializer which is {}.
massage_init_elt calls digest_init_r on the {} and produces
TARGET_EXPR >>>
and the subsequent fold_non_dependent_init leads to cxx_eval_vec_init
-> expand_vec_init_expr.

I think this is all correct except that the fold_non_dependent_init is
somewhat questionable to me; do we really have to fold in order to say
if the NSDMI init can throw?  Sure, we need to digest the {}, maybe
the field's ctors can throw, but I don't know about the folding.


And we can check cp_unevaluated_operand to avoid the
fold_non_dependent_init?


...we'd still fold.  I'm not sure if we want a LOOKUP_ flag that says
"we're just checking if we can throw, don't fold".


Eh, a new flag is overkill.  Maybe don't do cp_evaluated in get_nsdmi if
we're called from walk_field_subobs would be worth a try?


FWIW, my experiments with cp_unevaluated_operand failed because then we'd
miss warnings as in g++.dg/ext/cond5.C which warns from the
get_defaulted_eh_spec context -- so I'd have no way to distinguish that
from the test in this PR.  Should we just go back to my patch?


Your patch is still OK.

Jason



Re: [PATCH] use proper DECL_INITIAL for VTV

2022-10-13 Thread Jason Merrill via Gcc-patches

On 10/13/22 10:25, Martin Liška wrote:

Hi.

I am working on the early debug info emission that would benefit from a late
use of asm_put_file. This is last blocker where C++ emits early a section 
directive
in assemble_vtv_preinit_initializer. We can use a proper DECL_INITIAL for that.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
with --enable-vtable-verify.

Ready to be installed?


OK.


gcc/cp/ChangeLog:

* vtable-class-hierarchy.cc (vtv_generate_init_routine): Emit
an artificial variable that would be put into .preinit_array
section.

gcc/ChangeLog:

* output.h (assemble_vtv_preinit_initializer): Remove.
* varasm.cc (assemble_vtv_preinit_initializer): Remove.
---
  gcc/cp/vtable-class-hierarchy.cc | 14 --
  gcc/output.h |  4 
  gcc/varasm.cc| 17 -
  3 files changed, 12 insertions(+), 23 deletions(-)

diff --git a/gcc/cp/vtable-class-hierarchy.cc b/gcc/cp/vtable-class-hierarchy.cc
index 79cb5f8de02..cc1df1ebdb2 100644
--- a/gcc/cp/vtable-class-hierarchy.cc
+++ b/gcc/cp/vtable-class-hierarchy.cc
@@ -1192,8 +1192,18 @@ vtv_generate_init_routine (void)
cgraph_node::add_new_function (vtv_fndecl, false);
  
if (flag_vtable_verify == VTV_PREINIT_PRIORITY && !TARGET_PECOFF)

-assemble_vtv_preinit_initializer (vtv_fndecl);
-
+   {
+ tree vtv_var
+   = build_decl (BUILTINS_LOCATION, VAR_DECL,
+ get_identifier ("__vtv_preinit"),
+ build_pointer_type (TREE_TYPE (vtv_fndecl)));
+ TREE_STATIC (vtv_var) = 1;
+ DECL_ARTIFICIAL (vtv_var) = 1;
+ DECL_INITIAL (vtv_var) = build_fold_addr_expr (vtv_fndecl);
+ set_decl_section_name (vtv_var, ".preinit_array");
+
+ varpool_node::add (vtv_var);
+   }
  }
pop_lang_context ();
  }
diff --git a/gcc/output.h b/gcc/output.h
index 6dea630913a..6936bdeeb6c 100644
--- a/gcc/output.h
+++ b/gcc/output.h
@@ -199,10 +199,6 @@ extern void assemble_end_function (tree, const char *);
 initial value (that will be done by the caller).  */
  extern void assemble_variable (tree, int, int, int);
  
-/* Put the vtable verification constructor initialization function

-   into the preinit array.  */
-extern void assemble_vtv_preinit_initializer (tree);
-
  /* Assemble everything that is needed for a variable declaration that has
 no definition in the current translation unit.  */
  extern void assemble_undefined_decl (tree);
diff --git a/gcc/varasm.cc b/gcc/varasm.cc
index 423f3f91af8..a11184584a2 100644
--- a/gcc/varasm.cc
+++ b/gcc/varasm.cc
@@ -2419,23 +2419,6 @@ assemble_variable (tree decl, int top_level 
ATTRIBUTE_UNUSED,
  }
  }
  
-

-/* Given a function declaration (FN_DECL), this function assembles the
-   function into the .preinit_array section.  */
-
-void
-assemble_vtv_preinit_initializer (tree fn_decl)
-{
-  section *sect;
-  unsigned flags = SECTION_WRITE;
-  rtx symbol = XEXP (DECL_RTL (fn_decl), 0);
-
-  flags |= SECTION_NOTYPE;
-  sect = get_section (".preinit_array", flags, fn_decl);
-  switch_to_section (sect);
-  assemble_addr_to_section (symbol, sect);
-}
-
  /* Return 1 if type TYPE contains any pointers.  */
  
  static int




Re: [PATCH v2] c++: parser - Support for target address spaces in C++

2022-10-13 Thread Jason Merrill via Gcc-patches

On 10/12/22 20:52, Paul Iannetta wrote:

On Tue, Oct 11, 2022 at 09:49:43PM -0400, Jason Merrill wrote:


It surprises that this is the only place we complain about an object with an
address-space qualifier.  Shouldn't we also complain about e.g. automatic
variables/parameters or non-static data members with address-space qualified
type?



Indeed, I was missing quite a few things here.  Thanks.
I used the draft as basis this time and imported from the C
implementation the relevant parts.  This time, the errors get properly
emitted when an address space is unduly specified; and comparisons,
assignments and comparisons are taken care of.

There are quite a few things I would like to clarify concerning some
implementation details.
   - A variable with automatic storage (which is neither a pointer nor
 a reference) cannot be qualified with an address space.  I detect
 this by the combination of `sc_none' and `! toplevel_bindings_p ()',
 but I've also seen the use of `at_function_scope' at other places.
 And I'm unsure which one is appropriate here.
 This detection happens at the very end of grokdeclarator because I
 need to know that the type is a pointer, which is not know until
 very late in the function.


At that point you have the decl, and you can ask directly what its 
storage duration is, perhaps using decl_storage_duration.


But why do you need to know whether the type is a pointer?  The 
attribute applies to the target type of the pointer, not the pointer 
type.  I think the problem is that you're looking at declspecs when you 
ought to be looking at type_quals.



   - I'm having some trouble deciding whether I include those three
 stub programs as tests, they all compile fine and clang accepts
 them as well.


Why not?


Ex1:
```
int __seg_fs * fs1;
int __seg_gs * gs1;

template struct strip;
template struct strip<__seg_fs T *> { typedef T type; };
template struct strip<__seg_gs T *> { typedef T type; };

int
main ()
{
 *(strip::type *) fs1 == *(strip::type *) gs1;
 return 0;
}
```

Ex2:
```
int __seg_fs * fs1;
int __seg_fs * fs2;

template auto f (T __seg_fs * a, U __seg_gs * b) { 
return a; }
template auto f (T __seg_gs * a, U __seg_fs * b) { 
return a; }

int
main ()
{
 f (fs1, gs1);
 f (gs1, fs1);
 return 0;
}
```

Ex3:
```
int __seg_fs * fs1;
int __seg_gs * gs1;

template
auto f (T __seg_fs * a, U __seg_gs * b)
{
 return *(T *) a == *(U *) b;
}

int
main ()
{
 return f (fs1, gs1);
}
```


Add support for custom address spaces in C++

gcc/
 * tree.h (ENCODE_QUAL_ADDR_SPACE): Missing parentheses.

gcc/c/
 * c-decl.cc: Remove c_register_addr_space.

gcc/c-family/
 * c-common.cc (c_register_addr_space): Imported from c-decl.cc
 (addr_space_superset): Imported from gcc/c/c-typecheck.cc
 * c-common.h: Remove the FIXME.
 (addr_space_superset): New declaration.

gcc/cp/
 * cp-tree.h (enum cp_decl_spec): Add addr_space support.
 (struct cp_decl_specifier_seq): Likewise.
 * decl.cc (get_type_quals): Likewise.
 (check_tag_decl): Likewise.
(grokdeclarator): Likewise.
 * parser.cc (cp_parser_type_specifier): Likewise.
 (cp_parser_cv_qualifier_seq_opt): Likewise.
 (cp_parser_postfix_expression): Likewise.
 (cp_parser_type_specifier): Likewise.
 (set_and_check_decl_spec_loc): Likewise.
 * typeck.cc (composite_pointer_type): Likewise
 (comp_ptr_ttypes_real): Likewise.
(same_type_ignoring_top_level_qualifiers_p): Likewise.
 * pt.cc (check_cv_quals_for_unify): Likewise.
 (unify): Likewise.
 * tree.cc: Remove c_register_addr_space stub.
 * mangle.cc (write_CV_qualifiers_for_type): Mangle address spaces
   using the extended qualifier notation.

gcc/doc
 * extend.texi (Named Address Spaces): add a mention about C++
   support.

gcc/testsuite/
 * g++.dg/abi/mangle-addr-space1.C: New test.
 * g++.dg/abi/mangle-addr-space2.C: New test.
 * g++.dg/parse/addr-space.C: New test.
 * g++.dg/parse/addr-space1.C: New test.
 * g++.dg/parse/addr-space2.C: New test.
 * g++.dg/parse/template/spec-addr-space.C: New test.
 * g++.dg/ext/addr-space-decl.C: New test.
 * g++.dg/ext/addr-space-ref.C: New test.
 * g++.dg/ext/addr-space-ops.C: New test.

# Please enter the commit message for your changes. Lines starting
# with '#' will be ignored, and an empty message aborts the commit.
#
# Date:  Sun Oct 9 16:02:22 2022 +0200
#
# On branch releases/gcc-12
# Your branch is ahead of 'origin/releases/gcc-12' by 2 commits.
#   (use "git push" to publish your local commits)
#
# Changes to be committed:
#   modified:   gcc/c-family/c-common.cc
#   modified:   gcc/c-family/c-common.h
#   modified:   gcc/c/c-decl.cc
#   modified:   gcc/c/c-typeck.cc
#   modified:   gcc/cp/cp-tree

Re: [PATCH v2] c++: parser - Support for target address spaces in C++

2022-10-13 Thread Paul Iannetta via Gcc-patches
On Thu, Oct 13, 2022 at 07:46:46AM +0200, Jakub Jelinek wrote:
> On Thu, Oct 13, 2022 at 02:52:59AM +0200, Paul Iannetta via Gcc-patches wrote:
> > +   if (type != error_mark_node
> > +   && !ADDR_SPACE_GENERIC_P (TYPE_ADDR_SPACE (type))
> > +   && current_function_decl)
> > + {
> > +   error
> > + ("compound literal qualified by address-space qualifier");
> > +   type = error_mark_node;
> 
> Can you please write this as:
>   error ("compound literal qualified by address-space "
>  "qualifier");
> ?  That is how diagnostics that don't fit on one line are usually written.
> 
> > @@ -23812,6 +23830,11 @@ cp_parser_cv_qualifier_seq_opt (cp_parser* parser)
> >   break;
> > }
> >  
> > +  if (RID_FIRST_ADDR_SPACE <= token->keyword &&
> 
> && should never go at the end of line.
> 
> > + token->keyword <= RID_LAST_ADDR_SPACE)
> > +   cv_qualifier =
> 
> and similarly = (except for aggregate initializers).
> 
> > + ENCODE_QUAL_ADDR_SPACE (token->keyword - RID_FIRST_ADDR_SPACE);
> 
> So:
> 
>   if (RID_FIRST_ADDR_SPACE <= token->keyword
> && token->keyword <= RID_LAST_ADDR_SPACE)
>   cv_qualifier
> = ENCODE_QUAL_ADDR_SPACE (token->keyword - RID_FIRST_ADDR_SPACE);
> 
> > + int unified_cv =
> > +   CLEAR_QUAL_ADDR_SPACE (arg_cv_quals & ~parm_cv_quals)
> > +   | ENCODE_QUAL_ADDR_SPACE (as_common);
> 
> Similarly (but this time with ()s added to ensure correct formatting in
> some editors).
> 
> int unified_cv
>   = (CLEAR_QUAL_ADDR_SPACE (arg_cv_quals & ~parm_cv_quals)
>  | ENCODE_QUAL_ADDR_SPACE (as_common));
> 
> >result_type
> > = cp_build_qualified_type (void_type_node,
> > -  (cp_type_quals (TREE_TYPE (t1))
> > -   | cp_type_quals (TREE_TYPE (t2;
> > +  (CLEAR_QUAL_ADDR_SPACE (cp_type_quals 
> > (TREE_TYPE (t1)))
> > +   | CLEAR_QUAL_ADDR_SPACE (cp_type_quals 
> > (TREE_TYPE (t2)))
> 
> The above 2 lines are way too long.
> I'd suggest to use temporaries, say
> int quals1 = cp_type_quals (TREE_TYPE (t1));
> int quals2 = cp_type_quals (TREE_TYPE (t2));
> and use those.
> 
>   Jakub

Thank you for the style review, I'll apply them in the next iteration.

Paul






Re: [PATCH v2] c++: parser - Support for target address spaces in C++

2022-10-13 Thread Paul Iannetta via Gcc-patches
On Thu, Oct 13, 2022 at 11:02:24AM -0400, Jason Merrill wrote:
> On 10/12/22 20:52, Paul Iannetta wrote:
> > On Tue, Oct 11, 2022 at 09:49:43PM -0400, Jason Merrill wrote:
> > > 
> > > It surprises that this is the only place we complain about an object with 
> > > an
> > > address-space qualifier.  Shouldn't we also complain about e.g. automatic
> > > variables/parameters or non-static data members with address-space 
> > > qualified
> > > type?
> > > 
> > 
> > Indeed, I was missing quite a few things here.  Thanks.
> > I used the draft as basis this time and imported from the C
> > implementation the relevant parts.  This time, the errors get properly
> > emitted when an address space is unduly specified; and comparisons,
> > assignments and comparisons are taken care of.
> > 
> > There are quite a few things I would like to clarify concerning some
> > implementation details.
> >- A variable with automatic storage (which is neither a pointer nor
> >  a reference) cannot be qualified with an address space.  I detect
> >  this by the combination of `sc_none' and `! toplevel_bindings_p ()',
> >  but I've also seen the use of `at_function_scope' at other places.
> >  And I'm unsure which one is appropriate here.
> >  This detection happens at the very end of grokdeclarator because I
> >  need to know that the type is a pointer, which is not know until
> >  very late in the function.
> 
> At that point you have the decl, and you can ask directly what its storage
> duration is, perhaps using decl_storage_duration.
> 
> But why do you need to know whether the type is a pointer?  The attribute
> applies to the target type of the pointer, not the pointer type.  I think
> the problem is that you're looking at declspecs when you ought to be looking
> at type_quals.
> 

I need to know that the base type is a pointer to reject invalid
declarations such as:

int f (__seg_fs int a) { } or int f () { __seg_fs int a; }

because parameters and auto variables can have an address space
qualifier only if they are pointer or reference type, which I can't
tell only from type_quals.

> >- I'm having some trouble deciding whether I include those three
> >  stub programs as tests, they all compile fine and clang accepts
> >  them as well.
> 
> Why not?

I thought they were pretty contrived, since it does not make much
sense to strip address space qualifiers, even though it does prove
that the implementation support those contrived but valid uses.

> 
> > Ex1:
> > ```
> > int __seg_fs * fs1;
> > int __seg_gs * gs1;
> > 
> > template struct strip;
> > template struct strip<__seg_fs T *> { typedef T type; };
> > template struct strip<__seg_gs T *> { typedef T type; };
> > 
> > int
> > main ()
> > {
> >  *(strip::type *) fs1 == *(strip::type *) 
> > gs1;
> >  return 0;
> > }
> > ```
> > 
> > Ex2:
> > ```
> > int __seg_fs * fs1;
> > int __seg_fs * fs2;
> > 
> > template auto f (T __seg_fs * a, U __seg_gs * b) { 
> > return a; }
> > template auto f (T __seg_gs * a, U __seg_fs * b) { 
> > return a; }
> > 
> > int
> > main ()
> > {
> >  f (fs1, gs1);
> >  f (gs1, fs1);
> >  return 0;
> > }
> > ```
> > 
> > Ex3:
> > ```
> > int __seg_fs * fs1;
> > int __seg_gs * gs1;
> > 
> > template
> > auto f (T __seg_fs * a, U __seg_gs * b)
> > {
> >  return *(T *) a == *(U *) b;
> > }
> > 
> > int
> > main ()
> > {
> >  return f (fs1, gs1);
> > }
> > ```
> > 
> > 
> > Add support for custom address spaces in C++
> > 
> > gcc/
> >  * tree.h (ENCODE_QUAL_ADDR_SPACE): Missing parentheses.
> > 
> > gcc/c/
> >  * c-decl.cc: Remove c_register_addr_space.
> > 
> > gcc/c-family/
> >  * c-common.cc (c_register_addr_space): Imported from c-decl.cc
> >  (addr_space_superset): Imported from gcc/c/c-typecheck.cc
> >  * c-common.h: Remove the FIXME.
> >  (addr_space_superset): New declaration.
> > 
> > gcc/cp/
> >  * cp-tree.h (enum cp_decl_spec): Add addr_space support.
> >  (struct cp_decl_specifier_seq): Likewise.
> >  * decl.cc (get_type_quals): Likewise.
> >  (check_tag_decl): Likewise.
> > (grokdeclarator): Likewise.
> >  * parser.cc (cp_parser_type_specifier): Likewise.
> >  (cp_parser_cv_qualifier_seq_opt): Likewise.
> >  (cp_parser_postfix_expression): Likewise.
> >  (cp_parser_type_specifier): Likewise.
> >  (set_and_check_decl_spec_loc): Likewise.
> >  * typeck.cc (composite_pointer_type): Likewise
> >  (comp_ptr_ttypes_real): Likewise.
> > (same_type_ignoring_top_level_qualifiers_p): Likewise.
> >  * pt.cc (check_cv_quals_for_unify): Likewise.
> >  (unify): Likewise.
> >  * tree.cc: Remove c_register_addr_space stub.
> >  * mangle.cc (write_CV_qualifiers_for_type): Mangle address spaces
> >using the extended qualifier notation.
> > 
> > gcc/doc
> >  * extend.texi (Named Address

Re: [PATCH] c++ modules: ICE with templated friend and std namespace [PR100134]

2022-10-13 Thread Jason Merrill via Gcc-patches

On 10/11/22 13:40, Nathan Sidwell wrote:

On 10/11/22 11:35, Patrick Palka wrote:

IIUC the function depset::hash::add_binding_entity has an assert
verifying that if a namespace contains an exported entity, then
the namespace must have been opened in the module purview:

   if (data->hash->add_namespace_entities (decl, data->partitions))
 {
   /* It contains an exported thing, so it is exported.  */
   gcc_checking_assert (DECL_MODULE_PURVIEW_P (decl));
   DECL_MODULE_EXPORT_P (decl) = true;
 }

We're tripping over this assert in the below testcase because by
instantiating and exporting std::A, we end up in turn defining
and exporting the hidden friend std::f without ever having opening
the enclosing namespace std within the module purview and thus
DECL_MODULE_PURVIEW_P (std_node) is false.

Note that it's important that the enclosing namespace is std here: if we
use a different namespace then the ICE disappears.  This probably has
something to do with the fact that we predefine std via push_namespace
from cxx_init_decl_processing (which makes it look like we've opened the
namespace in the TU), whereas with another namespace we would instead
lazily obtain the NAMESPACE_DECL from add_imported_namespace.

Since templated frined functions are special in that they give us a way
to declare a new namespace-scope function without having to explicitly
open the namespace, this patch proposes to fix this issue by propagating
DECL_MODULE_PURVIEW_P from a friend function to the enclosing namespace
when instantiating the friend.


ouch.  This is ok, but I think we have a bug -- what is the module 
ownership of the friend introduced by the instantiation?


Haha, there's a note on 13.7.5/3 -- the attachment is to the same module 
as the befriending class.


That means we end up creating and writing out entities that exist in the 
symbol table (albeit hidden) whose module ownership is neither the 
global module or the tu's module.  That's not something the module 
machinery anticipates. We'll get the mangling wrong for starters. Hmm.


These are probably rare.  Thinking about the right solution though ...


This seems closely connected to

 https://cplusplus.github.io/CWG/issues/2588.html

Jason


nathan




Tested on x86_64-pc-linux-gnu, does this look like the right fix?  Other
solutions that seem to work are to set DECL_MODULE_PURVIEW_P on std_node
after the fact from declare_module, or simply to suppress the assert for
std_node.

PR c++/100134

gcc/cp/ChangeLog:

* pt.cc (tsubst_friend_function): Propagate DECL_MODULE_PURVIEW_P
from the new declaration to the enclosing namespace scope.

gcc/testsuite/ChangeLog:

* g++.dg/modules/tpl-friend-8_a.H: New test.
* g++.dg/modules/tpl-friend-8_b.C: New test.
---
  gcc/cp/pt.cc  | 7 +++
  gcc/testsuite/g++.dg/modules/tpl-friend-8_a.H | 9 +
  gcc/testsuite/g++.dg/modules/tpl-friend-8_b.C | 8 
  3 files changed, 24 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/modules/tpl-friend-8_a.H
  create mode 100644 gcc/testsuite/g++.dg/modules/tpl-friend-8_b.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 5b9fc588a21..9e3085f3fa6 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -11448,6 +11448,13 @@ tsubst_friend_function (tree decl, tree args)
   by duplicate_decls.  */
    new_friend = old_decl;
  }
+
+  /* We've just added a new namespace-scope entity to the purview 
without

+ necessarily having opened the enclosing namespace, so make sure the
+ enclosing namespace is in the purview now too.  */
+  if (TREE_CODE (DECL_CONTEXT (new_friend)) == NAMESPACE_DECL)
+    DECL_MODULE_PURVIEW_P (DECL_CONTEXT (new_friend))
+  |= DECL_MODULE_PURVIEW_P (STRIP_TEMPLATE (new_friend));
  }
    else
  {
diff --git a/gcc/testsuite/g++.dg/modules/tpl-friend-8_a.H 
b/gcc/testsuite/g++.dg/modules/tpl-friend-8_a.H

new file mode 100644
index 000..bd2290460b5
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/tpl-friend-8_a.H
@@ -0,0 +1,9 @@
+// PR c++/100134
+// { dg-additional-options -fmodule-header }
+// { dg-module-cmi {} }
+
+namespace std {
+  template struct A {
+    friend void f(A) { }
+  };
+}
diff --git a/gcc/testsuite/g++.dg/modules/tpl-friend-8_b.C 
b/gcc/testsuite/g++.dg/modules/tpl-friend-8_b.C

new file mode 100644
index 000..76d7447c2eb
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/tpl-friend-8_b.C
@@ -0,0 +1,8 @@
+// PR c++/100134
+// { dg-additional-options -fmodules-ts }
+// { dg-module-cmi pr100134 }
+export module pr100134;
+
+import "tpl-friend-8_a.H";
+
+export std::A a;






[COMMITTED 0/4] Add partial equivalences to the oracle.

2022-10-13 Thread Andrew MacLeod via Gcc-patches

This patch implements partial equivalences in the relation oracle.

They are tracked much like normal equivalences, in that they all belong 
to a set.  I refer to them as "slices" of an ssa-name.  A little extra 
info is maintained for a partial set in class pe_slice.


A slice contains the bitmap of other members of the partial equivalence, 
the root ssa-name that every member is a slice of, along with the 
relation code indicating if its an 8, 16, 32, or 64 bit slice of that name.


4 new relation kinds are added for these:  VREL_PE8, VREL_PE16, 
VREL_PE32 and VREL_PE64.


The oracle maintains a vector of pe_slices representing one entry for 
each ssa-name globally. we determine at the def point what the LHS is a 
slice of. It is either the RHS, or becomes a member of the set the RHS 
is already in. ie:


long b_3 = foo()
a_4 = (short) b_3

a_4 is registered as a 16 bit slice of b_3.  and the slice set is {b_3, a_4}

c_5 = (char) a_4

a_4 is already in a slice set, so c_5 is registered into the set as a 8 
bit slice of b_3, and the set now contains {b_3, a_4, c_5}


If we query the relation between c_5 and a_4, it is trivial to check 
that that are in the same slice set, and therefore share bits.  The 
number of bits they share is the MIN of the slice size of each, so MIN 
(16, 8). The relation will be returned as VREL_PE8.   This means you can 
count on the lower 8 bits to be identical between the 2 ssa-names, and 
they will be defined as those bits in the root value in b_3.


That relation can then be used to determine if there is anything useful 
to be done with this relation by the caller.


In particular, this will fix 2 regressions from last year, PR 102540 and 
102872 where we lose the connection between a cast and a bitwise mask of 
the same size.  ie:


static long a;
static unsigned b;
int test1 () {
    long c, e;
    c = b = a;
    e = c ? 2 / (c + 1) : 0;
    if (e && !b)
    kill ();
    a = 0;

   :
Equivalence set : [_6, c_10]
Partial equiv (_2 pe32 a.0_1)
Partial equiv (_6 pe32 a.0_1)

  a.0_1 = a;
  _2 = (unsigned int) a.0_1;
  b = _2;
  _6 = a.0_1 & 4294967295;
  c_10 = _6;
  if (c_10 != 0)
    goto ; [INV]
  else
    goto ; [INV]

_6 : [irange] long int [0, 4294967295] NONZERO 0x
c_10 : [irange] long int [0, 4294967295] NONZERO 0x
2->3  (T) a.0_1 :   [irange] long int [-INF, -1][1, +INF]
2->3  (T) _6 :  [irange] long int [1, 4294967295] NONZERO 0x
2->3  (T) c_10 :    [irange] long int [1, 4294967295] NONZERO 0x
2->6  (F) a.0_1 :   [irange] long int [-INF, -4294967296][0, +INF] 
NONZERO 0x

2->6  (F) _6 :  [irange] long int [0, 0] NONZERO 0x0
2->6  (F) c_10 :    [irange] long int [0, 0] NONZERO 0x0

   :
  _4 = c_10 + 1;
  iftmp.2_12 = 2 / _4;
  if (iftmp.2_12 != 0)
    goto ; [INV]
  else
    goto ; [INV]

   :
  if (_2 == 0)

When we get to _2 == 0, ranger looks for any equivalences (full or 
partial) of _2 coming into this block. It sees that _6 on the edges 
2->3->4 has the range


2->3  (T) _6 :  [irange] long int [1, 4294967295] NONZERO 0x
and shares 32 bits.   Both _6 and _2 are 32 bits, so it casts that range 
of _6 and determines _2 is


_2  [irange] unsigned int [1, +INF]

and folds away the condition.

Bootstrapped on x86_64-pc-linux-gnu with no regressions.





[COMMITTED 1/4] Add partial equivalence support to the relation oracle.

2022-10-13 Thread Andrew MacLeod via Gcc-patches
This patch provide the new relation kinds as well as management of the 
partial equivalency slice table.


Bootstrapped on x86_64-pc-linux-gnu with no regressions.  Pushed

Andrew

From b5563410ea613ff2b2d7c6fa1847cfcb1ff91efb Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Thu, 6 Oct 2022 15:00:52 -0400
Subject: [PATCH 1/4] Add partial equivalence support to the relation oracle.

This provides enhancements to the equivalence oracle to also track
partial equivalences.  They are tracked similar to equivalences, except
it tracks a 'slice' of another ssa name.   8, 16, 32 and 64 bit slices are
tracked.  This will allow casts and mask of the same value to compare
equal.

	* value-relation.cc (equiv_chain::dump): Don't print empty
	equivalences.
	(equiv_oracle::equiv_oracle): Allocate a partial equiv table.
	(equiv_oracle::~equiv_oracle): Release the partial equiv table.
	(equiv_oracle::add_partial_equiv): New.
	(equiv_oracle::partial_equiv_set): New.
	(equiv_oracle::partial_equiv): New.
	(equiv_oracle::query_relation): Check for partial equivs too.
	(equiv_oracle::dump): Also dump partial equivs.
	(dom_oracle::register_relation): Handle partial equivs.
	(dom_oracle::query_relation): Check for partial equivs.
	* value-relation.h (enum relation_kind_t): Add partial equivs.
	(relation_partial_equiv_p): New.
	(relation_equiv_p): New.
	(class pe_slice): New.
	(class equiv_oracle): Add prototypes.
	(pe_to_bits): New.
	(bits_to_pe): New.
	(pe_min): New.
---
 gcc/value-relation.cc | 165 +++---
 gcc/value-relation.h  |  78 +++-
 2 files changed, 229 insertions(+), 14 deletions(-)

diff --git a/gcc/value-relation.cc b/gcc/value-relation.cc
index 1ee6da199f2..ceeca53e0a1 100644
--- a/gcc/value-relation.cc
+++ b/gcc/value-relation.cc
@@ -32,10 +32,11 @@ along with GCC; see the file COPYING3.  If not see
 #include "alloc-pool.h"
 #include "dominance.h"
 
-#define VREL_LAST   VREL_NE
+#define VREL_LAST   VREL_PE64
 
 static const char *kind_string[VREL_LAST + 1] =
-{ "varying", "undefined", "<", "<=", ">", ">=", "==", "!=" };
+{ "varying", "undefined", "<", "<=", ">", ">=", "==", "!=", "pe8", "pe16",
+  "pe32", "pe64" };
 
 // Print a relation_kind REL to file F.
 
@@ -302,7 +303,7 @@ equiv_chain::dump (FILE *f) const
   bitmap_iterator bi;
   unsigned i;
 
-  if (!m_names)
+  if (!m_names || bitmap_empty_p (m_names))
 return;
   fprintf (f, "Equivalence set : [");
   unsigned c = 0;
@@ -329,18 +330,124 @@ equiv_oracle::equiv_oracle ()
   obstack_init (&m_chain_obstack);
   m_self_equiv.create (0);
   m_self_equiv.safe_grow_cleared (num_ssa_names + 1);
+  m_partial.create (0);
+  m_partial.safe_grow_cleared (num_ssa_names + 1);
 }
 
 // Destruct an equivalency oracle.
 
 equiv_oracle::~equiv_oracle ()
 {
+  m_partial.release ();
   m_self_equiv.release ();
   obstack_free (&m_chain_obstack, NULL);
   m_equiv.release ();
   bitmap_obstack_release (&m_bitmaps);
 }
 
+// Add a partial equivalence R between OP1 and OP2.
+
+void
+equiv_oracle::add_partial_equiv (relation_kind r, tree op1, tree op2)
+{
+  int v1 = SSA_NAME_VERSION (op1);
+  int v2 = SSA_NAME_VERSION (op2);
+  int prec2 = TYPE_PRECISION (TREE_TYPE (op2));
+  int bits = pe_to_bits (r);
+  gcc_checking_assert (bits && prec2 >= bits);
+
+  if (v1 >= (int)m_partial.length () || v2 >= (int)m_partial.length ())
+m_partial.safe_grow_cleared (num_ssa_names + 1);
+  gcc_checking_assert (v1 < (int)m_partial.length ()
+		   && v2 < (int)m_partial.length ());
+
+  pe_slice &pe1 = m_partial[v1];
+  pe_slice &pe2 = m_partial[v2];
+
+  if (pe1.members)
+{
+  // If the definition pe1 already has an entry, either the stmt is
+  // being re-evaluated, or the def was used before being registered.
+  // In either case, if PE2 has an entry, we simply do nothing.
+  if (pe2.members)
+	return;
+  // PE1 is the LHS and already has members, so everything in the set
+  // should be a slice of PE2 rather than PE1.
+  pe2.code = pe_min (r, pe1.code);
+  pe2.ssa_base = op2;
+  pe2.members = pe1.members;
+  bitmap_iterator bi;
+  unsigned x;
+  EXECUTE_IF_SET_IN_BITMAP (pe1.members, 0, x, bi)
+	{
+	  m_partial[x].ssa_base = op2;
+	  m_partial[x].code = pe2.code;
+	}
+  bitmap_set_bit (pe1.members, v2);
+  return;
+}
+  if (pe2.members)
+{
+  pe1.ssa_base = pe2.ssa_base;
+  // If pe2 is a 16 bit value, but only an 8 bit copy, we can't be any
+  // more than an 8 bit equivalence here, so choose MIN value.
+  pe1.code = pe_min (r, pe2.code);
+  pe1.members = pe2.members;
+  bitmap_set_bit (pe1.members, v1);
+}
+  else
+{
+  // Neither name has an entry, simply create op1 as slice of op2.
+  pe2.code = bits_to_pe (TYPE_PRECISION (TREE_TYPE (op2)));
+  if (pe2.code == VREL_VARYING)
+	return;
+  pe2.ssa_base = op2;
+  pe2.members = BITMAP_ALLOC (&m_bitmaps);
+  bitmap_set_bit (pe2.membe

[COMMITTED 2/4] Add equivalence iterator to relation oracle.

2022-10-13 Thread Andrew MacLeod via Gcc-patches
Instead of looping over an exposed equivalence bitmap, provide iterators 
to loop over equivalences, partial equivalences, or both.


Bootstrapped on x86_64-pc-linux-gnu with no regressions.  Pushed

Andrew
From aa05838b0536422256e0c477c57f1ea1d2915e92 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Fri, 7 Oct 2022 12:55:32 -0400
Subject: [PATCH 2/4] Add equivalence iterator to relation oracle.

Instead of looping over an exposed equivalence bitmap, provide iterators
to loop over equivalences, partial equivalences, or both.

	* gimple-range-cache.cc (ranger_cache::fill_block_cache): Use
	iterator.
	* value-relation.cc
	  (equiv_relation_iterator::equiv_relation_iterator): New.
	(equiv_relation_iterator::next): New.
	(equiv_relation_iterator::get_name): New.
	* value-relation.h (class relation_oracle): Privatize some methods.
	(class equiv_relation_iterator): New.
	(FOR_EACH_EQUIVALENCE): New.
	(FOR_EACH_PARTIAL_EQUIV): New.
	(FOR_EACH_PARTIAL_AND_FULL_EQUIV): New.
---
 gcc/gimple-range-cache.cc | 10 +
 gcc/value-relation.cc | 78 +++
 gcc/value-relation.h  | 41 ++--
 3 files changed, 118 insertions(+), 11 deletions(-)

diff --git a/gcc/gimple-range-cache.cc b/gcc/gimple-range-cache.cc
index 4782d47265e..8c80ba6cd14 100644
--- a/gcc/gimple-range-cache.cc
+++ b/gcc/gimple-range-cache.cc
@@ -1220,15 +1220,9 @@ ranger_cache::fill_block_cache (tree name, basic_block bb, basic_block def_bb)
   // See if any equivalences can refine it.
   if (m_oracle)
 	{
-	  unsigned i;
-	  bitmap_iterator bi;
-	  // Query equivalences in read-only mode.
-	  const_bitmap equiv = m_oracle->equiv_set (name, bb);
-	  EXECUTE_IF_SET_IN_BITMAP (equiv, 0, i, bi)
+	  tree equiv_name;
+	  FOR_EACH_EQUIVALENCE (m_oracle, bb, name, equiv_name)
 	{
-	  if (i == SSA_NAME_VERSION (name))
-		continue;
-	  tree equiv_name = ssa_name (i);
 	  basic_block equiv_bb = gimple_bb (SSA_NAME_DEF_STMT (equiv_name));
 
 	  // Check if the equiv has any ranges calculated.
diff --git a/gcc/value-relation.cc b/gcc/value-relation.cc
index ceeca53e0a1..50fc190a36b 100644
--- a/gcc/value-relation.cc
+++ b/gcc/value-relation.cc
@@ -1641,3 +1641,81 @@ path_oracle::dump (FILE *f) const
   fprintf (f, "\n");
 }
 }
+
+// 
+//  EQUIV iterator.  Although we have bitmap iterators, don't expose that it
+//  is currently a bitmap.  Use an export iterator to hide future changes.
+
+// Construct a basic iterator over an equivalence bitmap.
+
+equiv_relation_iterator::equiv_relation_iterator (relation_oracle *oracle,
+		  basic_block bb, tree name,
+		  bool full, bool partial)
+{
+  m_name = name;
+  m_oracle = oracle;
+  m_pe = partial ? oracle->partial_equiv_set (name) : NULL;
+  m_bm = NULL;
+  if (full)
+m_bm = oracle->equiv_set (name, bb);
+  if (!m_bm && m_pe)
+m_bm = m_pe->members;
+  if (m_bm)
+bmp_iter_set_init (&m_bi, m_bm, 1, &m_y);
+}
+
+// Move to the next export bitmap spot.
+
+void
+equiv_relation_iterator::next ()
+{
+  bmp_iter_next (&m_bi, &m_y);
+}
+
+// Fetch the name of the next export in the export list.  Return NULL if
+// iteration is done.
+
+tree
+equiv_relation_iterator::get_name (relation_kind *rel)
+{
+  if (!m_bm)
+return NULL_TREE;
+
+  while (bmp_iter_set (&m_bi, &m_y))
+{
+  // Do not return self.
+  tree t = ssa_name (m_y);
+  if (t && t != m_name)
+	{
+	  relation_kind k = VREL_EQ;
+	  if (m_pe && m_bm == m_pe->members)
+	{
+	  const pe_slice *equiv_pe = m_oracle->partial_equiv_set (t);
+	  if (equiv_pe && equiv_pe->members == m_pe->members)
+		k = pe_min (m_pe->code, equiv_pe->code);
+	  else
+		k = VREL_VARYING;
+	}
+	  if (relation_equiv_p (k))
+	{
+	  if (rel)
+		*rel = k;
+	  return t;
+	}
+	}
+  next ();
+}
+
+  // Process partial equivs after full equivs if both were requested.
+  if (m_pe && m_bm != m_pe->members)
+{
+  m_bm = m_pe->members;
+  if (m_bm)
+	{
+	  // Recursively call back to process First PE.
+	  bmp_iter_set_init (&m_bi, m_bm, 1, &m_y);
+	  return get_name (rel);
+	}
+}
+  return NULL_TREE;
+}
diff --git a/gcc/value-relation.h b/gcc/value-relation.h
index f5f2524ad56..a3bbe1e8157 100644
--- a/gcc/value-relation.h
+++ b/gcc/value-relation.h
@@ -100,9 +100,6 @@ public:
   // register a relation between 2 ssa names on an edge.
   void register_edge (edge, relation_kind, tree, tree);
 
-  // Return equivalency set for an SSA name in a basic block.
-  virtual const_bitmap equiv_set (tree, basic_block) = 0;
-  virtual const class pe_slice *partial_equiv_set (tree) { return NULL; }
   // register a relation between 2 ssa names in a basic block.
   virtual void register_relation (basic_block, relation_kind, tree, tree) = 0;
   // Query for a relation between two ssa names in a basic block.
@@ -115,6 +112,11 @@ public:
   virtual void dump (FILE *) c

[COMMITTED 4/4] PR tree-optimization/102540 - propagate partial equivs in the cache.

2022-10-13 Thread Andrew MacLeod via Gcc-patches
Rangers on entry cache propagation already evaluates equivalences when 
calculating values. This patch also allows it to work with partial 
equivalences, and if the bit sizes are compatible, make use of those 
ranges as well.


It attempts to be conservative, so should be safe.

This resolves regressions in both PR 102540 and PR 102872.

Bootstrapped on x86_64-pc-linux-gnu with no regressions.  Pushed

Andrew
From 6cc3394507a2303a18891d34222c53f679256c37 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Wed, 5 Oct 2022 10:42:07 -0400
Subject: [PATCH 4/4] propagate partial equivs in the cache.

Adjust on-entry cache propagation to look for and propagate both full
and partial equivalences.

	gcc/
	PR tree-optimization/102540
	PR tree-optimization/102872
	* gimple-range-cache.cc (ranger_cache::fill_block_cache):
	Handle partial equivs.
	(ranger_cache::range_from_dom): Cleanup dump output.

	gcc/testsuite/
	* gcc.dg/pr102540.c: New.
	* gcc.dg/pr102872.c: New.
---
 gcc/gimple-range-cache.cc   | 37 +++--
 gcc/testsuite/gcc.dg/pr102540.c | 19 +
 gcc/testsuite/gcc.dg/pr102872.c | 16 ++
 3 files changed, 66 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr102540.c
 create mode 100644 gcc/testsuite/gcc.dg/pr102872.c

diff --git a/gcc/gimple-range-cache.cc b/gcc/gimple-range-cache.cc
index 8c80ba6cd14..0b9aa3639c5 100644
--- a/gcc/gimple-range-cache.cc
+++ b/gcc/gimple-range-cache.cc
@@ -1189,8 +1189,9 @@ ranger_cache::fill_block_cache (tree name, basic_block bb, basic_block def_bb)
 {
   edge_iterator ei;
   edge e;
-  Value_Range block_result (TREE_TYPE (name));
-  Value_Range undefined (TREE_TYPE (name));
+  tree type = TREE_TYPE (name);
+  Value_Range block_result (type);
+  Value_Range undefined (type);
 
   // At this point we shouldn't be looking at the def, entry or exit block.
   gcc_checking_assert (bb != def_bb && bb != ENTRY_BLOCK_PTR_FOR_FN (cfun) &&
@@ -1221,10 +1222,16 @@ ranger_cache::fill_block_cache (tree name, basic_block bb, basic_block def_bb)
   if (m_oracle)
 	{
 	  tree equiv_name;
-	  FOR_EACH_EQUIVALENCE (m_oracle, bb, name, equiv_name)
+	  relation_kind rel;
+	  int prec = TYPE_PRECISION (type);
+	  FOR_EACH_PARTIAL_AND_FULL_EQUIV (m_oracle, bb, name, equiv_name, rel)
 	{
 	  basic_block equiv_bb = gimple_bb (SSA_NAME_DEF_STMT (equiv_name));
 
+	  // Ignore partial equivs that are smaller than this object.
+	  if (rel != VREL_EQ && prec > pe_to_bits (rel))
+		continue;
+
 	  // Check if the equiv has any ranges calculated.
 	  if (!m_gori.has_edge_range_p (equiv_name))
 		continue;
@@ -1234,16 +1241,32 @@ ranger_cache::fill_block_cache (tree name, basic_block bb, basic_block def_bb)
 		  (equiv_bb && !dominated_by_p (CDI_DOMINATORS, bb, equiv_bb)))
 		continue;
 
+	  if (DEBUG_RANGE_CACHE)
+		{
+		  if (rel == VREL_EQ)
+		fprintf (dump_file, "Checking Equivalence (");
+		  else
+		fprintf (dump_file, "Checking Partial equiv (");
+		  print_relation (dump_file, rel);
+		  fprintf (dump_file, ") ");
+		  print_generic_expr (dump_file, equiv_name, TDF_SLIM);
+		  fprintf (dump_file, "\n");
+		}
 	  Value_Range equiv_range (TREE_TYPE (equiv_name));
 	  if (range_from_dom (equiv_range, equiv_name, bb, RFD_READ_ONLY))
 		{
+		  if (rel != VREL_EQ)
+		range_cast (equiv_range, type);
 		  if (block_result.intersect (equiv_range))
 		{
 		  if (DEBUG_RANGE_CACHE)
 			{
-			  fprintf (dump_file, "Equivalence update! :  ");
+			  if (rel == VREL_EQ)
+			fprintf (dump_file, "Equivalence update! :  ");
+			  else
+			fprintf (dump_file, "Partial equiv update! :  ");
 			  print_generic_expr (dump_file, equiv_name, TDF_SLIM);
-			  fprintf (dump_file, "had range  :  ");
+			  fprintf (dump_file, " has range  :  ");
 			  equiv_range.dump (dump_file);
 			  fprintf (dump_file, " refining range to :");
 			  block_result.dump (dump_file);
@@ -1458,7 +1481,9 @@ ranger_cache::range_from_dom (vrange &r, tree name, basic_block start_bb,
 
   if (DEBUG_RANGE_CACHE)
 {
-  fprintf (dump_file, "CACHE: BB %d DOM query, found ", start_bb->index);
+  fprintf (dump_file, "CACHE: BB %d DOM query for ", start_bb->index);
+  print_generic_expr (dump_file, name, TDF_SLIM);
+  fprintf (dump_file, ", found ");
   r.dump (dump_file);
   if (bb)
 	fprintf (dump_file, " at BB%d\n", bb->index);
diff --git a/gcc/testsuite/gcc.dg/pr102540.c b/gcc/testsuite/gcc.dg/pr102540.c
new file mode 100644
index 000..c12f8fcebfb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr102540.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-evrp" } */
+
+
+void kill();
+
+static long a;
+static unsigned b;
+int test1 () {
+long c, e;
+c = b = a;
+e = c ? 2 / (c + 1) : 0;
+if (e && !b)
+kill ();
+a = 0;
+}
+
+/* { dg-final { scan-tree-dump-not "kill" "evrp" } }  */
+
diff --git a/gcc/testsuite/gcc.dg/pr102872.c b/

[COMMITTED 3/4] Add partial equivalence recognition to cast and bitwise and.

2022-10-13 Thread Andrew MacLeod via Gcc-patches
This provides the hooks that will register basic partial equivalencies 
for casts and bitwise AND operations with the appropriate bit pattern.


Bootstrapped on x86_64-pc-linux-gnu with no regressions.  Pushed

Andrew
From d75be7e4343f049176546aa9517d570e5eb67954 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Thu, 6 Oct 2022 15:01:24 -0400
Subject: [PATCH 3/4] Add partial equivalence recognition to cast and bitwise
 and.

This provides the hooks that will register partial equivalencies for
casts and bitwise AND operations with the appropriate bit pattern.

	* range-op.cc (operator_cast::lhs_op1_relation): New.
	(operator_bitwise_and::lhs_op1_relation): New.
---
 gcc/range-op.cc | 65 +
 1 file changed, 65 insertions(+)

diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index f8255dd10a1..cf7f0dcd670 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -2417,6 +2417,10 @@ public:
 			  const irange &lhs,
 			  const irange &op2,
 			  relation_kind rel = VREL_VARYING) const;
+  virtual relation_kind lhs_op1_relation (const irange &lhs,
+	  const irange &op1,
+	  const irange &op2,
+	  relation_kind) const;
 private:
   bool truncating_cast_p (const irange &inner, const irange &outer) const;
   bool inside_domain_p (const wide_int &min, const wide_int &max,
@@ -2425,6 +2429,35 @@ private:
 			   const irange &outer) const;
 } op_convert;
 
+// Add a partial equivalence between the LHS and op1 for casts.
+
+relation_kind
+operator_cast::lhs_op1_relation (const irange &lhs,
+ const irange &op1,
+ const irange &op2 ATTRIBUTE_UNUSED,
+ relation_kind) const
+{
+  if (lhs.undefined_p () || op1.undefined_p ())
+return VREL_VARYING;
+  unsigned lhs_prec = TYPE_PRECISION (lhs.type ());
+  unsigned op1_prec = TYPE_PRECISION (op1.type ());
+  // If the result gets sign extended into a larger type check first if this
+  // qualifies as a partial equivalence.
+  if (TYPE_SIGN (op1.type ()) == SIGNED && lhs_prec > op1_prec)
+{
+  // If the result is sign extended, and the LHS is larger than op1,
+  // check if op1's range can be negative as the sign extention will
+  // cause the upper bits to be 1 instead of 0, invalidating the PE.
+  int_range<3> negs = range_negatives (op1.type ());
+  negs.intersect (op1);
+  if (!negs.undefined_p ())
+	return VREL_VARYING;
+}
+
+  unsigned prec = MIN (lhs_prec, op1_prec);
+  return bits_to_pe (prec);
+}
+
 // Return TRUE if casting from INNER to OUTER is a truncating cast.
 
 inline bool
@@ -2739,6 +2772,10 @@ public:
 		const wide_int &lh_ub,
 		const wide_int &rh_lb,
 		const wide_int &rh_ub) const;
+  virtual relation_kind lhs_op1_relation (const irange &lhs,
+	  const irange &op1,
+	  const irange &op2,
+	  relation_kind) const;
 private:
   void simple_op1_range_solver (irange &r, tree type,
 const irange &lhs,
@@ -2784,6 +2821,34 @@ wi_optimize_signed_bitwise_op (irange &r, tree type,
   return true;
 }
 
+// An AND of 8,16, 32 or 64 bits can produce a partial equivalence between
+// the LHS and op1.
+
+relation_kind
+operator_bitwise_and::lhs_op1_relation (const irange &lhs,
+ const irange &op1,
+ const irange &op2,
+ relation_kind) const
+{
+  if (lhs.undefined_p () || op1.undefined_p () || op2.undefined_p ())
+return VREL_VARYING;
+  if (!op2.singleton_p ())
+return VREL_VARYING;
+  // if val == 0xff or 0x OR 0X OR 0X, return TRUE
+  int prec1 = TYPE_PRECISION (op1.type ());
+  int prec2 = TYPE_PRECISION (op2.type ());
+  int mask_prec = 0;
+  wide_int mask = op2.lower_bound ();
+  if (wi::eq_p (mask, wi::mask (8, false, prec2)))
+mask_prec = 8;
+  else if (wi::eq_p (mask, wi::mask (16, false, prec2)))
+mask_prec = 16;
+  else if (wi::eq_p (mask, wi::mask (32, false, prec2)))
+mask_prec = 32;
+  else if (wi::eq_p (mask, wi::mask (64, false, prec2)))
+mask_prec = 64;
+  return bits_to_pe (MIN (prec1, mask_prec));
+}
 
 // Optimize BIT_AND_EXPR and BIT_IOR_EXPR in terms of a mask if
 // possible.  Basically, see if we can optimize:
-- 
2.37.3



[PATCH] [X86_64]: Enable support for next generation AMD Zen4 CPU

2022-10-13 Thread Joshi, Tejas Sanjay via Gcc-patches
[Public]

Hi all,

PFA, the patch that enables support for the next generation AMD Zen4 CPU via 
-march=znver4.
This is a basic enablement patch and as of now the costings, tunings are kept 
same as znver3.

Good for trunk?

Regards,
Tejas

0001-Enable-AMD-znver4-support-and-add-instruction-reserv.patch
Description: 0001-Enable-AMD-znver4-support-and-add-instruction-reserv.patch


[PATCH] c++ modules: verify_type failure with typedef enum [PR106848]

2022-10-13 Thread Patrick Palka via Gcc-patches
Here during stream in we end up having created a type variant for the enum
before we read the enum's definition, and thus the variant inherited stale
TYPE_VALUES and TYPE_MIN/MAX_VALUES, which leads to an ICE (with -g).  The
stale variant got created from set_underlying_type during earlier stream in
of the (redundant) typedef for the enum.

This patch works around this by setting TYPE_VALUES and TYPE_MIN/MAX_VALUES
for all variants when reading in an enum definition.  Does this look like
the right approach?  Or perhaps we need to arrange that we read the enum
definition before reading in the typedef decl?  Note that seems to be an
issue only when the typedef name and enum names are the same (thus the
typedef is redundant), otherwise we seem to read the enum definition first
as desired.

PR c++/106848

gcc/cp/ChangeLog:

* module.cc (trees_in::read_enum_def): Set the TYPE_VALUES,
TYPE_MIN_VALUE and TYPE_MAX_VALUE of all type variants.

gcc/testsuite/ChangeLog:

* g++.dg/modules/enum-9_a.H: New test.
* g++.dg/modules/enum-9_b.C: New test.
---
 gcc/cp/module.cc| 9 ++---
 gcc/testsuite/g++.dg/modules/enum-9_a.H | 5 +
 gcc/testsuite/g++.dg/modules/enum-9_b.C | 6 ++
 3 files changed, 17 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/enum-9_a.H
 create mode 100644 gcc/testsuite/g++.dg/modules/enum-9_b.C

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 7ffeefa7c1f..97fb80bcd44 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -12303,9 +12303,12 @@ trees_in::read_enum_def (tree defn, tree 
maybe_template)
 
   if (installing)
 {
-  TYPE_VALUES (type) = values;
-  TYPE_MIN_VALUE (type) = min;
-  TYPE_MAX_VALUE (type) = max;
+  for (tree t = type; t; t = TYPE_NEXT_VARIANT (t))
+   {
+ TYPE_VALUES (t) = values;
+ TYPE_MIN_VALUE (t) = min;
+ TYPE_MAX_VALUE (t) = max;
+   }
 
   rest_of_type_compilation (type, DECL_NAMESPACE_SCOPE_P (defn));
 }
diff --git a/gcc/testsuite/g++.dg/modules/enum-9_a.H 
b/gcc/testsuite/g++.dg/modules/enum-9_a.H
new file mode 100644
index 000..fb7d10ad3b6
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/enum-9_a.H
@@ -0,0 +1,5 @@
+// PR c++/106848
+// { dg-additional-options -fmodule-header }
+// { dg-module-cmi {} }
+
+typedef enum memory_order { memory_order_seq_cst } memory_order;
diff --git a/gcc/testsuite/g++.dg/modules/enum-9_b.C 
b/gcc/testsuite/g++.dg/modules/enum-9_b.C
new file mode 100644
index 000..63e81675d0a
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/enum-9_b.C
@@ -0,0 +1,6 @@
+// PR c++/106848
+// { dg-additional-options "-fmodules-ts -g" }
+
+import "enum-9_a.H";
+
+memory_order x = memory_order_seq_cst;
-- 
2.38.0.68.ge85701b4af



Re: [PATCH v2] c++: parser - Support for target address spaces in C++

2022-10-13 Thread Jason Merrill via Gcc-patches

On 10/13/22 11:23, Paul Iannetta wrote:

On Thu, Oct 13, 2022 at 11:02:24AM -0400, Jason Merrill wrote:

On 10/12/22 20:52, Paul Iannetta wrote:

On Tue, Oct 11, 2022 at 09:49:43PM -0400, Jason Merrill wrote:


It surprises that this is the only place we complain about an object with an
address-space qualifier.  Shouldn't we also complain about e.g. automatic
variables/parameters or non-static data members with address-space qualified
type?



Indeed, I was missing quite a few things here.  Thanks.
I used the draft as basis this time and imported from the C
implementation the relevant parts.  This time, the errors get properly
emitted when an address space is unduly specified; and comparisons,
assignments and comparisons are taken care of.

There are quite a few things I would like to clarify concerning some
implementation details.
- A variable with automatic storage (which is neither a pointer nor
  a reference) cannot be qualified with an address space.  I detect
  this by the combination of `sc_none' and `! toplevel_bindings_p ()',
  but I've also seen the use of `at_function_scope' at other places.
  And I'm unsure which one is appropriate here.
  This detection happens at the very end of grokdeclarator because I
  need to know that the type is a pointer, which is not know until
  very late in the function.


At that point you have the decl, and you can ask directly what its storage
duration is, perhaps using decl_storage_duration.

But why do you need to know whether the type is a pointer?  The attribute
applies to the target type of the pointer, not the pointer type.  I think
the problem is that you're looking at declspecs when you ought to be looking
at type_quals.


I need to know that the base type is a pointer to reject invalid
declarations such as:

 int f (__seg_fs int a) { } or int f () { __seg_fs int a; }

because parameters and auto variables can have an address space
qualifier only if they are pointer or reference type, which I can't
tell only from type_quals.


But "int *__seg_fs a" is just as invalid as the above; the difference is 
not whether a is a pointer, but whether the address-space-qualified is 
the type of a itself or some sub-type.


You need to look at the qualifiers on type (which should also be the 
ones in type_quals), not the qualifiers in the declspecs.



- I'm having some trouble deciding whether I include those three
  stub programs as tests, they all compile fine and clang accepts
  them as well.


Why not?


I thought they were pretty contrived, since it does not make much
sense to strip address space qualifiers, even though it does prove
that the implementation support those contrived but valid uses.


Testcases are full of contrived examples testing corner cases.  :)




Ex1:
```
int __seg_fs * fs1;
int __seg_gs * gs1;

template struct strip;
template struct strip<__seg_fs T *> { typedef T type; };
template struct strip<__seg_gs T *> { typedef T type; };

int
main ()
{
  *(strip::type *) fs1 == *(strip::type *) 
gs1;
  return 0;
}
```

Ex2:
```
int __seg_fs * fs1;
int __seg_fs * fs2;

template auto f (T __seg_fs * a, U __seg_gs * b) { 
return a; }
template auto f (T __seg_gs * a, U __seg_fs * b) { 
return a; }

int
main ()
{
  f (fs1, gs1);
  f (gs1, fs1);
  return 0;
}
```

Ex3:
```
int __seg_fs * fs1;
int __seg_gs * gs1;

template
auto f (T __seg_fs * a, U __seg_gs * b)
{
  return *(T *) a == *(U *) b;
}

int
main ()
{
  return f (fs1, gs1);
}
```


Add support for custom address spaces in C++

gcc/
  * tree.h (ENCODE_QUAL_ADDR_SPACE): Missing parentheses.

gcc/c/
  * c-decl.cc: Remove c_register_addr_space.

gcc/c-family/
  * c-common.cc (c_register_addr_space): Imported from c-decl.cc
  (addr_space_superset): Imported from gcc/c/c-typecheck.cc
  * c-common.h: Remove the FIXME.
  (addr_space_superset): New declaration.

gcc/cp/
  * cp-tree.h (enum cp_decl_spec): Add addr_space support.
  (struct cp_decl_specifier_seq): Likewise.
  * decl.cc (get_type_quals): Likewise.
  (check_tag_decl): Likewise.
(grokdeclarator): Likewise.
  * parser.cc (cp_parser_type_specifier): Likewise.
  (cp_parser_cv_qualifier_seq_opt): Likewise.
  (cp_parser_postfix_expression): Likewise.
  (cp_parser_type_specifier): Likewise.
  (set_and_check_decl_spec_loc): Likewise.
  * typeck.cc (composite_pointer_type): Likewise
  (comp_ptr_ttypes_real): Likewise.
(same_type_ignoring_top_level_qualifiers_p): Likewise.
  * pt.cc (check_cv_quals_for_unify): Likewise.
  (unify): Likewise.
  * tree.cc: Remove c_register_addr_space stub.
  * mangle.cc (write_CV_qualifiers_for_type): Mangle address spaces
using the extended qualifier notation.

gcc/doc
  * extend.texi (Named Address Spaces): add a mention about C++
 

Re: [PATCH v2] c++: parser - Support for target address spaces in C++

2022-10-13 Thread Paul Iannetta via Gcc-patches
On Thu, Oct 13, 2022 at 11:47:42AM -0400, Jason Merrill wrote:
> On 10/13/22 11:23, Paul Iannetta wrote:
> > On Thu, Oct 13, 2022 at 11:02:24AM -0400, Jason Merrill wrote:
> > > On 10/12/22 20:52, Paul Iannetta wrote:
> > > > On Tue, Oct 11, 2022 at 09:49:43PM -0400, Jason Merrill wrote:
> > > > > 
> > > > > It surprises that this is the only place we complain about an object 
> > > > > with an
> > > > > address-space qualifier.  Shouldn't we also complain about e.g. 
> > > > > automatic
> > > > > variables/parameters or non-static data members with address-space 
> > > > > qualified
> > > > > type?
> > > > > 
> > > > 
> > > > Indeed, I was missing quite a few things here.  Thanks.
> > > > I used the draft as basis this time and imported from the C
> > > > implementation the relevant parts.  This time, the errors get properly
> > > > emitted when an address space is unduly specified; and comparisons,
> > > > assignments and comparisons are taken care of.
> > > > 
> > > > There are quite a few things I would like to clarify concerning some
> > > > implementation details.
> > > > - A variable with automatic storage (which is neither a pointer nor
> > > >   a reference) cannot be qualified with an address space.  I detect
> > > >   this by the combination of `sc_none' and `! toplevel_bindings_p 
> > > > ()',
> > > >   but I've also seen the use of `at_function_scope' at other places.
> > > >   And I'm unsure which one is appropriate here.
> > > >   This detection happens at the very end of grokdeclarator because I
> > > >   need to know that the type is a pointer, which is not know until
> > > >   very late in the function.
> > > 
> > > At that point you have the decl, and you can ask directly what its storage
> > > duration is, perhaps using decl_storage_duration.
> > > 
> > > But why do you need to know whether the type is a pointer?  The attribute
> > > applies to the target type of the pointer, not the pointer type.  I think
> > > the problem is that you're looking at declspecs when you ought to be 
> > > looking
> > > at type_quals.
> > 
> > I need to know that the base type is a pointer to reject invalid
> > declarations such as:
> > 
> >  int f (__seg_fs int a) { } or int f () { __seg_fs int a; }
> > 
> > because parameters and auto variables can have an address space
> > qualifier only if they are pointer or reference type, which I can't
> > tell only from type_quals.
> 
> But "int *__seg_fs a" is just as invalid as the above; the difference is not
> whether a is a pointer, but whether the address-space-qualified is the type
> of a itself or some sub-type.

I agree that "int * __seg_fs a" is invalid but it is accepted by the C
front-end, and by clang (both C and C++), the behavior is that the
address-name is silently ignored.

> You need to look at the qualifiers on type (which should also be the ones in
> type_quals), not the qualifiers in the declspecs.

I'll have another look, thanks.

> > > > - I'm having some trouble deciding whether I include those three
> > > >   stub programs as tests, they all compile fine and clang accepts
> > > >   them as well.
> > > 
> > > Why not?
> > 
> > I thought they were pretty contrived, since it does not make much
> > sense to strip address space qualifiers, even though it does prove
> > that the implementation support those contrived but valid uses.
> 
> Testcases are full of contrived examples testing corner cases.  :)
> 
> > > 
> > > > Ex1:
> > > > ```
> > > > int __seg_fs * fs1;
> > > > int __seg_gs * gs1;
> > > > 
> > > > template struct strip;
> > > > template struct strip<__seg_fs T *> { typedef T type; };
> > > > template struct strip<__seg_gs T *> { typedef T type; };
> > > > 
> > > > int
> > > > main ()
> > > > {
> > > >   *(strip::type *) fs1 == 
> > > > *(strip::type *) gs1;
> > > >   return 0;
> > > > }
> > > > ```
> > > > 
> > > > Ex2:
> > > > ```
> > > > int __seg_fs * fs1;
> > > > int __seg_fs * fs2;
> > > > 
> > > > template auto f (T __seg_fs * a, U __seg_gs * 
> > > > b) { return a; }
> > > > template auto f (T __seg_gs * a, U __seg_fs * 
> > > > b) { return a; }
> > > > 
> > > > int
> > > > main ()
> > > > {
> > > >   f (fs1, gs1);
> > > >   f (gs1, fs1);
> > > >   return 0;
> > > > }
> > > > ```
> > > > 
> > > > Ex3:
> > > > ```
> > > > int __seg_fs * fs1;
> > > > int __seg_gs * gs1;
> > > > 
> > > > template
> > > > auto f (T __seg_fs * a, U __seg_gs * b)
> > > > {
> > > >   return *(T *) a == *(U *) b;
> > > > }
> > > > 
> > > > int
> > > > main ()
> > > > {
> > > >   return f (fs1, gs1);
> > > > }
> > > > ```
> > > > 
> > > > 
> > > > Add support for custom address spaces in C++
> > > > 
> > > > gcc/
> > > >   * tree.h (ENCODE_QUAL_ADDR_SPACE): Missing parentheses.
> > > > 
> > > > gcc/c/
> > > >   * c-decl.cc: Remove c_register_addr_space.
> > > > 
> > > > gcc/c-family/
> > > >   * c-common.cc (c_register_addr_space): Imported from c

Re: [PATCH, rs6000] Split TARGET_POWER8 from TARGET_DIRECT_MOVE [PR101865] (2/2)

2022-10-13 Thread will schmidt via Gcc-patches


Ping.

On Mon, 2022-09-19 at 11:13 -0500, will schmidt wrote:
> [PATCH, rs6000] Split TARGET_POWER8 from TARGET_DIRECT_MOVE [PR101865]
> 
> Hi,
>   The _ARCH_PWR8 define is conditional on TARGET_DIRECT_MOVE,
> and can be disabled by dependent options when it should not be.
> This manifests in the issue seen in PR101865 where -mno-vsx
> mistakenly disables _ARCH_PWR8.
> 
> This change replaces the relevant TARGET_DIRECT_MOVE references
> with a TARGET_POWER8 entry so that the direct_move and power8
> features can be enabled or disabled independently.
> 
> This is done via the OPTION_MASK definitions, so this
> means that some references to the OPTION_MASK_DIRECT_MOVE
> option are now replaced with OPTION_MASK_POWER8.
> 
> The existing (and rather lengthy) commentary for DIRECT_MOVE remains
> in place in rs6000-c.cc:rs6000_target_modify_macros().  The
> if-defined logic there will now set a __DIRECT_MOVE__ define when
> TARGET_DIRECT_MOVE is set, this serves as a placeholder for debug
> purposes, but is otherwise unused.  This can be removed in a
> subsequent patch, or in an update of this patch, depending on feedback.
> 
> This regests cleanly (power8,power9,power10), and resolves
> PR 101865 as represented in the tests from (1/2).
> 
> OK for trunk?
> Thanks,
> -Will
> 
> 
> gcc/
>   PR Target/101865
>   * config/rs6000/rs6000-builtin.cc
>   (rs6000_builtin_is_supported): Replace TARGET_DIRECT_MOVE
>   usage with TARGET_POWER8.
>   * config/rs6000/rs6000-c.cc (rs6000_target_modify_macros):
>   Add __DIRECT_MOVE__ define.  Replace _ARCH_PWR8_ define
>   conditional with OPTION_MASK_POWER8.
>   * config/rs6000/rs6000-cpus.def (ISA_2_7_MASKS_SERVER):
>   Add OPTION_MASK_POWER8 entry.
>   (POWERPC_MASKS): Same.
>   * config/rs6000/rs6000.cc (rs6000_option_override_internal):
>   Replace OPTION_MASK_DIRECT_MOVE usage with OPTION_MASK_POWER8.
>   (rs6000_opt_masks): Add "power8" entry for new OPTION_MASK_POWER8.
>   * config/rs6000/rs6000.opt (-mpower8): Add entry for POWER8.
>   * config/rs6000/vsx.md (vsx_extract_): Replace
>   TARGET_DIRECT_MOVE usage with TARGET_POWER8.
>   (define_peephole2): Same.
> 
> diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
> b/gcc/config/rs6000/rs6000-builtin.cc
> index 3ce729c1e6de..91a0f39bd796 100644
> --- a/gcc/config/rs6000/rs6000-builtin.cc
> +++ b/gcc/config/rs6000/rs6000-builtin.cc
> @@ -163,11 +163,11 @@ rs6000_builtin_is_supported (enum rs6000_gen_builtins 
> fncode)
>  case ENB_P7:
>return TARGET_POPCNTD;
>  case ENB_P7_64:
>return TARGET_POPCNTD && TARGET_POWERPC64;
>  case ENB_P8:
> -  return TARGET_DIRECT_MOVE;
> +  return TARGET_POWER8;
>  case ENB_P8V:
>return TARGET_P8_VECTOR;
>  case ENB_P9:
>return TARGET_MODULO;
>  case ENB_P9_64:
> diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc
> index ca9cc42028f7..41d51b039061 100644
> --- a/gcc/config/rs6000/rs6000-c.cc
> +++ b/gcc/config/rs6000/rs6000-c.cc
> @@ -439,11 +439,13 @@ rs6000_target_modify_macros (bool define_p, 
> HOST_WIDE_INT flags)
>   turned off in any of the following conditions:
>   1. TARGET_HARD_FLOAT, TARGET_ALTIVEC, or TARGET_VSX is explicitly
>   disabled and OPTION_MASK_DIRECT_MOVE was not explicitly
>   enabled.
>   2. TARGET_VSX is off.  */
> -  if ((flags & OPTION_MASK_DIRECT_MOVE) != 0)
> +  if ((OPTION_MASK_DIRECT_MOVE) != 0)
> +rs6000_define_or_undefine_macro (define_p, "__DIRECT_MOVE__");
> +  if ((flags & OPTION_MASK_POWER8) != 0)
>  rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR8");
>if ((flags & OPTION_MASK_MODULO) != 0)
>  rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR9");
>if ((flags & OPTION_MASK_POWER10) != 0)
>  rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR10");
> diff --git a/gcc/config/rs6000/rs6000-cpus.def 
> b/gcc/config/rs6000/rs6000-cpus.def
> index c3825bcccd84..c873f6d58989 100644
> --- a/gcc/config/rs6000/rs6000-cpus.def
> +++ b/gcc/config/rs6000/rs6000-cpus.def
> @@ -48,10 +48,11 @@
> system.  */
>  #define ISA_2_7_MASKS_SERVER (ISA_2_6_MASKS_SERVER   \
>| OPTION_MASK_P8_VECTOR\
>| OPTION_MASK_CRYPTO   \
>| OPTION_MASK_DIRECT_MOVE  \
> +  | OPTION_MASK_POWER8   \
>| OPTION_MASK_EFFICIENT_UNALIGNED_VSX  \
>| OPTION_MASK_QUAD_MEMORY  \
>| OPTION_MASK_QUAD_MEMORY_ATOMIC)
> 
>  /* ISA masks setting fusion options.  */
> @@ -124,10 +125,11 @@
>  #define POWERPC_MASKS(OPTION_MASK_ALTIVEC
> \
>| OPTION_MASK_CMPB \
>| OPTION_MASK_C

[PATCH] c++, v2: Implement excess precision support for C++ [PR107097, PR323]

2022-10-13 Thread Jakub Jelinek via Gcc-patches
On Wed, Oct 12, 2022 at 02:08:20PM -0400, Jason Merrill wrote:
> > In general I've tried to follow the C99 handling, C11+ relies on the
> > C standard saying that in case of integral conversions excess precision
> > can be used (see PR87390 for more details), but I don't see anything similar
> > on the C++ standard side.
> 
> https://eel.is/c++draft/expr#pre-6 seems identical to C99 (apart from a
> stray "the"?); presumably nobody has proposed to copy the N1531
> clarifications.  But since those are clarifications, I'd prefer to use our
> C11+ semantics to avoid divergence between the default modes of the C and
> C++ front ends.

Ok, so that it is more readable and say if we decide to make e.g. C++98
behave like C99 and only C++11 and later like C11, I'm sending this as
a 2 patch series, this patch is just an updated version of the previous
patch (your review comments, Marek's mail and missed changes to
doc/invoke.texi) and another mail will be upgrade of this to the C11
behavior.

> > +  semantic_result_type
> > +   = type_after_usual_arithmetic_conversions (arg2_type, arg3_type);
> > +  if (semantic_result_type == error_mark_node
> > + && TREE_CODE (arg2_type) == REAL_TYPE
> > + && TREE_CODE (arg3_type) == REAL_TYPE
> > + && (extended_float_type_p (arg2_type)
> > + || extended_float_type_p (arg3_type))
> 
> What if semantic_result_type is error_mark_node and the other conditions
> don't hold?  That seems impossible, so maybe the other conditions should
> move into a gcc_checking_assert? (And likewise for result_type below)

Changed in all places to an assert, though previously I missed
that cp_common_type on complex type(s) could have similar problem.

> > @@ -9772,8 +9849,12 @@ build_over_call (struct z_candidate *can
> > return error_mark_node;
> > }
> > else if (magic != 0)
> > -   /* For other magic varargs only do decay_conversion.  */
> > -   a = decay_conversion (a, complain);
> > +   {
> > + if (magic == 1 && TREE_CODE (a) == EXCESS_PRECISION_EXPR)
> > +   a = TREE_OPERAND (a, 0);
> 
> It was confusing me that this mentions 1, and the magic_varargs_p comment
> above mentions 2:  Let's add a comment

That is because removing excess precision means keeping
EXCESS_PRECISION_EXPR around and preserving excess precision
means removing of EXCESS_PRECISION_EXPR.

> 
>  /* Don't truncate excess precision to the semantic type.  */
> 
> to clarify.

Ok.

Here is an updated patch, bootstrapped/regtested on x86_64-linux and
i686-linux, ok for trunk?

2022-10-13  Jakub Jelinek  

PR middle-end/323
PR c++/107097
gcc/
* doc/invoke.texi (-fexcess-precision=standard): Mention that the
option now also works in C++.
gcc/c-family/
* c-common.def (EXCESS_PRECISION_EXPR): Remove comment part about
the tree being specific to C/ObjC.
* c-opts.cc (c_common_post_options): Handle flag_excess_precision
in C++ the same as in C.
* c-lex.cc (interpret_float): Set const_type to excess_precision ()
even for C++.
gcc/cp/
* parser.cc (cp_parser_primary_expression): Handle
EXCESS_PRECISION_EXPR with REAL_CST operand the same as REAL_CST.
* cvt.cc (cp_ep_convert_and_check): New function.
* call.cc (build_conditional_expr): Add excess precision support.
When type_after_usual_arithmetic_conversions returns error_mark_node,
use gcc_checking_assert that it is because of uncomparable floating
point ranks instead of checking all those conditions and make it
work also with complex types.
(convert_like_internal): Likewise.  Add NESTED_P argument, pass true
to recursive calls to convert_like.
(convert_like): Add NESTED_P argument, pass it through to
convert_like_internal.  For other overload pass false to it.
(convert_like_with_context): Pass false to NESTED_P.
(convert_arg_to_ellipsis): Add excess precision support.
(magic_varargs_p): For __builtin_is{finite,inf,inf_sign,nan,normal}
and __builtin_fpclassify return 2 instead of 1, document what it
means.
(build_over_call): Don't handle former magic 2 which is no longer
used, instead for magic 1 remove EXCESS_PRECISION_EXPR.
(perform_direct_initialization_if_possible): Pass false to NESTED_P
convert_like argument.
* constexpr.cc (cxx_eval_constant_expression): Handle
EXCESS_PRECISION_EXPR.
(potential_constant_expression_1): Likewise.
* pt.cc (tsubst_copy, tsubst_copy_and_build): Likewise.
* cp-tree.h (cp_ep_convert_and_check): Declare.
* cp-gimplify.cc (cp_fold): Handle EXCESS_PRECISION_EXPR.
* typeck.cc (cp_common_type): For COMPLEX_TYPEs, return error_mark_node
if recursive call returned it.
(convert_arguments): For magic 1 remove EXCESS_PRECISION_EXPR.
(cp_build_binary_op): Add excess precision support.  When

[PATCH] c++: Excess precision for ? int : float or int == float [PR107097, PR82071, PR87390]

2022-10-13 Thread Jakub Jelinek via Gcc-patches
Hi!

The following incremental patch implements the C11 behavior (for all C++
versions) for
cond ? int : float
cond ? float : int
int cmp float
float cmp int
where int is any integral type, float any floating point type with
excess precision and cmp ==, !=, >, <, >=, <= and <=>.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2022-10-13  Jakub Jelinek  

PR c/82071
PR c/87390
PR c++/107097
gcc/cp/
* cp-tree.h (cp_ep_convert_and_check): Remove.
* cvt.cc (cp_ep_convert_and_check): Remove.
* call.cc (build_conditional_expr): Use excess precision for ?: with
one arm floating and another integral.  Don't convert first to
semantic result type from integral types.
(convert_like_internal): Don't call cp_ep_convert_and_check, instead
just strip EXCESS_PRECISION_EXPR before calling cp_convert_and_check
or cp_convert.
* typeck.cc (cp_build_binary_op): Set may_need_excess_precision
for comparisons or SPACESHIP_EXPR with at least one operand integral.
Don't compute semantic_result_type if build_type is non-NULL.  Call
cp_convert_and_check instead of cp_ep_convert_and_check.
gcc/testsuite/
* gcc.target/i386/excess-precision-8.c: For C++ wrap abort and
exit declarations into extern "C" block.
* gcc.target/i386/excess-precision-10.c: Likewise.
* g++.target/i386/excess-precision-7.C: Remove.
* g++.target/i386/excess-precision-8.C: New test.
* g++.target/i386/excess-precision-9.C: Remove.
* g++.target/i386/excess-precision-10.C: New test.
* g++.target/i386/excess-precision-12.C: New test.

--- gcc/cp/cp-tree.h.jj 2022-10-13 09:35:27.999241554 +0200
+++ gcc/cp/cp-tree.h2022-10-13 15:43:57.124884379 +0200
@@ -6793,8 +6793,6 @@ extern tree ocp_convert   (tree, 
tree,
 tsubst_flags_t);
 extern tree cp_convert (tree, tree, tsubst_flags_t);
 extern tree cp_convert_and_check(tree, tree, tsubst_flags_t);
-extern tree cp_ep_convert_and_check (tree, tree, tree,
-tsubst_flags_t);
 extern tree cp_fold_convert(tree, tree);
 extern tree cp_get_callee  (tree);
 extern tree cp_get_callee_fndecl   (tree);
--- gcc/cp/cvt.cc.jj2022-10-13 09:35:27.956242146 +0200
+++ gcc/cp/cvt.cc   2022-10-13 14:09:29.612758165 +0200
@@ -684,33 +684,6 @@ cp_convert_and_check (tree type, tree ex
   return result;
 }
 
-/* Similarly, but deal with excess precision.  SEMANTIC_TYPE is the type this
-   conversion would use without excess precision.  If SEMANTIC_TYPE is NULL,
-   this function is equivalent to cp_convert_and_check.  This function is
-   a wrapper that handles conversions that may be different than the usual
-   ones because of excess precision.  */
-
-tree
-cp_ep_convert_and_check (tree type, tree expr, tree semantic_type,
-tsubst_flags_t complain)
-{
-  if (TREE_TYPE (expr) == type)
-return expr;
-  if (expr == error_mark_node)
-return expr;
-  if (!semantic_type)
-return cp_convert_and_check (type, expr, complain);
-
-  if (TREE_CODE (TREE_TYPE (expr)) == INTEGER_TYPE
-  && TREE_TYPE (expr) != semantic_type)
-/* For integers, we need to check the real conversion, not
-   the conversion to the excess precision type.  */
-expr = cp_convert_and_check (semantic_type, expr, complain);
-  /* Result type is the excess precision type, which should be
- large enough, so do not check.  */
-  return cp_convert (type, expr, complain);
-}
-
 /* Conversion...
 
FLAGS indicates how we should behave.  */
--- gcc/cp/call.cc.jj   2022-10-13 09:50:41.248658097 +0200
+++ gcc/cp/call.cc  2022-10-13 16:11:34.901325768 +0200
@@ -5895,11 +5895,53 @@ build_conditional_expr (const op_locatio
   && (ARITHMETIC_TYPE_P (arg3_type)
   || UNSCOPED_ENUM_P (arg3_type)))
 {
-  /* In this case, there is always a common type.  */
-  result_type = type_after_usual_arithmetic_conversions (arg2_type,
-arg3_type);
+  /* A conditional expression between a floating-point
+type and an integer type should convert the integer type to
+the evaluation format of the floating-point type, with
+possible excess precision.  */
+  tree eptype2 = arg2_type;
+  tree eptype3 = arg3_type;
+  tree eptype;
+  if (ANY_INTEGRAL_TYPE_P (arg2_type)
+ && (eptype = excess_precision_type (arg3_type)) != NULL_TREE)
+   {
+ eptype3 = eptype;
+ if (!semantic_result_type)
+   semantic_result_type
+ = type_after_usual_arithmetic_conversions (arg2_type, arg3_type);
+   }
+  else if (ANY_INTEGRAL_TYPE_P (arg3_type)
+  &

[PATCH] middle-end, c++, i386, libgcc, v2: std::bfloat16_t and __bf16 arithmetic support

2022-10-13 Thread Jakub Jelinek via Gcc-patches
Hi!

On Wed, Oct 05, 2022 at 04:02:25PM -0400, Jason Merrill wrote:
> > As I wrote earlier, I think we need at least one, __builtin_nans variant
> > which would be used in libstdc++
> > std::numeric_limits::signaling_NaN() implementation.
> > I think
> > std::numeric_limits::infinity() can be implemented as
> > return (__bf16) __builtin_huge_valf ();
> > and similarly
> > std::numeric_limits::quiet_NaN() as
> > return (__bf16) __builtin_nanf ("");
> > but
> > return (__bf16) __builtin_nansf ("");
> > would loose the signaling NaN on the conversion and raise exception,
> > and as the method is constexpr,
> > union { unsigned short a; __bf16 b; } u = { 0x7f81 };
> > return u.b;
> > wouldn't work.  I can certainly restrict the builtins to the single
> > one, but wonder whether the suffix for that builtin shouldn't be chosen
> > such that eventually we could add more builtins if we need to
> > and don't run into the log with bf16 suffix vs. logb with f16 suffix
> > ambiguity.
> > As you said, most of the libstdc++ overloads for std::bfloat16_t then
> > can use float builtins or library calls under the hood, but std::nextafter
> > is another case where I think we'll need to have something bfloat16_t
> > specific, because float ulp isn't bfloat16_t ulp, the latter is much larger.
> 
> Makes sense.

So, this updated version of the patch adds just a single __builtin_nansf16b
builtin (or do you want __builtin_nansbf16?).
> 
> > Based on what Joseph wrote, I'll add bf16/BF16 suffix support for C too
> > in the next iteration (always with pedwarn in that case).

And implements bf16/BF16 suffixes for C too.

> > I'm afraid too many places rely on all modes of a certain class to be
> > visible when walking from "narrowest" to "widest" mode, say
> > FOR_EACH_MODE_IN_CLASS/FOR_EACH_MODE/FOR_EACH_MODE_UNTIL/FOR_EACH_WIDER_MODE
> > etc. wouldn't work at all if GET_MODE_WIDER_MODE (BFmode) == SFmode
> > && GET_MODE_WIDER_MODE (HFmode) == SFmode.
> 
> Yes, it seems they need to change now that their assumptions have been
> violated.  I suppose FOR_EACH_MODE_IN_CLASS would need to change to not use
> get_wider, and users of FOR_EACH_MODE/FOR_EACH_MODE_UNTIL need to decide
> whether they want an iteration that uses get_wider (likely with a new name)
> or not.

And now that the GET_MODE_WIDER_MODE vs. GET_MODE_NEXT_MODE patch is in,
is updated on top of those changes.

So far lightly tested on x86_64-linux, ok for trunk if it passes full
bootstrap/regtest on both x86_64-linux and i686-linux?

2022-10-13  Jakub Jelinek  

gcc/
* tree-core.h (enum tree_index): Add TI_BFLOAT16_TYPE.
* tree.h (bfloat16_type_node): Define.
* tree.cc (excess_precision_type): Promote bfloat16_type_mode
like float16_type_mode.
(build_common_tree_nodes): Initialize bfloat16_type_node if
BFmode is supported.
* expmed.h (maybe_expand_shift): Declare.
* expmed.cc (maybe_expand_shift): No longer static.
* expr.cc (convert_mode_scalar): Don't ICE on BF -> HF or HF -> BF
conversions.  If there is no optab, handle BF -> {DF,XF,TF,HF}
conversions as separate BF -> SF -> {DF,XF,TF,HF} conversions, add
-ffast-math generic implementation for BF -> SF and SF -> BF
conversions.
* builtin-types.def (BT_BFLOAT16, BT_FN_BFLOAT16_CONST_STRING): New.
* builtins.def (BUILT_IN_NANSF16B): New builtin.
* fold-const-call.cc (fold_const_call): Handle CFN_BUILT_IN_NANSF16B.
* config/i386/i386.cc (classify_argument): Handle E_BCmode.
(ix86_libgcc_floating_mode_supported_p): Also return true for BFmode
for -msse2.
(ix86_mangle_type): Mangle BFmode as DF16b.
(ix86_invalid_conversion, ix86_invalid_unary_op,
ix86_invalid_binary_op): Remove.
(TARGET_INVALID_CONVERSION, TARGET_INVALID_UNARY_OP,
TARGET_INVALID_BINARY_OP): Don't redefine.
* config/i386/i386-builtins.cc (ix86_bf16_type_node): Remove.
(ix86_register_bf16_builtin_type): Use bfloat16_type_node rather than
ix86_bf16_type_node, only create it if still NULL.
* config/i386/i386-builtin-types.def (BFLOAT16): Likewise.
* config/i386/i386.md (cbranchbf4, cstorebf4): New expanders.
gcc/c-family/
* c-cppbuiltin.cc (c_cpp_builtins): If bfloat16_type_node,
predefine __BFLT16_*__ macros and for C++23 also
__STDCPP_BFLOAT16_T__.  Predefine bfloat16_type_node related
macros for -fbuilding-libgcc.
* c-lex.cc (interpret_float): Handle CPP_N_BFLOAT16.
gcc/c/
* c-typeck.cc (convert_arguments): Don't promote __bf16 to
double.
gcc/cp/
* cp-tree.h (extended_float_type_p): Return true for
bfloat16_type_node.
* typeck.cc (cp_compare_floating_point_conversion_ranks): Set
extended{1,2} if mv{1,2} is bfloat16_type_node.  Adjust comment.
gcc/testsuite/
* lib/target-supports.exp (check_effective_target_bfloat16,
check_eff

[PATCH] testsuite: Fix failure in test pr105586.c [PR107171]

2022-10-13 Thread Surya Kumari Jangala via Gcc-patches
testsuite: Fix failure in test pr105586.c [PR107171]

The test pr105586.c fails on a big endian system when run in 32bit
mode. The failure occurs as the test case does not guard against
unsupported __int128.

2022-10-13  Surya Kumari Jangala  

gcc/testsuite/
PR testsuite/107171
* gcc.target/powerpc/pr105586.c: Guard against unsupported
__int128.


diff --git a/gcc/testsuite/gcc.target/powerpc/pr105586.c 
b/gcc/testsuite/gcc.target/powerpc/pr105586.c
index bd397f5..3f88a09 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr105586.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr105586.c
@@ -1,4 +1,5 @@
 /* { dg-options "-mdejagnu-tune=power4 -O2 -fcompare-debug -fno-if-conversion 
-fno-guess-branch-probability" } */
+/* { dg-require-effective-target int128 } */
 
 extern int bar(int i);


Re: Handling of main() function for freestanding

2022-10-13 Thread Arsen Arsenović via Gcc-patches
Hi,

On Friday, 7 October 2022 15:51:31 CEST Jason Merrill wrote:
> > * gcc.dg/noreturn-4.c: Likewise.
> 
> I'd be inclined to drop this test.
That seems like an odd choice, why do that over using another function 
for the test case? (there's nothing specific to main in this test, and 
it doesn't even need to link, so using any ol' function should be okay; 
see attachment)

The attached patch is also v2 of the original builtin-main one submitted 
earlier.  Tested on x86_64-pc-linux-gnu.  This revision excludes the 
mentioned pedwarns unless hosted.

Thanks,
-- 
Arsen Arsenović
>From 27a2cf85b1c3eb901413fd135918af0377bd1459 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Arsen=20Arsenovi=C4=87?= 
Date: Tue, 20 Sep 2022 19:17:31 +0200
Subject: [PATCH v2] c-family: Implement new `int main' semantics in
 freestanding

>From now, by default, (specifically) `int main' in freestanding will
implicitly return 0, as it does for hosted modes. The old behaviour is
still accessible via -fno-builtin-main.

gcc/c-family/ChangeLog:

	* c-common.cc (disable_builtin_function): Support special value
	`main' that, in freestanding, allows disabling special casing
	placed around `main'.
	* c-common.h: Add flag_builtin_main.
	(want_builtin_main_p): New function, true iff hosted OR
	builtin_main are set.

gcc/c/ChangeLog:

	* c-decl.cc (grokdeclarator): Consider flag_builtin_main when
	deciding whether to emit warnings.
	(finish_function): Consider flag_builtin_main and the noreturn
	flag when deciding whether to emit an implicit zero return.
	* c-objc-common.cc (c_missing_noreturn_ok_p): Consider missing
	noreturn okay only when hosted or when builtin_main is enabled.

gcc/cp/ChangeLog:

	* cp-tree.h (DECL_MAIN_P): Consider flag_builtin_main when
	deciding whether this function is to be the special function
	main.
	* decl.cc (grokfndecl): Only pedwarn on hosted.
	(finish_function): Do not inject extra return of marked
	noreturn.

gcc/ChangeLog:

	* doc/invoke.texi: Document -fno-builtin-main.

gcc/testsuite/ChangeLog:

	* gcc.dg/noreturn-4.c: Don't use `main', but a generic function
	name instead.
	* g++.dg/freestanding-main-implicitly-returns.C: New test.
	* g++.dg/no-builtin-main.C: New test.
	* gcc.dg/freestanding-main-implicitly-returns.c: New test.
	* gcc.dg/no-builtin-main.c: New test.
---
 gcc/c-family/c-common.cc  |  6 ++
 gcc/c-family/c-common.h   | 10 ++
 gcc/c/c-decl.cc   |  4 ++--
 gcc/c/c-objc-common.cc|  9 ++---
 gcc/cp/cp-tree.h  | 12 ++-
 gcc/cp/decl.cc|  6 --
 gcc/doc/invoke.texi   | 20 ++-
 .../freestanding-main-implicitly-returns.C|  5 +
 gcc/testsuite/g++.dg/no-builtin-main.C|  5 +
 .../freestanding-main-implicitly-returns.c|  5 +
 gcc/testsuite/gcc.dg/no-builtin-main.c|  5 +
 gcc/testsuite/gcc.dg/noreturn-4.c |  6 +++---
 12 files changed, 73 insertions(+), 20 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/freestanding-main-implicitly-returns.C
 create mode 100644 gcc/testsuite/g++.dg/no-builtin-main.C
 create mode 100644 gcc/testsuite/gcc.dg/freestanding-main-implicitly-returns.c
 create mode 100644 gcc/testsuite/gcc.dg/no-builtin-main.c

diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc
index 9ec9100cc90..f9060cbc171 100644
--- a/gcc/c-family/c-common.cc
+++ b/gcc/c-family/c-common.cc
@@ -232,6 +232,10 @@ int flag_isoc2x;
 
 int flag_hosted = 1;
 
+/* Nonzero means that we want to give main its special meaning */
+
+int flag_builtin_main = 1;
+
 
 /* ObjC language option variables.  */
 
@@ -4879,6 +4883,8 @@ disable_builtin_function (const char *name)
 {
   if (startswith (name, "__builtin_"))
 error ("cannot disable built-in function %qs", name);
+  else if (strcmp("main", name) == 0)
+flag_builtin_main = 0;
   else
 {
   disabled_builtin *new_disabled_builtin = XNEW (disabled_builtin);
diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index 62ab4ba437b..44537cc6977 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -689,6 +689,16 @@ extern int flag_isoc2x;
 
 extern int flag_hosted;
 
+/* Nonzero means that we want to give main its special meaning */
+
+extern int flag_builtin_main;
+
+/* Returns false if both flag_hosted and flag_builtin_main are zero, true
+   otherwise. */
+inline bool builtin_main_p() {
+  return flag_hosted || flag_builtin_main;
+}
+
 /* ObjC language option variables.  */
 
 
diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
index 193e268f04e..891e36b30b6 100644
--- a/gcc/c/c-decl.cc
+++ b/gcc/c/c-decl.cc
@@ -10442,9 +10442,9 @@ finish_function (location_t end_loc)
   if (DECL_RESULT (fndecl) && DECL_RESULT (fndecl) != error_mark_node)
 DECL_CONTEXT (DECL_RESULT (fndecl)) = fndecl;
 
-  if (MAIN_NAME_P (DECL_NAME (fndecl)) && flag_hosted
+  if (MA

Re: [PATCH RESEND 0/1] RFC: P1689R5 support

2022-10-13 Thread David Malcolm via Gcc-patches
On Mon, 2022-10-10 at 16:21 -0400, Jason Merrill wrote:
> On 10/4/22 11:11, Ben Boeckel wrote:
> > This patch adds initial support for ISO C++'s [P1689R5][], a format
> > for
> > describing C++ module requirements and provisions based on the
> > source
> > code. This is required because compiling C++ with modules is not
> > embarrassingly parallel and need to be ordered to ensure that
> > `import
> > some_module;` can be satisfied in time by making sure that the TU
> > with
> > `export import some_module;` is compiled first.
> > 
> > [P1689R5]: https://isocpp.org/files/papers/P1689R5.html
> > 
> > I'd like feedback on the approach taken here with respect to the
> > user-visible flags. I'll also note that header units are not
> > supported
> > at this time because the current `-E` behavior with respect to
> > `import
> > ;` is to search for an appropriate `.gcm` file which
> > is not
> > something such a "scan" can support. A new mode will likely need to
> > be
> > created (e.g., replacing `-E` with `-fc++-module-scanning` or
> > something)
> > where headers are looked up "normally" and processed only as much
> > as
> > scanning requires.
> > 
> > Testing is currently happening in CMake's CI using a prior revision
> > of
> > this patch (the differences are basically the changelog, some
> > style, and
> > `trtbd` instead of `p1689r5` as the format name).
> > 
> > For testing within GCC, I'll work on the following:
> > 
> > - scanning non-module source
> > - scanning module-importing source (`import X;`)
> > - scanning module-exporting source (`export module X;`)
> > - scanning module implementation unit (`module X;`)
> > - flag combinations?
> > 
> > Are there existing tools for handling JSON output for testing
> > purposes?
> 
> David Malcolm would probably know best about JSON wrangling.

Unfortunately our JSON output doesn't make any guarantees about the
ordering of keys within an object, so the precise textual output
changes from run to run.  I've coped with that in my test cases by
limiting myself to simple regexes of fragments of the JSON output.

Martin Liska [CCed] went much further in
4e275dccfc2467b3fe39012a3dd2a80bac257dd0 by adding a run-gcov-pytest
DejaGnu directive, allowing for test cases for gcov to be written in
Python, which can thus test much more interesting assertions about the
generated JSON.

Dave

> 
> > Basically, something that I can add to the test suite that doesn't
> > care
> > about whitespace, but checks the structure (with sensible
> > replacements
> > for absolute paths where relevant)?
> 
> Various tests in g++.dg/debug/dwarf2 handle that sort of thing with
> regexps.
> 
> > For the record, Clang has patches with similar flags and behavior
> > by
> > Chuanqi Xu here:
> > 
> >  https://reviews.llvm.org/D134269
> > 
> > with the same flags (though using my old `trtbd` spelling for the
> > format name).
> > 
> > Thanks,
> > 
> > --Ben
> > 
> > Ben Boeckel (1):
> >    p1689r5: initial support
> > 
> >   gcc/ChangeLog   |   9 ++
> >   gcc/c-family/ChangeLog  |   6 +
> >   gcc/c-family/c-opts.cc  |  40 ++-
> >   gcc/c-family/c.opt  |  12 ++
> >   gcc/cp/ChangeLog    |   5 +
> >   gcc/cp/module.cc    |   3 +-
> >   gcc/doc/invoke.texi |  15 +++
> >   gcc/fortran/ChangeLog   |   5 +
> >   gcc/fortran/cpp.cc  |   4 +-
> >   gcc/genmatch.cc |   2 +-
> >   gcc/input.cc    |   4 +-
> >   libcpp/ChangeLog    |  11 ++
> >   libcpp/include/cpplib.h |  12 +-
> >   libcpp/include/mkdeps.h |  17 ++-
> >   libcpp/init.cc  |  14 ++-
> >   libcpp/mkdeps.cc    | 235
> > ++--
> >   16 files changed, 368 insertions(+), 26 deletions(-)
> > 
> > 
> > base-commit: d812e8cb2a920fd75768e16ca8ded59ad93c172f
> 



Re: Handling of main() function for freestanding

2022-10-13 Thread Jakub Jelinek via Gcc-patches
On Thu, Oct 13, 2022 at 07:03:24PM +0200, Arsen Arsenović wrote:
> @@ -1,10 +1,10 @@
>  /* Check for "noreturn" warning in main. */
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -Wmissing-noreturn -ffreestanding" } */
> +/* { dg-options "-O2 -Wmissing-noreturn" } */
>  extern void exit (int) __attribute__ ((__noreturn__));
>  
> -int
> -main (void) /* { dg-warning "function might be candidate for attribute 
> 'noreturn'" "warn for main" } */
> +void
> +f (void) /* { dg-warning "function might be candidate for attribute 
> 'noreturn'" "warn for main" } */
>  {
>exit (0);
>  }

Don't we have such a test already elsewhere?  If not, then certain
"warn for main" part should be removed or replaced...

Jakub



Re: [DOCS] Python Language Conventions

2022-10-13 Thread David Malcolm via Gcc-patches
On Thu, 2022-10-13 at 11:44 +0200, Gerald Pfeifer wrote:
> Hi Martin,
> 
> On Thu, 13 Oct 2022, Martin Liška wrote:
> > I think we should add how Python scripts should be formatted. I
> > noticed
> > that while reading the Modula-2 patchset where it follows the C/C++
> > style
> > when it comes to Python files.
> 
> good initiative, thank you! This makes sense to me, alas I'm not a
> Python 
> hacker, so best wait to see what David and Gaius think, too?

I'm very much +1 on recommending PEP 8.

My Python skills are bit-rotting somewhat, and I've not used flake8,
but it seems a reasonable recommendation to me.

> 
> 
> Some suggestions on the web side of things:
> 
> > +Python Language Conventions
> 
> Since the name of the page already is codingconventions.html, I
> suggest
> making this simply "#python" - shorter and simpler. :-)
> 
> > +Python scripts should follow  > href="https://peps.python.org/pep-0008/";>PEP 8 – Style Guide for
> > Python Code
> > +which can be verified by flake8
> > tool.
> 
> ...by the...tool.
> 
> > +We do also recommend using the following flake8 plug-
> > ins:
> 
> Here maybe simply say "We recommend using"?

That's a much better wording.

Dave

> 
> Hope this helps,
> Gerald



Re: Handling of main() function for freestanding

2022-10-13 Thread Jason Merrill via Gcc-patches

On 10/13/22 13:02, Arsen Arsenović wrote:

Hi,

On Friday, 7 October 2022 15:51:31 CEST Jason Merrill wrote:

* gcc.dg/noreturn-4.c: Likewise.


I'd be inclined to drop this test.

That seems like an odd choice, why do that over using another function
for the test case? (there's nothing specific to main in this test, and
it doesn't even need to link, so using any ol' function should be okay;
see attachment)


It seemed to me that the test was specifically checking that main was 
treated like any other function when freestanding.



The attached patch is also v2 of the original builtin-main one submitted
earlier.  Tested on x86_64-pc-linux-gnu.  This revision excludes the
mentioned pedwarns unless hosted.


I was arguing that we don't need the new flag; there shouldn't be any 
need to turn it off.


Jason



Re: Handling of main() function for freestanding

2022-10-13 Thread Arsen Arsenović via Gcc-patches
On Thursday, 13 October 2022 19:10:10 CEST Jakub Jelinek wrote:
> Don't we have such a test already elsewhere?  If not, then certain
> "warn for main" part should be removed or replaced...

Whoops, missed that comment.  There is actually an equivalent test that 
I overlooked (noreturn-1.c), so maybe dropping is the right thing to do, 
indeed.

-- 
Arsen Arsenović


signature.asc
Description: This is a digitally signed message part.


Re: [PATCH] [PR24021] Implement PLUS_EXPR range-op entry for floats.

2022-10-13 Thread Jakub Jelinek via Gcc-patches
On Thu, Oct 13, 2022 at 02:36:49PM +0200, Aldy Hernandez wrote:
> +// Like real_arithmetic, but round the result to INF if the operation
> +// produced inexact results.
> +//
> +// ?? There is still one problematic case, i387.  With
> +// -fexcess-precision=standard we perform most SF/DFmode arithmetic in
> +// XFmode (long_double_type_node), so that case is OK.  But without
> +// -mfpmath=sse, all the SF/DFmode computations are in XFmode
> +// precision (64-bit mantissa) and only occassionally rounded to
> +// SF/DFmode (when storing into memory from the 387 stack).  Maybe
> +// this is ok as well though it is just occassionally more precise. ??
> +
> +static void
> +frange_arithmetic (enum tree_code code, tree type,
> +REAL_VALUE_TYPE &result,
> +const REAL_VALUE_TYPE &op1,
> +const REAL_VALUE_TYPE &op2,
> +const REAL_VALUE_TYPE &inf)
> +{
> +  REAL_VALUE_TYPE value;
> +  enum machine_mode mode = TYPE_MODE (type);
> +  bool mode_composite = MODE_COMPOSITE_P (mode);
> +
> +  bool inexact = real_arithmetic (&value, code, &op1, &op2);
> +  real_convert (&result, mode, &value);
> +
> +  // If real_convert above has rounded an inexact value to towards
> +  // inf, we can keep the result as is, otherwise we'll adjust by 1 ulp
> +  // later (real_nextafter).
> +  bool rounding = (flag_rounding_math
> +&& (real_isneg (&inf)
> +? real_less (&result, &value)
> +: !real_less (&value, &result)));

I thought the agreement during Cauldron was that we'd do this always,
regardless of flag_rounding_math.
Because excess precision (the fast one like on ia32 or -mfpmath=387 on
x86_64), or -frounding-math, or FMA contraction can all increase precision
and worst case it all behaves like -frounding-math for the ranges.

So, perhaps use:
  if ((mode_composite || (real_isneg (&inf) ? real_less (&result, &value)
: !real_less (&value, &result))
  && (inexact || !real_identical (&result, &value
?
No need to do the real_isneg/real_less stuff for mode_composite, then
we do it always for inexacts, but otherwise we check if the rounding
performed by real.cc has been in the conservative direction (for upper
bound to +inf, for lower bound to -inf), if yes, we don't need to do
anything, if yes, we frange_nextafter.

As discussed, for mode_composite, I think we want to do the extra
stuff for inexact denormals and otherwise do the nextafter unconditionally,
because our internal mode_composite representation isn't precise enough.

> +  // Be extra careful if there may be discrepancies between the
> +  // compile and runtime results.
> +  if ((rounding || mode_composite)
> +  && (inexact || !real_identical (&result, &value)))
> +{
> +  if (mode_composite)
> + {
> +   bool denormal = (result.sig[SIGSZ-1] & SIG_MSB) == 0;

Use real_isdenormal here?
Though, real_iszero needs the same thing.

> +   if (denormal)
> + {
> +   REAL_VALUE_TYPE tmp;

And explain here why is this, that IBM extended denormals have just
DFmode precision.
Though, now that I think about it, while this is correct for denormals,

> +   real_convert (&tmp, DFmode, &value);
> +   frange_nextafter (DFmode, tmp, inf);
> +   real_convert (&result, mode, &tmp);
> + }

there are also the cases where the higher double exponent is in the
[__DBL_MIN_EXP__, __LDBL_MIN_EXP__] aka [-1021, -968] or so.
https://en.wikipedia.org/wiki/Double-precision_floating-point_format
If the upper double is denormal in the DFmode sense, so smaller absolute
value than __DBL_MIN__, then doing nextafter in DFmode is the right thing to
do, the lower double must be always +/- zero.
Now, if the result is __DBL_MIN__, the upper double is already normalized
but we can add __DBL_DENORM_MIN__ to it, which will make the number have
54-bit precision.
If the result is __DBL_MIN__ * 2, we can again add __DBL_DENORM_MIN__
and make it 55-bit precision.  Etc. until we reach __DBL_MIN__ * 2e53
where it acts like fully normalized 106-bit precision number.
I must say I'm not really sure what real_nextafter is doing in those cases,
I'm afraid it doesn't handle it correctly but the only other use
of real_nextafter is guarded with:
  /* Don't handle composite modes, nor decimal, nor modes without
 inf or denorm at least for now.  */
  if (format->pnan < format->p
  || format->b == 10
  || !format->has_inf
  || !format->has_denorm)
return false;
so it isn't that big deal except for ranges.

Jakub



[Patch] libgomp: Add Fortran testcases for omp_in_explicit_task

2022-10-13 Thread Tobias Burnus

Rather obvious patch as it is a straight conversion from C.

OK for mainline?

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
libgomp: Add Fortran testcases for omp_in_explicit_task

Fortranized testcases of commits r13-3257-ga58a965eb73
and r13-3258-g0ec4e93fb9f.

libgomp/ChangeLog:

	* testsuite/libgomp.fortran/task-7.f90: New test.
	* testsuite/libgomp.fortran/task-8.f90: New test.
	* testsuite/libgomp.fortran/task-in-explicit-1.f90: New test.
	* testsuite/libgomp.fortran/task-in-explicit-2.f90: New test.
	* testsuite/libgomp.fortran/task-in-explicit-3.f90: New test.
	* testsuite/libgomp.fortran/task-reduction-17.f90: New test.
	* testsuite/libgomp.fortran/task-reduction-18.f90: New test.

 libgomp/testsuite/libgomp.fortran/task-7.f90   |  22 
 libgomp/testsuite/libgomp.fortran/task-8.f90   |  13 +++
 .../libgomp.fortran/task-in-explicit-1.f90 | 113 +
 .../libgomp.fortran/task-in-explicit-2.f90 |  21 
 .../libgomp.fortran/task-in-explicit-3.f90 |  31 ++
 .../libgomp.fortran/task-reduction-17.f90  |  32 ++
 .../libgomp.fortran/task-reduction-18.f90  |  15 +++
 7 files changed, 247 insertions(+)

diff --git a/libgomp/testsuite/libgomp.fortran/task-7.f90 b/libgomp/testsuite/libgomp.fortran/task-7.f90
new file mode 100644
index 000..e806bd79663
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/task-7.f90
@@ -0,0 +1,22 @@
+! { dg-do run }
+
+program main
+  use omp_lib
+  implicit none
+
+  !$omp task final (.true.)
+if (.not. omp_in_final ()) &
+  error stop
+!$omp task
+  if (.not. omp_in_final ()) &
+error stop
+  !$omp target nowait
+  if (omp_in_final ()) &
+error stop
+  !$omp end target
+  if (.not. omp_in_final ()) &
+error stop
+  !$omp taskwait
+!$omp end task
+  !$omp end task
+end
diff --git a/libgomp/testsuite/libgomp.fortran/task-8.f90 b/libgomp/testsuite/libgomp.fortran/task-8.f90
new file mode 100644
index 000..037c63b8fa3
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/task-8.f90
@@ -0,0 +1,13 @@
+! { dg-do run }
+
+program main
+  implicit none
+  integer :: i
+  i = 0
+  !$omp task
+!$omp target nowait private (i)
+  i = 1
+!$omp end target
+!$omp taskwait
+  !$omp end task
+end
diff --git a/libgomp/testsuite/libgomp.fortran/task-in-explicit-1.f90 b/libgomp/testsuite/libgomp.fortran/task-in-explicit-1.f90
new file mode 100644
index 000..b6fa21b2c22
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/task-in-explicit-1.f90
@@ -0,0 +1,113 @@
+! { dg-do run }
+
+program main
+  use omp_lib
+  implicit none
+  integer :: i
+
+  if (omp_in_explicit_task ()) &
+error stop
+  !$omp task
+  if (.not. omp_in_explicit_task ()) &
+error stop
+  !$omp end task
+
+  !$omp task final (.true.)
+if (.not. omp_in_explicit_task ()) &
+  error stop
+!$omp task
+if (.not. omp_in_explicit_task ()) &
+  error stop
+!$omp end task
+  !$omp end task
+
+  !$omp parallel
+if (omp_in_explicit_task ()) &
+  error stop
+!$omp task if (.false.)
+if (.not. omp_in_explicit_task ()) &
+  error stop
+!$omp task if (.false.)
+  if (.not. omp_in_explicit_task ()) &
+error stop
+!$omp end task
+!$omp end task
+!$omp task final (.true.)
+  if (.not. omp_in_explicit_task ()) &
+error stop
+!$omp end task
+!$omp barrier
+if (omp_in_explicit_task ()) &
+  error stop
+!$omp taskloop num_tasks (24)
+do i = 1, 32
+  if (.not. omp_in_explicit_task ()) &
+error stop
+end do
+!$omp masked
+!$omp task
+if (.not. omp_in_explicit_task ()) &
+  error stop
+!$omp end task
+!$omp end masked
+!$omp barrier
+if (omp_in_explicit_task ()) &
+  error stop
+  !$omp end parallel
+
+  !$omp target
+if (omp_in_explicit_task ()) &
+  error stop
+!$omp task if (.false.)
+if (.not. omp_in_explicit_task ()) &
+  error stop
+!$omp end task
+!$omp task
+if (.not. omp_in_explicit_task ()) &
+  error stop
+!$omp end task
+  !$omp end target
+
+  !$omp target teams
+!$omp distribute
+do i = 1, 4
+  if (omp_in_explicit_task ()) then
+error stop
+  else
+  !$omp parallel
+if (omp_in_explicit_task ()) &
+  error stop
+!$omp task
+if (.not. omp_in_explicit_task ()) &
+  error stop
+!$omp end task
+!$omp barrier
+if (omp_in_explicit_task ()) &
+  error stop
+  !$omp end parallel
+  end if
+end do
+  !$omp end target teams
+
+  !$omp teams
+!$omp distribute
+do

Re: [Patch] libgomp: Add Fortran testcases for omp_in_explicit_task

2022-10-13 Thread Jakub Jelinek via Gcc-patches
On Thu, Oct 13, 2022 at 08:10:47PM +0200, Tobias Burnus wrote:
> Rather obvious patch as it is a straight conversion from C.
> 
> OK for mainline?
> 
> Tobias
> -
> Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
> München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
> Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
> München, HRB 106955

> libgomp: Add Fortran testcases for omp_in_explicit_task
> 
> Fortranized testcases of commits r13-3257-ga58a965eb73
> and r13-3258-g0ec4e93fb9f.
> 
> libgomp/ChangeLog:
> 
>   * testsuite/libgomp.fortran/task-7.f90: New test.
>   * testsuite/libgomp.fortran/task-8.f90: New test.
>   * testsuite/libgomp.fortran/task-in-explicit-1.f90: New test.
>   * testsuite/libgomp.fortran/task-in-explicit-2.f90: New test.
>   * testsuite/libgomp.fortran/task-in-explicit-3.f90: New test.
>   * testsuite/libgomp.fortran/task-reduction-17.f90: New test.
>   * testsuite/libgomp.fortran/task-reduction-18.f90: New test.

LGTM, thanks.

Jakub



[PATCH] c++ modules: ICE with dynamic_cast [PR106304]

2022-10-13 Thread Patrick Palka via Gcc-patches
The FUNCTION_DECL we build for __dynamic_cast has an empty DECL_CONTEXT,
but trees_out::tree_node expects all FUNCTION_DECLs to have non-empty
DECL_CONTEXT thus we crash when streaming out the dynamic_cast in the
below testcase.

This patch naively fixes this by setting DECL_CONTEXT for __dynamic_cast
appropriately.  Like for __cxa_atexit which is similarly lazily declared,
I suppose we should push it into the namespace too.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

PR c++/106304

gcc/cp/ChangeLog:

* constexpr.cc (cxx_dynamic_cast_fn_p): Check for abi_node
instead of global_namespace.
* rtti.cc (build_dynamic_cast_1): Set DECL_CONTEXT and
DECL_SOURCE_LOCATION on dynamic_cast_node, and push it
into the namespace.

gcc/testsuite/ChangeLog:

* g++.dg/modules/pr106304_a.C: New test.
* g++.dg/modules/pr106304_b.C: New test.
---
 gcc/cp/constexpr.cc   |  2 +-
 gcc/cp/rtti.cc|  4 
 gcc/testsuite/g++.dg/modules/pr106304_a.C | 12 
 gcc/testsuite/g++.dg/modules/pr106304_b.C |  8 
 4 files changed, 25 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/pr106304_a.C
 create mode 100644 gcc/testsuite/g++.dg/modules/pr106304_b.C

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index 06dcd71c926..5939d2882f8 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -2108,7 +2108,7 @@ cxx_dynamic_cast_fn_p (tree fndecl)
 {
   return (cxx_dialect >= cxx20
  && id_equal (DECL_NAME (fndecl), "__dynamic_cast")
- && CP_DECL_CONTEXT (fndecl) == global_namespace);
+ && CP_DECL_CONTEXT (fndecl) == abi_node);
 }
 
 /* Often, we have an expression in the form of address + offset, e.g.
diff --git a/gcc/cp/rtti.cc b/gcc/cp/rtti.cc
index f5b43ec0fb2..a85c7b56409 100644
--- a/gcc/cp/rtti.cc
+++ b/gcc/cp/rtti.cc
@@ -787,6 +787,10 @@ build_dynamic_cast_1 (location_t loc, tree type, tree expr,
   NULL_TREE));
  dcast_fn = (build_library_fn_ptr
  (fn_name, fn_type, ECF_LEAF | ECF_PURE | 
ECF_NOTHROW));
+ /* As with __cxa_atexit in get_atexit_node.  */
+ DECL_CONTEXT (dcast_fn) = FROB_CONTEXT (current_namespace);
+ DECL_SOURCE_LOCATION (dcast_fn) = BUILTINS_LOCATION;
+ dcast_fn = pushdecl (dcast_fn, /*hiding=*/true);
  pop_abi_namespace (flags);
  dynamic_cast_node = dcast_fn;
}
diff --git a/gcc/testsuite/g++.dg/modules/pr106304_a.C 
b/gcc/testsuite/g++.dg/modules/pr106304_a.C
new file mode 100644
index 000..b999eeccf4a
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/pr106304_a.C
@@ -0,0 +1,12 @@
+// PR c++/106304
+// { dg-additional-options -fmodules-ts }
+// { dg-module-cmi pr106304 }
+
+export module pr106304;
+
+struct A { virtual ~A() = default; };
+struct B : A { };
+
+inline const B* as_b(const A& a) {
+  return dynamic_cast(&a);
+}
diff --git a/gcc/testsuite/g++.dg/modules/pr106304_b.C 
b/gcc/testsuite/g++.dg/modules/pr106304_b.C
new file mode 100644
index 000..e8333909c8d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/pr106304_b.C
@@ -0,0 +1,8 @@
+// PR c++/106304
+// { dg-additional-options -fmodules-ts }
+
+module pr106304;
+
+void f(A& a) {
+  as_b(a);
+}
-- 
2.38.0.68.ge85701b4af



Re: [PATCH] c++, v2: Implement excess precision support for C++ [PR107097, PR323]

2022-10-13 Thread Jason Merrill via Gcc-patches

On 10/13/22 12:40, Jakub Jelinek wrote:

On Wed, Oct 12, 2022 at 02:08:20PM -0400, Jason Merrill wrote:

In general I've tried to follow the C99 handling, C11+ relies on the
C standard saying that in case of integral conversions excess precision
can be used (see PR87390 for more details), but I don't see anything similar
on the C++ standard side.


https://eel.is/c++draft/expr#pre-6 seems identical to C99 (apart from a
stray "the"?); presumably nobody has proposed to copy the N1531
clarifications.  But since those are clarifications, I'd prefer to use our
C11+ semantics to avoid divergence between the default modes of the C and
C++ front ends.


Ok, so that it is more readable and say if we decide to make e.g. C++98
behave like C99 and only C++11 and later like C11, I'm sending this as
a 2 patch series, this patch is just an updated version of the previous
patch (your review comments, Marek's mail and missed changes to
doc/invoke.texi) and another mail will be upgrade of this to the C11
behavior.


+  semantic_result_type
+   = type_after_usual_arithmetic_conversions (arg2_type, arg3_type);
+  if (semantic_result_type == error_mark_node
+ && TREE_CODE (arg2_type) == REAL_TYPE
+ && TREE_CODE (arg3_type) == REAL_TYPE
+ && (extended_float_type_p (arg2_type)
+ || extended_float_type_p (arg3_type))


What if semantic_result_type is error_mark_node and the other conditions
don't hold?  That seems impossible, so maybe the other conditions should
move into a gcc_checking_assert? (And likewise for result_type below)


Changed in all places to an assert, though previously I missed
that cp_common_type on complex type(s) could have similar problem.


@@ -9772,8 +9849,12 @@ build_over_call (struct z_candidate *can
return error_mark_node;
}
 else if (magic != 0)
-   /* For other magic varargs only do decay_conversion.  */
-   a = decay_conversion (a, complain);
+   {
+ if (magic == 1 && TREE_CODE (a) == EXCESS_PRECISION_EXPR)
+   a = TREE_OPERAND (a, 0);


It was confusing me that this mentions 1, and the magic_varargs_p comment
above mentions 2:  Let's add a comment


That is because removing excess precision means keeping
EXCESS_PRECISION_EXPR around and preserving excess precision
means removing of EXCESS_PRECISION_EXPR.



  /* Don't truncate excess precision to the semantic type.  */

to clarify.


Ok.

Here is an updated patch, bootstrapped/regtested on x86_64-linux and
i686-linux, ok for trunk?

2022-10-13  Jakub Jelinek  

PR middle-end/323
PR c++/107097
gcc/
* doc/invoke.texi (-fexcess-precision=standard): Mention that the
option now also works in C++.
gcc/c-family/
* c-common.def (EXCESS_PRECISION_EXPR): Remove comment part about
the tree being specific to C/ObjC.
* c-opts.cc (c_common_post_options): Handle flag_excess_precision
in C++ the same as in C.
* c-lex.cc (interpret_float): Set const_type to excess_precision ()
even for C++.
gcc/cp/
* parser.cc (cp_parser_primary_expression): Handle
EXCESS_PRECISION_EXPR with REAL_CST operand the same as REAL_CST.
* cvt.cc (cp_ep_convert_and_check): New function.
* call.cc (build_conditional_expr): Add excess precision support.
When type_after_usual_arithmetic_conversions returns error_mark_node,
use gcc_checking_assert that it is because of uncomparable floating
point ranks instead of checking all those conditions and make it
work also with complex types.
(convert_like_internal): Likewise.  Add NESTED_P argument, pass true
to recursive calls to convert_like.
(convert_like): Add NESTED_P argument, pass it through to
convert_like_internal.  For other overload pass false to it.
(convert_like_with_context): Pass false to NESTED_P.
(convert_arg_to_ellipsis): Add excess precision support.
(magic_varargs_p): For __builtin_is{finite,inf,inf_sign,nan,normal}
and __builtin_fpclassify return 2 instead of 1, document what it
means.
(build_over_call): Don't handle former magic 2 which is no longer
used, instead for magic 1 remove EXCESS_PRECISION_EXPR.
(perform_direct_initialization_if_possible): Pass false to NESTED_P
convert_like argument.
* constexpr.cc (cxx_eval_constant_expression): Handle
EXCESS_PRECISION_EXPR.
(potential_constant_expression_1): Likewise.
* pt.cc (tsubst_copy, tsubst_copy_and_build): Likewise.
* cp-tree.h (cp_ep_convert_and_check): Declare.
* cp-gimplify.cc (cp_fold): Handle EXCESS_PRECISION_EXPR.
* typeck.cc (cp_common_type): For COMPLEX_TYPEs, return error_mark_node
if recursive call returned it.
(convert_arguments): For magic 1 remove EXCESS_PRECISION_EXPR.
(cp_build_binary_op): Add excess precision support.  When
cp_comm

Re: [PATCH] middle-end, c++, i386, libgcc, v2: std::bfloat16_t and __bf16 arithmetic support

2022-10-13 Thread Jason Merrill via Gcc-patches

On 10/13/22 12:50, Jakub Jelinek wrote:

Hi!

On Wed, Oct 05, 2022 at 04:02:25PM -0400, Jason Merrill wrote:

As I wrote earlier, I think we need at least one, __builtin_nans variant
which would be used in libstdc++
std::numeric_limits::signaling_NaN() implementation.
I think
std::numeric_limits::infinity() can be implemented as
return (__bf16) __builtin_huge_valf ();
and similarly
std::numeric_limits::quiet_NaN() as
return (__bf16) __builtin_nanf ("");
but
return (__bf16) __builtin_nansf ("");
would loose the signaling NaN on the conversion and raise exception,
and as the method is constexpr,
union { unsigned short a; __bf16 b; } u = { 0x7f81 };
return u.b;
wouldn't work.  I can certainly restrict the builtins to the single
one, but wonder whether the suffix for that builtin shouldn't be chosen
such that eventually we could add more builtins if we need to
and don't run into the log with bf16 suffix vs. logb with f16 suffix
ambiguity.
As you said, most of the libstdc++ overloads for std::bfloat16_t then
can use float builtins or library calls under the hood, but std::nextafter
is another case where I think we'll need to have something bfloat16_t
specific, because float ulp isn't bfloat16_t ulp, the latter is much larger.


Makes sense.


So, this updated version of the patch adds just a single __builtin_nansf16b
builtin (or do you want __builtin_nansbf16?).


16b sounds fine.


Based on what Joseph wrote, I'll add bf16/BF16 suffix support for C too
in the next iteration (always with pedwarn in that case).


And implements bf16/BF16 suffixes for C too.


I'm afraid too many places rely on all modes of a certain class to be
visible when walking from "narrowest" to "widest" mode, say
FOR_EACH_MODE_IN_CLASS/FOR_EACH_MODE/FOR_EACH_MODE_UNTIL/FOR_EACH_WIDER_MODE
etc. wouldn't work at all if GET_MODE_WIDER_MODE (BFmode) == SFmode
&& GET_MODE_WIDER_MODE (HFmode) == SFmode.


Yes, it seems they need to change now that their assumptions have been
violated.  I suppose FOR_EACH_MODE_IN_CLASS would need to change to not use
get_wider, and users of FOR_EACH_MODE/FOR_EACH_MODE_UNTIL need to decide
whether they want an iteration that uses get_wider (likely with a new name)
or not.


And now that the GET_MODE_WIDER_MODE vs. GET_MODE_NEXT_MODE patch is in,
is updated on top of those changes.

So far lightly tested on x86_64-linux, ok for trunk if it passes full
bootstrap/regtest on both x86_64-linux and i686-linux?


LGTM, but a i386 maintainer should review it as well.


2022-10-13  Jakub Jelinek  

gcc/
* tree-core.h (enum tree_index): Add TI_BFLOAT16_TYPE.
* tree.h (bfloat16_type_node): Define.
* tree.cc (excess_precision_type): Promote bfloat16_type_mode
like float16_type_mode.
(build_common_tree_nodes): Initialize bfloat16_type_node if
BFmode is supported.
* expmed.h (maybe_expand_shift): Declare.
* expmed.cc (maybe_expand_shift): No longer static.
* expr.cc (convert_mode_scalar): Don't ICE on BF -> HF or HF -> BF
conversions.  If there is no optab, handle BF -> {DF,XF,TF,HF}
conversions as separate BF -> SF -> {DF,XF,TF,HF} conversions, add
-ffast-math generic implementation for BF -> SF and SF -> BF
conversions.
* builtin-types.def (BT_BFLOAT16, BT_FN_BFLOAT16_CONST_STRING): New.
* builtins.def (BUILT_IN_NANSF16B): New builtin.
* fold-const-call.cc (fold_const_call): Handle CFN_BUILT_IN_NANSF16B.
* config/i386/i386.cc (classify_argument): Handle E_BCmode.
(ix86_libgcc_floating_mode_supported_p): Also return true for BFmode
for -msse2.
(ix86_mangle_type): Mangle BFmode as DF16b.
(ix86_invalid_conversion, ix86_invalid_unary_op,
ix86_invalid_binary_op): Remove.
(TARGET_INVALID_CONVERSION, TARGET_INVALID_UNARY_OP,
TARGET_INVALID_BINARY_OP): Don't redefine.
* config/i386/i386-builtins.cc (ix86_bf16_type_node): Remove.
(ix86_register_bf16_builtin_type): Use bfloat16_type_node rather than
ix86_bf16_type_node, only create it if still NULL.
* config/i386/i386-builtin-types.def (BFLOAT16): Likewise.
* config/i386/i386.md (cbranchbf4, cstorebf4): New expanders.
gcc/c-family/
* c-cppbuiltin.cc (c_cpp_builtins): If bfloat16_type_node,
predefine __BFLT16_*__ macros and for C++23 also
__STDCPP_BFLOAT16_T__.  Predefine bfloat16_type_node related
macros for -fbuilding-libgcc.
* c-lex.cc (interpret_float): Handle CPP_N_BFLOAT16.
gcc/c/
* c-typeck.cc (convert_arguments): Don't promote __bf16 to
double.
gcc/cp/
* cp-tree.h (extended_float_type_p): Return true for
bfloat16_type_node.
* typeck.cc (cp_compare_floating_point_conversion_ranks): Set
extended{1,2} if mv{1,2} is bfloat16_type_node.  Adjust comment.
gcc/testsuite/
* lib/target-supports.exp (check_effective_target_bfloat16,
check_effective_target_bf

Re: [PATCH v2] c++: parser - Support for target address spaces in C++

2022-10-13 Thread Jason Merrill via Gcc-patches

On 10/13/22 12:02, Paul Iannetta wrote:

On Thu, Oct 13, 2022 at 11:47:42AM -0400, Jason Merrill wrote:

On 10/13/22 11:23, Paul Iannetta wrote:

On Thu, Oct 13, 2022 at 11:02:24AM -0400, Jason Merrill wrote:

On 10/12/22 20:52, Paul Iannetta wrote:

On Tue, Oct 11, 2022 at 09:49:43PM -0400, Jason Merrill wrote:


It surprises that this is the only place we complain about an object with an
address-space qualifier.  Shouldn't we also complain about e.g. automatic
variables/parameters or non-static data members with address-space qualified
type?



Indeed, I was missing quite a few things here.  Thanks.
I used the draft as basis this time and imported from the C
implementation the relevant parts.  This time, the errors get properly
emitted when an address space is unduly specified; and comparisons,
assignments and comparisons are taken care of.

There are quite a few things I would like to clarify concerning some
implementation details.
 - A variable with automatic storage (which is neither a pointer nor
   a reference) cannot be qualified with an address space.  I detect
   this by the combination of `sc_none' and `! toplevel_bindings_p ()',
   but I've also seen the use of `at_function_scope' at other places.
   And I'm unsure which one is appropriate here.
   This detection happens at the very end of grokdeclarator because I
   need to know that the type is a pointer, which is not know until
   very late in the function.


At that point you have the decl, and you can ask directly what its storage
duration is, perhaps using decl_storage_duration.

But why do you need to know whether the type is a pointer?  The attribute
applies to the target type of the pointer, not the pointer type.  I think
the problem is that you're looking at declspecs when you ought to be looking
at type_quals.


I need to know that the base type is a pointer to reject invalid
declarations such as:

  int f (__seg_fs int a) { } or int f () { __seg_fs int a; }

because parameters and auto variables can have an address space
qualifier only if they are pointer or reference type, which I can't
tell only from type_quals.


But "int *__seg_fs a" is just as invalid as the above; the difference is not
whether a is a pointer, but whether the address-space-qualified is the type
of a itself or some sub-type.


I agree that "int * __seg_fs a" is invalid but it is accepted by the C
front-end, and by clang (both C and C++), the behavior is that the
address-name is silently ignored.


Hmm, that sounds like a bug; in that case, presumably the user meant to 
qualify the pointed-to type, and silently ignoring seems unlikely to 
give the effect they want.



You need to look at the qualifiers on type (which should also be the ones in
type_quals), not the qualifiers in the declspecs.


I'll have another look, thanks.


 - I'm having some trouble deciding whether I include those three
   stub programs as tests, they all compile fine and clang accepts
   them as well.


Why not?


I thought they were pretty contrived, since it does not make much
sense to strip address space qualifiers, even though it does prove
that the implementation support those contrived but valid uses.


Testcases are full of contrived examples testing corner cases.  :)




Ex1:
```
int __seg_fs * fs1;
int __seg_gs * gs1;

template struct strip;
template struct strip<__seg_fs T *> { typedef T type; };
template struct strip<__seg_gs T *> { typedef T type; };

int
main ()
{
   *(strip::type *) fs1 == *(strip::type *) 
gs1;
   return 0;
}
```

Ex2:
```
int __seg_fs * fs1;
int __seg_fs * fs2;

template auto f (T __seg_fs * a, U __seg_gs * b) { 
return a; }
template auto f (T __seg_gs * a, U __seg_fs * b) { 
return a; }

int
main ()
{
   f (fs1, gs1);
   f (gs1, fs1);
   return 0;
}
```

Ex3:
```
int __seg_fs * fs1;
int __seg_gs * gs1;

template
auto f (T __seg_fs * a, U __seg_gs * b)
{
   return *(T *) a == *(U *) b;
}

int
main ()
{
   return f (fs1, gs1);
}
```


Add support for custom address spaces in C++

gcc/
   * tree.h (ENCODE_QUAL_ADDR_SPACE): Missing parentheses.

gcc/c/
   * c-decl.cc: Remove c_register_addr_space.

gcc/c-family/
   * c-common.cc (c_register_addr_space): Imported from c-decl.cc
   (addr_space_superset): Imported from gcc/c/c-typecheck.cc
   * c-common.h: Remove the FIXME.
   (addr_space_superset): New declaration.

gcc/cp/
   * cp-tree.h (enum cp_decl_spec): Add addr_space support.
   (struct cp_decl_specifier_seq): Likewise.
   * decl.cc (get_type_quals): Likewise.
   (check_tag_decl): Likewise.
(grokdeclarator): Likewise.
   * parser.cc (cp_parser_type_specifier): Likewise.
   (cp_parser_cv_qualifier_seq_opt): Likewise.
   (cp_parser_postfix_expression): Likewise.
   (cp_parser_type_specifier): Likewise.
   (set_and_check_decl_spec_loc): Likewise.
  

[committed] analyzer: fix ICE introduced in r13-3168 [PR107210]

2022-10-13 Thread David Malcolm via Gcc-patches
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r13-3285-g99da523359e933.

gcc/analyzer/ChangeLog:
PR analyzer/107210
* svalue.cc (constant_svalue::maybe_fold_bits_within): Only
attempt to extract individual bits when tree_fits_uhwi_p.

gcc/testsuite/ChangeLog:
PR analyzer/107210
* gfortran.dg/analyzer/pr107210.f90: New test.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/svalue.cc  |  3 ++-
 gcc/testsuite/gfortran.dg/analyzer/pr107210.f90 | 16 
 2 files changed, 18 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gfortran.dg/analyzer/pr107210.f90

diff --git a/gcc/analyzer/svalue.cc b/gcc/analyzer/svalue.cc
index a0838c0f588..4b00a81b31d 100644
--- a/gcc/analyzer/svalue.cc
+++ b/gcc/analyzer/svalue.cc
@@ -884,7 +884,8 @@ constant_svalue::maybe_fold_bits_within (tree type,
   if (bits.m_size_in_bits == 1
   && TREE_CODE (m_cst_expr) == INTEGER_CST
   && type
-  && INTEGRAL_TYPE_P (type))
+  && INTEGRAL_TYPE_P (type)
+  && tree_fits_uhwi_p (m_cst_expr))
 {
   unsigned HOST_WIDE_INT bit = bits.m_start_bit_offset.to_uhwi ();
   unsigned HOST_WIDE_INT mask = (1 << bit);
diff --git a/gcc/testsuite/gfortran.dg/analyzer/pr107210.f90 
b/gcc/testsuite/gfortran.dg/analyzer/pr107210.f90
new file mode 100644
index 000..6132db48817
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/analyzer/pr107210.f90
@@ -0,0 +1,16 @@
+! { dg-additional-options "-O1" }
+
+subroutine check_int (j)
+  INTEGER(4) :: i, ia(5), ib(5,4), ip, ipa(:)
+  target :: ib
+  POINTER :: ip, ipa
+  logical :: l(5)
+
+  ipa=>ib(2:3,1)
+
+  l = (/ sizeof(i) == 4, sizeof(ia) == 20, sizeof(ib) == 80, &
+   sizeof(ip) == 4, sizeof(ipa) == 8 /)
+
+  if (any(.not.l)) STOP 4
+
+end subroutine check_int
-- 
2.26.3



Re: Handling of main() function for freestanding

2022-10-13 Thread Arsen Arsenović via Gcc-patches
On Thursday, 13 October 2022 19:24:41 CEST Jason Merrill wrote:
> I was arguing that we don't need the new flag; there shouldn't be any
> need to turn it off.
At the time, I opted to go with a more conservative route; I haven't 
been around enough to have very strong opinions ;)  I certainly can't 
think of a way always adding a return can go wrong, but figured someone, 
somehow, might rely on this behavior.  Removed the flag and tested on 
x86_64-pc-linux-gnu, v3 attached.

FWIW, there's precedent for treating main specially regardless of 
flag_hosted (e.g. it's always marked extern "C" in the C++ frontend, 
AFAICT).

-- 
Arsen Arsenović
>From e60be6bb45fdba8085bde5d1883deeae640e786b Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Arsen=20Arsenovi=C4=87?= 
Date: Thu, 13 Oct 2022 21:46:30 +0200
Subject: [PATCH v3] c-family: Implicitly return zero from main even on
 freestanding

... unless marked noreturn.

This should not get in anyone's way, but should permit the use of main()
in freestanding more easily, especially for writing test cases that
should work both in freestanding and hosted modes.

gcc/c/ChangeLog:

	* c-decl.cc (finish_function): Ignore hosted when deciding
	whether to implicitly return zero, but check noreturn.
	* c-objc-common.cc (c_missing_noreturn_ok_p): Loosen the
	requirements to just MAIN_NAME_P.

gcc/cp/ChangeLog:

	* cp-tree.h (DECL_MAIN_FREESTANDING_P): Move most DECL_MAIN_P
	logic here, so that we can use it when not hosted.
	(DECL_MAIN_P): Implement in terms of DECL_MAIN_FREESTANDING_P.
	* decl.cc (finish_function): Use DECL_MAIN_FREESTANDING_P
	instead of DECL_MAIN_P, to lose the hosted requirement, but
	check noreturn.

gcc/testsuite/ChangeLog:

	* g++.dg/freestanding-main.C: New test.
	* gcc.dg/freestanding-main.c: New test.
---
 gcc/c/c-decl.cc  | 2 +-
 gcc/c/c-objc-common.cc   | 5 ++---
 gcc/cp/cp-tree.h | 8 +---
 gcc/cp/decl.cc   | 3 ++-
 gcc/testsuite/g++.dg/freestanding-main.C | 5 +
 gcc/testsuite/gcc.dg/freestanding-main.c | 5 +
 6 files changed, 20 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/freestanding-main.C
 create mode 100644 gcc/testsuite/gcc.dg/freestanding-main.c

diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
index 193e268f04e..8c655590558 100644
--- a/gcc/c/c-decl.cc
+++ b/gcc/c/c-decl.cc
@@ -10442,7 +10442,7 @@ finish_function (location_t end_loc)
   if (DECL_RESULT (fndecl) && DECL_RESULT (fndecl) != error_mark_node)
 DECL_CONTEXT (DECL_RESULT (fndecl)) = fndecl;
 
-  if (MAIN_NAME_P (DECL_NAME (fndecl)) && flag_hosted
+  if (MAIN_NAME_P (DECL_NAME (fndecl)) && !TREE_THIS_VOLATILE (fndecl)
   && TYPE_MAIN_VARIANT (TREE_TYPE (TREE_TYPE (fndecl)))
   == integer_type_node && flag_isoc99)
 {
diff --git a/gcc/c/c-objc-common.cc b/gcc/c/c-objc-common.cc
index 70e10a98e33..2933414fd45 100644
--- a/gcc/c/c-objc-common.cc
+++ b/gcc/c/c-objc-common.cc
@@ -37,9 +37,8 @@ static bool c_tree_printer (pretty_printer *, text_info *, const char *,
 bool
 c_missing_noreturn_ok_p (tree decl)
 {
-  /* A missing noreturn is not ok for freestanding implementations and
- ok for the `main' function in hosted implementations.  */
-  return flag_hosted && MAIN_NAME_P (DECL_ASSEMBLER_NAME (decl));
+  /* A missing noreturn is ok for the `main' function.  */
+  return MAIN_NAME_P (DECL_ASSEMBLER_NAME (decl));
 }
 
 /* Called from check_global_declaration.  */
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 3b67be651b9..4c7adfbffd8 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -772,11 +772,13 @@ typedef struct ptrmem_cst * ptrmem_cst_t;
 
 /* Returns nonzero iff NODE is a declaration for the global function
`main'.  */
-#define DECL_MAIN_P(NODE)\
+#define DECL_MAIN_FREESTANDING_P(NODE)			\
(DECL_EXTERN_C_FUNCTION_P (NODE)			\
 && DECL_NAME (NODE) != NULL_TREE			\
-&& MAIN_NAME_P (DECL_NAME (NODE))			\
-&& flag_hosted)
+&& MAIN_NAME_P (DECL_NAME (NODE)))
+
+/* Nonzero iff NODE is a declaration for `main', and we are hosted. */
+#define DECL_MAIN_P(NODE) (DECL_MAIN_FREESTANDING_P(NODE) && flag_hosted)
 
 /* Lookup walker marking.  */
 #define LOOKUP_SEEN_P(NODE) TREE_VISITED (NODE)
diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 82eb0c2f22a..cfc8cd5afd7 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -17854,7 +17854,8 @@ finish_function (bool inline_p)
   if (!DECL_CLONED_FUNCTION_P (fndecl))
 {
   /* Make it so that `main' always returns 0 by default.  */
-  if (DECL_MAIN_P (current_function_decl))
+  if (DECL_MAIN_FREESTANDING_P (current_function_decl)
+	  && !TREE_THIS_VOLATILE (current_function_decl))
 	finish_return_stmt (integer_zero_node);
 
   if (use_eh_spec_block (current_function_decl))
diff --git a/gcc/testsuite/g++.dg/freestanding-main.C b/gcc/testsuite/g++.dg/freestanding-main.C
new file mode 100644
index 000..3718cc4508e
--- /dev/null
+++ b/gcc/testsuite/g++

Re: [DOCS] Python Language Conventions

2022-10-13 Thread Gaius Mulley via Gcc-patches
David Malcolm  writes:

> On Thu, 2022-10-13 at 11:44 +0200, Gerald Pfeifer wrote:
>> Hi Martin,
>> 
>> On Thu, 13 Oct 2022, Martin Liška wrote:
>> > I think we should add how Python scripts should be formatted. I
>> > noticed
>> > that while reading the Modula-2 patchset where it follows the C/C++
>> > style
>> > when it comes to Python files.
>> 
>> good initiative, thank you! This makes sense to me, alas I'm not a
>> Python 
>> hacker, so best wait to see what David and Gaius think, too?
>
> I'm very much +1 on recommending PEP 8.

hi,

all sounds very sensible - yes I'm also happy to adopt any house style
and will reformat the code accordingly

regards,
Gaius


Re: [PATCH] middle-end, c++, i386, libgcc, v2: std::bfloat16_t and __bf16 arithmetic support

2022-10-13 Thread Uros Bizjak via Gcc-patches
On Thu, Oct 13, 2022 at 9:38 PM Jason Merrill  wrote:
>
> On 10/13/22 12:50, Jakub Jelinek wrote:
> > Hi!
> >
> > On Wed, Oct 05, 2022 at 04:02:25PM -0400, Jason Merrill wrote:
> >>> As I wrote earlier, I think we need at least one, __builtin_nans variant
> >>> which would be used in libstdc++
> >>> std::numeric_limits::signaling_NaN() implementation.
> >>> I think
> >>> std::numeric_limits::infinity() can be implemented as
> >>> return (__bf16) __builtin_huge_valf ();
> >>> and similarly
> >>> std::numeric_limits::quiet_NaN() as
> >>> return (__bf16) __builtin_nanf ("");
> >>> but
> >>> return (__bf16) __builtin_nansf ("");
> >>> would loose the signaling NaN on the conversion and raise exception,
> >>> and as the method is constexpr,
> >>> union { unsigned short a; __bf16 b; } u = { 0x7f81 };
> >>> return u.b;
> >>> wouldn't work.  I can certainly restrict the builtins to the single
> >>> one, but wonder whether the suffix for that builtin shouldn't be chosen
> >>> such that eventually we could add more builtins if we need to
> >>> and don't run into the log with bf16 suffix vs. logb with f16 suffix
> >>> ambiguity.
> >>> As you said, most of the libstdc++ overloads for std::bfloat16_t then
> >>> can use float builtins or library calls under the hood, but std::nextafter
> >>> is another case where I think we'll need to have something bfloat16_t
> >>> specific, because float ulp isn't bfloat16_t ulp, the latter is much 
> >>> larger.
> >>
> >> Makes sense.
> >
> > So, this updated version of the patch adds just a single __builtin_nansf16b
> > builtin (or do you want __builtin_nansbf16?).
>
> 16b sounds fine.
>
> >>> Based on what Joseph wrote, I'll add bf16/BF16 suffix support for C too
> >>> in the next iteration (always with pedwarn in that case).
> >
> > And implements bf16/BF16 suffixes for C too.
> >
> >>> I'm afraid too many places rely on all modes of a certain class to be
> >>> visible when walking from "narrowest" to "widest" mode, say
> >>> FOR_EACH_MODE_IN_CLASS/FOR_EACH_MODE/FOR_EACH_MODE_UNTIL/FOR_EACH_WIDER_MODE
> >>> etc. wouldn't work at all if GET_MODE_WIDER_MODE (BFmode) == SFmode
> >>> && GET_MODE_WIDER_MODE (HFmode) == SFmode.
> >>
> >> Yes, it seems they need to change now that their assumptions have been
> >> violated.  I suppose FOR_EACH_MODE_IN_CLASS would need to change to not use
> >> get_wider, and users of FOR_EACH_MODE/FOR_EACH_MODE_UNTIL need to decide
> >> whether they want an iteration that uses get_wider (likely with a new name)
> >> or not.
> >
> > And now that the GET_MODE_WIDER_MODE vs. GET_MODE_NEXT_MODE patch is in,
> > is updated on top of those changes.
> >
> > So far lightly tested on x86_64-linux, ok for trunk if it passes full
> > bootstrap/regtest on both x86_64-linux and i686-linux?
>
> LGTM, but a i386 maintainer should review it as well.

OK with two changes  to cbranch and cstore expanders, as explained inline.

Thanks,
Uros.

> > 2022-10-13  Jakub Jelinek  
> >
> > gcc/
> >   * tree-core.h (enum tree_index): Add TI_BFLOAT16_TYPE.
> >   * tree.h (bfloat16_type_node): Define.
> >   * tree.cc (excess_precision_type): Promote bfloat16_type_mode
> >   like float16_type_mode.
> >   (build_common_tree_nodes): Initialize bfloat16_type_node if
> >   BFmode is supported.
> >   * expmed.h (maybe_expand_shift): Declare.
> >   * expmed.cc (maybe_expand_shift): No longer static.
> >   * expr.cc (convert_mode_scalar): Don't ICE on BF -> HF or HF -> BF
> >   conversions.  If there is no optab, handle BF -> {DF,XF,TF,HF}
> >   conversions as separate BF -> SF -> {DF,XF,TF,HF} conversions, add
> >   -ffast-math generic implementation for BF -> SF and SF -> BF
> >   conversions.
> >   * builtin-types.def (BT_BFLOAT16, BT_FN_BFLOAT16_CONST_STRING): New.
> >   * builtins.def (BUILT_IN_NANSF16B): New builtin.
> >   * fold-const-call.cc (fold_const_call): Handle CFN_BUILT_IN_NANSF16B.
> >   * config/i386/i386.cc (classify_argument): Handle E_BCmode.
> >   (ix86_libgcc_floating_mode_supported_p): Also return true for BFmode
> >   for -msse2.
> >   (ix86_mangle_type): Mangle BFmode as DF16b.
> >   (ix86_invalid_conversion, ix86_invalid_unary_op,
> >   ix86_invalid_binary_op): Remove.
> >   (TARGET_INVALID_CONVERSION, TARGET_INVALID_UNARY_OP,
> >   TARGET_INVALID_BINARY_OP): Don't redefine.
> >   * config/i386/i386-builtins.cc (ix86_bf16_type_node): Remove.
> >   (ix86_register_bf16_builtin_type): Use bfloat16_type_node rather than
> >   ix86_bf16_type_node, only create it if still NULL.
> >   * config/i386/i386-builtin-types.def (BFLOAT16): Likewise.
> >   * config/i386/i386.md (cbranchbf4, cstorebf4): New expanders.
> > gcc/c-family/
> >   * c-cppbuiltin.cc (c_cpp_builtins): If bfloat16_type_node,
> >   predefine __BFLT16_*__ macros and for C++23 also
> >   __STDCPP_BFLOAT16_T__.  Predefine bfloat16_type_node related
> >   macros for -fbuilding-libgcc

Re: Handling of main() function for freestanding

2022-10-13 Thread Jason Merrill via Gcc-patches

On 10/13/22 16:14, Arsen Arsenović wrote:

On Thursday, 13 October 2022 19:24:41 CEST Jason Merrill wrote:

I was arguing that we don't need the new flag; there shouldn't be any
need to turn it off.

At the time, I opted to go with a more conservative route; I haven't
been around enough to have very strong opinions ;)  I certainly can't
think of a way always adding a return can go wrong, but figured someone,
somehow, might rely on this behavior.  Removed the flag and tested on
x86_64-pc-linux-gnu, v3 attached.


Thanks!


FWIW, there's precedent for treating main specially regardless of
flag_hosted (e.g. it's always marked extern "C" in the C++ frontend,
AFAICT).

-#define DECL_MAIN_P(NODE)  \
+#define DECL_MAIN_FREESTANDING_P(NODE) \
(DECL_EXTERN_C_FUNCTION_P (NODE)\
 && DECL_NAME (NODE) != NULL_TREE   \
-&& MAIN_NAME_P (DECL_NAME (NODE))  \
-&& flag_hosted)
+&& MAIN_NAME_P (DECL_NAME (NODE)))
+
+/* Nonzero iff NODE is a declaration for `main', and we are hosted. */
+#define DECL_MAIN_P(NODE) (DECL_MAIN_FREESTANDING_P(NODE) && flag_hosted)


I liked in the previous version that you checked the return type of main 
when !flag_hosted, here and in c_missing_noreturn_ok_p.  Let's bring 
that back.


Jason



Re: [PATCH] middle-end, c++, i386, libgcc, v2: std::bfloat16_t and __bf16 arithmetic support

2022-10-13 Thread Jakub Jelinek via Gcc-patches
On Thu, Oct 13, 2022 at 11:11:53PM +0200, Uros Bizjak wrote:
> > > +  do_compare_rtx_and_jump (op1, op2, GET_CODE (operands[0]), 0,
> > > +SFmode, NULL_RTX, NULL,
> > > +as_a  (operands[3]),
> > > +/* Unfortunately this isn't propagated.  */
> > > +profile_probability::even ());
> 
> You could use ix86_expand_branch instead of do_compare_rtx_and_jump
> here. This would expand in SFmode, so insn condition from cbranchsf4
> should be copied here:
> 
>   "TARGET_80387 || (SSE_FLOAT_MODE_P (SFmode) && TARGET_SSE_MATH)"
> 
> Additionally, ix86_fp_comparison_operator predicate should be used for
> operator0. Basically, just copy predicates from cbranchsf4 as we are
> effectively expanding the SFmode compare & branch.

The reason why I've used there the generic routine was exactly to handle
not just ix86_fp_comparison_operator, but also comparisons that are more
complex than that (need 2 comparisons).

While for ix86_fp_comparison_operator cases the optabs wouldn't be actually
strictly needed, the generic code would see e.g. cbranchbf4 isn't supported
and try cbranchsf4, succeed on that and the only disadvantage would be
that the BFmode -> SFmode extensions would be performed using library
functions unless -ffast-math while they can be handled by left shifting
the 16 BFmode bits to most significant 16 bits of SFmode even when honoring
NaNs, for the non-ix86_fp_comparison_operator cases the generic behavior
is actually that neither cbranchbf4, nor cbranchsf4, nor cbranchdf4, nor
cbranchxf4, nor cbranchtf4 works out and generic code emits a libcall
(__{eq,ne}bf2).  I bet that is the reason why libgcc contains __{eq,ne}hf2
entrypoints.
I wanted to avoid adding __{eq,ne}bf2 and the addition of
cbranchbf4/cstorebf4 was how I managed to do that; by telling the
generic code that it can handle those by the faster BFmode to SFmode
conversions of the operands and then perform one or two bit checks.

I guess another possibility would be to call ix86_expand_branch there
once or twice and repeat what the generic code does, or add the
libgcc entrypoints which would perhaps bypass soft-fp and just do the
shifts + SFmode comparison.

> > > +  else
> > > +{
> > > +  rtx t2 = gen_reg_rtx (SImode);
> > > +  emit_insn (gen_zero_extendhisi2 (t2, op2));
> > > +  emit_insn (gen_ashlsi3 (t2, t2, GEN_INT (16)));
> > > +  op2 = gen_lowpart (SFmode, t2);
> > > +}
> 
> Similar to cbranch above, use ix86_expand_setcc and copy predicates
> from cstoresf4.

Ditto here, cstore was actually quite required by the generic code when
cbranch is implemented.

Jakub



Re: [PATCH] middle-end, c++, i386, libgcc, v2: std::bfloat16_t and __bf16 arithmetic support

2022-10-13 Thread Uros Bizjak via Gcc-patches
On Thu, Oct 13, 2022 at 11:35 PM Jakub Jelinek  wrote:
>
> On Thu, Oct 13, 2022 at 11:11:53PM +0200, Uros Bizjak wrote:
> > > > +  do_compare_rtx_and_jump (op1, op2, GET_CODE (operands[0]), 0,
> > > > +SFmode, NULL_RTX, NULL,
> > > > +as_a  (operands[3]),
> > > > +/* Unfortunately this isn't propagated.  */
> > > > +profile_probability::even ());
> >
> > You could use ix86_expand_branch instead of do_compare_rtx_and_jump
> > here. This would expand in SFmode, so insn condition from cbranchsf4
> > should be copied here:
> >
> >   "TARGET_80387 || (SSE_FLOAT_MODE_P (SFmode) && TARGET_SSE_MATH)"
> >
> > Additionally, ix86_fp_comparison_operator predicate should be used for
> > operator0. Basically, just copy predicates from cbranchsf4 as we are
> > effectively expanding the SFmode compare & branch.
>
> The reason why I've used there the generic routine was exactly to handle
> not just ix86_fp_comparison_operator, but also comparisons that are more
> complex than that (need 2 comparisons).
>
> While for ix86_fp_comparison_operator cases the optabs wouldn't be actually
> strictly needed, the generic code would see e.g. cbranchbf4 isn't supported
> and try cbranchsf4, succeed on that and the only disadvantage would be
> that the BFmode -> SFmode extensions would be performed using library
> functions unless -ffast-math while they can be handled by left shifting
> the 16 BFmode bits to most significant 16 bits of SFmode even when honoring
> NaNs, for the non-ix86_fp_comparison_operator cases the generic behavior
> is actually that neither cbranchbf4, nor cbranchsf4, nor cbranchdf4, nor
> cbranchxf4, nor cbranchtf4 works out and generic code emits a libcall
> (__{eq,ne}bf2).  I bet that is the reason why libgcc contains __{eq,ne}hf2
> entrypoints.
> I wanted to avoid adding __{eq,ne}bf2 and the addition of
> cbranchbf4/cstorebf4 was how I managed to do that; by telling the
> generic code that it can handle those by the faster BFmode to SFmode
> conversions of the operands and then perform one or two bit checks.

Thanks, for the explanation, I see the intention now.

The patch is OK as is.

Thanks,
Uros.

> I guess another possibility would be to call ix86_expand_branch there
> once or twice and repeat what the generic code does, or add the
> libgcc entrypoints which would perhaps bypass soft-fp and just do the
> shifts + SFmode comparison.
>
> > > > +  else
> > > > +{
> > > > +  rtx t2 = gen_reg_rtx (SImode);
> > > > +  emit_insn (gen_zero_extendhisi2 (t2, op2));
> > > > +  emit_insn (gen_ashlsi3 (t2, t2, GEN_INT (16)));
> > > > +  op2 = gen_lowpart (SFmode, t2);
> > > > +}
> >
> > Similar to cbranch above, use ix86_expand_setcc and copy predicates
> > from cstoresf4.
>
> Ditto here, cstore was actually quite required by the generic code when
> cbranch is implemented.
>
> Jakub
>


Re: [PATCH v2] c++: parser - Support for target address spaces in C++

2022-10-13 Thread Paul Iannetta via Gcc-patches
On Thu, Oct 13, 2022 at 03:41:16PM -0400, Jason Merrill wrote:
> On 10/13/22 12:02, Paul Iannetta wrote:
> > On Thu, Oct 13, 2022 at 11:47:42AM -0400, Jason Merrill wrote:
> > > On 10/13/22 11:23, Paul Iannetta wrote:
> > > > On Thu, Oct 13, 2022 at 11:02:24AM -0400, Jason Merrill wrote:
> > > > > On 10/12/22 20:52, Paul Iannetta wrote:
> > > > > > On Tue, Oct 11, 2022 at 09:49:43PM -0400, Jason Merrill wrote:
> > > > > > > 
> > > > > > > It surprises that this is the only place we complain about an 
> > > > > > > object with an
> > > > > > > address-space qualifier.  Shouldn't we also complain about e.g. 
> > > > > > > automatic
> > > > > > > variables/parameters or non-static data members with 
> > > > > > > address-space qualified
> > > > > > > type?
> > > > > > > 
> > > > > > 
> > > > > > Indeed, I was missing quite a few things here.  Thanks.
> > > > > > I used the draft as basis this time and imported from the C
> > > > > > implementation the relevant parts.  This time, the errors get 
> > > > > > properly
> > > > > > emitted when an address space is unduly specified; and comparisons,
> > > > > > assignments and comparisons are taken care of.
> > > > > > 
> > > > > > There are quite a few things I would like to clarify concerning some
> > > > > > implementation details.
> > > > > >  - A variable with automatic storage (which is neither a 
> > > > > > pointer nor
> > > > > >a reference) cannot be qualified with an address space.  I 
> > > > > > detect
> > > > > >this by the combination of `sc_none' and `! 
> > > > > > toplevel_bindings_p ()',
> > > > > >but I've also seen the use of `at_function_scope' at other 
> > > > > > places.
> > > > > >And I'm unsure which one is appropriate here.
> > > > > >This detection happens at the very end of grokdeclarator 
> > > > > > because I
> > > > > >need to know that the type is a pointer, which is not know 
> > > > > > until
> > > > > >very late in the function.
> > > > > 
> > > > > At that point you have the decl, and you can ask directly what its 
> > > > > storage
> > > > > duration is, perhaps using decl_storage_duration.
> > > > > 
> > > > > But why do you need to know whether the type is a pointer?  The 
> > > > > attribute
> > > > > applies to the target type of the pointer, not the pointer type.  I 
> > > > > think
> > > > > the problem is that you're looking at declspecs when you ought to be 
> > > > > looking
> > > > > at type_quals.
> > > > 
> > > > I need to know that the base type is a pointer to reject invalid
> > > > declarations such as:
> > > > 
> > > >   int f (__seg_fs int a) { } or int f () { __seg_fs int a; }
> > > > 
> > > > because parameters and auto variables can have an address space
> > > > qualifier only if they are pointer or reference type, which I can't
> > > > tell only from type_quals.
> > > 
> > > But "int *__seg_fs a" is just as invalid as the above; the difference is 
> > > not
> > > whether a is a pointer, but whether the address-space-qualified is the 
> > > type
> > > of a itself or some sub-type.
> > 
> > I agree that "int * __seg_fs a" is invalid but it is accepted by the C
> > front-end, and by clang (both C and C++), the behavior is that the
> > address-name is silently ignored.
> 
> Hmm, that sounds like a bug; in that case, presumably the user meant to
> qualify the pointed-to type, and silently ignoring seems unlikely to give
> the effect they want.
> 

Well, actually, I'm re-reading the draft and "int * __seg_fs a" is
valid.  It means "pointer in address space __seg_fs pointing to an
object in the generic address space", whereas "__seg_fs int * a" means
"pointer in the generic address space pointing to an object in the
__seg_fs address-space".

Oddities such as, "__seg_fs int * __seg_gs a" are also perfectly
valid.

The reason why I wrongly assumed that the address space was silently
ignored is that I made a simple test which only relied on the mangled
function name...

```
int * __seg_fs a;
template  int f (T *a) { return *a; }
int main () { return f (a); } // f(int*), since a is in __seg_fs
  // but the pointer is to the generic
  // address-space.
```

Implementation-wise, I think I handle that correctly but I'll do a recheck.

> > > You need to look at the qualifiers on type (which should also be the ones 
> > > in
> > > type_quals), not the qualifiers in the declspecs.
> > 
> > I'll have another look, thanks.
> > 
> > > > > >  - I'm having some trouble deciding whether I include those 
> > > > > > three
> > > > > >stub programs as tests, they all compile fine and clang 
> > > > > > accepts
> > > > > >them as well.
> > > > > 
> > > > > Why not?
> > > > 
> > > > I thought they were pretty contrived, since it does not make much
> > > > sense to strip address space qualifiers, even though it does prove
> > > > that the implementation support those contrived but valid us

Re: [PATCH] Fix bogus -Wstringop-overflow warning

2022-10-13 Thread Jeff Law via Gcc-patches



On 10/13/22 06:06, Eric Botcazou via Gcc-patches wrote:

Hi,

if you compile the attached testcase with -O2 -fno-inline -Wall, you get:

In function 'process_array3':
cc1: warning: 'process_array4' accessing 4 bytes in a region of size 3 [-
Wstringop-overflow=]
cc1: note: referencing argument 1 of type 'char[4]'
t.c:6:6: note: in a call to function 'process_array4'
 6 | void process_array4 (char a[4], int n)
   |  ^~
cc1: warning: 'process_array4' accessing 4 bytes in a region of size 3 [-
Wstringop-overflow=]
cc1: note: referencing argument 1 of type 'char[4]'
t.c:6:6: note: in a call to function 'process_array4'

That's because the ICF IPA pass has identified the two functions and turned
process_array3 into a wrapper of process_array4.  This looks sensible to me
given that the only difference between them is an "access" attribute on their
type describing the access size of the parameter and the "access" attribute
does not affect type identity (struct attribute_spec.affects_type_identity).

Hence the proposed fix, tested on x86-64/Linux, OK for the mainline?


2022-10-13  Eric Botcazou  

* gimple-ssa-warn-access.cc (pass_waccess::check_call): Return
early for calls made from thunks.


2022-10-13  Eric Botcazou  

* gcc.dg/Wstringop-overflow-89.c: New test.


Not a fan as it could potentially hide a real issue, but I don't really 
have a better solution.  I pondered suggesting "access" affect type 
identity, but the cases where that's really important are probably 
better handled by the "fn spec" attribute, leaving "access" strictly 
impacting diagnostics.


OK

jeff



Re: [PATCH v2 00/10] [RISC-V] Atomics improvements [PR100265/PR100266]

2022-10-13 Thread Jeff Law via Gcc-patches



On 10/11/22 17:31, Vineet Gupta wrote:




I expect that the pressure for a proper fix upstream (instead of a 
backward compatible compromise) will increase over time (once people 
start building big iron based on RISC-V and start hunting performance 
bottlenecks in multithreaded workloads to be competitive).
What could be done to get some relief is to enable the new atomics 
ABI by a command line switch and promote its use. And at one point in 
the future (if there are enough fixes to justify a break) the new ABI 
can be enabled by default with a new flag to enable the old ABI.


Indeed we are stuck with inefficiencies with status quo. The new abi 
option sounds like a reasonable plan going fwd.


Also my understand is that while the considerations are ABI centric, 
the option to faciliate this need not be tied to canonical -mabi=lp32, 
lp64d etc. It might just be a toggle as -matomic=legacy,2019 etc (this 
is not suggestive just indicative). Otherwise there's another level of 
blowup in multilib testing etc.


If I understand the history here, we're essentially catering to code 
that is potentially relying on behavior that was never really 
guaranteed.   That's not really ABI -- it's more depending on specifics 
of an implementation or undefined/underdefined behavior.    Holding back 
progress for that case seems short-sighted, particularly given how early 
I hope we are in the RISC-V journey.



But I'm also sympathetic to the desire not to break existing code.  
Could we keep the old behavior under a flag and fix the default behavior 
here, presumably with a bit in the ELF header indicating code that wants 
the old behavior?



Jeff




Re: [PATCH] Fix bogus -Wstringop-overflow warning

2022-10-13 Thread Eric Botcazou via Gcc-patches
> Not a fan as it could potentially hide a real issue, but I don't really
> have a better solution.

Thanks.

> I pondered suggesting "access" affect type identity, but the cases where
> that's really important are probably better handled by the "fn spec"
> attribute, leaving "access" strictly impacting diagnostics.

I can expand a bit here, because I tried to change the "access" attribute that 
way and this badly breaks the C compiler, for example:

int foo (int n, char m[1][n]);

int foo (int n, char m[1][n]) {}

no longer compiles with an error about different function types.

-- 
Eric Botcazou




Re: [PATCH v2 00/10] [RISC-V] Atomics improvements [PR100265/PR100266]

2022-10-13 Thread Jeff Law via Gcc-patches



On 10/11/22 18:15, Palmer Dabbelt wrote:


Sorry, I thought we'd talked about it somewhere but it must have just 
been in meetings and such.  Patrick was writing a similar patch set 
around the same time so it probably just got tied up in that, we ended 
up reducing it to just the strong CAS inline stuff because we couldn't 
sort out the correctness of the rest of it.


Now that you mention it, I vaguely recall a discussion about inline 
atomics vs libatomic and the discussion on this issue might have been 
there rather than in Christophe's patchset.





My initial understanding was that fixing something broken cannot be 
an ABI break.
And that the mismatch of the implementation in 2021 and the 
recommended mappings in the ratified specification from 2019 is 
something that is broken. I still don't know the background here, 
but I guess this assumption is incorrect from a historical point of 
view.


We agreed that we wouldn't break binaries back when we submitted the 
port.  The ISA has changed many times since then, including adding the 
recommended mappings, but those binaries exist and we can't just 
silently break things for users.


That may be too strong of a policy -- just because something worked in 
the past doesn't mean it must always work.  It really depends on the 
contracts specified by the ABI, processor reference documentation, etc 
and whether or not the code relies on something that it outside those 
contracts.  If it does rely on behavior outside the contracts, then we 
shouldn't be constrained by such an agreement.


Another way to think about this problem is do we want more code making 
incorrect assumptions about the behavior of atomics getting out in the 
wild?  My take would be that we nip this in the bud, get it right now in 
the default configuration, but leave enough bits in place that existing 
code continues to work.




However, I'm sure that I am not the only one that assumes the 
mappings in the specification to be implemented in compilers and 
tools. Therefore I still consider the implementation of the RISC-V 
atomics in GCC as broken (at least w.r.t. user expectation from 
people that lack the historical background and just read the RISC-V 
specification).


You can't just read one of those RISC-V PDFs and assume that 
implementations that match those words will function correctly. Those 
words regularly change in ways where reasonable readers would end up 
with incompatible implementations due to those differences.  That's 
why we're so explicit about versions and such these days, we're just 
getting burned by these old mappings because they're from back when we 
though the RISC-V definition of compatibility was going to match the 
more common one and we didn't build in fallbacks.


Fair point, but in my mind that argues that the platform must mature 
further so that the contracts can be relied upon.  That obviously needs 
to get fixed and until it does any agreements or guarantees about 
behavior of existing code can't be reasonably made.  If we're going to 
be taken seriously, then those fundamentals have to be rock solid.





I don't think we're just stuck with the status quo, we really just 
need to go through the mappings and figure out which can be made both 
fast and ABI-compatible.  Then we can fix those and see where we 
stand, maybe it's good enough or maybe we need to introduce some sort 
of compatibility break to make things faster (and/or compatible with 
LLVM, where I suspect we're broken right now).


Certainly seems like a good first step.  What we can fix without 
breaking things we do while we sort out the tougher problems.





If we do need a break then I think it's probably possible to do it in 
phases, where we have a middle-ground compatibility mode that works 
for both the old and new mappings so distros can gradually move over 
as they rebuild packages.


As someone that lived in the distro space for a long time, I would argue 
that now is the time to fix this stuff -- before there is a large uptake 
in distro consumption.





+Jeff, who was offering to help when the threads got crossed.  I'd 
punted on a lot of this in the hope Andrea could help out, as I'm not 
really a memory model guy and this is pretty far down the rabbit 
hole.  Happy to have the help if you're offering, though, as what's 
there is likely a pretty big performance issue for anyone with a 
reasonable memory system.


Hmm, there's a case I'm pondering if I can discuss  or not. Probably not 
since I can't recall it ever being discussed in public.  So I'll just 
say this space can be critically important for performance and the 
longer we wait, the tougher it gets to fix without causing significant 
disruption.


Jeff


Re: [PATCH v2 00/10] [RISC-V] Atomics improvements [PR100265/PR100266]

2022-10-13 Thread Jeff Law via Gcc-patches



On 10/12/22 02:03, Christoph Müllner wrote:



So we have the following atomics ABIs:
 I) GCC implementation
 II) LLVM implementation
 III) Specified ABI in the "Code Porting and Mapping Guidelines" 
appendix of the RISC-V specification


And presumably we don't have any way to distinguish between I and II at 
the DSO or object level.  That implies that if we're going to get to 
III, then we have to mark new code.  We obviously can't mark 
pre-existing bits (and I may have implied we should do that in my 
earlier message, my goof).






And there are two proposed solutions:
 a) Finding a new ABI that is compatible with I) and II) is of course 
a solution, but we don't know if and when such a solution exists.
 b) Going to introduce III) causes a break and therefore needs special 
care (e.g. let the user decide via command line flag or provide a 
compatibility mode).


I don't see that a) and b) contradict each other.
Why not going for both:
 -) Continue to work on a backward compatible solution
 -) Enable the "new" ABI from the specification appendix via command 
line flag

 -) Reevaluate the situation in 12 months to decide the next steps
I would lean towards making the new, more correct, behavior the default 
and having the old behavior enabled by a command line flag. But 
otherwise what you're suggesting seems reasonable.



Jeff


Re: [PATCH v2 00/10] [RISC-V] Atomics improvements [PR100265/PR100266]

2022-10-13 Thread Palmer Dabbelt

On Thu, 13 Oct 2022 15:39:39 PDT (-0700), gcc-patches@gcc.gnu.org wrote:


On 10/11/22 17:31, Vineet Gupta wrote:




I expect that the pressure for a proper fix upstream (instead of a
backward compatible compromise) will increase over time (once people
start building big iron based on RISC-V and start hunting performance
bottlenecks in multithreaded workloads to be competitive).
What could be done to get some relief is to enable the new atomics
ABI by a command line switch and promote its use. And at one point in
the future (if there are enough fixes to justify a break) the new ABI
can be enabled by default with a new flag to enable the old ABI.


Indeed we are stuck with inefficiencies with status quo. The new abi
option sounds like a reasonable plan going fwd.

Also my understand is that while the considerations are ABI centric,
the option to faciliate this need not be tied to canonical -mabi=lp32,
lp64d etc. It might just be a toggle as -matomic=legacy,2019 etc (this
is not suggestive just indicative). Otherwise there's another level of
blowup in multilib testing etc.


If I understand the history here, we're essentially catering to code
that is potentially relying on behavior that was never really
guaranteed.   That's not really ABI -- it's more depending on specifics
of an implementation or undefined/underdefined behavior.    Holding back
progress for that case seems short-sighted, particularly given how early
I hope we are in the RISC-V journey.


But I'm also sympathetic to the desire not to break existing code. 


I suppose ABI means a ton of things, but in this case that's really want 
I was trying to get at: just changing to the mappings suggested in the 
ISA manual risks producing binaries that don't work when mixed with old 
binaries.  My rationale for calling that ABI was that there's a defacto 
memory model ABI defined as "anything that works with the old binaries", 
but ABI means so many things maybe we just shouldn't say it at all here?


I'm going to call it "old binary compatibility" here, rather than "ABI 
compatibility", just so it's a different term.



Could we keep the old behavior under a flag and fix the default behavior
here, presumably with a bit in the ELF header indicating code that wants
the old behavior?


The thread got forked a bit, but that's essentially what I was trying to 
suggest.  I talked with Andrea a bit and here's how I'd describe it:


We have a mapping from the C{,++}11 memory model to RVWMO that's 
currently emitted by GCC, and there are years of binaries with that 
mapping.  As far as we know that mapping is correct, but I don't think 
anyone's gone through and formally analyzed it.  Let's call those the 
"GCC mapping".


There's also a mapping listed in the ISA manual.  That's not the same as 
the GCC mapping, so let's call it the "ISA mapping".  We need to 
double-check the specifics, but IIUC this ISA mapping is broadly 
followed by LLVM.  It's also very likely to be correct, as it's been 
analyzed by lots of formal memory model people as part of the RVWMO 
standardization process.


As far as I know, everyone likes the ISA mapping better than the GCC 
mapping.  It's hard to describe why concretely because there's no 
hardware that implements RVWMO sufficiently aggressively that we can 
talk performance, but at least matching what's in the ISA manual is the 
way to go.  Also, just kind of a personal opinion, the GCC mapping does 
some weird ugly stuff.


So the question is really: how do we get rid of the GCC mapping while 
causing as little headache as possible?


My proposal is as follows:

* Let's properly analyze the GCC mapping, just on its own.  Maybe it's 
 broken, if it is then I think we've got a pretty decent argument to 
 just ignore old binary compatibility -- if it was always broken then 
 we can't break it, so who cares.
* Assuming the GCC mapping is internally consistent, let's analyze 
 arbitrary mixes of the GCC and ISA mappings.  It's not generally true 
 that two correct mappings can be freely mixed, but maybe we've got 
 some cases that work fine.  Maybe we even get lucky and everything's 
 compatible, who knows (though I'm worried about the same .aq vs fence 
 stuff we've had in LKMM a few times).
* Assuming there's any issue mixing the GCC and ISA mappings, let's add 
 a flag to GCC.  Something like -mrvwmo-compat={legacy,both,isa}, where:
   - =legacy does what we have now.  We can eventually deprecate this, 
 and assuming =both is fast enough maybe we don't even need it.
   - =both implements a mapping that's compatible with both the GCC and 
 ISA mappings.  This might be slow or something, but it'll be 
 correct with both.  We can set this to the default now, as it's 
 safe.

   - =isa implements the ISA mappings.

Then we can go stick some marker in the ELF saying "this binary is 
compatible with the ISA mappings" to triage binaries in systems.  That 
way distros can decide when they want to move to -mrvwmo-compat

[PATCH] Always enable LRA

2022-10-13 Thread Segher Boessenkool
This small patch changes everything that checks targetm.lra_p behave as
if it returned true.

It has no effect on any primary or secondary target.  It also is fine
for nds32 and for nios2, and it works fine for microblaze (which used
old reload before), resulting in smaller code.

I have patches to completely rip out old reload, and more stuff after
that, but of course not everything is nice yet:

It makes a few targets no longer build though.  In my testing (of all
linux targets that built before) these are alpha, c6x, h8300, m68k,
32-bit parisc, sh, and xtensa.

alpha builds the compiler, but it then crashes with
/home/segher/src/kernel/drivers/tty/serial/serial_core.c:1029:1: internal 
compiler error: maximum number of generated reload insns per insn achieved (90)
(and in three more files) which can mean anything unfortunately.

c6x is more exciting:
/home/segher/src/kernel/fs/char_dev.c:598:1: internal compiler error: in 
priority, at haifa-sched.cc:1597
which is
  /* We should not be interested in priority of an already scheduled insn.  */
  gcc_assert (QUEUE_INDEX (insn) != QUEUE_SCHEDULED);

h8300 fails during GCC build:
/home/segher/src/gcc/libgcc/unwind.inc: In function 
'_Unwind_SjLj_RaiseException':
/home/segher/src/gcc/libgcc/unwind.inc:141:1: error: could not split insn
  141 | }
  | ^
(insn 69 256 327 (set (mem/f:SI (pre_dec:SI (reg/f:SI 7 sp)) [12  S4 A32])
(reg/f:SI 7 sp)) "/home/segher/src/gcc/libgcc/unwind.inc":118:12 19 
{*movsi}
 (expr_list:REG_ARGS_SIZE (const_int 4 [0x4])
(nil)))
during RTL pass: final
which looks like a backend bug, I don't see a pattern that could split
this (without needing an extra clobber)?

m68k builds the compiler fine, but then has
/home/segher/src/kernel/fs/squashfs/namei.c: In function 'squashfs_lookup':
/home/segher/src/kernel/fs/squashfs/namei.c:232:1: error: insn does not satisfy 
its constraints:
  232 | }
  | ^
(insn 270 470 271 30 (set (mem:SI (pre_dec:SI (reg/f:SI 15 %sp)) [1  S4 A16])
(plus:SI (sign_extend:SI (reg:HI 8 %a0 [175]))
(reg:SI 2 %d2 [orig:173 val ] [173]))) 
"/home/segher/src/kernel/fs/squashfs/namei.c":212:13 77 {pushasi}
 (expr_list:REG_ARGS_SIZE (const_int 4 [0x4])
(nil)))
during RTL pass: postreload
/home/segher/src/kernel/fs/squashfs/namei.c:232:1: internal compiler error: in 
extract_constrain_insn, at recog.cc:2692
(and similar in two more files).

32-bit parisc now runs into the 90 reloads thing (parisc64 already did
without the patch).

sh doesn't build GCC:
/home/segher/src/gcc/libgcc/libgcc2.c: In function '__divdc3':
/home/segher/src/gcc/libgcc/libgcc2.c:2182:1: error: unable to generate reloads 
for:
 2182 | }
  | ^
(call_insn/u 132 131 1855 7 (parallel [
(set (reg:SI 0 r0)
(call (mem:SI (symbol_ref:SI ("__ltdf2") [flags 0x41] 
) [0  S4 A32])
(const_int 0 [0])))
(use (reg:SI 154 fpscr0))
(use (reg:SI 12 r12))
(clobber (reg:SI 146 pr))
(clobber (reg:SI 758))
]) "/home/segher/src/gcc/libgcc/libgcc2.c":2082:7 233 {call_value_pcrel}
 (expr_list:REG_DEAD (reg:DF 6 r6)
(expr_list:REG_DEAD (reg:DF 4 r4)
(expr_list:REG_CALL_DECL (symbol_ref:SI ("__ltdf2") [flags 0x41] 
)
(expr_list:REG_EH_REGION (const_int -2147483648 
[0x8000])
(nil)
(expr_list (use (reg:DF 6 r6))
(expr_list (use (reg:DF 4 r4))
(nil
during RTL pass: reload

And finally, xtensa does
/home/segher/src/gcc/libgcc/libgcc2.c:840:1: error: insn does not satisfy its 
constraints:
  840 | }
  | ^
(insn 8 7 9 2 (set (reg:SI 9 a9 [57])
(const_int 1431655765 [0x])) 
"/home/segher/src/gcc/libgcc/libgcc2.c":828:21 37 {movsi_internal}
 (expr_list:REG_EQUIV (const_int 1431655765 [0x])
(nil)))
during RTL pass: postreload
/home/segher/src/gcc/libgcc/libgcc2.c:840:1: internal compiler error: in 
extract_constrain_insn, at recog.cc:2692

 - ~ - ~ -

All in all, more worked than expected.  I think it is time to finally
make this switch.

I did not test targets without a linux port.  I build the linux kernel
as well, to see if the resulting compiler actually works :-)
Suggestions what to use for other targets are welcome.

Is there any way to get the final targets updated for LRA?

Other rants?  :-)


Segher

---
 gcc/auto-inc-dec.cc | 2 +-
 gcc/combine.cc  | 2 +-
 gcc/ira-lives.cc| 5 -
 gcc/ira.cc  | 2 +-
 gcc/recog.cc| 2 +-
 gcc/targhooks.cc| 4 
 6 files changed, 4 insertions(+), 13 deletions(-)

diff --git a/gcc/auto-inc-dec.cc b/gcc/auto-inc-dec.cc
index 481e7af..0186c17 100644
--- a/gcc/auto-inc-dec.cc
+++ b/gcc/auto-inc-dec.cc
@@ -1443,7 +1443,7 @@ merge_in_block (int max_reg, basic_block bb)
 
   /* Reload should handle auto-inc within a jump correctly, while LRA
 is known to have issues with autoinc.  */
-  if (JUMP_P (ins

Re: [PATCH v2 00/10] [RISC-V] Atomics improvements [PR100265/PR100266]

2022-10-13 Thread Vineet Gupta

On 10/13/22 15:39, Jeff Law via Gcc-patches wrote:


On 10/11/22 17:31, Vineet Gupta wrote:




I expect that the pressure for a proper fix upstream (instead of a 
backward compatible compromise) will increase over time (once people 
start building big iron based on RISC-V and start hunting performance 
bottlenecks in multithreaded workloads to be competitive).
What could be done to get some relief is to enable the new atomics 
ABI by a command line switch and promote its use. And at one point in 
the future (if there are enough fixes to justify a break) the new ABI 
can be enabled by default with a new flag to enable the old ABI.


Indeed we are stuck with inefficiencies with status quo. The new abi 
option sounds like a reasonable plan going fwd.


Also my understand is that while the considerations are ABI centric, 
the option to faciliate this need not be tied to canonical -mabi=lp32, 
lp64d etc. It might just be a toggle as -matomic=legacy,2019 etc (this 
is not suggestive just indicative). Otherwise there's another level of 
blowup in multilib testing etc.


If I understand the history here, we're essentially catering to code 
that is potentially relying on behavior that was never really 
guaranteed.   That's not really ABI -- it's more depending on specifics 
of an implementation or undefined/underdefined behavior.    Holding back 
progress for that case seems short-sighted, particularly given how early 
I hope we are in the RISC-V journey.


Exactly. I keep hearing about the potential ABI breakage but no real 
discussion (publicly at least) which describe in detail what exactly 
that ABI / old-behavior breakage is with this patch series [1]. So 
perhaps we can start with reviewing the patches, calling out exactly 
what change causes the divergence and if that is acceptable or not. And 
while at it, perhaps also make updates to [2]



[1] https://gcc.gnu.org/pipermail/gcc-patches/2022-May/595712.html
[2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100265


Re: [PATCH] Always enable LRA

2022-10-13 Thread Koning, Paul via Gcc-patches



> On Oct 13, 2022, at 7:56 PM, Segher Boessenkool  
> wrote:
> 
> This small patch changes everything that checks targetm.lra_p behave as
> if it returned true.
> 
> It has no effect on any primary or secondary target.  It also is fine
> for nds32 and for nios2, and it works fine for microblaze (which used
> old reload before), resulting in smaller code.
> 
> I have patches to completely rip out old reload, and more stuff after
> that, but of course not everything is nice yet:

I guess I'll have to look harder to see if it's possible to make LRA handle 
CISC addressing modes like memory indirect, autoincrement, autodecrement, and 
others that the old reload handles at least somewhat.  Ideally LRA should do a 
better job; right now I believe it doesn't really do these things at all.  
Targets like pdp11 and vax would like these.

paul




Re: [PATCH] Always enable LRA

2022-10-13 Thread Jeff Law via Gcc-patches



On 10/13/22 17:56, Segher Boessenkool wrote:


h8300 fails during GCC build:
/home/segher/src/gcc/libgcc/unwind.inc: In function 
'_Unwind_SjLj_RaiseException':
/home/segher/src/gcc/libgcc/unwind.inc:141:1: error: could not split insn
   141 | }
   | ^
(insn 69 256 327 (set (mem/f:SI (pre_dec:SI (reg/f:SI 7 sp)) [12  S4 A32])
 (reg/f:SI 7 sp)) "/home/segher/src/gcc/libgcc/unwind.inc":118:12 19 
{*movsi}
  (expr_list:REG_ARGS_SIZE (const_int 4 [0x4])
 (nil)))
during RTL pass: final
which looks like a backend bug, I don't see a pattern that could split
this (without needing an extra clobber)?


I'm aware of this -- its invalid RTL:

Uses of the register outside of an address are not permitted within the
same insn as a use in an embedded side effect expression because such
insns behave differently on different machines and hence must be treated
as ambiguous and disallowed.


I'd actually tried to turn on LRA for the H8 port a little while ago and 
stumbled over it.


I'm aware of a similar situation involving a general register on the H8, 
but using reload instead of LRA.  I looked at it a while back and my 
recollection was that the insn was actually fine until reload got its 
grubby hands on it.  And when I wandered reload to hunt for anything 
which handled the restriction noted above, I didn't find anything.  If 
you were to search for H8 bugs in bugzilla, it should be discoverable.


While we could potentially work around this in the backend, it'd be a 
hack at best.  It hasn't risen to the top of my priority list yet.  I 
considered suggesting we change this from "invalid" to "target defined" 
behavior, but that felt like a cop-out.



jeff



Re: [PATCH] Optimize nested permutation to single VEC_PERM_EXPR [PR54346]

2022-10-13 Thread Xu, Liwei via Gcc-patches
Hi Richard
 
Thank your for your detailed explanation, I’ll patch the test case with 
suggestions form LuLu.

Best
Levy

> On 13 Oct 2022, at 7:12 pm, Richard Biener  wrote:
> 
> On Thu, Oct 13, 2022 at 10:16 AM Lulu Cheng  wrote:
>> 
>> 
>>> 在 2022/10/13 下午2:44, Xi Ruoyao 写道:
>>> On Thu, 2022-10-13 at 14:15 +0800, Levy wrote:
 Hi RuoYao
 
 It’s probably because loongarch64 doesn’t support
 can_vec_perm_const_p(result_mode, op_mode, sel2, false)
 
 I’m not sure whether if loongarch will support it or should I just
 limit the test target for pr54346.c?
>>> I'm not sure if we can add TARGET_VECTORIZE_VEC_PERM_CONST when we don't
>>> actually support vector.  (LoongArch has SIMD instructions but the
>>> support in GCC won't be added in a very recent future.)
>>> 
>> If what I understand is correct, I think this might be a better solution.
>> 
>>  /* { dg-do compile } */
>> 
>> +/* { dg-require-effective-target vect_perm } */
>>  /* { dg-options "-O -fdump-tree-dse1" } */
> 
> Btw, what forwprop does is check whether any of the original permutations are
> not supported and then elide the supportability check for the result.
> The reasoning
> is that the original permute(s) would be lowered during vectlower so we can as
> well do that for the result.  We should just never turn a supported 
> permutation
> sequence into a not supported one.
> 
> Richard.
> 


Re: [PATCH] Optimize nested permutation to single VEC_PERM_EXPR [PR54346]

2022-10-13 Thread Lulu Cheng



在 2022/10/13 下午7:10, Richard Biener 写道:

On Thu, Oct 13, 2022 at 10:16 AM Lulu Cheng  wrote:


在 2022/10/13 下午2:44, Xi Ruoyao 写道:

On Thu, 2022-10-13 at 14:15 +0800, Levy wrote:

Hi RuoYao

It’s probably because loongarch64 doesn’t support
can_vec_perm_const_p(result_mode, op_mode, sel2, false)

I’m not sure whether if loongarch will support it or should I just
limit the test target for pr54346.c?

I'm not sure if we can add TARGET_VECTORIZE_VEC_PERM_CONST when we don't
actually support vector.  (LoongArch has SIMD instructions but the
support in GCC won't be added in a very recent future.)


If what I understand is correct, I think this might be a better solution.

   /* { dg-do compile } */

+/* { dg-require-effective-target vect_perm } */
   /* { dg-options "-O -fdump-tree-dse1" } */

Btw, what forwprop does is check whether any of the original permutations are
not supported and then elide the supportability check for the result.
The reasoning
is that the original permute(s) would be lowered during vectlower so we can as
well do that for the result.  We should just never turn a supported permutation
sequence into a not supported one.

Richard.


Hi Richard:

I'm very sorry. I don't fully understand what you mean.

Could you give me some more details?


Thanks!

Lulu Cheng




[committed] c: C2x storage class specifiers in compound literals

2022-10-13 Thread Joseph Myers
Implement the C2x feature of storage class specifiers in compound
literals.  Such storage class specifiers (static, register or
thread_local; also constexpr, but we don't yet have C2x constexpr
support implemented) can be used before the type name (not mixed with
type specifiers, unlike in declarations) and have the same semantics
and constraints as for declarations of named objects.  Also allow GNU
__thread to be used, given that thread_local can be.

Bootstrapped with no regressions for x86_64-pc-linux-gnu.

gcc/c/
* c-decl.cc (build_compound_literal): Add parameter scspecs.
Handle storage class specifiers.
* c-parser.cc (c_token_starts_compound_literal)
(c_parser_compound_literal_scspecs): New.
(c_parser_postfix_expression_after_paren_type): Add parameter
scspecs.  Call pedwarn_c11 for use of storage class specifiers.
Update call to build_compound_literal.
(c_parser_cast_expression, c_parser_sizeof_expression)
(c_parser_alignof_expression): Handle storage class specifiers for
compound literals.  Update calls to
c_parser_postfix_expression_after_paren_type.
(c_parser_postfix_expression): Update syntax comment.
* c-tree.h (build_compound_literal): Update prototype.
* c-typeck.cc (c_mark_addressable): Diagnose taking address of
register compound literal.

gcc/testsuite/
* gcc.dg/c11-complit-1.c, gcc.dg/c11-complit-2.c,
gcc.dg/c11-complit-3.c, gcc.dg/c2x-complit-2.c,
gcc.dg/c2x-complit-3.c, gcc.dg/c2x-complit-4.c,
gcc.dg/c2x-complit-5.c, gcc.dg/c2x-complit-6.c,
gcc.dg/c2x-complit-7.c, gcc.dg/c90-complit-2.c,
gcc.dg/gnu2x-complit-1.c, gcc.dg/gnu2x-complit-2.c: New tests.

diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
index 193e268f04e..a7571cc7542 100644
--- a/gcc/c/c-decl.cc
+++ b/gcc/c/c-decl.cc
@@ -6048,11 +6048,13 @@ mark_forward_parm_decls (void)
literal.  NON_CONST is true if the initializers contain something
that cannot occur in a constant expression.  If ALIGNAS_ALIGN is nonzero,
it is the (valid) alignment for this compound literal, as specified
-   with _Alignas.  */
+   with _Alignas.  SCSPECS are the storage class specifiers (C2x) from the
+   compound literal.  */
 
 tree
 build_compound_literal (location_t loc, tree type, tree init, bool non_const,
-   unsigned int alignas_align)
+   unsigned int alignas_align,
+   struct c_declspecs *scspecs)
 {
   /* We do not use start_decl here because we have a type, not a declarator;
  and do not use finish_decl because the decl should be stored inside
@@ -6060,15 +6062,33 @@ build_compound_literal (location_t loc, tree type, tree 
init, bool non_const,
   tree decl;
   tree complit;
   tree stmt;
+  bool threadp = scspecs ? scspecs->thread_p : false;
+  enum c_storage_class storage_class = (scspecs
+   ? scspecs->storage_class
+   : csc_none);
 
   if (type == error_mark_node
   || init == error_mark_node)
 return error_mark_node;
 
+  if (current_scope == file_scope && storage_class == csc_register)
+{
+  error_at (loc, "file-scope compound literal specifies %");
+  storage_class = csc_none;
+}
+
+  if (current_scope != file_scope && threadp && storage_class == csc_none)
+{
+  error_at (loc, "compound literal implicitly auto and declared %qs",
+   scspecs->thread_gnu_p ? "__thread" : "_Thread_local");
+  threadp = false;
+}
+
   decl = build_decl (loc, VAR_DECL, NULL_TREE, type);
   DECL_EXTERNAL (decl) = 0;
   TREE_PUBLIC (decl) = 0;
-  TREE_STATIC (decl) = (current_scope == file_scope);
+  TREE_STATIC (decl) = (current_scope == file_scope
+   || storage_class == csc_static);
   DECL_CONTEXT (decl) = current_function_decl;
   TREE_USED (decl) = 1;
   DECL_READ_P (decl) = 1;
@@ -6076,6 +6096,13 @@ build_compound_literal (location_t loc, tree type, tree 
init, bool non_const,
   DECL_IGNORED_P (decl) = 1;
   C_DECL_COMPOUND_LITERAL_P (decl) = 1;
   TREE_TYPE (decl) = type;
+  if (threadp)
+set_decl_tls_model (decl, decl_default_tls_model (decl));
+  if (storage_class == csc_register)
+{
+  C_DECL_REGISTER (decl) = 1;
+  DECL_REGISTER (decl) = 1;
+}
   c_apply_type_quals_to_decl (TYPE_QUALS (strip_array_types (type)), decl);
   if (alignas_align)
 {
diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 89e05870f47..602e0235f2d 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -666,6 +666,30 @@ c_parser_next_tokens_start_typename (c_parser *parser, 
enum c_lookahead_kind la)
   return false;
 }
 
+/* Return true if TOKEN, after an open parenthesis, can start a
+   compound literal (either a storage class specifier allowed in that
+   context, or a type name), false otherwise.  */
+static bool
+c_token_starts_compound_literal (c_token *toke

[PATCH v2] LoongArch: Optimize the implementation of stack check.

2022-10-13 Thread Lulu Cheng
The old stack stack was performed before the stack was dropped,
which would cause the detection tool to report a memory leak.

The current stack check scheme is as follows:

'-fstack-clash-protection':
1. When the frame->total_size is smaller than the guard page size,
   the stack is dropped according to the original scheme, and there
   is no need to perform stack detection in the prologue.
2. When frame->total_size is greater than or equal to guard page size,
   the first step to drop the stack is to drop the space required by
   the caller-save registers. This space needs to save the caller-save
   registers, so an implicit stack check is performed.
   So just need to check the rest of the stack space.

'-fstack-check':
There is no one-time stack drop and then page-by-page detection as
described in the document. It is also the same as
'-fstack-clash-protection', which is detected immediately after page drop.

It is judged that when frame->total_size is not 0, only the size required
to save the s register is dropped for the first stack down.

The test cases are referenced from aarch64.

gcc/ChangeLog:

* config/loongarch/linux.h (STACK_CHECK_MOVING_SP):
Define this macro to 1.
* config/loongarch/loongarch.cc (loongarch_first_stack_step):
Return the size of the first drop stack according to whether stack 
checking
is performed
(loongarch_emit_probe_stack_range): Adjust the method of stack checking 
in prologue.
(loongarch_output_probe_stack_range): Delete useless code.
(loongarch_expand_prologue): Adjust the method of stack checking in 
prologue.
(loongarch_option_override_internal): Enforce that interval is the same
size as size so the mid-end does the right thing.
* config/loongarch/loongarch.h (STACK_CLASH_MAX_UNROLL_PAGES):
New macro decide whether to loop stack detection.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Add loongarch support for 
stack_clash_protection.
* gcc.target/loongarch/stack-check-alloca-1.c: New test.
* gcc.target/loongarch/stack-check-alloca-2.c: New test.
* gcc.target/loongarch/stack-check-alloca-3.c: New test.
* gcc.target/loongarch/stack-check-alloca-4.c: New test.
* gcc.target/loongarch/stack-check-alloca-5.c: New test.
* gcc.target/loongarch/stack-check-alloca-6.c: New test.
* gcc.target/loongarch/stack-check-alloca.h: New test.
* gcc.target/loongarch/stack-check-cfa-1.c: New test.
* gcc.target/loongarch/stack-check-cfa-2.c: New test.
* gcc.target/loongarch/stack-check-prologue-1.c: New test.
* gcc.target/loongarch/stack-check-prologue-2.c: New test.
* gcc.target/loongarch/stack-check-prologue-3.c: New test.
* gcc.target/loongarch/stack-check-prologue-4.c: New test.
* gcc.target/loongarch/stack-check-prologue-5.c: New test.
* gcc.target/loongarch/stack-check-prologue-6.c: New test.
* gcc.target/loongarch/stack-check-prologue-7.c: New test.
* gcc.target/loongarch/stack-check-prologue.h: New test.
---
 gcc/config/loongarch/linux.h  |   3 +
 gcc/config/loongarch/loongarch.cc | 249 +++---
 gcc/config/loongarch/loongarch.h  |   4 +
 .../loongarch/stack-check-alloca-1.c  |  15 ++
 .../loongarch/stack-check-alloca-2.c  |  12 +
 .../loongarch/stack-check-alloca-3.c  |  12 +
 .../loongarch/stack-check-alloca-4.c  |  12 +
 .../loongarch/stack-check-alloca-5.c  |  13 +
 .../loongarch/stack-check-alloca-6.c  |  13 +
 .../gcc.target/loongarch/stack-check-alloca.h |  15 ++
 .../gcc.target/loongarch/stack-check-cfa-1.c  |  12 +
 .../gcc.target/loongarch/stack-check-cfa-2.c  |  12 +
 .../loongarch/stack-check-prologue-1.c|  11 +
 .../loongarch/stack-check-prologue-2.c|  11 +
 .../loongarch/stack-check-prologue-3.c|  11 +
 .../loongarch/stack-check-prologue-4.c|  11 +
 .../loongarch/stack-check-prologue-5.c|  12 +
 .../loongarch/stack-check-prologue-6.c|  11 +
 .../loongarch/stack-check-prologue-7.c|  12 +
 .../loongarch/stack-check-prologue.h  |   5 +
 gcc/testsuite/lib/target-supports.exp |   7 +-
 21 files changed, 362 insertions(+), 101 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/stack-check-alloca-1.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/stack-check-alloca-2.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/stack-check-alloca-3.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/stack-check-alloca-4.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/stack-check-alloca-5.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/stack-check-alloca-6.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/stack-check-alloca.h
 create mode 100644 gcc/testsuite/gcc.target/loongarch/stack-check-cfa-1.c
 create mode 1006

[PATCH] rs6000: Enable const_anchor for 'addi'

2022-10-13 Thread Jiufu Guo via Gcc-patches
Hi,

There is a functionality as const_anchor in cse.cc.  This const_anchor
supports to generate new constants through adding small gap/offsets to
existing constant.  For example:

void __attribute__ ((noinline)) foo (long long *a)
{
  *a++ = 0x2351847027482577LL;
  *a++ = 0x2351847027482578LL;
}
The second constant (0x2351847027482578LL) can be compated by adding '1'
to the first constant (0x2351847027482577LL).
This is profitable if more than one instructions are need to build the
second constant.

* For rs6000, we can enable this functionality, as the instruction
'addi' is just for this when gap is smaller than 0x8000.

* Besides enabling TARGET_CONST_ANCHOR on rs6000, this patch also fixed
one issue. The issue is:
"gcc_assert (SCALAR_INT_MODE_P (mode))" is an requirement for function
"try_const_anchors". e.g. it may not need to check const_anchor for
{[%1:DI]=0;} which is in BLK mode. And "SCALAR_INT_MODE_P (mode)" is
checked when invoking insert_const_anchors.
So, this patch also adds this checking before calling try_const_anchors.

* One potential side effect of this patch:
Comparing with
"r101=0x2351847027482577LL
...
r201=0x2351847027482578LL"
The new r201 will be "r201=r101+1", and then r101 will live longer,
and would increase pressure when allocating registers.
But I feel, this would be acceptable for this const_anchor feature.

* With this patch, I checked the performance change on SPEC2017, while,
and the performance is not aggressive, since this functionality is not
hit on any hot path. There are runtime wavings/noise(e.g. on
povray_r/xalancbmk_r/xz_r), that are not caused by the patch.

With this patch, I also checked the changes in object files (from
GCC bootstrap and SPEC), the significant changes are the improvement
that: "addi" vs. "2 or more insns: lis+or.."; it also exposes some
other optimizations opportunities: like combine/jump2. While the
code to store/load one more register is also occurring in few cases,
but it does not impact overall performance.

* To refine this patch, some history discussions are referenced:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=33699
https://gcc.gnu.org/pipermail/gcc-patches/2009-April/260421.html
https://gcc.gnu.org/pipermail/gcc-patches/2021-March/566744.html


Bootstrap and regtest pass on ppc64 and ppc64le for this patch.
Is this ok for trunk?


BR,
Jeff (Jiufu)

gcc/ChangeLog:

* config/rs6000/rs6000.cc (TARGET_CONST_ANCHOR): New define.
* cse.cc (cse_insn): Add guard condition.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/const_anchors.c: New test.
* gcc.target/powerpc/try_const_anchors_ice.c: New test.

---
 gcc/config/rs6000/rs6000.cc   |  4 
 gcc/cse.cc|  3 ++-
 .../gcc.target/powerpc/const_anchors.c| 20 +++
 .../powerpc/try_const_anchors_ice.c   | 16 +++
 4 files changed, 42 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/const_anchors.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/try_const_anchors_ice.c

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index d2743f7bce6..80cded6dec1 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -1760,6 +1760,10 @@ static const struct attribute_spec 
rs6000_attribute_table[] =
 
 #undef TARGET_UPDATE_IPA_FN_TARGET_INFO
 #define TARGET_UPDATE_IPA_FN_TARGET_INFO rs6000_update_ipa_fn_target_info
+
+#undef TARGET_CONST_ANCHOR
+#define TARGET_CONST_ANCHOR 0x8000
+
 
 
 /* Processor table.  */
diff --git a/gcc/cse.cc b/gcc/cse.cc
index b13afd4ba72..56542b91c1e 100644
--- a/gcc/cse.cc
+++ b/gcc/cse.cc
@@ -5005,7 +5005,8 @@ cse_insn (rtx_insn *insn)
   if (targetm.const_anchor
  && !src_related
  && src_const
- && GET_CODE (src_const) == CONST_INT)
+ && GET_CODE (src_const) == CONST_INT
+ && SCALAR_INT_MODE_P (mode))
{
  src_related = try_const_anchors (src_const, mode);
  src_related_is_const_anchor = src_related != NULL_RTX;
diff --git a/gcc/testsuite/gcc.target/powerpc/const_anchors.c 
b/gcc/testsuite/gcc.target/powerpc/const_anchors.c
new file mode 100644
index 000..39958ff9765
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/const_anchors.c
@@ -0,0 +1,20 @@
+/* { dg-do compile { target has_arch_ppc64 } } */
+/* { dg-options "-O2" } */
+
+#define C1 0x2351847027482577ULL
+#define C2 0x2351847027482578ULL
+
+void __attribute__ ((noinline)) foo (long long *a)
+{
+  *a++ = C1;
+  *a++ = C2;
+}
+
+void __attribute__ ((noinline)) foo1 (long long *a, long long b)
+{
+  *a++ = C1;
+  if (b)
+*a++ = C2;
+}
+
+/* { dg-final { scan-assembler-times {\maddi\M} 2 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/try_const_anchors_ice.c 
b/gcc/testsuite/gcc.target/powerpc/try_const_anchors_ice.c
new file mode 100644
index 000..4c8a892e803
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/try_const_anc

  1   2   >