date:20250102

RE: [RFC] PR81358: Enable automatic linking of libatomic

2025-01-02 Thread Prathamesh Kulkarni



> -Original Message-
> From: Prathamesh Kulkarni 
> Sent: 20 December 2024 21:08
> To: Prathamesh Kulkarni ; Tobias Burnus
> ; Joseph Myers 
> Cc: Xi Ruoyao ; Matthew Malcomson
> ; gcc-patches@gcc.gnu.org
> Subject: RE: [RFC] PR81358: Enable automatic linking of libatomic
> 
> 
> 
> > -Original Message-
> > From: Prathamesh Kulkarni 
> > Sent: 18 December 2024 21:09
> > To: Tobias Burnus ; Joseph Myers
> > 
> > Cc: Xi Ruoyao ; Matthew Malcomson
> > ; gcc-patches@gcc.gnu.org
> > Subject: RE: [RFC] PR81358: Enable automatic linking of libatomic
> >
> > External email: Use caution opening links or attachments
> >
> >
> > > -Original Message-
> > > From: Tobias Burnus 
> > > Sent: 18 December 2024 17:46
> > > To: Prathamesh Kulkarni ; Joseph Myers
> > > 
> > > Cc: Xi Ruoyao ; Matthew Malcomson
> > > ; gcc-patches@gcc.gnu.org
> > > Subject: Re: [RFC] PR81358: Enable automatic linking of libatomic
> > >
> > > External email: Use caution opening links or attachments
> > >
> > >
> > > A x86_64-gnu-linux bootstrap which also builds -m32 now fails.
> > >
> > > While I have local patches, they should not affect this, hence, I
> > fear
> > > that it has been caused by this patch.
> > >
> > > Namely, if I do a bootstrap into an empty directory with:
> > >
> > >$ /configure --prefix=...
> > > --enable-languages=c,c++,fortran,lto,objc
> > > --enable-offload-targets=nvptx-none,amdgcn-amdhsa
> > >
> > > I get during stage1 bootstrap:
> > >
> > > checking whether -lc should be explicitly linked in... no checking
> > > dynamic linker characteristics... configure: error: Link tests are
> > not
> > > allowed after GCC_NO_EXECUTABLES.
> > > make[2]: *** [Makefile:17281: configure-stage1-target-libstdc++-
> v3]
> > > Error 1
> > >
> > > And while ./x86_64-pc-linux-gnu/libstdc++-v3/config.log is okay,
> > > ./x86_64-pc-linux-gnu/32/libstdc++-v3/config.log fails with:
> > >
> > > configure:10962: $? = 0
> > > configure:10976: result: no
> > > configure:11141: checking dynamic linker characteristics
> > > configure:11577: error: Link tests are not allowed after
> > > GCC_NO_EXECUTABLES.
> > >
> > > Can you check?
> > Hi Tobias,
> > Sorry for the breakage. IIUC, the issue seems to be happening with
> > libatomic/Makefile.am all: rule added in the patch, that incorrectly
> > copies libatomic.a in $(gcc_objdir).
> > I have a local patch that instead copies it over to
> > $(gcc_objdir)$(MULTISUBDIR)/, which seems to fix the issue with
> stage-
> > 1 multilib build (with -m32) on x86_64-linux-gnu.
> > I will post it after running it thru bootstrap+test.
> Hi,
> The previous patch (now reverted) had two different issues both
> stemming from the rule added in libatomic/Makefile.am:
> (1) As mentioned above, it broke multilib builds because it
> incorrectly copies libatomic.a in $(gcc_objdir). The attached patch
> fixes it by instead copying libatomic.a  over to
> $(gcc_objdir)$(MULTISUBDIR)/, and can confirm that 64-bit libatomic.a
> is copied to $build/gcc/ and 32-bit libatomic.a is copied to
> $build/gcc/32/.
> 
> (2) libatomic_convenience.la was not getting generated for some
> reason, which resulted in build failure while building libdruntime.
> The patch adds libatomic_convenience.la as a dependency, and I can see
> it now getting generated, which seems to fix the build issue with
> libdruntime.
> 
> Patch passes bootstrap+test with multilib enabled for --enable-
> languages=all on x86_64-linux-gnu, and for --enable-
> languages=c,c++,fortran on aarch64-linux-gnu.
> Does this version look OK ?
Hi,
ping: https://gcc.gnu.org/pipermail/gcc-patches/2024-December/672119.html

Thanks,
Prathamesh
> 
> Signed-off-by: Prathamesh Kulkarni 
> 
> Thanks,
> Prathamesh
> >
> > Thanks,
> > Prathamesh
> > >
> > > Tobias

Re: [PATCH] c: special-case some "bool" errors with C23 (v2) [PR117629]

2025-01-02 Thread Sam James

David Malcolm  writes:

> On Thu, 2025-01-02 at 18:33 +, Joseph Myers wrote:
>> On Thu, 19 Dec 2024, David Malcolm wrote:
>> 
>> > Here's an updated version of the patch.
>> > 
>> > Changed in v2:
>> > - distinguish between "bool" and "_Bool" when determining
>> >   standard version
>> > - more test coverage
>> > 
>> > Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
>> > OK for trunk?
>> 
>> OK. 
>
> Thanks; pushed as r15-6507-g321983033d621e.
>
>>  (I'm guessing the other new keywords that weren't previously
>> reserved 
>> (alignas, alignof, constexpr, nullptr, static_assert, thread_local, 
>> typeof, typeof_unqual) are sufficiently rare as identifiers that they
>> aren't worth trying to produce better diagnostics for.)
>
> That was my thinking too.
>
> Sam, did you see anything significant here in your testing?

Honestly, they've barely come up. Out of the ones which *have*, it's
static_assert and nullptr. The others either never or once.

nullptr: https://bugs.gentoo.org/944747
static_assert: binutils, git iirc

It's fine with me if we wait-and-see. I'll keep an eye out for other
classes of C23 issues as well.

Thanks again for these

>
> Thanks
> Dave

cheers,
sam

[PATCH] RISC-V: Add missing dg-runtest to run the testcase under gcc.target/riscv/rvv/

2025-01-02 Thread Tsung Chun Lin




0001-RISC-V-Add-missing-dg-runtest-to-run-the-testcase-un.patch
Description: Binary data

Re: [WIP 3/8] algol68: front-end misc files

2025-01-02 Thread David Malcolm

On Wed, 2025-01-01 at 03:09 +0100, Jose E. Marchesi wrote:
> ---
>  gcc/algol68/Make-lang.in |  239 +
>  gcc/algol68/README   |  102 ++
>  gcc/algol68/a68-diagnostics.cc   |  450 +
>  gcc/algol68/a68-lang.cc  |  549 ++
>  gcc/algol68/a68-moids-diagnostics.cc |  271 +
>  gcc/algol68/a68-moids-misc.cc    | 1404
> ++
>  gcc/algol68/a68-moids-size.cc    |  339 +++
>  gcc/algol68/a68-moids-to-string.cc   |  375 +++
>  gcc/algol68/a68-postulates.cc    |  105 ++
>  gcc/algol68/a68-tree.def |   26 +
>  gcc/algol68/a68-types.h  |  980 ++
>  gcc/algol68/a68.h    |  650 
>  gcc/algol68/a68spec.cc   |  212 
>  gcc/algol68/algol68-target.def   |   52 +
>  gcc/algol68/config-lang.in   |   31 +
>  gcc/algol68/gac-internals.texi   |  351 +++
>  gcc/algol68/gac.texi |  292 ++
>  gcc/algol68/lang-specs.h |   26 +
>  gcc/algol68/lang.opt |   93 ++
>  gcc/algol68/lang.opt.urls    |   32 +
>  20 files changed, 6579 insertions(+)
>  create mode 100644 gcc/algol68/Make-lang.in
>  create mode 100644 gcc/algol68/README
>  create mode 100644 gcc/algol68/a68-diagnostics.cc
>  create mode 100644 gcc/algol68/a68-lang.cc
>  create mode 100644 gcc/algol68/a68-moids-diagnostics.cc
>  create mode 100644 gcc/algol68/a68-moids-misc.cc
>  create mode 100644 gcc/algol68/a68-moids-size.cc
>  create mode 100644 gcc/algol68/a68-moids-to-string.cc
>  create mode 100644 gcc/algol68/a68-postulates.cc
>  create mode 100644 gcc/algol68/a68-tree.def
>  create mode 100644 gcc/algol68/a68-types.h
>  create mode 100644 gcc/algol68/a68.h
>  create mode 100644 gcc/algol68/a68spec.cc
>  create mode 100644 gcc/algol68/algol68-target.def
>  create mode 100644 gcc/algol68/config-lang.in
>  create mode 100644 gcc/algol68/gac-internals.texi
>  create mode 100644 gcc/algol68/gac.texi
>  create mode 100644 gcc/algol68/lang-specs.h
>  create mode 100644 gcc/algol68/lang.opt
>  create mode 100644 gcc/algol68/lang.opt.urls
> 
> diff --git a/gcc/algol68/Make-lang.in b/gcc/algol68/Make-lang.in
> new file mode 100644
> index 000..294d39dd205
> --- /dev/null
> +++ b/gcc/algol68/Make-lang.in
> @@ -0,0 +1,239 @@
> +# Make-lang.in -- Top level -*- makefile -*- fragment for GCC ALGOL
> 68
> +# frontend.
> +
> +# Copyright (C) 2025 Free Software Foundation, Inc.
> +
> +# This file is NOT part of GCC.
> +
> +# GCC is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published
> by
> +# the Free Software Foundation; either version 3, or (at your
> option)
> +# any later version.
> +
> +# GCC is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.   See the
> +# GNU General Public License for more details.
> +
> +# You should have received a copy of the GNU General Public License
> +# along with GCC; see the file COPYING3.  If not see
> +# .
> +
> +# This file provides the language dependent support in the main
> Makefile.

The boilerplate in this file, and many others in the patch kit, has the
line "This file is NOT part of GCC."

Sorry if I'm missing something obvious here, but it certainly looks
like part of GCC to me, or, at least, it would be if the patch were
merged into our repository.

What is the intent of these lines?  Is there some kind of GCC vs not-
GCC separation intended here, and is there a high-level description of
where the line is drawn?

[...snip...]

Thanks
Dave

RE: [PATCH 2/2][libstdc++]: Adjust probabilities of hashmap loop conditions

2025-01-02 Thread Tamar Christina

Hi,

> It means that we consider that hasher is not perfect, we have several entries 
> in the same bucket. Shouldn't we reward those that are spending time on their 
> hasher to make it as perfect as possible ?

I don’t think it makes much of a difference for a perfect hashtable as you’re 
exiting on the first iteration anyway.  So taking the branch on iteration 0 
shouldn’t be an issue.

> Said differently is using 1 in the __builtin_expect changes the produced 
> figures a lot ?

I expect it to be slower since the entire loop is no longer in a single fetch 
block. But I’ll run the numbers with this change.

Thanks,
Tamar

From: François Dumont 
Sent: Monday, December 30, 2024 5:08 PM
To: Jonathan Wakely 
Cc: Tamar Christina ; gcc-patches@gcc.gnu.org; nd 
; libstd...@gcc.gnu.org
Subject: Re: [PATCH 2/2][libstdc++]: Adjust probabilities of hashmap loop 
conditions

Sorry to react so late on this patch.

I'm only surprised by the expected result of the added __builtin_expect which 
is 0.

It means that we consider that hasher is not perfect, we have several entries 
in the same bucket. Shouldn't we reward those that are spending time on their 
hasher to make it as perfect as possible ?

Said differently is using 1 in the __builtin_expect changes the produced 
figures a lot ?

François

On Wed, Dec 18, 2024 at 5:01 PM Jonathan Wakely 
mailto:jwak...@redhat.com>> wrote:
On Wed, 18 Dec 2024 at 14:14, Tamar Christina 
mailto:tamar.christ...@arm.com>> wrote:
>
> > e791e52ec329277474f3218d8a44cd37ded14ac3..8101d868d0c5f7ac4f97931a
> > > ffcf71d826c88094 100644
> > > > --- a/libstdc++-v3/include/bits/hashtable.h
> > > > +++ b/libstdc++-v3/include/bits/hashtable.h
> > > > @@ -2171,7 +2171,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > > >   if (this->_M_equals(__k, __code, *__p))
> > > > return __prev_p;
> > > >
> > > > - if (!__p->_M_nxt || _M_bucket_index(*__p->_M_next()) != __bkt)
> > > > + if (__builtin_expect (!__p->_M_nxt || _M_bucket_index(*__p-
> > >_M_next())
> > > != __bkt, 0))
> > > > break;
> > > >   __prev_p = __p;
> > > > }
> > > > @@ -2201,7 +2201,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > > > if (this->_M_equals_tr(__k, __code, *__p))
> > > >   return __prev_p;
> > > >
> > > > -   if (!__p->_M_nxt || _M_bucket_index(*__p->_M_next()) != 
> > > > __bkt)
> > > > +   if (__builtin_expect (!__p->_M_nxt || _M_bucket_index(*__p-
> > > >_M_next()) != __bkt, 0))
> > > >   break;
> > > > __prev_p = __p;
> > > >   }
> > > > @@ -2228,7 +2228,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > > >pointer_to(const_cast<__node_base&>(_M_before_begin));
> > > >   while (__loc._M_before->_M_nxt)
> > > > {
> > > > - if (this->_M_key_equals(__k, *__loc._M_node()))
> > > > + if (__builtin_expect (this->_M_key_equals(__k, 
> > > > *__loc._M_node()), 1))
> > > > return __loc;
> > > >   __loc._M_before = __loc._M_before->_M_nxt;
> > > > }
> > >
> > > The first two changes make sense to me: we're typically going to loop
> > > more than once when searching for an element, so the break is
> > > unlikely. The third change is a bit more surprising, because although
> > > it's likely we'll find an element *eventually* it is only likely once
> > > out of N iterations, not likely on every iteration. But if the numbers
> > > show it helps, then that just shows my intuition is probably wrong.
> > >
> > > If this is optimized for 'find' is it possible that this change
> > > pessimizes insertion (which also has to use _M_locate to find where to
> > > insert)?
> >
> > You're right there seems to be a uniform slowdown for insertions in small 
> > sized
> > hashtables of ~10 entries.  They're about 10-14% slower with that change.
> >
> > As expected just the ones on _M_find_before_node does not cause any issues.
> >
> > Since the insertions into small hashtables don't run long enough I've also 
> > increased
> > the number of iterations to ~1 million even out the score a bit more but 
> > there's still
> > a sizable 5 regressions.
> >
> > I've kicked off a longer run removing the change on _M_locate and with some
> > more
> > variable size finds/inserts.
> >
> > So far it looks like the additional benefits gotten on _M_locate were
> > mainly on two tests.  I'll look at the perf traces to figure out exactly 
> > why but I'll
> > respin
> > the patch without the _M_locate change and send it tomorrow morning once
> > benchmarks finish.
> >
>
> Hi,
>
> Here's a new patch and new numbers showing the improvements over the previous
> one and total improvement with the two together.  This change seems to be 
> mostly
> beneficial on larger sized hashmaps and otherwise no real losses on Insert or 
> Find.
> (around the region of ~1% which look like noise).
>
> which results in ~0-10% extra on top of the previous patch.
>
> In ta

[PATCH 2/2] b4-config: Add useful options

2025-01-02 Thread Jiaxun Yang

Add patchwork configuration, use check_GNU_style.py and git_email.py
to perform prepare and pre-apply checks, disable auto-to-cc preflight
checks as we don't have auto-to-cc script.

It helps with streamlining workflow with b4 so people can use
`b4 prep --check` to check patches before sending or applying
patches from the list.

ChangeLog:

* .b4-config: Add pw-url, pw-project, prep-perpatch-check-cmd,
prep-perpatch-check-cmd, prep-pre-flight-checks and
am-perpatch-check-cmd.
---
 .b4-config | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/.b4-config b/.b4-config
index 
eec2261deb6a7bec900af435937ae1d5d443c674..80e41eb1824aa62790041844113de04481e0e836
 100644
--- a/.b4-config
+++ b/.b4-config
@@ -1,4 +1,11 @@
 [b4]
 midmask = https://inbox.sourceware.org/%s
 linkmask = https://inbox.sourceware.org/%s
+pw-url = https://patchwork.sourceware.org/
+pw-project = gcc
 send-series-to = gcc-patches@gcc.gnu.org
+prep-pre-flight-checks = disable-needs-auto-to-cc
+prep-perpatch-check-cmd = ./contrib/check_GNU_style.py -
+prep-perpatch-check-cmd = ./contrib/gcc-changelog/git_email.py -q -v -
+am-perpatch-check-cmd = ./contrib/check_GNU_style.py -
+am-perpatch-check-cmd = ./contrib/gcc-changelog/git_email.py -q -v -

-- 
2.43.0

[PATCH 0/2] Improve b4 workflow

2025-01-02 Thread Jiaxun Yang

Hi all,

This series improved b4 working flow by wire up code style
and changelog checking scripts in b4's automation.

Please help with review and apply.

Thanks!

Signed-off-by: Jiaxun Yang 
---
Jiaxun Yang (2):
  contrib/gcc-changelog/git_email.py: Rework the script
  b4-config: Add useful options

 .b4-config |   7 +++
 contrib/gcc-changelog/git_email.py | 125 +
 2 files changed, 93 insertions(+), 39 deletions(-)
---
base-commit: 81d4707a00a2d74a9caf2d806e5b0ebe13e1247c
change-id: 20241231-b4-workflow-2ca134652c07

Best regards,
-- 
Jiaxun Yang

[PATCH 1/2] contrib/gcc-changelog/git_email.py: Rework the script

2025-01-02 Thread Jiaxun Yang

Rework the script to align parameters and output with
git_check_commit.py, also better cooperation with b4.

All changes to usage are backward compatible.

contrib/ChangeLog:

* gcc-changelog/git_email.py: (main) Convert to use
argparser; accept file from stdin; Add --verbose
and --print-changelog option aligned with git_check_commit.py;
Add --quite option to assist b4; massage output messages.
---
 contrib/gcc-changelog/git_email.py | 125 +
 1 file changed, 86 insertions(+), 39 deletions(-)

diff --git a/contrib/gcc-changelog/git_email.py 
b/contrib/gcc-changelog/git_email.py
index 
9d8e44e429def97ea7079faf432d581189f5aa06..f3ead51f0127321c0fce94b166f96658338a8f99
 100755
--- a/contrib/gcc-changelog/git_email.py
+++ b/contrib/gcc-changelog/git_email.py
@@ -21,6 +21,8 @@
 import os
 import re
 import sys
+import argparse
+import tempfile
 from itertools import takewhile
 
 from dateutil.parser import parse
@@ -92,53 +94,98 @@ class GitEmail(GitCommit):
 super().__init__(git_info, commit_to_info_hook=None)
 
 
-def show_help():
-print("""usage: git_email.py [--help] [patch file ...]
-
-Check git ChangeLog format of a patch
-
-With zero arguments, process every patch file in the
-./patches directory.
-With one argument, process the named patch file.
-
-Patch files must be in 'git format-patch' format.""")
-sys.exit(0)
-
-
 if __name__ == '__main__':
-if len(sys.argv) == 2 and (sys.argv[1] == '-h' or sys.argv[1] == '--help'):
-show_help()
-
-if len(sys.argv) == 1:
+parser = argparse.ArgumentParser(
+description=('Check git ChangeLog format of a patch.\n'
+ 'Patch files must be in \'git format-patch\' format.'),
+formatter_class=argparse.RawTextHelpFormatter
+)
+parser.add_argument(
+'files',
+nargs='*',
+help=('Patch files to process.\n'
+  'Use "-" to read from stdin.\n'
+  'If none provided, processes all files in ./patches directory')
+)
+parser.add_argument('-p', '--print-changelog', action='store_true',
+help='Print final changelog entires')
+parser.add_argument('-q', '--quiet', action='store_true',
+help='Don\'t print "OK" and summary messages')
+parser.add_argument('-v', '--verbose', action='store_true',
+help='Print verbose information')
+args = parser.parse_args()
+
+batch_mode = False
+tmp = None
+
+if not args.files:
+# Process all files in patches directory
 allfiles = []
 for root, _dirs, files in os.walk('patches'):
 for f in files:
 full = os.path.join(root, f)
 allfiles.append(full)
 
-success = 0
-for full in sorted(allfiles):
-email = GitEmail(full)
-print(email.filename)
-if email.success:
-success += 1
-print('  OK')
-for warning in email.warnings:
-print('  WARN: %s' % warning)
-else:
-for error in email.errors:
-print('  ERR: %s' % error)
-
-print()
-print('Successfully parsed: %d/%d' % (success, len(allfiles)))
+files_to_process = sorted(allfiles)
+batch_mode = True
 else:
-email = GitEmail(sys.argv[1])
+# Handle filelist or stdin
+if args.files[0] == '-':
+tmp = tempfile.NamedTemporaryFile(mode='w+', delete=False)
+tmp.write(sys.stdin.read())
+tmp.flush()
+tmp.close()
+files_to_process = [tmp.name]
+else:
+files_to_process = args.files
+
+success = 0
+fail = 0
+total = len(files_to_process)
+batch_mode = batch_mode or total > 1
+
+if total == 0:
+print('No files to process', file=sys.stderr)
+parser.print_help()
+sys.exit(1)
+
+for full in files_to_process:
+email = GitEmail(full)
+
+res = 'OK' if email.success else 'FAILED'
+have_message = not email.success or (email.warnings and args.verbose)
+if not args.quiet or have_message:
+filename = '-' if tmp else email.filename
+print('Checking %s: %s' % (filename, res))
+
 if email.success:
-print('OK')
-email.print_output()
-email.print_warnings()
+success += 1
+if args.verbose:
+for warning in email.warnings:
+print('WARN: %s' % warning)
+if args.print_changelog:
+email.print_output()
 else:
+fail += 1
 if not email.info.lines:
-print('Error: patch contains no parsed lines', file=sys.stderr)
-email.print_errors()
-sys.exit(1)
+print('ERR: patch contains no p

[PATCH 2/2] RISC-V:Add testcases for signed .SAT_ADD IMM form 1 with IMM = -1.

2025-01-02 Thread Li Xu

From: xuli 

This patch adds testcase for form1, as shown below:

T __attribute__((noinline))  \
sat_s_add_imm_##T##_fmt_1##_##INDEX (T x) \
{\
  T sum = (UT)x + (UT)IMM; \
  return (x ^ IMM) < 0 \
? sum\
: (sum ^ x) >= 0 \
  ? sum  \
  : x < 0 ? MIN : MAX;   \
}

Passed the rv64gcv regression test.

Signed-off-by: Li Xu 

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat/sat_s_add_imm-2.c: Move to...
* gcc.target/riscv/sat/sat_s_add_imm-1-i16.c: ...here.
* gcc.target/riscv/sat/sat_s_add_imm-3.c: Move to...
* gcc.target/riscv/sat/sat_s_add_imm-1-i32.c: ...here.
* gcc.target/riscv/sat/sat_s_add_imm-4.c: Move to...
* gcc.target/riscv/sat/sat_s_add_imm-1-i64.c: ...here.
* gcc.target/riscv/sat/sat_s_add_imm-1.c: Move to...
* gcc.target/riscv/sat/sat_s_add_imm-1-i8.c: ...here.
* gcc.target/riscv/sat/sat_s_add_imm-run-2.c: Move to...
* gcc.target/riscv/sat/sat_s_add_imm-run-1-i16.c: ...here.
* gcc.target/riscv/sat/sat_s_add_imm-run-3.c: Move to...
* gcc.target/riscv/sat/sat_s_add_imm-run-1-i32.c: ...here.
* gcc.target/riscv/sat/sat_s_add_imm-run-4.c: Move to...
* gcc.target/riscv/sat/sat_s_add_imm-run-1-i64.c: ...here.
* gcc.target/riscv/sat/sat_s_add_imm-run-1.c: Move to...
* gcc.target/riscv/sat/sat_s_add_imm-run-1-i8.c: ...here.
* gcc.target/riscv/sat/sat_s_add_imm-2-1.c: Move to...
* gcc.target/riscv/sat/sat_s_add_imm_type_check-1-i16.c: ...here.
* gcc.target/riscv/sat/sat_s_add_imm-3-1.c: Move to...
* gcc.target/riscv/sat/sat_s_add_imm_type_check-1-i32.c: ...here.
* gcc.target/riscv/sat/sat_s_add_imm-1-1.c: Move to...
* gcc.target/riscv/sat/sat_s_add_imm_type_check-1-i8.c: ...here.
---
 ...at_s_add_imm-2.c => sat_s_add_imm-1-i16.c} | 27 ++-
 ...at_s_add_imm-3.c => sat_s_add_imm-1-i32.c} | 26 +-
 ...at_s_add_imm-4.c => sat_s_add_imm-1-i64.c} | 22 ++-
 ...sat_s_add_imm-1.c => sat_s_add_imm-1-i8.c} | 22 ++-
 ..._imm-run-2.c => sat_s_add_imm-run-1-i16.c} |  6 +
 ..._imm-run-3.c => sat_s_add_imm-run-1-i32.c} |  6 +
 ..._imm-run-4.c => sat_s_add_imm-run-1-i64.c} |  6 +
 ...d_imm-run-1.c => sat_s_add_imm-run-1-i8.c} |  6 +
 ...2-1.c => sat_s_add_imm_type_check-1-i16.c} |  0
 ...3-1.c => sat_s_add_imm_type_check-1-i32.c} |  0
 ...-1-1.c => sat_s_add_imm_type_check-1-i8.c} |  0
 11 files changed, 117 insertions(+), 4 deletions(-)
 rename gcc/testsuite/gcc.target/riscv/sat/{sat_s_add_imm-2.c => 
sat_s_add_imm-1-i16.c} (53%)
 rename gcc/testsuite/gcc.target/riscv/sat/{sat_s_add_imm-3.c => 
sat_s_add_imm-1-i32.c} (53%)
 rename gcc/testsuite/gcc.target/riscv/sat/{sat_s_add_imm-4.c => 
sat_s_add_imm-1-i64.c} (55%)
 rename gcc/testsuite/gcc.target/riscv/sat/{sat_s_add_imm-1.c => 
sat_s_add_imm-1-i8.c} (57%)
 rename gcc/testsuite/gcc.target/riscv/sat/{sat_s_add_imm-run-2.c => 
sat_s_add_imm-run-1-i16.c} (84%)
 rename gcc/testsuite/gcc.target/riscv/sat/{sat_s_add_imm-run-3.c => 
sat_s_add_imm-run-1-i32.c} (84%)
 rename gcc/testsuite/gcc.target/riscv/sat/{sat_s_add_imm-run-4.c => 
sat_s_add_imm-run-1-i64.c} (84%)
 rename gcc/testsuite/gcc.target/riscv/sat/{sat_s_add_imm-run-1.c => 
sat_s_add_imm-run-1-i8.c} (84%)
 rename gcc/testsuite/gcc.target/riscv/sat/{sat_s_add_imm-2-1.c => 
sat_s_add_imm_type_check-1-i16.c} (100%)
 rename gcc/testsuite/gcc.target/riscv/sat/{sat_s_add_imm-3-1.c => 
sat_s_add_imm_type_check-1-i32.c} (100%)
 rename gcc/testsuite/gcc.target/riscv/sat/{sat_s_add_imm-1-1.c => 
sat_s_add_imm_type_check-1-i8.c} (100%)

diff --git a/gcc/testsuite/gcc.target/riscv/sat/sat_s_add_imm-2.c 
b/gcc/testsuite/gcc.target/riscv/sat/sat_s_add_imm-1-i16.c
similarity index 53%
rename from gcc/testsuite/gcc.target/riscv/sat/sat_s_add_imm-2.c
rename to gcc/testsuite/gcc.target/riscv/sat/sat_s_add_imm-1-i16.c
index 3878286d207..2e23af5d86b 100644
--- a/gcc/testsuite/gcc.target/riscv/sat/sat_s_add_imm-2.c
+++ b/gcc/testsuite/gcc.target/riscv/sat/sat_s_add_imm-1-i16.c
@@ -29,4 +29,29 @@
 */
 DEF_SAT_S_ADD_IMM_FMT_1(0, int16_t, uint16_t, -7, INT16_MIN, INT16_MAX)
 
-/* { dg-final { scan-tree-dump-times ".SAT_ADD " 1 "optimized" } } */
+/*
+** sat_s_add_imm_int16_t_fmt_1_1:
+** addi\s+[atx][0-9]+,\s*a0,\s*-1
+** not\s+[atx][0-9]+,\s*a0
+** xor\s+[atx][0-9]+,\s*a0,\s*[atx][0-9]+
+** srli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*15
+** srli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*15
+** xori\s+[atx][0-9]+,\s*[atx][0-9]+,\s*1
+** and\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** andi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*1
+** srai\s+a0,\s*a0,\s*63
+** li\s+[atx][0-9]+,\s*32768
+** addi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*-1
+** xor\

[PATCH 1/2] Match:Support IMM=-1 for signed scalar SAT_ADD IMM form1

2025-01-02 Thread Li Xu

From: xuli 

This patch would like to support .SAT_ADD when IMM=-1.

Form1:
T __attribute__((noinline))  \
sat_s_add_imm_##T##_fmt_1##_##INDEX (T x) \
{\
  T sum = (UT)x + (UT)IMM; \
  return (x ^ IMM) < 0 \
? sum\
: (sum ^ x) >= 0 \
  ? sum  \
  : x < 0 ? MIN : MAX;   \
}

Take below form1 as example:
DEF_SAT_S_ADD_IMM_FMT_1(0, int8_t, uint8_t, -1, INT8_MIN, INT8_MAX)

Before this patch:
__attribute__((noinline))
int8_t sat_s_add_imm_int8_t_fmt_1_0 (int8_t x)
{
  unsigned char x.0_1;
  unsigned char _2;
  unsigned char _3;
  int8_t iftmp.1_4;
  signed char _8;
  unsigned char _9;
  signed char _10;

   [local count: 1073741824]:
  x.0_1 = (unsigned char) x_5(D);
  _3 = -x.0_1;
  _10 = (signed char) _3;
  _8 = x_5(D) & _10;
  if (_8 < 0)
goto ; [1.40%]
  else
goto ; [98.60%]

   [local count: 434070867]:
  _2 = x.0_1 + 255;

   [local count: 1073741824]:
  # _9 = PHI <_2(3), 128(2)>
  iftmp.1_4 = (int8_t) _9;
  return iftmp.1_4;

}

After this patch:
__attribute__((noinline))
int8_t sat_s_add_imm_int8_t_fmt_1_0 (int8_t x)
{
  int8_t _4;

   [local count: 1073741824]:
  gimple_call <.SAT_ADD, _4, x_5(D), 255> [tail call]
  gimple_return <_4>

}

The below test suites are passed for this patch:
1. The rv64gcv fully regression tests.
2. The x86 bootstrap tests.
3. The x86 fully regression tests.

Signed-off-by: Li Xu 

gcc/ChangeLog:

* match.pd: Add signed scalar SAT_ADD IMM form1 with IMM=-1 matching.
* tree-ssa-math-opts.cc (match_unsigned_saturation_add): Adapt function 
name.
(match_saturation_add_with_assign): Match signed and unsigned SAT_ADD 
with assign.
(math_opts_dom_walker::after_dom_children): Match imm=-1 signed SAT_ADD 
with NOP_EXPR case.

---
 gcc/match.pd  | 19 ++-
 gcc/tree-ssa-math-opts.cc | 30 +-
 2 files changed, 43 insertions(+), 6 deletions(-)

diff --git a/gcc/match.pd b/gcc/match.pd
index 98411af3940..a07dbb808d2 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3403,7 +3403,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
(bit_xor:c @0 INTEGER_CST@3)) integer_zerop)
 (signed_integer_sat_val @0)
 @2)
-  (if (wi::bit_and (wi::to_wide (@1), wi::to_wide (@3)) == 0
+  (if (wi::bit_and (wi::to_wide (@1), wi::to_wide (@3)) == 0)))
+
+(match (signed_integer_sat_add @0 @1)
+  /* T SUM = (T)((UT)X + (UT)-1);
+ SAT_S_ADD = (X ^ -1) < 0 ? SUM : (X ^ SUM) >= 0 ? SUM
+ : (x < 0) ? MIN : MAX  */
+  (convert (cond^ (lt (bit_and:c @0 (nop_convert (negate (nop_convert @0
+ integer_zerop)
+INTEGER_CST@2
+(plus (nop_convert @0) integer_all_onesp@1)))
+   (with
+{
+ unsigned precision = TYPE_PRECISION (type);
+ wide_int c1 = wi::to_wide (@1);
+ wide_int c2 = wi::to_wide (@2);
+ wide_int sum = wi::add (c1, c2);
+}
+(if (wi::eq_p (sum, wi::max_value (precision, SIGNED)))
 
 /* Saturation sub for signed integer.  */
 (if (INTEGRAL_TYPE_P (type) && !TYPE_UNSIGNED (type))
diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
index 292eb852f2d..f6a1bea2002 100644
--- a/gcc/tree-ssa-math-opts.cc
+++ b/gcc/tree-ssa-math-opts.cc
@@ -4064,15 +4064,34 @@ build_saturation_binary_arith_call_and_insert 
(gimple_stmt_iterator *gsi,
  *   _10 = -_9;
  *   _12 = _7 | _10;
  *   =>
- *   _12 = .SAT_ADD (_4, _6);  */
+ *   _12 = .SAT_ADD (_4, _6);
+ *
+ * Try to match IMM=-1 saturation signed add with assign.
+ *  [local count: 1073741824]:
+ * x.0_1 = (unsigned char) x_5(D);
+ * _3 = -x.0_1;
+ * _10 = (signed char) _3;
+ * _8 = x_5(D) & _10;
+ * if (_8 < 0)
+ *   goto ; [1.40%]
+ * else
+ *   goto ; [98.60%]
+ *  [local count: 434070867]:
+ * _2 = x.0_1 + 255;
+ *  [local count: 1073741824]:
+ * # _9 = PHI <_2(3), 128(2)>
+ * _4 = (int8_t) _9;
+ *   =>
+ * _4 = .SAT_ADD (x_5, -1); */
 
 static void
-match_unsigned_saturation_add (gimple_stmt_iterator *gsi, gassign *stmt)
+match_saturation_add_with_assign (gimple_stmt_iterator *gsi, gassign *stmt)
 {
   tree ops[2];
   tree lhs = gimple_assign_lhs (stmt);
 
-  if (gimple_unsigned_integer_sat_add (lhs, ops, NULL))
+  if (gimple_unsigned_integer_sat_add (lhs, ops, NULL)
+  || gimple_signed_integer_sat_add (lhs, ops, NULL))
 build_saturation_binary_arith_call_and_replace (gsi, IFN_SAT_ADD, lhs,
ops[0], ops[1]);
 }
@@ -6363,7 +6382,7 @@ math_opts_dom_walker::after_dom_children (basic_block bb)
  break;
 
case PLUS_EXPR:
- match_unsigned_saturation_add (&gsi, as_a (stmt));
+ match_saturation_add_with_assign (&gsi, as_a (stmt));
  match_unsigned_saturation_sub (&g

Re: [PATCH] RISC-V: Add testcases for unsigned imm vec SAT_SUB form2~4

2025-01-02 Thread 钟居哲

LGTM.

juzhe.zh...@rivai.ai

From: Li Xu
Date: 2025-01-02 16:02
To: gcc-patches
CC: kito.cheng; palmer; juzhe.zhong; pan2.li; xuli
Subject: [PATCH] RISC-V: Add testcases for unsigned imm vec SAT_SUB form2~4
From: xuli 

Form2:
void __attribute__((noinline)) \
vec_sat_u_sub_imm##IMM##_##T##_fmt_2 (T *out, T *in, unsigned limit)  \
{   \
  unsigned i;   \
  for (i = 0; i < limit; i++)   \
out[i] = in[i] >= (T)IMM ? in[i] - (T)IMM : 0;  \
}

Form3:
void __attribute__((noinline)) \
vec_sat_u_sub_imm##IMM##_##T##_fmt_3 (T *out, T *in, unsigned limit)  \
{   \
  unsigned i;   \
  for (i = 0; i < limit; i++)   \
out[i] = (T)IMM > in[i] ? (T)IMM - in[i] : 0;   \
}

Form4:
void __attribute__((noinline)) \
vec_sat_u_sub_imm##IMM##_##T##_fmt_4 (T *out, T *in, unsigned limit)  \
{   \
  unsigned i;   \
  for (i = 0; i < limit; i++)   \
out[i] = in[i] > (T)IMM ? in[i] - (T)IMM : 0;   \
}

Passed the rv64gcv full regression test.

Signed-off-by: Li Xu 

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/sat/vec_sat_arith.h: add unsigned imm vec 
sat_sub form2~4.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_data.h: add data for vec sat_sub.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-2-u16.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-2-u32.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-2-u64.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-2-u8.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-3-u16.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-3-u32.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-3-u64.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-3-u8.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-4-u16.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-4-u32.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-4-u64.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-4-u8.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-run-2-u16.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-run-2-u32.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-run-2-u64.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-run-2-u8.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-run-3-u16.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-run-3-u32.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-run-3-u64.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-run-3-u8.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-run-4-u16.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-run-4-u32.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-run-4-u64.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-run-4-u8.c: New test.

---
.../riscv/rvv/autovec/sat/vec_sat_arith.h |  54 
.../riscv/rvv/autovec/sat/vec_sat_data.h  | 256 --
.../rvv/autovec/sat/vec_sat_u_sub_imm-2-u16.c |   9 +
.../rvv/autovec/sat/vec_sat_u_sub_imm-2-u32.c |   9 +
.../rvv/autovec/sat/vec_sat_u_sub_imm-2-u64.c |   9 +
.../rvv/autovec/sat/vec_sat_u_sub_imm-2-u8.c  |   9 +
.../rvv/autovec/sat/vec_sat_u_sub_imm-3-u16.c |   9 +
.../rvv/autovec/sat/vec_sat_u_sub_imm-3-u32.c |   9 +
.../rvv/autovec/sat/vec_sat_u_sub_imm-3-u64.c |   9 +
.../rvv/autovec/sat/vec_sat_u_sub_imm-3-u8.c  |   9 +
.../rvv/autovec/sat/vec_sat_u_sub_imm-4-u16.c |   9 +
.../rvv/autovec/sat/vec_sat_u_sub_imm-4-u32.c |   9 +
.../rvv/autovec/sat/vec_sat_u_sub_imm-4-u64.c |   9 +
.../rvv/autovec/sat/vec_sat_u_sub_imm-4-u8.c  |   9 +
.../autovec/sat/vec_sat_u_sub_imm-run-2-u16.c |  28 ++
.../autovec/sat/vec_sat_u_sub_imm-run-2-u32.c |  28 ++
.../autovec/sat/vec_sat_u_sub_imm-run-2-u64.c |  28 ++
.../autovec/sat/vec_sat_u_sub_imm-run-2-u8.c  |  28 ++
.../autovec/sat/vec_sat_u_sub_imm-run-3-u16.c |  28 ++
.../autovec/sat/vec_sat_u_sub_imm-run-3-u32.c |  28 ++
.../autovec/sat/vec_sat_u_sub_imm-run-3-u64.c |  28 ++
.../autovec/sat/vec_sat_u_sub_imm-run-3-u8.c  |  28 ++
.../autovec/sat/vec_sat_u_sub_imm-run-4-u16.c |  28 ++
.../autovec/sat/vec_sat_u_sub_imm-run-4-u32.c |  28 ++
.../autovec/sat/vec_sat_u_sub_imm-run-4-u64.c |  28 ++
.../autovec/sat/vec_sat_u_sub_imm-run-4-u8.c  |  28 ++
26 files changed, 738 insertions(+), 16 deletions(-)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-2-u16.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-2-u32.c
create mode 100644 
gcc/testsu

Re: [PATCH] Prefer scalar_int_mode if the size - 1 is equal to UNITS_PER_WORD.

2025-01-02 Thread Tsung Chun Lin

Add CC patchworks...@rivosinc.com

Thanks,
Jim

Robin Dapp  於 2025年1月2日 週四 下午3:56寫道：
>
> > Add an extra test case that we do not create a vector store but a scalar 
> > store.
>
> LGTM.  I haven't seen the patch in the patchworks list or the CI.  Would you
> mind sending the same version again CC'ing patchworks...@rivosinc.com so we're
> sure?  Or did you do that already at some point and I missed it?
>
> --
> Regards
>  Robin
>


0001-Prefer-scalar_int_mode-if-the-size-of-QI-vector-is-e.patch
Description: Binary data

[PATCH] RISC-V: Add testcases for unsigned imm vec SAT_SUB form2~4

2025-01-02 Thread Li Xu

From: xuli 

Form2:
void __attribute__((noinline)) \
vec_sat_u_sub_imm##IMM##_##T##_fmt_2 (T *out, T *in, unsigned limit)  \
{   \
  unsigned i;   \
  for (i = 0; i < limit; i++)   \
out[i] = in[i] >= (T)IMM ? in[i] - (T)IMM : 0;  \
}

Form3:
void __attribute__((noinline)) \
vec_sat_u_sub_imm##IMM##_##T##_fmt_3 (T *out, T *in, unsigned limit)  \
{   \
  unsigned i;   \
  for (i = 0; i < limit; i++)   \
out[i] = (T)IMM > in[i] ? (T)IMM - in[i] : 0;   \
}

Form4:
void __attribute__((noinline)) \
vec_sat_u_sub_imm##IMM##_##T##_fmt_4 (T *out, T *in, unsigned limit)  \
{   \
  unsigned i;   \
  for (i = 0; i < limit; i++)   \
out[i] = in[i] > (T)IMM ? in[i] - (T)IMM : 0;   \
}

Passed the rv64gcv full regression test.

Signed-off-by: Li Xu 

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/sat/vec_sat_arith.h: add unsigned imm 
vec sat_sub form2~4.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_data.h: add data for vec 
sat_sub.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-2-u16.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-2-u32.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-2-u64.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-2-u8.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-3-u16.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-3-u32.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-3-u64.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-3-u8.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-4-u16.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-4-u32.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-4-u64.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-4-u8.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-run-2-u16.c: New 
test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-run-2-u32.c: New 
test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-run-2-u64.c: New 
test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-run-2-u8.c: New 
test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-run-3-u16.c: New 
test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-run-3-u32.c: New 
test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-run-3-u64.c: New 
test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-run-3-u8.c: New 
test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-run-4-u16.c: New 
test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-run-4-u32.c: New 
test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-run-4-u64.c: New 
test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-run-4-u8.c: New 
test.

---
 .../riscv/rvv/autovec/sat/vec_sat_arith.h |  54 
 .../riscv/rvv/autovec/sat/vec_sat_data.h  | 256 --
 .../rvv/autovec/sat/vec_sat_u_sub_imm-2-u16.c |   9 +
 .../rvv/autovec/sat/vec_sat_u_sub_imm-2-u32.c |   9 +
 .../rvv/autovec/sat/vec_sat_u_sub_imm-2-u64.c |   9 +
 .../rvv/autovec/sat/vec_sat_u_sub_imm-2-u8.c  |   9 +
 .../rvv/autovec/sat/vec_sat_u_sub_imm-3-u16.c |   9 +
 .../rvv/autovec/sat/vec_sat_u_sub_imm-3-u32.c |   9 +
 .../rvv/autovec/sat/vec_sat_u_sub_imm-3-u64.c |   9 +
 .../rvv/autovec/sat/vec_sat_u_sub_imm-3-u8.c  |   9 +
 .../rvv/autovec/sat/vec_sat_u_sub_imm-4-u16.c |   9 +
 .../rvv/autovec/sat/vec_sat_u_sub_imm-4-u32.c |   9 +
 .../rvv/autovec/sat/vec_sat_u_sub_imm-4-u64.c |   9 +
 .../rvv/autovec/sat/vec_sat_u_sub_imm-4-u8.c  |   9 +
 .../autovec/sat/vec_sat_u_sub_imm-run-2-u16.c |  28 ++
 .../autovec/sat/vec_sat_u_sub_imm-run-2-u32.c |  28 ++
 .../autovec/sat/vec_sat_u_sub_imm-run-2-u64.c |  28 ++
 .../autovec/sat/vec_sat_u_sub_imm-run-2-u8.c  |  28 ++
 .../autovec/sat/vec_sat_u_sub_imm-run-3-u16.c |  28 ++
 .../autovec/sat/vec_sat_u_sub_imm-run-3-u32.c |  28 ++
 .../autovec/sat/vec_sat_u_sub_imm-run-3-u64.c |  28 ++
 .../autovec/sat/vec_sat_u_sub_imm-run-3-u8.c  |  28 ++
 .../autovec/sat/vec_sat_u_sub_imm-run-4-u16.c |  28 ++
 .../autovec/sat/vec_sat_u_sub_imm-run-4-u32.c |  28 ++
 .../autovec/sat/vec_sat_u_sub_imm-run-4-u64.c |  28 ++
 .../autovec/sat/vec_sat_u_sub_imm-run-4-u8.c  |  28 ++
 26 files changed, 738 insertions(+), 16 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-2-u16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-2-u32.

[PATCH 2/2] RISC-V: Add testcases for signed vector SAT_ADD IMM form 1

2025-01-02 Thread Li Xu

From: xuli 

This patch adds testcase for form1, as shown below:

void __attribute__((noinline))   \
vec_sat_s_add_imm_##T##_fmt_1##_##INDEX (T *out, T *op_1, unsigned limit) \
{\
  unsigned i;\
  for (i = 0; i < limit; i++)\
{\
  T x = op_1[i]; \
  T sum = (UT)x + (UT)IMM;   \
  out[i] = (x ^ IMM) < 0 \
? sum\
: (sum ^ x) >= 0 \
  ? sum  \
  : x < 0 ? MIN : MAX;   \
}\
}

Passed the rv64gcv regression test.

Signed-off-by: Li Xu 
gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/sat/vec_sat_arith.h: add signed vec 
SAT_ADD IMM form1.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_data.h: add sat_s_add_imm 
data.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add_imm-1-i16.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add_imm-1-i32.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add_imm-1-i64.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add_imm-1-i8.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add_imm-run-1-i16.c: New 
test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add_imm-run-1-i32.c: New 
test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add_imm-run-1-i64.c: New 
test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add_imm-run-1-i8.c: New 
test.
* 
gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add_imm_type_check-1-i16.c: New test.
* 
gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add_imm_type_check-1-i32.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add_imm_type_check-1-i8.c: 
New test.

---
 .../riscv/rvv/autovec/sat/vec_sat_arith.h |  25 ++
 .../riscv/rvv/autovec/sat/vec_sat_data.h  | 240 ++
 .../rvv/autovec/sat/vec_sat_s_add_imm-1-i16.c |  10 +
 .../rvv/autovec/sat/vec_sat_s_add_imm-1-i32.c |  10 +
 .../rvv/autovec/sat/vec_sat_s_add_imm-1-i64.c |  10 +
 .../rvv/autovec/sat/vec_sat_s_add_imm-1-i8.c  |  10 +
 .../autovec/sat/vec_sat_s_add_imm-run-1-i16.c |  28 ++
 .../autovec/sat/vec_sat_s_add_imm-run-1-i32.c |  28 ++
 .../autovec/sat/vec_sat_s_add_imm-run-1-i64.c |  28 ++
 .../autovec/sat/vec_sat_s_add_imm-run-1-i8.c  |  28 ++
 .../sat/vec_sat_s_add_imm_type_check-1-i16.c  |   9 +
 .../sat/vec_sat_s_add_imm_type_check-1-i32.c  |   9 +
 .../sat/vec_sat_s_add_imm_type_check-1-i8.c   |  10 +
 13 files changed, 445 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add_imm-1-i16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add_imm-1-i32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add_imm-1-i64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add_imm-1-i8.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add_imm-run-1-i16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add_imm-run-1-i32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add_imm-run-1-i64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add_imm-run-1-i8.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add_imm_type_check-1-i16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add_imm_type_check-1-i32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_s_add_imm_type_check-1-i8.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_arith.h
index 7db892cc2e9..ffdccd79b7a 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_arith.h
@@ -314,6 +314,31 @@ vec_sat_s_add_##T##_fmt_4 (T *out, T *op_1, T *op_2, 
unsigned limit) \
 #define RUN_VEC_SAT_S_ADD_FMT_4_WRAP(T, out, op_1, op_2, N) \
   RUN_VEC_SAT_S_ADD_FMT_4(T, out, op_1, op_2, N)
 
+#define DEF_VEC_SAT_S_ADD_IMM_FMT_1(INDEX, T, UT, IMM, MIN, MAX) \
+void __attribute__((noinline))   \
+vec_sat_s_add_imm_##T##_fmt_1##_##INDEX (T *out, T *op_1, unsigned limit) \
+{\
+  unsigned i;

[PATCH 1/2] Match:Support signed vector SAT_ADD IMM form 1

2025-01-02 Thread Li Xu

From: xuli 

This patch would like to support vector SAT_ADD when one of the op
is singed IMM.

void __attribute__((noinline))   \
vec_sat_s_add_imm_##T##_fmt_1##_##INDEX (T *out, T *op_1, unsigned limit) \
{\
  unsigned i;\
  for (i = 0; i < limit; i++)\
{\
  T x = op_1[i]; \
  T sum = (UT)x + (UT)IMM;   \
  out[i] = (x ^ IMM) < 0 \
? sum\
: (sum ^ x) >= 0 \
  ? sum  \
  : x < 0 ? MIN : MAX;   \
}\
}

Take below form1 as example:
DEF_VEC_SAT_S_ADD_IMM_FMT_1(0, int8_t, uint8_t, 9, INT8_MIN, INT8_MAX)

Before this patch:
__attribute__((noinline))
void vec_sat_s_add_imm_int8_t_fmt_1_0 (int8_t * restrict out, int8_t * restrict 
op_1, unsigned int limit)
{
  vector([16,16]) signed char * vectp_out.28;
  vector([16,16]) signed char vect_iftmp.27;
  vector([16,16])  mask__28.26;
  vector([16,16])  mask__29.25;
  vector([16,16])  mask__19.19;
  vector([16,16])  mask__31.18;
  vector([16,16]) signed char vect__6.17;
  vector([16,16]) signed char vect__5.16;
  vector([16,16]) signed char vect_sum_15.15;
  vector([16,16]) unsigned char vect__4.14;
  vector([16,16]) unsigned char vect_x.13;
  vector([16,16]) signed char vect_x_14.12;
  vector([16,16]) signed char * vectp_op_1.10;
  vector([16,16])  _78;
  vector([16,16]) unsigned char _79;
  vector([16,16]) unsigned char _80;
  unsigned long _92;
  unsigned long ivtmp_93;
  unsigned long ivtmp_94;
  unsigned long _95;

   [local count: 118111598]:
  if (limit_12(D) != 0)
goto ; [89.00%]
  else
goto ; [11.00%]

   [local count: 105119322]:
  _92 = (unsigned long) limit_12(D);

   [local count: 955630226]:
  # vectp_op_1.10_62 = PHI 
  # vectp_out.28_89 = PHI 
  # ivtmp_93 = PHI 
  _95 = .SELECT_VL (ivtmp_93, POLY_INT_CST [16, 16]);
  vect_x_14.12_64 = .MASK_LEN_LOAD (vectp_op_1.10_62, 8B, { -1, ... }, _95, 0);
  vect_x.13_65 = VIEW_CONVERT_EXPR(vect_x_14.12_64);
  vect__4.14_67 = vect_x.13_65 + { 9, ... };
  vect_sum_15.15_68 = VIEW_CONVERT_EXPR(vect__4.14_67);
  vect__5.16_70 = vect_x_14.12_64 ^ { 9, ... };
  vect__6.17_71 = vect_x_14.12_64 ^ vect_sum_15.15_68;
  mask__31.18_73 = vect__5.16_70 >= { 0, ... };
  mask__19.19_75 = vect_x_14.12_64 < { 0, ... };
  mask__29.25_85 = vect__6.17_71 < { 0, ... };
  mask__28.26_86 = mask__31.18_73 & mask__29.25_85;
  _78 = ~mask__28.26_86;
  _79 = .VCOND_MASK (mask__19.19_75, { 128, ... }, { 127, ... });
  _80 = .COND_ADD (_78, vect_x.13_65, { 9, ... }, _79);
  vect_iftmp.27_87 = VIEW_CONVERT_EXPR(_80);
  .MASK_LEN_STORE (vectp_out.28_89, 8B, { -1, ... }, _95, 0, vect_iftmp.27_87);
  vectp_op_1.10_63 = vectp_op_1.10_62 + _95;
  vectp_out.28_90 = vectp_out.28_89 + _95;
  ivtmp_94 = ivtmp_93 - _95;
  if (ivtmp_94 != 0)
goto ; [89.00%]
  else
goto ; [11.00%]

   [local count: 118111600]:
  return;

}

After this patch:
__attribute__((noinline))
void vec_sat_s_add_imm_int8_t_fmt_1_0 (int8_t * restrict out, int8_t * restrict 
op_1, unsigned int limit)
{
  vector([16,16]) signed char * vectp_out.12;
  vector([16,16]) signed char vect_patt_10.11;
  vector([16,16]) signed char vect_x_14.10;
  vector([16,16]) signed char D.2852;
  vector([16,16]) signed char * vectp_op_1.8;
  vector([16,16]) signed char _73(D);
  unsigned long _80;
  unsigned long ivtmp_81;
  unsigned long ivtmp_82;
  unsigned long _83;

   [local count: 118111598]:
  if (limit_12(D) != 0)
goto ; [89.00%]
  else
goto ; [11.00%]

   [local count: 105119322]:
  _80 = (unsigned long) limit_12(D);

   [local count: 955630226]:
  # vectp_op_1.8_71 = PHI 
  # vectp_out.12_77 = PHI 
  # ivtmp_81 = PHI 
  _83 = .SELECT_VL (ivtmp_81, POLY_INT_CST [16, 16]);
  vect_x_14.10_74 = .MASK_LEN_LOAD (vectp_op_1.8_71, 8B, { -1, ... }, _73(D), 
_83, 0);
  vect_patt_10.11_75 = .SAT_ADD (vect_x_14.10_74, { 9, ... });
  .MASK_LEN_STORE (vectp_out.12_77, 8B, { -1, ... }, _83, 0, 
vect_patt_10.11_75);
  vectp_op_1.8_72 = vectp_op_1.8_71 + _83;
  vectp_out.12_78 = vectp_out.12_77 + _83;
  ivtmp_82 = ivtmp_81 - _83;
  if (ivtmp_82 != 0)
goto ; [89.00%]
  else
goto ; [11.00%]

   [local count: 118111600]:
  return;

}

The below test suites are passed for this patch:
1. The rv64gcv fully regression tests.
2. The x86 bootstrap tests.
3. The x86 fully regression tests.

Signed-off-by: Li Xu 
gcc/ChangeLog:

* match.pd: add singned vector SAT_ADD IMM form1 matching.

---
 gcc/match.

[Patch] [GCN] install.texi: Refer to Newlib 4.5.0 instead to certain git commits

2025-01-02 Thread Tobias Burnus


Hi all and happy new year,

I think it makes sense to refer to Newlib 4.5.0 instead of referring
to two specific git commits.

Comments before I commit it?


Actually, due to GCC 15's -std=gnu23, 4.5.0 is recommended
in general as it features:

  - fixes for building with gcc-15

besides

  - fixes for building with gcc-15
  - fixes to powf
  ...

cf. https://sourceware.org/git/?p=newlib-cygwin.git;a=blob;f=newlib/NEWS;hb=HEAD

See also the announcement athttps://sourceware.org/pipermail/newlib/2025/021431.html and for manual 
downloads https://sourceware.org/ftp/newlib/index.html Tobias
[GCN] install.texi: Refer to Newlib 4.5.0 instead to certain git commits

gcc/ChangeLog:

	* doc/install.texi (amdgcn-x-amdhsa): Refer to Newlib 4.5.0 for
	the I/O locking fixes.

diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index fcecfb69c0c..db40b8cede2 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -4000,9 +4000,8 @@ Instead of GNU Binutils, you will need to install LLVM 15, or later, and copy
 by specifying a @code{--with-multilib-list=} that does not list @code{gfx1100}
 and @code{gfx1103}.
 
-Use Newlib (4.3.0 or newer; 4.4.0 contains some improvements and git commits
-7dd4eb1db and ed50a50b9 (2024-04-04, post-4.4.0) fix device console output
-for GFX10 and GFX11 devices).
+Use Newlib (4.3.0 or newer; 4.4.0 contains some improvements and 4.5.0 fixes
+the device console output for GFX10 and GFX11 devices).
 
 To run the binaries, install the HSA Runtime from the
 @uref{https://rocm.docs.amd.com/,,ROCm Platform}, and use

[PATCH] testsuite: torture: add LLVM testcase for DSE vs. -ftrivial-auto-var-init=

2025-01-02 Thread Sam James

This testcase came up in a recent LLVM bug report [0] for DSE vs
-ftrivial-auto-var-init=. Add it to our testsuite given that area
could do with better coverage.

[0] https://github.com/llvm/llvm-project/issues/119646

gcc/testsuite/ChangeLog:

* gcc.dg/torture/dse-trivial-auto-var-init.c: New test.

Co-authored-by: Andrew Pinski 
---
OK?

 .../gcc.dg/torture/dse-trivial-auto-var-init.c  | 17 +
 1 file changed, 17 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/torture/dse-trivial-auto-var-init.c

diff --git a/gcc/testsuite/gcc.dg/torture/dse-trivial-auto-var-init.c 
b/gcc/testsuite/gcc.dg/torture/dse-trivial-auto-var-init.c
new file mode 100644
index ..5a3d4c4e3ecb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/dse-trivial-auto-var-init.c
@@ -0,0 +1,17 @@
+/* Testcase for LLVM bug: https://github.com/llvm/llvm-project/issues/119646 */
+/* { dg-do run } */
+/* { dg-additional-options "-ftrivial-auto-var-init=zero" } */
+
+int b = 208;
+[[gnu::noinline]]
+void f(int *e, int a) {
+  *e = !!b;
+  if (a)
+__builtin_trap();
+}
+int main(void) {
+  b = 0;
+  f(&b, 0);
+  if (b != 0)
+__builtin_trap();
+}

base-commit: 99d5ef700619c28904846399a6f6692af4c56b1b
-- 
2.47.1

[PATCH] LoongArch: combine related slli operations

2025-01-02 Thread Zhou Zhao

If SImode reg is continuous left shifted twice, combine related
instruction to one.

gcc/ChangeLog:

* config/loongarch/loongarch.md (extsv_ashlsi3):
  New template

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/slli-1.c: New test.
---
 gcc/config/loongarch/loongarch.md   | 13 +
 gcc/testsuite/gcc.target/loongarch/slli-1.c | 10 ++
 2 files changed, 23 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/slli-1.c

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 19a22a93de3..1c62e5c02a1 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -3048,6 +3048,19 @@ (define_insn "si3_extend"
   [(set_attr "type" "shift")
(set_attr "mode" "SI")])
 
+(define_insn "extsv_ashlsi3"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+   (ashift:DI
+  (sign_extract:DI (match_operand:DI 1 "register_operand" "r")
+   (match_operand:SI 2 "const_uimm5_operand")
+   (const_int 0))
+  (match_operand:SI 3 "const_uimm5_operand")))]
+  "TARGET_64BIT
+&&(INTVAL (operands[2]) + INTVAL (operands[3])) == 0x20"
+  "slli.w\t%0,%1,%3"
+  [(set_attr "type" "shift")
+   (set_attr "mode" "SI")])
+
 (define_insn "*rotr3"
   [(set (match_operand:GPR 0 "register_operand" "=r,r")
(rotatert:GPR (match_operand:GPR 1 "register_operand" "r,r")
diff --git a/gcc/testsuite/gcc.target/loongarch/slli-1.c 
b/gcc/testsuite/gcc.target/loongarch/slli-1.c
new file mode 100644
index 000..891d6457b12
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/slli-1.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+int
+foo (int x)
+{
+  return (x << 2) * 8;
+}
+
+/* { dg-final { scan-assembler-times "slli\.\[dw\]" 1} } */
-- 
2.20.1

Re: [PATCH] forwprop: Handle RAW_DATA_CST in check_ctz_array

2025-01-02 Thread Richard Biener




> Am 02.01.2025 um 10:36 schrieb Jakub Jelinek :
> 
> Hi!
> 
> In order to stress test RAW_DATA_CST handling, I've tested trunk gcc with
> r15-6339 reapplied and a hack where I've changed
>  const unsigned int raw_data_min_len = 128;
> to
>  const unsigned int raw_data_min_len = 2;
> in cp_lexer_new_main and 64 to 4 several times in c_parser_initval
> and c_maybe_optimize_large_byte_initializer, so that RAW_DATA_CST doesn't
> trigger just on very large initializers, but even quite small ones.
> 
> One of the regressions (will work on the others next) was that pr90838.c
> testcase regressed, check_ctz_array needs to handle RAW_DATA_CST, otherwise
> on larger initializers or if those come from #embed just won't trigger.
> The new testcase shows when it doesn't trigger anymore (regression from 14).
> 
> The patch just handles RAW_DATA_CST in the CONSTRUCTOR_ELTS the same as is
> it was a series of INTEGER_CSTs.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok

Richard 

> 2025-01-02  Jakub Jelinek  
> 
>* tree-ssa-forwprop.cc (check_ctz_array): Handle also RAW_DATA_CST
>in the CONSTRUCTOR_ELTS.
> 
>* gcc.dg/pr90838-2.c: New test.
> 
> --- gcc/tree-ssa-forwprop.cc.jj2024-12-28 00:12:11.185146287 +0100
> +++ gcc/tree-ssa-forwprop.cc2024-12-31 12:45:33.512434253 +0100
> @@ -2269,7 +2269,7 @@ check_ctz_array (tree ctor, unsigned HOS
> HOST_WIDE_INT &zero_val, unsigned shift, unsigned bits)
> {
>   tree elt, idx;
> -  unsigned HOST_WIDE_INT i, mask;
> +  unsigned HOST_WIDE_INT i, mask, raw_idx = 0;
>   unsigned matched = 0;
> 
>   mask = ((HOST_WIDE_INT_1U << (bits - shift)) - 1) << shift;
> @@ -2278,13 +2278,34 @@ check_ctz_array (tree ctor, unsigned HOS
> 
>   FOR_EACH_CONSTRUCTOR_ELT (CONSTRUCTOR_ELTS (ctor), i, idx, elt)
> {
> -  if (TREE_CODE (idx) != INTEGER_CST || TREE_CODE (elt) != INTEGER_CST)
> +  if (TREE_CODE (idx) != INTEGER_CST)
>return false;
> -  if (i > bits * 2)
> +  if (TREE_CODE (elt) != INTEGER_CST && TREE_CODE (elt) != RAW_DATA_CST)
>return false;
> 
>   unsigned HOST_WIDE_INT index = tree_to_shwi (idx);
> -  HOST_WIDE_INT val = tree_to_shwi (elt);
> +  HOST_WIDE_INT val;
> +
> +  if (TREE_CODE (elt) == INTEGER_CST)
> +val = tree_to_shwi (elt);
> +  else
> +{
> +  if (raw_idx == (unsigned) RAW_DATA_LENGTH (elt))
> +{
> +  raw_idx = 0;
> +  continue;
> +}
> +  if (TYPE_UNSIGNED (TREE_TYPE (elt)))
> +val = RAW_DATA_UCHAR_ELT (elt, raw_idx);
> +  else
> +val = RAW_DATA_SCHAR_ELT (elt, raw_idx);
> +  index += raw_idx;
> +  raw_idx++;
> +  i--;
> +}
> +
> +  if (index > bits * 2)
> +return false;
> 
>   if (index == 0)
>{
> --- gcc/testsuite/gcc.dg/pr90838-2.c.jj2024-12-31 12:50:10.548568029 +0100
> +++ gcc/testsuite/gcc.dg/pr90838-2.c2024-12-31 12:52:41.944455198 +0100
> @@ -0,0 +1,39 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-forwprop2-details" } */
> +/* { dg-additional-options "-mbmi" { target { { i?86-*-* x86_64-*-* } && { ! 
> { ia32 } } } } } */
> +/* { dg-additional-options "-march=rv64gc_zbb" { target { rv64 } } } */
> +/* { dg-additional-options "-march=rv32gc_zbb" { target { rv32 } } } */
> +/* { dg-require-effective-target int32plus } */
> +
> +static const unsigned long long magic = 0x03f08c5392f756cdULL;
> +
> +static const char table[128] = {
> + 0,  1, 12,  2, 13, 22, 17,  3,
> +14, 33, 23, 36, 18, 58, 28,  4,
> +62, 15, 34, 26, 24, 48, 50, 37,
> +19, 55, 59, 52, 29, 44, 39,  5,
> +63, 11, 21, 16, 32, 35, 57, 27,
> +61, 25, 47, 49, 54, 51, 43, 38,
> +10, 20, 31, 56, 60, 46, 53, 42,
> + 9, 30, 45, 41,  8, 40,  7,  6,
> + 1,  2,  3,  4,  5,  6,  7,  8,
> + 9, 10, 11, 12, 13, 14, 15, 16,
> +17, 18, 19, 20, 21, 22, 23, 24,
> +25, 26, 27, 28, 29, 30, 31, 32,
> +33, 34, 35, 36, 37, 38, 39, 40,
> +41, 42, 43, 44, 45, 46, 47, 48,
> +49, 50, 51, 52, 53, 54, 55, 56,
> +57, 58, 59, 60, 61, 62, 63, 64
> +};
> +
> +int ctz4 (unsigned long x)
> +{
> +  unsigned long lsb = x & -x;
> +  return table[(lsb * magic) >> 58];
> +}
> +
> +/* { dg-final { scan-tree-dump {= \.CTZ} "forwprop2" { target { { i?86-*-* 
> x86_64-*-* } && { ! { ia32 } } } } } } */
> +/* { dg-final { scan-tree-dump {= \.CTZ} "forwprop2" { target aarch64*-*-* } 
> } } */
> +/* { dg-final { scan-tree-dump {= \.CTZ} "forwprop2" { target { rv64 } } } } 
> */
> +/* { dg-final { scan-tree-dump {= \.CTZ} "forwprop2" { target { rv32 } } } } 
> */
> +/* { dg-final { scan-tree-dump {= \.CTZ} "forwprop2" { target { 
> loongarch64*-*-* } } } } */
> 
>Jakub
>

[PATCH] testsuite: libitm: Adjust how libitm.c++ passes link flags

2025-01-02 Thread mmalcomson

From: Matthew Malcomson 

For the `gcc` and `g++` tools we often pass -B/path/to/object/dir in via
`TEST_ALWAYS_FLAGS` (see e.g. asan.exp where this is set).
In libitm.c++/c++.exp we pass that -B flag via the `tool_flags` argument
to `dg-runtest`.

Passing as the `tool_flags` argument means that these flags get added to
the name of the test.  This means that if one were to compare the
testsuite results between runs made in different build directories
libitm.c++ gives a reasonable amount of NA->PASS and PASS->NA
differences even though the same tests passed each time.

This patch follows the example set in other parts of the testsuite like
asan_init and passes the -B arguments to the compiler via a global
variable called `TEST_ALWAYS_FLAGS`.  For this DejaGNU "tool" we had to
newly initialise that variable in libitm_init and add a check against
that variable in libitm_target_compile.  I thought about adding the
relevant flags we need for C++ into `ALWAYS_CFLAGS` but decided against
it since the name didn't match what we would be using it for.

Testing done to bootstrap & regtest on AArch64.  Manually observed that
the testsuite diff between two different build directories no longer
exists.

N.b. since I pass the new flags in the `ldflags` option of the DejaGNU
options while the previous code always passed this -B flag, the compile
test throwdown.C no longer gets compiled with this -B flag.  I believe
that is not a problem.

libitm/ChangeLog:

* testsuite/libitm.c++/c++.exp: Use TEST_ALWAYS_FLAGS instead of
passing arguments to dg-runtest.
* testsuite/lib/libitm.exp (libitm_init): Initialise
TEST_ALWAYS_FLAGS.
(libitm_target_compile): Take flags from TEST_ALWAYS_FLAGS.

Signed-off-by: Matthew Malcomson 
---
 libitm/testsuite/lib/libitm.exp | 8 
 libitm/testsuite/libitm.c++/c++.exp | 7 ---
 2 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/libitm/testsuite/lib/libitm.exp b/libitm/testsuite/lib/libitm.exp
index ac390d6d0dd..42a5aac4b0b 100644
--- a/libitm/testsuite/lib/libitm.exp
+++ b/libitm/testsuite/lib/libitm.exp
@@ -77,6 +77,7 @@ proc libitm_init { args } {
 global blddir
 global gluefile wrap_flags
 global ALWAYS_CFLAGS
+global TEST_ALWAYS_FLAGS
 global CFLAGS
 global TOOL_EXECUTABLE TOOL_OPTIONS
 global GCC_UNDER_TEST
@@ -145,6 +146,9 @@ proc libitm_init { args } {
}
 }
 
+# This set in order to give libitm.c++/c++.exp a nicely named flag to set
+# when adding C++ options.
+set TEST_ALWAYS_FLAGS ""
 set ALWAYS_CFLAGS ""
 if { $blddir != "" } {
lappend ALWAYS_CFLAGS "additional_flags=-B${blddir}/"
@@ -191,6 +195,7 @@ proc libitm_target_compile { source dest type options } {
 global libitm_compile_options
 global gluefile wrap_flags
 global ALWAYS_CFLAGS
+global TEST_ALWAYS_FLAGS
 global GCC_UNDER_TEST
 global lang_test_file
 global lang_library_path
@@ -217,6 +222,9 @@ proc libitm_target_compile { source dest type options } {
 if [info exists ALWAYS_CFLAGS] {
set options [concat "$ALWAYS_CFLAGS" $options]
 }
+if [info exists TEST_ALWAYS_FLAGS] {
+   set options [concat "$TEST_ALWAYS_FLAGS" $options]
+}
 
 set options [dg-additional-files-options $options $source $dest $type]
 
diff --git a/libitm/testsuite/libitm.c++/c++.exp 
b/libitm/testsuite/libitm.c++/c++.exp
index ab278f2cb33..d501e7e8170 100644
--- a/libitm/testsuite/libitm.c++/c++.exp
+++ b/libitm/testsuite/libitm.c++/c++.exp
@@ -56,10 +56,10 @@ if { $lang_test_file_found } {
 # Gather a list of all tests.
 set tests [lsort [glob -nocomplain $srcdir/$subdir/*.C]]
 
-set stdcxxadder ""
+set saved_TEST_ALWAYS_FLAGS $TEST_ALWAYS_FLAGS
 if { $blddir != "" } {
set ld_library_path 
"$always_ld_library_path:${blddir}/${lang_library_path}"
-   set stdcxxadder "-B ${blddir}/${lang_library_path}"
+   set TEST_ALWAYS_FLAGS "$TEST_ALWAYS_FLAGS 
ldflags=-B${blddir}/${lang_library_path}"
 } else {
set ld_library_path "$always_ld_library_path"
 }
@@ -74,7 +74,8 @@ if { $lang_test_file_found } {
 }
 
 # Main loop.
-dg-runtest $tests $stdcxxadder $libstdcxx_includes
+dg-runtest $tests "" $libstdcxx_includes
+set TEST_ALWAYS_FLAGS $saved_TEST_ALWAYS_FLAGS
 }
 
 # All done.
-- 
2.43.0

[PATCH] aarch64: Detect word-level modification in early-ra [PR118184]

2025-01-02 Thread Richard Sandiford

REGMODE_NATURAL_SIZE is set to 64 bits for everything except
VLA SVE modes.  This means that it's possible to modify (say)
the highpart of a TI pseudo or a V2DI pseudo independently
of the lowpart.  Modifying such highparts requires a reload
if the highpart ends up in the upper 64 bits of an FPR,
since RTL semantics do not allow the highpart of a single
hard register to be modified independently of the lowpart.

early-ra missed a check for this case, which meant that it
effectively treated an assignment to (subreg:DI (reg:TI R) 0)
as an assignment to the whole of R.

Tested on aarch64-linux-gnu & pushed to trunk.  I'll backport
to GCC 14 after a grace period.

Richard


gcc/
PR target/118184
* config/aarch64/aarch64-early-ra.cc (allocno_assignment_is_rmw):
New function.
(early_ra::record_insn_defs): Mark the live range information as
untrustworthy if an assignment would change part of an allocno
but preserve the rest.

gcc/testsuite/
* gcc.dg/torture/pr118184.c: New test.
---
 gcc/config/aarch64/aarch64-early-ra.cc  | 51 -
 gcc/testsuite/gcc.dg/torture/pr118184.c | 36 +
 2 files changed, 86 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr118184.c

diff --git a/gcc/config/aarch64/aarch64-early-ra.cc 
b/gcc/config/aarch64/aarch64-early-ra.cc
index 660a47195d2..479fe56b4d8 100644
--- a/gcc/config/aarch64/aarch64-early-ra.cc
+++ b/gcc/config/aarch64/aarch64-early-ra.cc
@@ -2033,6 +2033,43 @@ early_ra::record_artificial_refs (unsigned int flags)
   m_current_point += 1;
 }
 
+// Return true if:
+//
+// - X is a SUBREG, in which case it is a SUBREG of some REG Y
+//
+// - one 64-bit word of Y can be modified while preserving all other words
+//
+// - X refers to no more than one 64-bit word of Y
+//
+// - assigning FPRs to Y would put more than one 64-bit word in each FPR
+//
+// For example, this is true of:
+//
+// - (subreg:DI (reg:TI R) 0) and
+// - (subreg:DI (reg:TI R) 8)
+//
+// but is not true of:
+//
+// - (subreg:V2SI (reg:V2x2SI R) 0) or
+// - (subreg:V2SI (reg:V2x2SI R) 8).
+static bool
+allocno_assignment_is_rmw (rtx x)
+{
+  if (partial_subreg_p (x))
+{
+  auto outer_mode = GET_MODE (x);
+  auto inner_mode = GET_MODE (SUBREG_REG (x));
+  if (known_eq (REGMODE_NATURAL_SIZE (inner_mode), 0U + UNITS_PER_WORD)
+ && known_lt (GET_MODE_SIZE (outer_mode), UNITS_PER_VREG))
+   {
+ auto nregs = targetm.hard_regno_nregs (V0_REGNUM, inner_mode);
+ if (maybe_ne (nregs * UNITS_PER_WORD, GET_MODE_SIZE (inner_mode)))
+   return true;
+   }
+}
+  return false;
+}
+
 // Called as part of a backwards walk over a block.  Model the definitions
 // in INSN, excluding partial call clobbers.
 void
@@ -2045,9 +2082,21 @@ early_ra::record_insn_defs (rtx_insn *insn)
   record_fpr_def (DF_REF_REGNO (ref));
 else
   {
-   auto range = get_allocno_subgroup (DF_REF_REG (ref));
+   rtx reg = DF_REF_REG (ref);
+   auto range = get_allocno_subgroup (reg);
for (auto &allocno : range.allocnos ())
  {
+   // Make sure that assigning to the DF_REF_REG clobbers the
+   // whole of this allocno, not just some of it.
+   if (allocno_assignment_is_rmw (reg))
+ {
+   record_live_range_failure ([&](){
+ fprintf (dump_file, "read-modify-write of allocno %d",
+  allocno.id);
+   });
+   break;
+ }
+
// If the destination is unused, record a momentary blip
// in its live range.
if (!bitmap_bit_p (m_live_allocnos, allocno.id))
diff --git a/gcc/testsuite/gcc.dg/torture/pr118184.c 
b/gcc/testsuite/gcc.dg/torture/pr118184.c
new file mode 100644
index 000..20f567af11f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr118184.c
@@ -0,0 +1,36 @@
+/* { dg-do run { target { longdouble128 && lp64 } } } */
+
+union u1
+{
+  long double ld;
+  unsigned long l[2];
+};
+
+[[gnu::noipa]]
+unsigned long m()
+{
+  return 1000;
+}
+
+[[gnu::noinline]]
+long double f(void)
+{
+  union u1 u;
+  u.ld = __builtin_nanf128("");
+  u.l[0] = m();
+  return u.ld;
+}
+
+int main()
+{
+   union u1 u;
+   u.ld = f();
+   union u1 u2;
+   u2.ld = __builtin_nanf128("");
+   u2.l[0] = m();
+   if (u.l[0] != u2.l[0])
+ __builtin_abort();
+   if (u.l[1] != u2.l[1])
+ __builtin_abort();
+   return 0;
+}
-- 
2.25.1

Re: [committed] Use u'' instead of '' in libgdiagnostics/conf.py

2025-01-02 Thread Richard Sandiford

Jakub Jelinek  writes:
> Hi!
>
> libgdiagnostics/conf.py breaks update-copyright.py --this-year,
> which only accepts copyright year in u'' literals in python files,
> not in ''.
>
> u'' strings is what e.g. libgccjit conf.py uses.
> Tested by building libgdiagnostics docs without/with this patch, the
> difference is just the expected addition of -2025 in tons of spots,
> nothing else.

It'd be good to move all the python scripts over to python 3, so that
this is no longer necessary.  That's obviously separate work though...

Richard

> Committed to trunk as obvious.
>
> 2025-01-02  Jakub Jelinek  
>
>   * doc/libgdiagnostics/conf.py: Use u'' instead of '' in
>   project and copyright initialization.
>
> --- gcc/doc/libgdiagnostics/conf.py
> +++ gcc/doc/libgdiagnostics/conf.py
> @@ -6,8 +6,8 @@
>  # -- Project information 
> -
>  # 
> https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information
>  
> -project = 'libgdiagnostics'
> -copyright = '2024 Free Software Foundation, Inc.'
> +project = u'libgdiagnostics'
> +copyright = u'2024-2025 Free Software Foundation, Inc.'
>  
>  # -- General configuration 
> ---
>  # 
> https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration
>
>
>   Jakub

Re: [PATCH] testsuite: libitm: Adjust how libitm.c++ passes link flags

2025-01-02 Thread Richard Sandiford

 writes:
> From: Matthew Malcomson 
>
> For the `gcc` and `g++` tools we often pass -B/path/to/object/dir in via
> `TEST_ALWAYS_FLAGS` (see e.g. asan.exp where this is set).
> In libitm.c++/c++.exp we pass that -B flag via the `tool_flags` argument
> to `dg-runtest`.
>
> Passing as the `tool_flags` argument means that these flags get added to
> the name of the test.  This means that if one were to compare the
> testsuite results between runs made in different build directories
> libitm.c++ gives a reasonable amount of NA->PASS and PASS->NA
> differences even though the same tests passed each time.
>
> This patch follows the example set in other parts of the testsuite like
> asan_init and passes the -B arguments to the compiler via a global
> variable called `TEST_ALWAYS_FLAGS`.  For this DejaGNU "tool" we had to
> newly initialise that variable in libitm_init and add a check against
> that variable in libitm_target_compile.  I thought about adding the
> relevant flags we need for C++ into `ALWAYS_CFLAGS` but decided against
> it since the name didn't match what we would be using it for.
>
> Testing done to bootstrap & regtest on AArch64.  Manually observed that
> the testsuite diff between two different build directories no longer
> exists.
>
> N.b. since I pass the new flags in the `ldflags` option of the DejaGNU
> options while the previous code always passed this -B flag, the compile
> test throwdown.C no longer gets compiled with this -B flag.  I believe
> that is not a problem.
>
> libitm/ChangeLog:
>
>   * testsuite/libitm.c++/c++.exp: Use TEST_ALWAYS_FLAGS instead of
>   passing arguments to dg-runtest.
>   * testsuite/lib/libitm.exp (libitm_init): Initialise
>   TEST_ALWAYS_FLAGS.
>   (libitm_target_compile): Take flags from TEST_ALWAYS_FLAGS.
>
> Signed-off-by: Matthew Malcomson 
> ---
>  libitm/testsuite/lib/libitm.exp | 8 
>  libitm/testsuite/libitm.c++/c++.exp | 7 ---
>  2 files changed, 12 insertions(+), 3 deletions(-)
>
> diff --git a/libitm/testsuite/lib/libitm.exp b/libitm/testsuite/lib/libitm.exp
> index ac390d6d0dd..42a5aac4b0b 100644
> --- a/libitm/testsuite/lib/libitm.exp
> +++ b/libitm/testsuite/lib/libitm.exp
> @@ -77,6 +77,7 @@ proc libitm_init { args } {
>  global blddir
>  global gluefile wrap_flags
>  global ALWAYS_CFLAGS
> +global TEST_ALWAYS_FLAGS
>  global CFLAGS
>  global TOOL_EXECUTABLE TOOL_OPTIONS
>  global GCC_UNDER_TEST
> @@ -145,6 +146,9 @@ proc libitm_init { args } {
>   }
>  }
>  
> +# This set in order to give libitm.c++/c++.exp a nicely named flag to set
> +# when adding C++ options.
> +set TEST_ALWAYS_FLAGS ""

This looked odd at first glance.  By unconditionally writing "" to the
variable, it seems to subvert the save and restore done in c++.exp.

How about instead copying the behaviour of asan_init and asan_finish,
so that libitm_init and libitm_finish do the save and restore?  Or perhaps
a slight variation: after saving, libitm_init can set TEST_ALWAYS_FLAGS
to "" if TEST_ALWAYS_FLAGS was previously unset.

c++.exp would then not need to save and restore the flags itself, and
could still assume that TEST_ALWAYS_FLAGS is always set.

Thanks,
Richard

>  set ALWAYS_CFLAGS ""
>  if { $blddir != "" } {
>   lappend ALWAYS_CFLAGS "additional_flags=-B${blddir}/"
> @@ -191,6 +195,7 @@ proc libitm_target_compile { source dest type options } {
>  global libitm_compile_options
>  global gluefile wrap_flags
>  global ALWAYS_CFLAGS
> +global TEST_ALWAYS_FLAGS
>  global GCC_UNDER_TEST
>  global lang_test_file
>  global lang_library_path
> @@ -217,6 +222,9 @@ proc libitm_target_compile { source dest type options } {
>  if [info exists ALWAYS_CFLAGS] {
>   set options [concat "$ALWAYS_CFLAGS" $options]
>  }
> +if [info exists TEST_ALWAYS_FLAGS] {
> + set options [concat "$TEST_ALWAYS_FLAGS" $options]
> +}
>  
>  set options [dg-additional-files-options $options $source $dest $type]
>  
> diff --git a/libitm/testsuite/libitm.c++/c++.exp 
> b/libitm/testsuite/libitm.c++/c++.exp
> index ab278f2cb33..d501e7e8170 100644
> --- a/libitm/testsuite/libitm.c++/c++.exp
> +++ b/libitm/testsuite/libitm.c++/c++.exp
> @@ -56,10 +56,10 @@ if { $lang_test_file_found } {
>  # Gather a list of all tests.
>  set tests [lsort [glob -nocomplain $srcdir/$subdir/*.C]]
>  
> -set stdcxxadder ""
> +set saved_TEST_ALWAYS_FLAGS $TEST_ALWAYS_FLAGS
>  if { $blddir != "" } {
>   set ld_library_path 
> "$always_ld_library_path:${blddir}/${lang_library_path}"
> - set stdcxxadder "-B ${blddir}/${lang_library_path}"
> + set TEST_ALWAYS_FLAGS "$TEST_ALWAYS_FLAGS 
> ldflags=-B${blddir}/${lang_library_path}"
>  } else {
>   set ld_library_path "$always_ld_library_path"
>  }
> @@ -74,7 +74,8 @@ if { $lang_test_file_found } {
>  }
>  
>  # Main loop.
> -dg-runtest $tests $stdcxxadder $libstdcxx_includes
> +dg-

[PATCH] Fortran: Cray pointer comparison wrongly optimized away [PR106692]

2025-01-02 Thread Harald Anlauf

Dear all,

this patch addresses overeager optimization of Cray pointers when
used in comparisons.  Cray pointers are non-standard, and odd in a
sense that they were introduced before modern Fortran pointers.
Comparisons with e.g. a "NULL" pointer are actually comparisons
with integer zero etc., which means that while they are references
they can actually be "NULL" to mimic a disassociated pointer.
The only solution I could find was treating them locally as volatile
when used in a comparison.  If someone has a better solution, please
share!

As this is a local solution, and a real-world legacy code using Cray
pointers would likely never use such a test in a vectorizable loop,
I expect negligible (performance and code-size) impact.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

This PR is marked as a regression (since gcc-7), is this OK for
a (limited?) backport?

Thanks,
Harald

From 2043df2056e451d7a2f48d3da9cd560eccd2dd51 Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Thu, 2 Jan 2025 20:22:23 +0100
Subject: [PATCH] Fortran: Cray pointer comparison wrongly optimized away
 [PR106692]

	PR fortran/106692

gcc/fortran/ChangeLog:

	* trans-expr.cc (gfc_conv_expr_op): Inhibit excessive optimization
	of Cray pointers by treating them as volatile in comparisons.

gcc/testsuite/ChangeLog:

	* gfortran.dg/cray_pointers_13.f90: New test.
---
 gcc/fortran/trans-expr.cc | 13 +
 .../gfortran.dg/cray_pointers_13.f90  | 51 +++
 2 files changed, 64 insertions(+)
 create mode 100644 gcc/testsuite/gfortran.dg/cray_pointers_13.f90

diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index f73e04bfd1d..bc24105ce32 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -4150,6 +4150,19 @@ gfc_conv_expr_op (gfc_se * se, gfc_expr * expr)

   if (lop)
 {
+  // Inhibit overeager optimization of Cray pointer comparisons (PR106692).
+  if (expr->value.op.op1->expr_type == EXPR_VARIABLE
+	  && expr->value.op.op1->ts.type == BT_INTEGER
+	  && expr->value.op.op1->symtree
+	  && expr->value.op.op1->symtree->n.sym->attr.cray_pointer)
+	TREE_THIS_VOLATILE (lse.expr) = 1;
+
+  if (expr->value.op.op2->expr_type == EXPR_VARIABLE
+	  && expr->value.op.op2->ts.type == BT_INTEGER
+	  && expr->value.op.op2->symtree
+	  && expr->value.op.op2->symtree->n.sym->attr.cray_pointer)
+	TREE_THIS_VOLATILE (rse.expr) = 1;
+
   /* The result of logical ops is always logical_type_node.  */
   tmp = fold_build2_loc (input_location, code, logical_type_node,
 			 lse.expr, rse.expr);
diff --git a/gcc/testsuite/gfortran.dg/cray_pointers_13.f90 b/gcc/testsuite/gfortran.dg/cray_pointers_13.f90
new file mode 100644
index 000..766d24546ab
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/cray_pointers_13.f90
@@ -0,0 +1,51 @@
+! { dg-do run }
+! { dg-additional-options "-fcray-pointer" }
+!
+! PR fortran/106692 - Cray pointer comparison wrongly optimized away
+!
+! Contributed by Marek Polacek
+
+program test
+  call test_cray()
+  call test_cray2()
+end
+
+subroutine test_cray()
+  pointer(ptrzz1 , zz1)
+  ptrzz1=0
+  if (ptrzz1 .ne. 0) then
+print *, "test_cray: ptrzz1=", ptrzz1
+stop 1
+  else
+call shape_cray(zz1)
+  end if
+end
+
+subroutine shape_cray(zz1)
+  pointer(ptrzz , zz)
+  ptrzz=loc(zz1)
+  if (ptrzz .ne. 0) then
+print *, "shape_cray: ptrzz=", ptrzz
+stop 3
+  end if
+end
+
+subroutine test_cray2()
+  pointer(ptrzz1 , zz1)
+  ptrzz1=0
+  if (0 == ptrzz1) then
+call shape_cray2(zz1)
+  else
+print *, "test_cray2: ptrzz1=", ptrzz1
+stop 2
+  end if
+end
+
+subroutine shape_cray2(zz1)
+  pointer(ptrzz , zz)
+  ptrzz=loc(zz1)
+  if (.not. (0 == ptrzz)) then
+print *, "shape_cray2: ptrzz=", ptrzz
+stop 4
+  end if
+end
--
2.43.0

Re: [WIP 3/8] algol68: front-end misc files

2025-01-02 Thread Jose E. Marchesi


> On Wed, 2025-01-01 at 03:09 +0100, Jose E. Marchesi wrote:
>> ---
>>  gcc/algol68/Make-lang.in |  239 +
>>  gcc/algol68/README   |  102 ++
>>  gcc/algol68/a68-diagnostics.cc   |  450 +
>>  gcc/algol68/a68-lang.cc  |  549 ++
>>  gcc/algol68/a68-moids-diagnostics.cc |  271 +
>>  gcc/algol68/a68-moids-misc.cc    | 1404
>> ++
>>  gcc/algol68/a68-moids-size.cc    |  339 +++
>>  gcc/algol68/a68-moids-to-string.cc   |  375 +++
>>  gcc/algol68/a68-postulates.cc    |  105 ++
>>  gcc/algol68/a68-tree.def |   26 +
>>  gcc/algol68/a68-types.h  |  980 ++
>>  gcc/algol68/a68.h    |  650 
>>  gcc/algol68/a68spec.cc   |  212 
>>  gcc/algol68/algol68-target.def   |   52 +
>>  gcc/algol68/config-lang.in   |   31 +
>>  gcc/algol68/gac-internals.texi   |  351 +++
>>  gcc/algol68/gac.texi |  292 ++
>>  gcc/algol68/lang-specs.h |   26 +
>>  gcc/algol68/lang.opt |   93 ++
>>  gcc/algol68/lang.opt.urls    |   32 +
>>  20 files changed, 6579 insertions(+)
>>  create mode 100644 gcc/algol68/Make-lang.in
>>  create mode 100644 gcc/algol68/README
>>  create mode 100644 gcc/algol68/a68-diagnostics.cc
>>  create mode 100644 gcc/algol68/a68-lang.cc
>>  create mode 100644 gcc/algol68/a68-moids-diagnostics.cc
>>  create mode 100644 gcc/algol68/a68-moids-misc.cc
>>  create mode 100644 gcc/algol68/a68-moids-size.cc
>>  create mode 100644 gcc/algol68/a68-moids-to-string.cc
>>  create mode 100644 gcc/algol68/a68-postulates.cc
>>  create mode 100644 gcc/algol68/a68-tree.def
>>  create mode 100644 gcc/algol68/a68-types.h
>>  create mode 100644 gcc/algol68/a68.h
>>  create mode 100644 gcc/algol68/a68spec.cc
>>  create mode 100644 gcc/algol68/algol68-target.def
>>  create mode 100644 gcc/algol68/config-lang.in
>>  create mode 100644 gcc/algol68/gac-internals.texi
>>  create mode 100644 gcc/algol68/gac.texi
>>  create mode 100644 gcc/algol68/lang-specs.h
>>  create mode 100644 gcc/algol68/lang.opt
>>  create mode 100644 gcc/algol68/lang.opt.urls
>> 
>> diff --git a/gcc/algol68/Make-lang.in b/gcc/algol68/Make-lang.in
>> new file mode 100644
>> index 000..294d39dd205
>> --- /dev/null
>> +++ b/gcc/algol68/Make-lang.in
>> @@ -0,0 +1,239 @@
>> +# Make-lang.in -- Top level -*- makefile -*- fragment for GCC ALGOL
>> 68
>> +# frontend.
>> +
>> +# Copyright (C) 2025 Free Software Foundation, Inc.
>> +
>> +# This file is NOT part of GCC.
>> +
>> +# GCC is free software; you can redistribute it and/or modify
>> +# it under the terms of the GNU General Public License as published
>> by
>> +# the Free Software Foundation; either version 3, or (at your
>> option)
>> +# any later version.
>> +
>> +# GCC is distributed in the hope that it will be useful,
>> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> +# GNU General Public License for more details.
>> +
>> +# You should have received a copy of the GNU General Public License
>> +# along with GCC; see the file COPYING3.  If not see
>> +# .
>> +
>> +# This file provides the language dependent support in the main
>> Makefile.
>
> The boilerplate in this file, and many others in the patch kit, has the
> line "This file is NOT part of GCC."
>
> Sorry if I'm missing something obvious here, but it certainly looks
> like part of GCC to me, or, at least, it would be if the patch were
> merged into our repository.
>
> What is the intent of these lines?  Is there some kind of GCC vs not-
> GCC separation intended here, and is there a high-level description of
> where the line is drawn?

Oh, sorry for the confusion.

In the Emacs world it is customary to have notes like

 ;; This file is NOT part of GNU Emacs.

instead of the standard

 ;; This file is part of GNU Emacs.

in files that are either third-party packages or that are not yet
incorporated in Emacs core or in ELPA (the Emacs packages archive).  I
am so used to it that it came automatically when I wrote these headers,
because the WIP is not really ready for merging yet..

Re: [PATCH] Fortran: Cray pointer comparison wrongly optimized away [PR106692]

2025-01-02 Thread Jerry D


On 1/2/25 12:04 PM, Harald Anlauf wrote:

Dear all,

this patch addresses overeager optimization of Cray pointers when
used in comparisons.  Cray pointers are non-standard, and odd in a
sense that they were introduced before modern Fortran pointers.
Comparisons with e.g. a "NULL" pointer are actually comparisons
with integer zero etc., which means that while they are references
they can actually be "NULL" to mimic a disassociated pointer.
The only solution I could find was treating them locally as volatile
when used in a comparison.  If someone has a better solution, please
share!

As this is a local solution, and a real-world legacy code using Cray
pointers would likely never use such a test in a vectorizable loop,
I expect negligible (performance and code-size) impact.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

This PR is marked as a regression (since gcc-7), is this OK for
a (limited?) backport?

Thanks,
Harald



The hack is fairly isolated and simple. The problem is a quirk from the 
past. The only way to expose an issue is to get it into the real world 
and see if anyone notices a problem.


OK for trunk and maybe 14.  If you think farther back, up to you.

Jerry

[PATCH] libgcc: i386/linux-unwind.h: always rely on sys/ucontext.h

2025-01-02 Thread Roman Kagan

When gcc is built for x86_64-linux-musl target, stack unwinding from
within signal handler stops at the innermost signal frame.  The reason
for this behaviro is that the signal trampoline is not accompanied with
appropiate CFI directives, and the fallback path in libgcc to recognize
it by the code sequence is only enabled for glibc except 2.0.  The
latter is motivated by the lack of sys/ucontext.h in that glibc version.

Given that all relevant libc-s ship sys/ucontext.h for over a decade,
and that other arches aren't shy of unconditionally using it, follow
suit and remove the preprocessor condition, too.

Signed-off-by: Roman Kagan 
---
 libgcc/config/i386/linux-unwind.h | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/libgcc/config/i386/linux-unwind.h 
b/libgcc/config/i386/linux-unwind.h
index fe316ee02cf2..8f37642bbf55 100644
--- a/libgcc/config/i386/linux-unwind.h
+++ b/libgcc/config/i386/linux-unwind.h
@@ -33,12 +33,6 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If 
not, see
 
 #ifndef inhibit_libc
 
-/* There's no sys/ucontext.h for glibc 2.0, so no
-   signal-turned-exceptions for them.  There's also no configure-run for
-   the target, so we can't check on (e.g.) HAVE_SYS_UCONTEXT_H.  Using the
-   target libc version macro should be enough.  */
-#if defined __GLIBC__ && !(__GLIBC__ == 2 && __GLIBC_MINOR__ == 0)
-
 #include 
 #include 
 
@@ -199,5 +193,4 @@ x86_frob_update_context (struct _Unwind_Context *context,
 }
 
 #endif /* ifdef __x86_64__  */
-#endif /* not glibc 2.0 */
 #endif /* ifdef inhibit_libc  */
-- 
2.47.1

[PATCH] hurd: Add OPTION_GLIBC_P and OPTION_GLIBC

2025-01-02 Thread Samuel Thibault

From: Svante Signell 

GNU/Hurd uses glibc just like GNU/Linux.

This is needed for gcc to notice that glibc supports split stack in
finish_options.

gcc/ChangeLog:
* gcc/config/gnu.h (OPTION_GLIBC_P, OPTION_GLIBC): Define.

Patch from Svante Signell for PR go/104290.
---
 gcc/config/gnu.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/config/gnu.h b/gcc/config/gnu.h
index e2a33baf040..4e921e0d51e 100644
--- a/gcc/config/gnu.h
+++ b/gcc/config/gnu.h
@@ -19,6 +19,10 @@ You should have received a copy of the GNU General Public 
License
 along with GCC.  If not, see .
 */
 
+/* C libraries used on GNU/Hurd.  */
+#define OPTION_GLIBC_P(opts)   (DEFAULT_LIBC == LIBC_GLIBC)
+#define OPTION_GLIBC   OPTION_GLIBC_P (&global_options)
+
 #undef GNU_USER_TARGET_OS_CPP_BUILTINS
 #define GNU_USER_TARGET_OS_CPP_BUILTINS()  \
 do {   \
-- 
2.43.0

Re: [committed] Use u'' instead of '' in libgdiagnostics/conf.py

2025-01-02 Thread David Malcolm

On Thu, 2025-01-02 at 13:34 +0100, Jakub Jelinek wrote:
> On Thu, Jan 02, 2025 at 11:51:20AM +, Richard Sandiford wrote:
> > Jakub Jelinek  writes:
> > > libgdiagnostics/conf.py breaks update-copyright.py --this-year,
> > > which only accepts copyright year in u'' literals in python
> > > files,
> > > not in ''.
> > > 
> > > u'' strings is what e.g. libgccjit conf.py uses.
> > > Tested by building libgdiagnostics docs without/with this patch,
> > > the
> > > difference is just the expected addition of -2025 in tons of
> > > spots,
> > > nothing else.
> > 
> > It'd be good to move all the python scripts over to python 3, so
> > that
> > this is no longer necessary.  That's obviously separate work
> > though...
> 
> Nothing against that.
> My python-fu is very limited though.
> 
> Right now update-copyright.py has
> '|copyright = u\''
> among other parts of regexp, maybe it would be just a matter of
> adding
> '|copyright = \''
> too or replacing the u one with the latter and getting rid of u''
> strings
> plus testing what it breaks/changes.

FWIW the u'' prefix was originally added in Python 2.0, was dropped in
Python 3.0 through 3.2, and then re-added in Python 3.3 onwards to help
with 2 vs 3 compatibility; see https://peps.python.org/pep-0414/

I don't think anyone's planning to eliminate them again.

It's probably simplest to extend update-copyright.py to support the non
u-prefixed strings as well as u-prefixed strings, given that both are
valid in Python 3.2 onwards.

Dave

Re: [committed] Use u'' instead of '' in libgdiagnostics/conf.py

2025-01-02 Thread David Malcolm

On Thu, 2025-01-02 at 10:39 -0500, David Malcolm wrote:
> On Thu, 2025-01-02 at 13:34 +0100, Jakub Jelinek wrote:
> > On Thu, Jan 02, 2025 at 11:51:20AM +, Richard Sandiford wrote:
> > > Jakub Jelinek  writes:
> > > > libgdiagnostics/conf.py breaks update-copyright.py --this-year,
> > > > which only accepts copyright year in u'' literals in python
> > > > files,
> > > > not in ''.
> > > > 
> > > > u'' strings is what e.g. libgccjit conf.py uses.
> > > > Tested by building libgdiagnostics docs without/with this
> > > > patch,
> > > > the
> > > > difference is just the expected addition of -2025 in tons of
> > > > spots,
> > > > nothing else.
> > > 
> > > It'd be good to move all the python scripts over to python 3, so
> > > that
> > > this is no longer necessary.  That's obviously separate work
> > > though...
> > 
> > Nothing against that.
> > My python-fu is very limited though.
> > 
> > Right now update-copyright.py has
> > '|copyright = u\''
> > among other parts of regexp, maybe it would be just a matter of
> > adding
> > '|copyright = \''
> > too or replacing the u one with the latter and getting rid of u''
> > strings
> > plus testing what it breaks/changes.
> 
> FWIW the u'' prefix was originally added in Python 2.0, was dropped
> in
> Python 3.0 through 3.2, and then re-added in Python 3.3 onwards to
> help
> with 2 vs 3 compatibility; see https://peps.python.org/pep-0414/
> 
> I don't think anyone's planning to eliminate them again.
> 
> It's probably simplest to extend update-copyright.py to support the
> non
> u-prefixed strings as well as u-prefixed strings, given that both are
> valid in Python 3.2 onwards.

gah, 3.3, I meant to say.

Re: [PATCH] c: special-case some "bool" errors with C23 (v2) [PR117629]

2025-01-02 Thread David Malcolm

On Thu, 2025-01-02 at 18:33 +, Joseph Myers wrote:
> On Thu, 19 Dec 2024, David Malcolm wrote:
> 
> > Here's an updated version of the patch.
> > 
> > Changed in v2:
> > - distinguish between "bool" and "_Bool" when determining
> >   standard version
> > - more test coverage
> > 
> > Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> > OK for trunk?
> 
> OK. 

Thanks; pushed as r15-6507-g321983033d621e.

>  (I'm guessing the other new keywords that weren't previously
> reserved 
> (alignas, alignof, constexpr, nullptr, static_assert, thread_local, 
> typeof, typeof_unqual) are sufficiently rare as identifiers that they
> aren't worth trying to produce better diagnostics for.)

That was my thinking too.

Sam, did you see anything significant here in your testing?

Thanks
Dave

Re: [PATCH]AArch64: Fix costing of emulated gathers/scatters [PR118188]

2025-01-02 Thread Richard Sandiford

Tamar Christina  writes:
>> >> So I think ideally, we should try to detect whether the indices come
>> >> directly from memory or are the result of arithmetic.  In the former case,
>> >> we should do the loads adjustment above.  In the latter case, we should
>> >> keep the vec_to_scalar accounting unchanged.
>> >
>> > I can do this but...
>> >
>> >>
>> >> Of course, these umovs are likely to be more throughput-limited than we
>> >> model, but that's a separate pre-existing problem...
>> >
>> > I agree with the above, the reason I just updated loads is as you already 
>> > said
>> > that the umov accounting as general operations don't account for the 
>> > bottleneck.
>> > In general umovs are more throughput limited than loads and the number of
>> general
>> > ops we can execute would in the example above misrepresent the throughput 
>> > as
>> it
>> > still thinks it can execute all transfers + all scalar loads in one cycle. 
>> >  As the number
>> of VX
>> > increases modelling them as general ops incorrectly favors the emulated 
>> > gather.
>> See e.g.
>> > Cortex-X925.
>> >
>> > By still modelling them as loads it more accurately models that the data 
>> > loads
>> have to
>> > wait for the indexes.
>> >
>> > The problem with modelling them as general ops is that when compared to the
>> IFN for
>> > SVE they end up being cheaper.  For instance the umov case above is faster 
>> > using
>> an
>> > actual SVE gather.
>> >
>> > So if we really want to be accurate we have to model vec transfers as 
>> > otherwise it
>> still
>> > models the index transfers as effectively free.
>> 
>> Yeah, agree that we eventually need to model transfers properly.
>> 
>> But I think my point still stands that modelling loads instead of
>> general ops won't help in cases where memory doesn't dominate.
>> Modelling UMOVs as general ops does give us something in that case,
>> even if it's not perfect.
>> 
>> How about, as a compromise, just removing the early return?  That way
>> we won't "regress" in the counting of general ops for the case of
>> arithmetic indices, but will still get the benefit of the load
>> heuristic.
>
> I'm ok with this.  An alternative solution here might be doing what i386 does 
> and
> scale the operation by vector subparts.  I assume the goal there was to model 
> the
> latency of doing the individual transfers in a dependency chain.  But if we 
> increase
> the number of general ops instead when the source isn't a memory then we
> simulate that it's data bound but also account for the throughput limitation 
> somewhat.

Scaling by subparts feels like double-counting (well, squaring) in this
case, since the provided count of 4 already reflects the number of subparts.

It looks we do still vectorise for Advanced SIMD with -mtune=neoverse-v2
(rather than -mcpu=neoverse-v2, so with SVE disabled) and without any tuning
option.  Is that the right call, or do we need to tweak the latency
costs too?

>> >> For the scatter store case:
>> >>
>> >> float
>> >> s4115 (int *ip)
>> >> {
>> >>   for (int i = 0; i < LEN_1D; i++)
>> >> {
>> >>   b[ip[i]] = a[i] + 1;
>> >> }
>> >> }
>> >>
>> >> the vectoriser (unhelpfully) costs both the index-to-scalars and
>> >> data-to-scalars as vec_to_scalar, meaning that we'll double-count
>> >> the extra loads.
>> >>
>> >
>> > I think that's more accurate though.
>> >
>> > This example is load Q -> umov -> store.
>> >
>> > This is a 3 insn dependency chain, where modelling the umov as load
>> > more accurately depicts the dependency on the preceding load.
>> 
>> For the above we generate:
>> 
>> .L2:
>> ldr q30, [x7, x1]
>> add x3, x0, x1
>> ldrsw   x6, [x0, x1]
>> add x1, x1, 16
>> ldp w5, w4, [x3, 4]
>> add x5, x2, w5, sxtw 2
>> add x4, x2, w4, sxtw 2
>> faddv30.4s, v30.4s, v31.4s
>> ldr w3, [x3, 12]
>> add x3, x2, w3, sxtw 2
>> str s30, [x2, x6, lsl 2]
>> st1 {v30.s}[1], [x5]
>> st1 {v30.s}[2], [x4]
>> st1 {v30.s}[3], [x3]
>> cmp x1, x8
>> bne .L2
>> 
>> i.e. we use separate address arithmetic and avoid UMOVs.  Counting
>> two loads and one store for each element of the scatter store seems
>> like overkill for that.
>
> Hmm agreed..
>
> How about for stores we increase the load counts by count / 2?
>
> This would account for the fact that we know we have indexed stores
> and so the data-to-scalar operation is free?

Yeah, sounds good.  We should probably divide count itself by 2,
then apply the new count to both the load heuristic and the general ops,
to avoid double-counting in both.  (The V pipe usage for stores is
modelled as part of the scalar_store itself.)  But like you say,
we should probably drop the - 1 from the load adjustment for stores,
because that - 1 would also be applied twice.

Thanks,
Richard

Re: [COMMITTED] Fortran: Grammar/markup fixes in intrinsics documentation

2025-01-02 Thread Maciej W. Rozycki

On Tue, 31 Dec 2024, Sandra Loosemore wrote:

> diff --git a/gcc/fortran/intrinsic.texi b/gcc/fortran/intrinsic.texi
> index d11d37761d9..b47180126ca 100644
> --- a/gcc/fortran/intrinsic.texi
> +++ b/gcc/fortran/intrinsic.texi
> @@ -1577,7 +1577,7 @@ if @var{Y} is present, @var{X} shall be REAL.
>  @item @emph{Return value}:
>  The return value is of the same type and kind as @var{X}.
>  If @var{Y} is present, the result is identical to @code{ATAN2(Y,X)}.
> -Otherwise, it the arcus tangent of @var{X}, where the real part of
> +Otherwise, it the arctangent of @var{X}, where the real part of
  ^^
s/it the/it is the/ presumably?

  Maciej

Re: [PATCH] gcc/configure: Fix check for assembler section merging support on Arm

2025-01-02 Thread Thiago Jung Bauermann

Hello,

Thiago Jung Bauermann  writes:

> This problem was noticed because the recent binutils commit
> d5cbf916be4a ("gas/ELF: also reject merge entity size being zero") caused
> gas to be stricter about mergeable sections without an entity size:
>
> configure:27013: checking assembler for section merging support
> configure:27022: /path/to/as   --fatal-warnings -o conftest.o conftest.s >&5
> conftest.s: Assembler messages:
> conftest.s:1: Warning: invalid merge / string entity size
> conftest.s: Error: 1 warning, treating warnings as errors
> configure:27025: $? = 1
> configure: failed program was
> .section .rodata.str, "aMS", @progbits, 1
> configure:27036: result: no
>
> In previous versions of gas the conftest.s program above was accepted
> and configure detected support for section merging.

I just posted a patch for gas implementing a suggestion by Alan Modra
that makes it accept the syntax above:

https://inbox.sourceware.org/binutils/20250103032831.622617-1-thiago.bauerm...@linaro.org/

However, I think it's still a good idea to commit this patch.

--
Thiago

Re: [WIP 1/8] algol68: top-level, include/ and config/ changes

2025-01-02 Thread Jose E. Marchesi



> "Jose E. Marchesi"  writes:
>
>> This patch contains the changes to files in the GCC top-level
>> directory to introduce the Algol 68 front-end.
>> ---
>>  MAINTAINERS  |2 +
>>  Makefile.def |3 +
>>  Makefile.in  | 1341 +-
>>  Makefile.tpl |   14 +
>>  SECURITY.txt |1 +
>>  config/acx.m4|6 +
>>  configure|  296 +-
>>  configure.ac |   65 ++-
>>  include/dwarf2.h |5 +-
>>  9 files changed, 1706 insertions(+), 27 deletions(-)
>>
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index 0c571bde8bc..827e53c0cdc 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -171,6 +171,7 @@ objective-c/c++ Mike Stump  
>> 
>>  objective-c/c++ Iain Sandoe 
>>  RustArthur Cohen
>>  RustPhilip Herron   
>> 
>> +Algol 68Jose E. Marchesi
>>  
>>  Various Maintainers
>>  
>> @@ -179,6 +180,7 @@ libcpp  Per Bothner 
>> 
>>  libcpp  All C and C++ front end maintainers
>>  libcpp  David Malcolm   
>>  fp-bit  Ian Lance Taylor
>> +libgac  Jose E. Marchesi
>>  libgcc  Ian Lance Taylor
>>  libgo   Ian Lance Taylor
>>  libgomp Jakub Jelinek   
>> diff --git a/Makefile.def b/Makefile.def
>> index 19954e7d731..dcac62a8a98 100644
>> --- a/Makefile.def
>> +++ b/Makefile.def
>> @@ -205,6 +205,7 @@ target_modules = { module= zlib; bootstrap=true; };
>>  target_modules = { module= rda; };
>>  target_modules = { module= libada; };
>>  target_modules = { module= libgm2; lib_path=.libs; };
>> +target_modules = { module= libgac; bootstrap=true; lib_path=.libs; };
>>  target_modules = { module= libgomp; bootstrap= true; lib_path=.libs; };
>>  target_modules = { module= libitm; lib_path=.libs; };
>>  target_modules = { module= libatomic; bootstrap=true; lib_path=.libs; };
>> @@ -727,6 +728,8 @@ languages = { language=d;
>> gcc-check-target=check-d;
>>  lib-check-target=check-target-libphobos; };
>>  languages = { language=jit; gcc-check-target=check-jit; };
>>  languages = { language=rust;gcc-check-target=check-rust; };
>> +languages = { language=algol68; gcc-check-target=check-algol68;
>> +lib-check-target=check-target-libgac; };
>>  
>>  // Toplevel bootstrap
>>  bootstrap_stage = { id=1 ; };
>> diff --git a/Makefile.tpl b/Makefile.tpl
>> index da38dca697a..6ab2ef86349 100644
>> --- a/Makefile.tpl
>> +++ b/Makefile.tpl
>> @@ -279,6 +279,11 @@ POSTSTAGE1_HOST_EXPORTS = \
>>  CC_FOR_BUILD="$$CC"; export CC_FOR_BUILD; \
>>  $(POSTSTAGE1_CXX_EXPORT) \
>>  $(LTO_EXPORTS) \
>> +GAC="$$r/$(HOST_SUBDIR)/prev-gcc/gac$(exeext) 
>> -B$$r/$(HOST_SUBDIR)/prev-gcc/ \
>> +  -B$(build_tooldir)/bin/ $(GACFLAGS_FOR_TARGET) \
>> +  -B$$r/prev-$(TARGET_SUBDIR)/libgac/.libs"; \
>> +export GAC; \
>> +GAC_FOR_BUILD="$$GAC"; export GAC_FOR_BUILD; \
>>  GDC="$$r/$(HOST_SUBDIR)/prev-gcc/gdc$(exeext) 
>> -B$$r/$(HOST_SUBDIR)/prev-gcc/ \
>>-B$(build_tooldir)/bin/ $(GDCFLAGS_FOR_TARGET) \
>>-B$$r/prev-$(TARGET_SUBDIR)/libphobos/libdruntime/gcc \
>> @@ -311,6 +316,7 @@ BASE_TARGET_EXPORTS = \
>>  CPPFLAGS="$(CPPFLAGS_FOR_TARGET)"; export CPPFLAGS; \
>>  CXXFLAGS="$(CXXFLAGS_FOR_TARGET)"; export CXXFLAGS; \
>>  GFORTRAN="$(GFORTRAN_FOR_TARGET) $(XGCC_FLAGS_FOR_TARGET) $$TFLAGS"; 
>> export GFORTRAN; \
>> +GAC="$(GAC_FOR_TARGET) $(XGCC_FLAGS_FOR_TARGET) $$TFLAGS"; export GAC; \
>>  GOC="$(GOC_FOR_TARGET) $(XGCC_FLAGS_FOR_TARGET) $$TFLAGS"; export GOC; \
>>  GDC="$(GDC_FOR_TARGET) $(XGCC_FLAGS_FOR_TARGET) $$TFLAGS"; export GDC; \
>>  GM2="$(GM2_FOR_TARGET) $(XGCC_FLAGS_FOR_TARGET) $$TFLAGS"; export GM2; \
>> @@ -379,6 +385,7 @@ CXX_FOR_BUILD = @CXX_FOR_BUILD@
>>  DLLTOOL_FOR_BUILD = @DLLTOOL_FOR_BUILD@
>>  DSYMUTIL_FOR_BUILD = @DSYMUTIL_FOR_BUILD@
>>  GFORTRAN_FOR_BUILD = @GFORTRAN_FOR_BUILD@
>> +GAC_FOR_BUILD = @GAC_FOR_BUILD@
>>  GOC_FOR_BUILD = @GOC_FOR_BUILD@
>>  GDC_FOR_BUILD = @GDC_FOR_BUILD@
>>  GM2_FOR_BUILD = @GM2_FOR_BUILD@
>> @@ -441,6 +448,7 @@ STRIP = @STRIP@
>>  WINDRES = @WINDRES@
>>  WINDMC = @WINDMC@
>>  
>> +GAC = @GAC@
>>  GDC = @GDC@
>>  GNATBIND = @GNATBIND@
>>  GNATMAKE = @GNATMAKE@
>> @@ -451,6 +459,7 @@ LIBCFLAGS = $(CFLAGS)
>>  CXXFLAGS = @CXXFLAGS@
>>  LIBCXXFLAGS = $(CXXFLAGS) -fno-implicit-templates
>>  GOCFLAGS = $(CFLAGS)
>> +GACFLAGS = @GACFLAGS@
>>  GDCFLAGS = @GDCFLAGS@
>>  GM2FLAGS = $(CFLAGS)
>>  
>> @@ -598,6 +607,7 @@ CXX_FOR_TARGET=$(STAGE_CC_WRAPPER) @CXX_FOR_TARGET@
>>  RAW_CXX_FOR_TARGET=$(STAGE_CC_WRAPPER) @RAW_CXX_FOR_TARGET@
>>  GFORTRAN_FOR_TARGET=$(STAGE_CC_WRAPPER) @GFORTRAN_FOR_TARGET@
>>  GOC_FOR_TARGET=$(STAGE_CC_WRAPPER) @GOC_FOR_TARGET@
>> +

Re: [COMMITTED] Fortran: Grammar/markup fixes in intrinsics documentation

2025-01-02 Thread Sandra Loosemore


On 1/2/25 20:22, Maciej W. Rozycki wrote:

On Tue, 31 Dec 2024, Sandra Loosemore wrote:


diff --git a/gcc/fortran/intrinsic.texi b/gcc/fortran/intrinsic.texi
index d11d37761d9..b47180126ca 100644
--- a/gcc/fortran/intrinsic.texi
+++ b/gcc/fortran/intrinsic.texi
@@ -1577,7 +1577,7 @@ if @var{Y} is present, @var{X} shall be REAL.
  @item @emph{Return value}:
  The return value is of the same type and kind as @var{X}.
  If @var{Y} is present, the result is identical to @code{ATAN2(Y,X)}.
-Otherwise, it the arcus tangent of @var{X}, where the real part of
+Otherwise, it the arctangent of @var{X}, where the real part of

   ^^
s/it the/it is the/ presumably?


Gah, how did I miss that?  :-(  (Maybe it's time for a brain 
transplant...)  I've fixed it now with the attached obvious patch, anyway.


-Sandra

From 15a74614c87aa8c4da069ec5442b0f72c1af0465 Mon Sep 17 00:00:00 2001
From: Sandra Loosemore 
Date: Fri, 3 Jan 2025 04:02:44 +
Subject: [PATCH] Fortran: Fix typo in ATAN documentation.

gcc/fortran/ChangeLog
	* intrinsic.texi (ATAN): Add missing verb.
---
 gcc/fortran/intrinsic.texi | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/fortran/intrinsic.texi b/gcc/fortran/intrinsic.texi
index bb6be0c387c..7c7e4c9372b 100644
--- a/gcc/fortran/intrinsic.texi
+++ b/gcc/fortran/intrinsic.texi
@@ -1577,7 +1577,7 @@ if @var{Y} is present, @var{X} shall be REAL.
 @item @emph{Return value}:
 The return value is of the same type and kind as @var{X}.
 If @var{Y} is present, the result is identical to @code{ATAN2(Y,X)}.
-Otherwise, it the arctangent of @var{X}, where the real part of
+Otherwise, it is the arctangent of @var{X}, where the real part of
 the result is in radians and lies in the range
 @math{-\pi/2 \leq \Re \atan(x) \leq \pi/2}.
 
-- 
2.34.1

[PATCH] forwprop: Handle RAW_DATA_CST in check_ctz_array

2025-01-02 Thread Jakub Jelinek

Hi!

In order to stress test RAW_DATA_CST handling, I've tested trunk gcc with
r15-6339 reapplied and a hack where I've changed
  const unsigned int raw_data_min_len = 128;
to
  const unsigned int raw_data_min_len = 2;
in cp_lexer_new_main and 64 to 4 several times in c_parser_initval
and c_maybe_optimize_large_byte_initializer, so that RAW_DATA_CST doesn't
trigger just on very large initializers, but even quite small ones.

One of the regressions (will work on the others next) was that pr90838.c
testcase regressed, check_ctz_array needs to handle RAW_DATA_CST, otherwise
on larger initializers or if those come from #embed just won't trigger.
The new testcase shows when it doesn't trigger anymore (regression from 14).

The patch just handles RAW_DATA_CST in the CONSTRUCTOR_ELTS the same as is
it was a series of INTEGER_CSTs.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2025-01-02  Jakub Jelinek  

* tree-ssa-forwprop.cc (check_ctz_array): Handle also RAW_DATA_CST
in the CONSTRUCTOR_ELTS.

* gcc.dg/pr90838-2.c: New test.

--- gcc/tree-ssa-forwprop.cc.jj 2024-12-28 00:12:11.185146287 +0100
+++ gcc/tree-ssa-forwprop.cc2024-12-31 12:45:33.512434253 +0100
@@ -2269,7 +2269,7 @@ check_ctz_array (tree ctor, unsigned HOS
 HOST_WIDE_INT &zero_val, unsigned shift, unsigned bits)
 {
   tree elt, idx;
-  unsigned HOST_WIDE_INT i, mask;
+  unsigned HOST_WIDE_INT i, mask, raw_idx = 0;
   unsigned matched = 0;
 
   mask = ((HOST_WIDE_INT_1U << (bits - shift)) - 1) << shift;
@@ -2278,13 +2278,34 @@ check_ctz_array (tree ctor, unsigned HOS
 
   FOR_EACH_CONSTRUCTOR_ELT (CONSTRUCTOR_ELTS (ctor), i, idx, elt)
 {
-  if (TREE_CODE (idx) != INTEGER_CST || TREE_CODE (elt) != INTEGER_CST)
+  if (TREE_CODE (idx) != INTEGER_CST)
return false;
-  if (i > bits * 2)
+  if (TREE_CODE (elt) != INTEGER_CST && TREE_CODE (elt) != RAW_DATA_CST)
return false;
 
   unsigned HOST_WIDE_INT index = tree_to_shwi (idx);
-  HOST_WIDE_INT val = tree_to_shwi (elt);
+  HOST_WIDE_INT val;
+
+  if (TREE_CODE (elt) == INTEGER_CST)
+   val = tree_to_shwi (elt);
+  else
+   {
+ if (raw_idx == (unsigned) RAW_DATA_LENGTH (elt))
+   {
+ raw_idx = 0;
+ continue;
+   }
+ if (TYPE_UNSIGNED (TREE_TYPE (elt)))
+   val = RAW_DATA_UCHAR_ELT (elt, raw_idx);
+ else
+   val = RAW_DATA_SCHAR_ELT (elt, raw_idx);
+ index += raw_idx;
+ raw_idx++;
+ i--;
+   }
+
+  if (index > bits * 2)
+   return false;
 
   if (index == 0)
{
--- gcc/testsuite/gcc.dg/pr90838-2.c.jj 2024-12-31 12:50:10.548568029 +0100
+++ gcc/testsuite/gcc.dg/pr90838-2.c2024-12-31 12:52:41.944455198 +0100
@@ -0,0 +1,39 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-forwprop2-details" } */
+/* { dg-additional-options "-mbmi" { target { { i?86-*-* x86_64-*-* } && { ! { 
ia32 } } } } } */
+/* { dg-additional-options "-march=rv64gc_zbb" { target { rv64 } } } */
+/* { dg-additional-options "-march=rv32gc_zbb" { target { rv32 } } } */
+/* { dg-require-effective-target int32plus } */
+
+static const unsigned long long magic = 0x03f08c5392f756cdULL;
+
+static const char table[128] = {
+ 0,  1, 12,  2, 13, 22, 17,  3,
+14, 33, 23, 36, 18, 58, 28,  4,
+62, 15, 34, 26, 24, 48, 50, 37,
+19, 55, 59, 52, 29, 44, 39,  5,
+63, 11, 21, 16, 32, 35, 57, 27,
+61, 25, 47, 49, 54, 51, 43, 38,
+10, 20, 31, 56, 60, 46, 53, 42,
+ 9, 30, 45, 41,  8, 40,  7,  6,
+ 1,  2,  3,  4,  5,  6,  7,  8,
+ 9, 10, 11, 12, 13, 14, 15, 16,
+17, 18, 19, 20, 21, 22, 23, 24,
+25, 26, 27, 28, 29, 30, 31, 32,
+33, 34, 35, 36, 37, 38, 39, 40,
+41, 42, 43, 44, 45, 46, 47, 48,
+49, 50, 51, 52, 53, 54, 55, 56,
+57, 58, 59, 60, 61, 62, 63, 64
+};
+
+int ctz4 (unsigned long x)
+{
+  unsigned long lsb = x & -x;
+  return table[(lsb * magic) >> 58];
+}
+
+/* { dg-final { scan-tree-dump {= \.CTZ} "forwprop2" { target { { i?86-*-* 
x86_64-*-* } && { ! { ia32 } } } } } } */
+/* { dg-final { scan-tree-dump {= \.CTZ} "forwprop2" { target aarch64*-*-* } } 
} */
+/* { dg-final { scan-tree-dump {= \.CTZ} "forwprop2" { target { rv64 } } } } */
+/* { dg-final { scan-tree-dump {= \.CTZ} "forwprop2" { target { rv32 } } } } */
+/* { dg-final { scan-tree-dump {= \.CTZ} "forwprop2" { target { 
loongarch64*-*-* } } } } */

Jakub

[committed] Tweak update-copyright.py script

2025-01-02 Thread Jakub Jelinek

Hi!

When running update-copyright.py --this-year, I've encountered various
failures, this patch works around those.

Committed as obvious.

For gen-evolution.awk, gen-cxxapi-file.py and uname2c.h I've dealt with
copyright year updates manually later on.

Note, I've also rotated ChangeLogs with yearly cadence and committed manual
as well as scripted copyright year updates.  See
https://gcc.gnu.org/r15-6495
https://gcc.gnu.org/r15-6496
https://gcc.gnu.org/r15-6497
https://gcc.gnu.org/r15-6500
https://gcc.gnu.org/r15-6501
for details.

2025-01-02  Jakub Jelinek  

* update-copyright.py (GCCFilter): Ignore gen-evolution.awk and
gen-cxxapi-file.py.
(TestsuiteFilter): Ignore spec-example-4.sarif.
(LibCppFilter): Ignore uname2c.h.

--- contrib/update-copyright.py
+++ contrib/update-copyright.py
@@ -1,6 +1,6 @@
 #!/usr/bin/env python3
 #
-# Copyright (C) 2013-2024 Free Software Foundation, Inc.
+# Copyright (C) 2013-2025 Free Software Foundation, Inc.
 #
 # This script is free software; you can redistribute it and/or modify
 # it under the terms of the GNU General Public License as published by
@@ -560,6 +560,8 @@ class GCCFilter (GenericFilter):
 
 # Weird ways to compose copyright year
 'GmcOptions.cc',
+'gen-evolution.awk',
+'gen-cxxapi-file.py',
 ])
 
 self.skip_dirs |= set ([
@@ -587,6 +589,11 @@ class TestsuiteFilter (GenericFilter):
 def __init__ (self):
 GenericFilter.__init__ (self)
 
+self.skip_files |= set ([
+# Weird ways to compose copyright year
+'spec-example-4.sarif',
+])
+
 self.skip_extensions |= set ([
 # Don't change the tests, which could be woend by anyone.
 '.c',
@@ -620,6 +627,12 @@ class LibCppFilter (GenericFilter):
 def __init__ (self):
 GenericFilter.__init__ (self)
 
+self.skip_files |= set ([
+# Generated file with the generated strings sometimes
+# matching the regexps.
+'uname2c.h',
+])
+
 self.skip_extensions |= set ([
 # Maintained by the translation project.
 '.po',


Jakub

[committed] Use u'' instead of '' in libgdiagnostics/conf.py

2025-01-02 Thread Jakub Jelinek

Hi!

libgdiagnostics/conf.py breaks update-copyright.py --this-year,
which only accepts copyright year in u'' literals in python files,
not in ''.

u'' strings is what e.g. libgccjit conf.py uses.
Tested by building libgdiagnostics docs without/with this patch, the
difference is just the expected addition of -2025 in tons of spots,
nothing else.

Committed to trunk as obvious.

2025-01-02  Jakub Jelinek  

* doc/libgdiagnostics/conf.py: Use u'' instead of '' in
project and copyright initialization.

--- gcc/doc/libgdiagnostics/conf.py
+++ gcc/doc/libgdiagnostics/conf.py
@@ -6,8 +6,8 @@
 # -- Project information -
 # 
https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information
 
-project = 'libgdiagnostics'
-copyright = '2024 Free Software Foundation, Inc.'
+project = u'libgdiagnostics'
+copyright = u'2024-2025 Free Software Foundation, Inc.'
 
 # -- General configuration ---
 # 
https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration


Jakub

[PATCH] match.pd: Fold pattern of round semantics.

2025-01-02 Thread Zhou Zhao

This patch implements 4 rules for semantics of round func in match.pd under
-funsafe-math-optimizations:
1) (x-floor(x)) < (ceil(x)-x) ? floor(x) : ceil(x) -> floor(x+0.5)
2) (x-floor(x)) >= (ceil(x)-x) ? ceil(x) : floor(x) -> floor(x+0.5)
3) (ceil(x)-x) > (x-floor(x)) ? floor(x) : ceil(x) -> floor(x+0.5)
4) (ceil(x)-x) <= (x-floor(x)) ? ceil(x) : floor(x) -> floor(x+0.5)

The patch implements floor(x+0.5) operation to replace semantics of
round(x) function.
The patch was regtested on aarch64-linux-gnu and x86_64-linux-gnu, SPEC
2017 and SPEC 2006 were run:
As for SPEC 2017, 538.imagick_r benchmark performance increased by 3%+
in base test of ratio mode.
As for SPEC 2006, while the transform does not seem to be triggered, we
also see no non-noise impact on performance.
OK for mainline?

gcc/ChangeLog:

* match.pd: Add new pattern for round.

gcc/testsuite/ChangeLog:

* gcc.dg/fold-round-1.c: New test.
---
 gcc/match.pd| 27 ++
 gcc/testsuite/gcc.dg/fold-round-1.c | 56 +
 2 files changed, 83 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/fold-round-1.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 83eca8b2e0a..7b22b7913ac 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -777,6 +777,33 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (rdiv @0 (negate @1))
  (rdiv (negate @0) @1))
 
+(if (flag_unsafe_math_optimizations)
+/* convert semantics of round(x) function to floor(x+0.5) */
+/* (x-floor(x)) < (ceil(x)-x) ? floor(x) : ceil(x) --> floor(x+0.5) */
+/* (x-floor(x)) >= (ceil(x)-x) ? ceil(x) : floor(x) --> floor(x+0.5) */
+/* (ceil(x)-x) > (x-floor(x)) ? floor(x) : ceil(x) --> floor(x+0.5) */
+/* (ceil(x)-x) <= (x-floor(x)) ? ceil(x) : floor(x) --> floor(x+0.5) */
+(for op (lt ge)
+ bt (FLOOR CEIL)
+ bf (CEIL FLOOR)
+ floor (FLOOR FLOOR)
+ ceil (CEIL CEIL)
+ (simplify
+  (cond (op (minus:s SSA_NAME@0 (floor SSA_NAME@0))
+   (minus:s (ceil SSA_NAME@0) SSA_NAME@0))
+   (bt SSA_NAME@0) (bf SSA_NAME@0))
+  (floor (plus @0 { build_real (type, dconsthalf); }
+(for op (gt le)
+ bt (FLOOR CEIL)
+ bf (CEIL FLOOR)
+ floor (FLOOR FLOOR)
+ ceil (CEIL CEIL)
+ (simplify
+  (cond (op (minus:s (ceil SSA_NAME@0) SSA_NAME@0)
+   (minus:s SSA_NAME@0 (floor SSA_NAME@0)))
+   (bt SSA_NAME@0) (bf SSA_NAME@0))
+  (floor (plus @0 { build_real (type, dconsthalf); })
+
 (if (flag_unsafe_math_optimizations)
  /* Simplify (C / x op 0.0) to x op 0.0 for C != 0, C != Inf/Nan.
 Since C / x may underflow to zero, do this only for unsafe math.  */
diff --git a/gcc/testsuite/gcc.dg/fold-round-1.c 
b/gcc/testsuite/gcc.dg/fold-round-1.c
new file mode 100644
index 000..845d6d2e475
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/fold-round-1.c
@@ -0,0 +1,56 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast -funsafe-math-optimizations" } */
+
+extern void link_error (void);
+
+#define TEST_ROUND(TYPE, FFLOOR, FCEIL)   \
+  void round_##FFLOOR##_1 (TYPE x)\
+  {   \
+TYPE t1 = 0;  \
+TYPE t2 = __builtin_##FFLOOR (x + 0.5);   \
+if ((x - __builtin_##FFLOOR (x)) < (__builtin_##FCEIL (x) - x))   \
+  t1 = __builtin_##FFLOOR (x);\
+else  \
+  t1 = __builtin_##FCEIL (x); \
+if (t1 != t2) \
+  link_error ();  \
+  }   \
+  void round_##FFLOOR##_2 (TYPE x)\
+  {   \
+TYPE t1 = 0;  \
+TYPE t2 = __builtin_##FFLOOR (x + 0.5);   \
+if ((__builtin_##FCEIL (x) - x) > (x - __builtin_##FFLOOR (x)))   \
+  t1 = __builtin_##FFLOOR (x);\
+else  \
+  t1 = __builtin_##FCEIL (x); \
+if (t1 != t2) \
+  link_error ();  \
+  }   \
+  void round_##FFLOOR##_3 (TYPE x)\
+  {   \
+TYPE t1 = 0;

Re: [PATCH v5 03/10] OpenMP: Remove dead code from declare variant reimplementation

2025-01-02 Thread Tobias Burnus


Hi Sandra,

Sandra Loosemore wrote:

After reimplementing late resolution of "declare variant", the
declare_variant_alt and calls_declare_variant_alt flags on struct
cgraph_node are no longer used by anything.  For the purposes of
marking functions that need late resolution, the
has_omp_variant_constructs flag has replaced
calls_declare_variant_alt.


[…]

LGTM. — Admittedly, I have locally not applied the posted patch but 
Sandra's not yet posted rediffed patch, which differs only in two times 
two additional lines in a function deleted here and changes in the 
context lines, untouched by the patch but which will confuse 'patch'.


→ 01 (previously approved; rediff only differs in small line-number 
shifts, which patch handles silently) — and (this) 03 are okay however, 
I still need to finish reviewing 02, which is the complex core patch.


Tobias

Re: [PATCH] testsuite: libitm: Adjust how libitm.c++ passes link flags

2025-01-02 Thread Matthew Malcomson




On 1/2/25 12:08, Richard Sandiford wrote:

+# This set in order to give libitm.c++/c++.exp a nicely named flag to set
+# when adding C++ options.
+set TEST_ALWAYS_FLAGS ""


This looked odd at first glance.  By unconditionally writing "" to the
variable, it seems to subvert the save and restore done in c++.exp.



Yeah -- I see your point, that's not good.


How about instead copying the behaviour of asan_init and asan_finish,
so that libitm_init and libitm_finish do the save and restore?  Or perhaps
a slight variation: after saving, libitm_init can set TEST_ALWAYS_FLAGS
to "" if TEST_ALWAYS_FLAGS was previously unset.

c++.exp would then not need to save and restore the flags itself, and
could still assume that TEST_ALWAYS_FLAGS is always set.



Have made the suggested change -- mentioning the extra little bit of 
complexity that this introduced ...


Since libitm is a "tool" in the DejaGNU sense (while asan is not), 
libitm_finish gets called twice for each libitm_init call.


The `runtest` procedure in DejaGNU's `runtest.exp` calls `${tool}_init`, 
executes the c.exp or c++.exp test runner and then calls 
`${tool}_finish`, while in each of the test runners we also call 
`dg-finish` (as required by the dg.exp API) which calls `${tool}_finish` 
directly.


This means using `libitm_finish` needs an extra bit in global state to 
check whether we have already reset things.

- Has been set in libitm_init and was unset at start
  => saved_TEST_ALWAYS_FLAGS is unset.
- Has been set in libitm_init and was set at start
  => saved_TEST_ALWAYS_FLAGS is set.
- Has already been reset => some other flag.

Have attached the adjusted patch to this email.
From fbce3b25e8ccad80697f1596f566b268fff71221 Mon Sep 17 00:00:00 2001
From: Matthew Malcomson 
Date: Wed, 11 Dec 2024 11:03:55 +
Subject: [PATCH] testsuite: libitm: Adjust how libitm.c++ passes link flags

For the `gcc` and `g++` tools we often pass -B/path/to/object/dir in via
`TEST_ALWAYS_FLAGS` (see e.g. asan.exp where this is set).
In libitm.c++/c++.exp we pass that -B flag via the `tool_flags` argument
to `dg-runtest`.

Passing as the `tool_flags` argument means that these flags get added to
the name of the test.  This means that if one were to compare the
testsuite results between runs made in different build directories
libitm.c++ gives a reasonable amount of NA->PASS and PASS->NA
differences even though the same tests passed each time.

This patch follows the example set in other parts of the testsuite like
asan_init and passes the -B arguments to the compiler via a global
variable called `TEST_ALWAYS_FLAGS`.  For this DejaGNU "tool" we had to
newly initialise that variable in libitm_init and add a check against
that variable in libitm_target_compile.  I thought about adding the
relevant flags we need for C++ into `ALWAYS_CFLAGS` but decided against
it since the name didn't match what we would be using it for.

We save the global `TEST_ALWAYS_FLAGS` in `libitm_init` and ensure
it's initialised.  We then reset this in `libitm_finish`.  Since
`libitm_finish` gets called twice (once from `dg-finish` and once from
the `runtest` procedure) we have to manage state to tell whether
TEST_ALWAYS_FLAGS has already been reset.

Testing done to bootstrap & regtest on AArch64.  Manually observed that
the testsuite diff between two different build directories no longer
exists.

N.b. since I pass the new flags in the `ldflags` option of the DejaGNU
options while the previous code always passed this -B flag, the compile
test throwdown.C no longer gets compiled with this -B flag.  I believe
that is not a problem.

libitm/ChangeLog:

	* testsuite/libitm.c++/c++.exp: Use TEST_ALWAYS_FLAGS instead of
	passing arguments to dg-runtest.
	* testsuite/lib/libitm.exp (libitm_init): Initialise
	TEST_ALWAYS_FLAGS.
	(libitm_finish): Reset TEST_ALWAYS_FLAGS.
	(libitm_target_compile): Take flags from TEST_ALWAYS_FLAGS.

Signed-off-by: Matthew Malcomson 
---
 libitm/testsuite/lib/libitm.exp | 42 +
 libitm/testsuite/libitm.c++/c++.exp |  5 ++--
 2 files changed, 44 insertions(+), 3 deletions(-)

diff --git a/libitm/testsuite/lib/libitm.exp b/libitm/testsuite/lib/libitm.exp
index ac390d6d0dd..c5b9bb1127f 100644
--- a/libitm/testsuite/lib/libitm.exp
+++ b/libitm/testsuite/lib/libitm.exp
@@ -59,6 +59,7 @@ set dg-do-what-default run
 #
 
 set libitm_compile_options ""
+set libitm_initialised 0
 
 #
 # libitm_init
@@ -77,12 +78,15 @@ proc libitm_init { args } {
 global blddir
 global gluefile wrap_flags
 global ALWAYS_CFLAGS
+global TEST_ALWAYS_FLAGS
+global saved_TEST_ALWAYS_FLAGS
 global CFLAGS
 global TOOL_EXECUTABLE TOOL_OPTIONS
 global GCC_UNDER_TEST
 global TESTING_IN_BUILD_TREE
 global target_triplet
 global always_ld_library_path
+global libitm_initialised
 
 set blddir [lookfor_file [get_multilibs] libitm]
 
@@ -145,6 +149,13 @@ proc libitm_init { args } {
 	}
 }
 
+# This set in

Re: [PATCH] match.pd: Fold pattern of round semantics.

2025-01-02 Thread 赵洲

Add Reviewer Richard Biener.


> -原始邮件-
> 发件人: "Zhou Zhao" 
> 发送时间:2025-01-02 19:37:07 (星期四)
> 收件人: gcc-patches@gcc.gnu.org
> 抄送: xry...@xry111.site, i...@xen0n.name, chengl...@loongson.cn, 
> xucheng...@loongson.cn, zhaoz...@loongson.cn
> 主题: [PATCH] match.pd: Fold pattern of round semantics.
> 
> This patch implements 4 rules for semantics of round func in match.pd under
> -funsafe-math-optimizations:
> 1) (x-floor(x)) < (ceil(x)-x) ? floor(x) : ceil(x) -> floor(x+0.5)
> 2) (x-floor(x)) >= (ceil(x)-x) ? ceil(x) : floor(x) -> floor(x+0.5)
> 3) (ceil(x)-x) > (x-floor(x)) ? floor(x) : ceil(x) -> floor(x+0.5)
> 4) (ceil(x)-x) <= (x-floor(x)) ? ceil(x) : floor(x) -> floor(x+0.5)
> 
> The patch implements floor(x+0.5) operation to replace semantics of
> round(x) function.
> The patch was regtested on aarch64-linux-gnu and x86_64-linux-gnu, SPEC
> 2017 and SPEC 2006 were run:
> As for SPEC 2017, 538.imagick_r benchmark performance increased by 3%+
> in base test of ratio mode.
> As for SPEC 2006, while the transform does not seem to be triggered, we
> also see no non-noise impact on performance.
> OK for mainline?
> 
> gcc/ChangeLog:
> 
>   * match.pd: Add new pattern for round.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/fold-round-1.c: New test.
> ---
>  gcc/match.pd| 27 ++
>  gcc/testsuite/gcc.dg/fold-round-1.c | 56 +
>  2 files changed, 83 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/fold-round-1.c
> 
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 83eca8b2e0a..7b22b7913ac 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -777,6 +777,33 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (rdiv @0 (negate @1))
>   (rdiv (negate @0) @1))
>  
> +(if (flag_unsafe_math_optimizations)
> +/* convert semantics of round(x) function to floor(x+0.5) */
> +/* (x-floor(x)) < (ceil(x)-x) ? floor(x) : ceil(x) --> floor(x+0.5) */
> +/* (x-floor(x)) >= (ceil(x)-x) ? ceil(x) : floor(x) --> floor(x+0.5) */
> +/* (ceil(x)-x) > (x-floor(x)) ? floor(x) : ceil(x) --> floor(x+0.5) */
> +/* (ceil(x)-x) <= (x-floor(x)) ? ceil(x) : floor(x) --> floor(x+0.5) */
> +(for op (lt ge)
> + bt (FLOOR CEIL)
> + bf (CEIL FLOOR)
> + floor (FLOOR FLOOR)
> + ceil (CEIL CEIL)
> + (simplify
> +  (cond (op (minus:s SSA_NAME@0 (floor SSA_NAME@0))
> + (minus:s (ceil SSA_NAME@0) SSA_NAME@0))
> + (bt SSA_NAME@0) (bf SSA_NAME@0))
> +  (floor (plus @0 { build_real (type, dconsthalf); }
> +(for op (gt le)
> + bt (FLOOR CEIL)
> + bf (CEIL FLOOR)
> + floor (FLOOR FLOOR)
> + ceil (CEIL CEIL)
> + (simplify
> +  (cond (op (minus:s (ceil SSA_NAME@0) SSA_NAME@0)
> + (minus:s SSA_NAME@0 (floor SSA_NAME@0)))
> + (bt SSA_NAME@0) (bf SSA_NAME@0))
> +  (floor (plus @0 { build_real (type, dconsthalf); })
> +
>  (if (flag_unsafe_math_optimizations)
>   /* Simplify (C / x op 0.0) to x op 0.0 for C != 0, C != Inf/Nan.
>  Since C / x may underflow to zero, do this only for unsafe math.  */
> diff --git a/gcc/testsuite/gcc.dg/fold-round-1.c 
> b/gcc/testsuite/gcc.dg/fold-round-1.c
> new file mode 100644
> index 000..845d6d2e475
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/fold-round-1.c
> @@ -0,0 +1,56 @@
> +/* { dg-do compile } */
> +/* { dg-options "-Ofast -funsafe-math-optimizations" } */
> +
> +extern void link_error (void);
> +
> +#define TEST_ROUND(TYPE, FFLOOR, FCEIL)  
>  \
> +  void round_##FFLOOR##_1 (TYPE x)   
>  \
> +  {  
>  \
> +TYPE t1 = 0; 
>  \
> +TYPE t2 = __builtin_##FFLOOR (x + 0.5);  
>  \
> +if ((x - __builtin_##FFLOOR (x)) < (__builtin_##FCEIL (x) - x))  
>  \
> +  t1 = __builtin_##FFLOOR (x);   
>  \
> +else 
>  \
> +  t1 = __builtin_##FCEIL (x);
>  \
> +if (t1 != t2)
>  \
> +  link_error (); 
>  \
> +  }  
>  \
> +  void round_##FFLOOR##_2 (TYPE x)   
>  \
> +  {  
>  \
> +TYPE t1 = 0; 
>  \
> +TYPE t2 = __builtin_##FFLOOR (x + 0.5);  
>  \
> +if ((__builtin_##FCEIL (x) - x) > (x - __builtin_##FFLOOR (x)))  
>  \
> +  t1 = __builtin_##FFLOOR (x);   
>  \
> +else

Re: [PATCH] Respect -fprofile-prefix-map for getcwd in .gcno files

2025-01-02 Thread Richard Biener

On Wed, Jan 1, 2025 at 1:44 AM Fangrui Song  wrote:
>
> so that
> `gcc -c a.cc --coverage -fprofile-prefix-map=$PWD=.`
> does not emit $PWD in the generated a.gcno file.

This looks OK to me.  Please leave a few days for others to comment though.

Thanks,
Richard.

> PR gcov-profile/96092
> ---
>  gcc/coverage.cc | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/coverage.cc b/gcc/coverage.cc
> index 1ed55fed547..c6e9aced6fe 100644
> --- a/gcc/coverage.cc
> +++ b/gcc/coverage.cc
> @@ -1341,7 +1341,7 @@ coverage_init (const char *filename)
>   gcov_write_unsigned (bbg_file_stamp);
>   /* Use an arbitrary checksum */
>   gcov_write_unsigned (0);
> - gcov_write_string (getpwd ());
> + gcov_write_string (remap_profile_filename (getpwd ()));
>
>   /* Do not support has_unexecuted_blocks for Ada.  */
>   gcov_write_unsigned (strcmp (lang_hooks.name, "GNU Ada") != 0);
> --
> 2.47.1
>

Re: [committed] Use u'' instead of '' in libgdiagnostics/conf.py

2025-01-02 Thread Jakub Jelinek

On Thu, Jan 02, 2025 at 11:51:20AM +, Richard Sandiford wrote:
> Jakub Jelinek  writes:
> > libgdiagnostics/conf.py breaks update-copyright.py --this-year,
> > which only accepts copyright year in u'' literals in python files,
> > not in ''.
> >
> > u'' strings is what e.g. libgccjit conf.py uses.
> > Tested by building libgdiagnostics docs without/with this patch, the
> > difference is just the expected addition of -2025 in tons of spots,
> > nothing else.
> 
> It'd be good to move all the python scripts over to python 3, so that
> this is no longer necessary.  That's obviously separate work though...

Nothing against that.
My python-fu is very limited though.

Right now update-copyright.py has
'|copyright = u\''
among other parts of regexp, maybe it would be just a matter of adding
'|copyright = \''
too or replacing the u one with the latter and getting rid of u'' strings
plus testing what it breaks/changes.

Jakub

[PATCH]AArch64: Fix costing of emulated gathers/scatters [PR118188]

2025-01-02 Thread Tamar Christina

Hi All,

When a target does not support gathers and scatters the vectorizer tries to
emulate these using scalar loads/stores and a reconstruction of vectors from
scalar.

The loads are still marked with VMAT_GATHER_SCATTER to indicate that they are
gather/scatters, however the vectorizer also asks the target to cost the
instruction that generates the indexes for the emulated instructions.

This is done by asking the target to cost vec_to_scalar and vec_construct with
a stmt_vinfo being the VMAT_GATHER_SCATTER.

Since Adv. SIMD does not have an LD1 variant that takes an Adv. SIMD Scalar
element the operation is lowered entirely into a sequence of GPR loads to create
the x registers for the indexes.

At the moment however we don't cost these, and so the vectorizer things that
when it emulates the instructions that it's much cheaper than using an actual
gather/scatter with SVE.  Consider:

#define iterations 10
#define LEN_1D 32000

float a[LEN_1D], b[LEN_1D];

float
s4115 (int *ip)
{
float sum = 0.;
for (int i = 0; i < LEN_1D; i++)
{
sum += a[i] * b[ip[i]];
}
return sum;
}

which before this patch with -mcpu= generates:

.L2:
add x3, x0, x1
ldrsw   x4, [x0, x1]
ldrsw   x6, [x3, 4]
ldpsw   x3, x5, [x3, 8]
ldr s1, [x2, x4, lsl 2]
ldr s30, [x2, x6, lsl 2]
ldr s31, [x2, x5, lsl 2]
ldr s29, [x2, x3, lsl 2]
uzp1v30.2s, v30.2s, v31.2s
ldr q31, [x7, x1]
add x1, x1, 16
uzp1v1.2s, v1.2s, v29.2s
zip1v30.4s, v1.4s, v30.4s
fmlav0.4s, v31.4s, v30.4s
cmp x1, x8
bne .L2

but during costing:

a[i_18] 1 times vector_load costs 4 in body
*_4 1 times unaligned_load (misalign -1) costs 4 in body
b[_5] 4 times vec_to_scalar costs 32 in body
b[_5] 4 times scalar_load costs 16 in body
b[_5] 1 times vec_construct costs 3 in body
_1 * _6 1 times vector_stmt costs 2 in body
_7 + sum_16 1 times scalar_to_vec costs 4 in prologue
_7 + sum_16 1 times vector_stmt costs 2 in epilogue
_7 + sum_16 1 times vec_to_scalar costs 4 in epilogue
_7 + sum_16 1 times vector_stmt costs 2 in body

Here we see that the latency for the vec_to_scalar is very high.  We know the
intermediate vector isn't usable by the target ISA and will always be elided.
However these latencies need to remain high because when costing gather/scatters
IFNs we still pass the nunits of the type along.  In other words, the vectorizer
is still costing vector gather/scatters as scalar load/stores.

Lowering the cost for the emulated gathers would result in emulation being
seemingly cheaper.  So while the emulated costs are very high, they need to be
higher than those for the IFN costing.

i.e. the vectorizer generates:

  vect__5.9_8 = MEM  [(intD.7 *)vectp_ip.7_14];
  _35 = BIT_FIELD_REF ;
  _36 = (sizetype) _35;
  _37 = _36 * 4;
  _38 = _34 + _37;
  _39 = (voidD.55 *) _38;
  # VUSE <.MEM_10(D)>
  _40 = MEM[(floatD.32 *)_39];

which after IVopts is:

  _63 = &MEM  [(int *)ip_11(D) + ivtmp.19_27 * 1];
  _47 = BIT_FIELD_REF  [(int *)_63], 32, 64>;
  _41 = BIT_FIELD_REF  [(int *)_63], 32, 32>;
  _35 = BIT_FIELD_REF  [(int *)_63], 32, 0>;
  _53 = BIT_FIELD_REF  [(int *)_63], 32, 96>;

Which we correctly lower in RTL to individual loads to avoid the repeated umov.

As such, we should cost the vec_to_scalar as GPR loads and also do so for the
throughput which we at the moment cost as:

  note:  Vector issue estimate:
  note:load operations = 6
  note:store operations = 0
  note:general operations = 6
  note:reduction latency = 2
  note:estimated min cycles per iteration = 2.00

Which means 3 loads for the GOR indexes are missing, making it seem like the
emulated loop has a much lower cycles per iter than it actually does since the
bottleneck on the load units are not modelled.

But worse, because the vectorizer costs gathers/scatters IFNs as scalar
load/stores the number of loads required for an SVE gather is always much
higher than the equivalent emulated variant.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

PR target/118188
* config/aarch64/aarch64.cc (aarch64_vector_costs::count_ops): Adjust
throughput of emulated gather and scatters.

gcc/testsuite/ChangeLog:

PR target/118188
* gcc.target/aarch64/sve/gather_load_12.c: New test.

---
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
6bb4bdf2472e62d9b066a06561da8e516f1b3c3e..cb9b155826d12b622ae0df1736e4b042d01cf56a
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -17358,6 +17358,25 @@ aarch64_vector_costs::count_ops (unsigned int count, 
vect_cost_for_stmt kind,
return;
 }
 
+  /* Detect the case where we are using an emulated gather/scatter.  When a
+ target does not support gathers and scatters direc

Re: [PATCH]AArch64: Fix costing of emulated gathers/scatters [PR118188]

2025-01-02 Thread Richard Sandiford

Tamar Christina  writes:
> Hi All,
>
> When a target does not support gathers and scatters the vectorizer tries to
> emulate these using scalar loads/stores and a reconstruction of vectors from
> scalar.
>
> The loads are still marked with VMAT_GATHER_SCATTER to indicate that they are
> gather/scatters, however the vectorizer also asks the target to cost the
> instruction that generates the indexes for the emulated instructions.
>
> This is done by asking the target to cost vec_to_scalar and vec_construct with
> a stmt_vinfo being the VMAT_GATHER_SCATTER.
>
> Since Adv. SIMD does not have an LD1 variant that takes an Adv. SIMD Scalar
> element the operation is lowered entirely into a sequence of GPR loads to 
> create
> the x registers for the indexes.
>
> At the moment however we don't cost these, and so the vectorizer things that
> when it emulates the instructions that it's much cheaper than using an actual
> gather/scatter with SVE.  Consider:
>
> #define iterations 10
> #define LEN_1D 32000
>
> float a[LEN_1D], b[LEN_1D];
>
> float
> s4115 (int *ip)
> {
> float sum = 0.;
> for (int i = 0; i < LEN_1D; i++)
> {
> sum += a[i] * b[ip[i]];
> }
> return sum;
> }
>
> which before this patch with -mcpu= generates:
>
> .L2:
> add x3, x0, x1
> ldrsw   x4, [x0, x1]
> ldrsw   x6, [x3, 4]
> ldpsw   x3, x5, [x3, 8]
> ldr s1, [x2, x4, lsl 2]
> ldr s30, [x2, x6, lsl 2]
> ldr s31, [x2, x5, lsl 2]
> ldr s29, [x2, x3, lsl 2]
> uzp1v30.2s, v30.2s, v31.2s
> ldr q31, [x7, x1]
> add x1, x1, 16
> uzp1v1.2s, v1.2s, v29.2s
> zip1v30.4s, v1.4s, v30.4s
> fmlav0.4s, v31.4s, v30.4s
> cmp x1, x8
> bne .L2
>
> but during costing:
>
> a[i_18] 1 times vector_load costs 4 in body
> *_4 1 times unaligned_load (misalign -1) costs 4 in body
> b[_5] 4 times vec_to_scalar costs 32 in body
> b[_5] 4 times scalar_load costs 16 in body
> b[_5] 1 times vec_construct costs 3 in body
> _1 * _6 1 times vector_stmt costs 2 in body
> _7 + sum_16 1 times scalar_to_vec costs 4 in prologue
> _7 + sum_16 1 times vector_stmt costs 2 in epilogue
> _7 + sum_16 1 times vec_to_scalar costs 4 in epilogue
> _7 + sum_16 1 times vector_stmt costs 2 in body
>
> Here we see that the latency for the vec_to_scalar is very high.  We know the
> intermediate vector isn't usable by the target ISA and will always be elided.
> However these latencies need to remain high because when costing 
> gather/scatters
> IFNs we still pass the nunits of the type along.  In other words, the 
> vectorizer
> is still costing vector gather/scatters as scalar load/stores.
>
> Lowering the cost for the emulated gathers would result in emulation being
> seemingly cheaper.  So while the emulated costs are very high, they need to be
> higher than those for the IFN costing.
>
> i.e. the vectorizer generates:
>
>   vect__5.9_8 = MEM  [(intD.7 *)vectp_ip.7_14];
>   _35 = BIT_FIELD_REF ;
>   _36 = (sizetype) _35;
>   _37 = _36 * 4;
>   _38 = _34 + _37;
>   _39 = (voidD.55 *) _38;
>   # VUSE <.MEM_10(D)>
>   _40 = MEM[(floatD.32 *)_39];
>
> which after IVopts is:
>
>   _63 = &MEM  [(int *)ip_11(D) + ivtmp.19_27 * 1];
>   _47 = BIT_FIELD_REF  [(int *)_63], 32, 64>;
>   _41 = BIT_FIELD_REF  [(int *)_63], 32, 32>;
>   _35 = BIT_FIELD_REF  [(int *)_63], 32, 0>;
>   _53 = BIT_FIELD_REF  [(int *)_63], 32, 96>;
>
> Which we correctly lower in RTL to individual loads to avoid the repeated 
> umov.
>
> As such, we should cost the vec_to_scalar as GPR loads and also do so for the
> throughput which we at the moment cost as:
>
>   note:  Vector issue estimate:
>   note:load operations = 6
>   note:store operations = 0
>   note:general operations = 6
>   note:reduction latency = 2
>   note:estimated min cycles per iteration = 2.00
>
> Which means 3 loads for the GOR indexes are missing, making it seem like the
> emulated loop has a much lower cycles per iter than it actually does since the
> bottleneck on the load units are not modelled.

Yeah, currently the memory operations for an emulated 4-element
load/store would be:

- 1 vector load for the indices
- 4 loads for the gather load, or 4 stores for the scatter store

(and then +1 for the a[i] access in this case).

Therefore...

> But worse, because the vectorizer costs gathers/scatters IFNs as scalar
> load/stores the number of loads required for an SVE gather is always much
> higher than the equivalent emulated variant.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>   PR target/118188
>   * config/aarch64/aarch64.cc (aarch64_vector_costs::count_ops): Adjust
>   throughput of emulated gather and scatters.
>
> gcc/testsuite/ChangeLog:
>
>   PR target/118188
>   * gcc.target/aarch64/sve/gather_load_12.c: New tes

Re: [PATCH]AArch64: Implement four and eight chunk VLA concats [PR118272]

2025-01-02 Thread Richard Sandiford

Tamar Christina  writes:
> Hi All,
>
> The following testcase
>
>   #pragma GCC target ("+sve")
>   extern char __attribute__ ((simd, const)) fn3 (int, short);
>   void test_fn3 (float *a, float *b, double *c, int n)
>   {
> for (int i = 0; i < n; ++i)
>   a[i] = fn3 (b[i], c[i]);
>   }
>
> at -Ofast ICEs because my previous patch only added support for combining 2
> partial SVE vectors into a bigger vector.  However There can also 4 and 8
> piece subvectors.
>
> This patch fixes this by implementing the missing expansions.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>   PR target/96342
>   PR middle-end/118272
>   * config/aarch64/aarch64-sve.md (vec_init,
>   vec_initvnx16qivnx2qi): New.
>   * config/aarch64/aarch64.cc (aarch64_sve_expand_vector_init_subvector):
>   Rewrite to support any arbitrary combinations.
>   * config/aarch64/iterators.md (SVE_NO2E): Update to use SVE_NO4E
>   (SVE_NO2E, Vquad): New.
>
> gcc/testsuite/ChangeLog:
>
>   PR target/96342
>   PR middle-end/118272
>   * gcc.target/aarch64/vect-simd-clone-3.c: New test.
>
> ---
>
> diff --git a/gcc/config/aarch64/aarch64-sve.md 
> b/gcc/config/aarch64/aarch64-sve.md
> index 
> 6659bb4fcab34699f22ff883825de1cd67108203..35f55bfacfc3238a8a7aa69015f36ba32981af59
>  100644
> --- a/gcc/config/aarch64/aarch64-sve.md
> +++ b/gcc/config/aarch64/aarch64-sve.md
> @@ -2839,6 +2839,7 @@ (define_expand "vec_init"
>}
>  )
>  
> +;; Vector constructor combining two half vectors { a, b }
>  (define_expand "vec_init"
>[(match_operand:SVE_NO2E 0 "register_operand")
> (match_operand 1 "")]
> @@ -2849,6 +2850,28 @@ (define_expand "vec_init"
>}
>  )
>  
> +;; Vector constructor combining four quad vectors { a, b, c, d }
> +(define_expand "vec_init"
> +  [(match_operand:SVE_NO4E 0 "register_operand")
> +   (match_operand 1 "")]
> +  "TARGET_SVE"
> +  {
> +aarch64_sve_expand_vector_init_subvector (operands[0], operands[1]);
> +DONE;
> +  }
> +)
> +
> +;; Vector constructor combining eight vectors { a, b, c, d, ... }
> +(define_expand "vec_initvnx16qivnx2qi"
> +  [(match_operand:VNx16QI 0 "register_operand")
> +   (match_operand 1 "")]
> +  "TARGET_SVE"
> +  {
> +aarch64_sve_expand_vector_init_subvector (operands[0], operands[1]);
> +DONE;
> +  }
> +)
> +
>  ;; Shift an SVE vector left and insert a scalar into element 0.
>  (define_insn "vec_shl_insert_"
>[(set (match_operand:SVE_FULL 0 "register_operand")
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 
> cb9b155826d12b622ae0df1736e4b042d01cf56a..e062cc00d1a548290377382c98ea8f3bb9310513
>  100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -24898,18 +24898,51 @@ aarch64_sve_expand_vector_init_subvector (rtx 
> target, rtx vals)
>machine_mode mode = GET_MODE (target);
>int nelts = XVECLEN (vals, 0);
>  
> -  gcc_assert (nelts == 2);
> +  gcc_assert (nelts % 2 == 0);
>  
> -  rtx arg0 = XVECEXP (vals, 0, 0);
> -  rtx arg1 = XVECEXP (vals, 0, 1);
> -
> -  /* If we have two elements and are concatting vector.  */
> -  machine_mode elem_mode = GET_MODE (arg0);
> +  /* We have to be concatting vector.  */
> +  machine_mode elem_mode = GET_MODE (XVECEXP (vals, 0, 0));
>gcc_assert (VECTOR_MODE_P (elem_mode));
>  
> -  arg0 = force_reg (elem_mode, arg0);
> -  arg1 = force_reg (elem_mode, arg1);
> -  emit_insn (gen_aarch64_pack_partial (mode, target, arg0, arg1));
> +  auto_vec worklist;
> +  machine_mode wider_mode
> += related_vector_mode (elem_mode, GET_MODE_INNER (elem_mode),
> +GET_MODE_NUNITS (elem_mode) * 2).require ();
> +
> +  /* First create the wider first level pack,  which also allows us to force
> + the values to registers and put the elements in a more convenient
> + data structure.  */
> +
> +  for (int i = 0; i < nelts; i+=2)
> +{
> +  rtx arg0 = XVECEXP (vals, 0, i);
> +  rtx arg1 = XVECEXP (vals, 0, i + 1);
> +  arg0 = force_reg (elem_mode, arg0);
> +  arg1 = force_reg (elem_mode, arg1);
> +  rtx tmp = gen_reg_rtx (wider_mode);
> +  emit_insn (gen_aarch64_pack_partial (wider_mode, tmp, arg0, arg1));
> +  worklist.safe_push (tmp);
> +}
> +
> +  /* Keep widening pairwise to have maximum throughput.  */
> +  while (worklist.length () > 1)
> +{
> +  rtx arg0 = worklist.pop ();
> +  rtx arg1 = worklist.pop ();
> +  gcc_assert (GET_MODE (arg0) == GET_MODE (arg1));
> +
> +  wider_mode
> + = related_vector_mode (wider_mode, GET_MODE_INNER (wider_mode),
> +GET_MODE_NUNITS (wider_mode) * 2).require ();
> +
> +  rtx tmp = gen_reg_rtx (wider_mode);
> +  emit_insn (gen_aarch64_pack_partial (wider_mode, tmp, arg0, arg1));
> +  worklist.safe_push (tmp);

It looks like this might not work VNx16QI->VNx2QI, since we'd start with
4 VN

RE: [PATCH]AArch64: Implement four and eight chunk VLA concats [PR118272]

2025-01-02 Thread Tamar Christina

> -Original Message-
> From: Richard Sandiford 
> Sent: Thursday, January 2, 2025 5:19 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> ; ktkac...@gcc.gnu.org
> Subject: Re: [PATCH]AArch64: Implement four and eight chunk VLA concats
> [PR118272]
> 
> Tamar Christina  writes:
> > Hi All,
> >
> > The following testcase
> >
> >   #pragma GCC target ("+sve")
> >   extern char __attribute__ ((simd, const)) fn3 (int, short);
> >   void test_fn3 (float *a, float *b, double *c, int n)
> >   {
> > for (int i = 0; i < n; ++i)
> >   a[i] = fn3 (b[i], c[i]);
> >   }
> >
> > at -Ofast ICEs because my previous patch only added support for combining 2
> > partial SVE vectors into a bigger vector.  However There can also 4 and 8
> > piece subvectors.
> >
> > This patch fixes this by implementing the missing expansions.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > PR target/96342
> > PR middle-end/118272
> > * config/aarch64/aarch64-sve.md (vec_init,
> > vec_initvnx16qivnx2qi): New.
> > * config/aarch64/aarch64.cc
> (aarch64_sve_expand_vector_init_subvector):
> > Rewrite to support any arbitrary combinations.
> > * config/aarch64/iterators.md (SVE_NO2E): Update to use SVE_NO4E
> > (SVE_NO2E, Vquad): New.
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR target/96342
> > PR middle-end/118272
> > * gcc.target/aarch64/vect-simd-clone-3.c: New test.
> >
> > ---
> >
> > diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-
> sve.md
> > index
> 6659bb4fcab34699f22ff883825de1cd67108203..35f55bfacfc3238a8a7aa69015
> f36ba32981af59 100644
> > --- a/gcc/config/aarch64/aarch64-sve.md
> > +++ b/gcc/config/aarch64/aarch64-sve.md
> > @@ -2839,6 +2839,7 @@ (define_expand "vec_init"
> >}
> >  )
> >
> > +;; Vector constructor combining two half vectors { a, b }
> >  (define_expand "vec_init"
> >[(match_operand:SVE_NO2E 0 "register_operand")
> > (match_operand 1 "")]
> > @@ -2849,6 +2850,28 @@ (define_expand "vec_init"
> >}
> >  )
> >
> > +;; Vector constructor combining four quad vectors { a, b, c, d }
> > +(define_expand "vec_init"
> > +  [(match_operand:SVE_NO4E 0 "register_operand")
> > +   (match_operand 1 "")]
> > +  "TARGET_SVE"
> > +  {
> > +aarch64_sve_expand_vector_init_subvector (operands[0], operands[1]);
> > +DONE;
> > +  }
> > +)
> > +
> > +;; Vector constructor combining eight vectors { a, b, c, d, ... }
> > +(define_expand "vec_initvnx16qivnx2qi"
> > +  [(match_operand:VNx16QI 0 "register_operand")
> > +   (match_operand 1 "")]
> > +  "TARGET_SVE"
> > +  {
> > +aarch64_sve_expand_vector_init_subvector (operands[0], operands[1]);
> > +DONE;
> > +  }
> > +)
> > +
> >  ;; Shift an SVE vector left and insert a scalar into element 0.
> >  (define_insn "vec_shl_insert_"
> >[(set (match_operand:SVE_FULL 0 "register_operand")
> > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> > index
> cb9b155826d12b622ae0df1736e4b042d01cf56a..e062cc00d1a548290377382
> c98ea8f3bb9310513 100644
> > --- a/gcc/config/aarch64/aarch64.cc
> > +++ b/gcc/config/aarch64/aarch64.cc
> > @@ -24898,18 +24898,51 @@ aarch64_sve_expand_vector_init_subvector (rtx
> target, rtx vals)
> >machine_mode mode = GET_MODE (target);
> >int nelts = XVECLEN (vals, 0);
> >
> > -  gcc_assert (nelts == 2);
> > +  gcc_assert (nelts % 2 == 0);
> >
> > -  rtx arg0 = XVECEXP (vals, 0, 0);
> > -  rtx arg1 = XVECEXP (vals, 0, 1);
> > -
> > -  /* If we have two elements and are concatting vector.  */
> > -  machine_mode elem_mode = GET_MODE (arg0);
> > +  /* We have to be concatting vector.  */
> > +  machine_mode elem_mode = GET_MODE (XVECEXP (vals, 0, 0));
> >gcc_assert (VECTOR_MODE_P (elem_mode));
> >
> > -  arg0 = force_reg (elem_mode, arg0);
> > -  arg1 = force_reg (elem_mode, arg1);
> > -  emit_insn (gen_aarch64_pack_partial (mode, target, arg0, arg1));
> > +  auto_vec worklist;
> > +  machine_mode wider_mode
> > += related_vector_mode (elem_mode, GET_MODE_INNER (elem_mode),
> > +  GET_MODE_NUNITS (elem_mode) * 2).require ();
> > +
> > +  /* First create the wider first level pack,  which also allows us to 
> > force
> > + the values to registers and put the elements in a more convenient
> > + data structure.  */
> > +
> > +  for (int i = 0; i < nelts; i+=2)
> > +{
> > +  rtx arg0 = XVECEXP (vals, 0, i);
> > +  rtx arg1 = XVECEXP (vals, 0, i + 1);
> > +  arg0 = force_reg (elem_mode, arg0);
> > +  arg1 = force_reg (elem_mode, arg1);
> > +  rtx tmp = gen_reg_rtx (wider_mode);
> > +  emit_insn (gen_aarch64_pack_partial (wider_mode, tmp, arg0, arg1));
> > +  worklist.safe_push (tmp);
> > +}
> > +
> > +  /* Keep widening pairwise to have maximum throughput.  */
> > +  while (worklist.length () > 1)
> > +{
> > +  rtx arg0 =

[pushed] Use _Float128 in test for PR118184

2025-01-02 Thread Richard Sandiford

The test was failing on x86 because longdouble128 only checks sizeof,
rather than a full 128-bit payload.  Using _Float128 is more portable
and still exposes the original bug.

Tested on aarch64-linux-gnu and x86_64-linux-gnu, pushed as obvious.

Richard


gcc/testsuite/
PR target/118184
* gcc.dg/torture/pr118184.c: Use _Float128 instead of long double.
---
 gcc/testsuite/gcc.dg/torture/pr118184.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/torture/pr118184.c 
b/gcc/testsuite/gcc.dg/torture/pr118184.c
index 20f567af11f..5933e2a1222 100644
--- a/gcc/testsuite/gcc.dg/torture/pr118184.c
+++ b/gcc/testsuite/gcc.dg/torture/pr118184.c
@@ -1,8 +1,8 @@
-/* { dg-do run { target { longdouble128 && lp64 } } } */
+/* { dg-do run { target { float128 && lp64 } } } */
 
 union u1
 {
-  long double ld;
+  _Float128 ld;
   unsigned long l[2];
 };
 
@@ -13,7 +13,7 @@ unsigned long m()
 }
 
 [[gnu::noinline]]
-long double f(void)
+_Float128 f(void)
 {
   union u1 u;
   u.ld = __builtin_nanf128("");
-- 
2.25.1

Re: [PATCH]AArch64: Fix costing of emulated gathers/scatters [PR118188]

2025-01-02 Thread Richard Sandiford

Tamar Christina  writes:
>> > [...]
>> > #define iterations 10
>> > #define LEN_1D 32000
>> >
>> > float a[LEN_1D], b[LEN_1D];
>> >
>> > float
>> > s4115 (int *ip)
>> > {
>> > float sum = 0.;
>> > for (int i = 0; i < LEN_1D; i++)
>> > {
>> > sum += a[i] * b[ip[i]];
>> > }
>> > return sum;
>> > }
>> > [...]
>> > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
>> > index
>> 6bb4bdf2472e62d9b066a06561da8e516f1b3c3e..cb9b155826d12b622ae0df1
>> 736e4b042d01cf56a 100644
>> > --- a/gcc/config/aarch64/aarch64.cc
>> > +++ b/gcc/config/aarch64/aarch64.cc
>> > @@ -17358,6 +17358,25 @@ aarch64_vector_costs::count_ops (unsigned int
>> count, vect_cost_for_stmt kind,
>> >return;
>> >  }
>> >
>> > +  /* Detect the case where we are using an emulated gather/scatter.  When 
>> > a
>> > + target does not support gathers and scatters directly the vectorizer
>> > + emulates these by constructing an index vector and then issuing an
>> > + extraction for every lane in the vector.  This is subsequently 
>> > lowered
>> > + by veclower into a series of loads which creates the scalar indexes 
>> > for
>> > + the subsequent loads.  After the final loads are done it issues a
>> > + vec_construct to recreate the vector from the scalar.  For costing 
>> > when
>> > + we see a vec_to_scalar on a stmt with VMAT_GATHER_SCATTER we are
>> dealing
>> > + with an emulated instruction and should adjust costing properly.  */
>> > +  if (kind == vec_to_scalar
>> > +  && (m_vec_flags & VEC_ADVSIMD)
>> > +  && vect_mem_access_type (stmt_info, node) == VMAT_GATHER_SCATTER)
>> > +{
>> > +  if (STMT_VINFO_TYPE (stmt_info) == load_vec_info_type)
>> > +  ops->loads += count - 1;
>> > +  else
>> > +  ops->stores += count - 1;
>> 
>> ...since the aim is to replace:
>> 
>> - 1 vector load for the indices
>> 
>> with
>> 
>> - 4 vector loads for the indices
>> 
>> I think this should be changing ops->loads even for scatter stores.
>> 
>
> Ah yes that's true... because the indexes for the stores are loads themselves.
>
>> But this assumes that the gather load indices come directly from memory.
>> That isn't always the case.  If we have:
>> 
>> float
>> s4115 (int *ip)
>> {
>> float sum = 0.;
>> for (int i = 0; i < LEN_1D; i++)
>>   {
>> sum += a[i] * b[ip[i] + 1];
>>   }
>> return sum;
>> }
>> 
>> then we'll have a vector load, a vector add, and then the 4 umovs
>> that, as you said above, were elided by post-vectoriser optimisations
>> in your b[ip[i]] example:
>> 
>> .L2:
>> ldr q31, [x0, x1]
>> ldr q29, [x6, x1]
>> add x1, x1, 16
>> add v31.4s, v31.4s, v26.4s
>> umovw5, v31.s[1]
>> umovw4, v31.s[3]
>> umovw3, v31.s[2]
>> fmovw8, s31
>> ldr s30, [x2, w5, sxtw 2]
>> ldr s27, [x2, w4, sxtw 2]
>> ldr s31, [x2, w8, sxtw 2]
>> ldr s28, [x2, w3, sxtw 2]
>> uzp1v30.2s, v30.2s, v27.2s
>> uzp1v31.2s, v31.2s, v28.2s
>> zip1v31.4s, v31.4s, v30.4s
>> fmlav0.4s, v29.4s, v31.4s
>> cmp x1, x7
>> bne .L2
>> 
>> These umovs are currently modelled:
>> 
>>   note:load operations = 6
>>   note:store operations = 0
>>   note:general operations = 7
>>   note:reduction latency = 2
>> 
>> although I'm guessing it should be:
>> 
>>   note:load operations = 6
>>   note:store operations = 0
>>   note:general operations = 9
>>   note:reduction latency = 2
>> 
>> instead.  The underaccounting comes from vec_construct, which counts 1
>> rather than 3 operations for moving the scalars back to a vector.
>> 
>> This patch would remove the umov accounting to give:
>> 
>>   note:load operations = 9
>>   note:store operations = 0
>>   note:general operations = 3
>>   note:reduction latency = 2
>> 
>> Counting loads rather than general ops wouldn't matter for tight loops
>> like these in which memory dominates anyway, since counting loads is
>> then pessimistically correct.  But in a loop that is compute bound,
>> it's probably more important to get this right.
>> 
>> So I think ideally, we should try to detect whether the indices come
>> directly from memory or are the result of arithmetic.  In the former case,
>> we should do the loads adjustment above.  In the latter case, we should
>> keep the vec_to_scalar accounting unchanged.
>
> I can do this but...
>
>> 
>> Of course, these umovs are likely to be more throughput-limited than we
>> model, but that's a separate pre-existing problem...
>
> I agree with the above, the reason I just updated loads is as you already said
> that the umov accounting as general operations don't account for the 
> bottleneck.
> In general umovs are more throughput limited than loads and the number of 
> general
> ops we can execute would in the

RE: [PATCH]AArch64: Fix costing of emulated gathers/scatters [PR118188]

2025-01-02 Thread Tamar Christina

> -Original Message-
> From: Richard Sandiford 
> Sent: Thursday, January 2, 2025 4:52 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> ; ktkac...@gcc.gnu.org
> Subject: Re: [PATCH]AArch64: Fix costing of emulated gathers/scatters
> [PR118188]
> 
> Tamar Christina  writes:
> > Hi All,
> >
> > When a target does not support gathers and scatters the vectorizer tries to
> > emulate these using scalar loads/stores and a reconstruction of vectors from
> > scalar.
> >
> > The loads are still marked with VMAT_GATHER_SCATTER to indicate that they 
> > are
> > gather/scatters, however the vectorizer also asks the target to cost the
> > instruction that generates the indexes for the emulated instructions.
> >
> > This is done by asking the target to cost vec_to_scalar and vec_construct 
> > with
> > a stmt_vinfo being the VMAT_GATHER_SCATTER.
> >
> > Since Adv. SIMD does not have an LD1 variant that takes an Adv. SIMD Scalar
> > element the operation is lowered entirely into a sequence of GPR loads to 
> > create
> > the x registers for the indexes.
> >
> > At the moment however we don't cost these, and so the vectorizer things that
> > when it emulates the instructions that it's much cheaper than using an 
> > actual
> > gather/scatter with SVE.  Consider:
> >
> > #define iterations 10
> > #define LEN_1D 32000
> >
> > float a[LEN_1D], b[LEN_1D];
> >
> > float
> > s4115 (int *ip)
> > {
> > float sum = 0.;
> > for (int i = 0; i < LEN_1D; i++)
> > {
> > sum += a[i] * b[ip[i]];
> > }
> > return sum;
> > }
> >
> > which before this patch with -mcpu= generates:
> >
> > .L2:
> > add x3, x0, x1
> > ldrsw   x4, [x0, x1]
> > ldrsw   x6, [x3, 4]
> > ldpsw   x3, x5, [x3, 8]
> > ldr s1, [x2, x4, lsl 2]
> > ldr s30, [x2, x6, lsl 2]
> > ldr s31, [x2, x5, lsl 2]
> > ldr s29, [x2, x3, lsl 2]
> > uzp1v30.2s, v30.2s, v31.2s
> > ldr q31, [x7, x1]
> > add x1, x1, 16
> > uzp1v1.2s, v1.2s, v29.2s
> > zip1v30.4s, v1.4s, v30.4s
> > fmlav0.4s, v31.4s, v30.4s
> > cmp x1, x8
> > bne .L2
> >
> > but during costing:
> >
> > a[i_18] 1 times vector_load costs 4 in body
> > *_4 1 times unaligned_load (misalign -1) costs 4 in body
> > b[_5] 4 times vec_to_scalar costs 32 in body
> > b[_5] 4 times scalar_load costs 16 in body
> > b[_5] 1 times vec_construct costs 3 in body
> > _1 * _6 1 times vector_stmt costs 2 in body
> > _7 + sum_16 1 times scalar_to_vec costs 4 in prologue
> > _7 + sum_16 1 times vector_stmt costs 2 in epilogue
> > _7 + sum_16 1 times vec_to_scalar costs 4 in epilogue
> > _7 + sum_16 1 times vector_stmt costs 2 in body
> >
> > Here we see that the latency for the vec_to_scalar is very high.  We know 
> > the
> > intermediate vector isn't usable by the target ISA and will always be 
> > elided.
> > However these latencies need to remain high because when costing
> gather/scatters
> > IFNs we still pass the nunits of the type along.  In other words, the 
> > vectorizer
> > is still costing vector gather/scatters as scalar load/stores.
> >
> > Lowering the cost for the emulated gathers would result in emulation being
> > seemingly cheaper.  So while the emulated costs are very high, they need to 
> > be
> > higher than those for the IFN costing.
> >
> > i.e. the vectorizer generates:
> >
> >   vect__5.9_8 = MEM  [(intD.7 *)vectp_ip.7_14];
> >   _35 = BIT_FIELD_REF ;
> >   _36 = (sizetype) _35;
> >   _37 = _36 * 4;
> >   _38 = _34 + _37;
> >   _39 = (voidD.55 *) _38;
> >   # VUSE <.MEM_10(D)>
> >   _40 = MEM[(floatD.32 *)_39];
> >
> > which after IVopts is:
> >
> >   _63 = &MEM  [(int *)ip_11(D) + ivtmp.19_27 * 1];
> >   _47 = BIT_FIELD_REF  [(int *)_63], 32, 64>;
> >   _41 = BIT_FIELD_REF  [(int *)_63], 32, 32>;
> >   _35 = BIT_FIELD_REF  [(int *)_63], 32, 0>;
> >   _53 = BIT_FIELD_REF  [(int *)_63], 32, 96>;
> >
> > Which we correctly lower in RTL to individual loads to avoid the repeated 
> > umov.
> >
> > As such, we should cost the vec_to_scalar as GPR loads and also do so for 
> > the
> > throughput which we at the moment cost as:
> >
> >   note:  Vector issue estimate:
> >   note:load operations = 6
> >   note:store operations = 0
> >   note:general operations = 6
> >   note:reduction latency = 2
> >   note:estimated min cycles per iteration = 2.00
> >
> > Which means 3 loads for the GOR indexes are missing, making it seem like the
> > emulated loop has a much lower cycles per iter than it actually does since 
> > the
> > bottleneck on the load units are not modelled.
> 
> Yeah, currently the memory operations for an emulated 4-element
> load/store would be:
> 
> - 1 vector load for the indices
> - 4 loads for the gather load, or 4 stores for the scatter store
> 
> (and then +1 for the a[i] access in this case).
> 
> Therefore...
> 
> > But wo

[Patch] OpenMP/C++: Store location in cp_parser_omp_var_list for kind=0

2025-01-02 Thread Tobias Burnus


This came up in the context of dispatch patching, where the location
was very confusing (pointing to the first letter of the first variable).

The OpenMP variable-list parser has two modes, one is used to directly
parse a clause, in which case a tree node is created with the proper
location. And another one is used to just parse the list of variables.
As those are usually DECL_P, they don't have an expression location,
causing diagnostic issues.

In the parser functions, that's indicated by kind == 0, but all callers
use OMP_CLAUSE_ERROR (with value 0) for this. This patch now added a
couple of comments to make clear that 0 and OMP_CLAUSE_ERROR are the
same, which hopefully helps reading the code.

Examples:
foo.C:4:38: error: ‘threadprivate’ ‘int fff()’ is not file, namespace or block 
scope variable
4 | #pragma omp threadprivate(fff, myvar)
  |  ^

New:
4 | #pragma omp threadprivate(fff, myvar)
  |   ^~~

Or even:
foo.C:3:59: error: ‘’ is specified more than once
3 |adjust_args(nothing: ) 
adjust_args(need_device_ptr:,,)
  |   ^~~~

New:
3 |adjust_args(nothing: ) 
adjust_args(need_device_ptr:,,)
  | ^~~~

The new implementation stores the location in the TREE_VALUE part
of the three list; as it cannot sensible be stored directly, we
allocate memory just for this short-term purpose, using build_empty_stmt,
which seems to be a reasonable compromise.

Any comments before I commit it?

Tobias

PS: I have not updated the testcases to check for the column numbers, but
could do so, if deemed sensible.
OpenMP/C++: Store location in cp_parser_omp_var_list for kind=0

cp_parser_omp_var_list and cp_parser_omp_var_list_no_open have a special
modus: kind = 0 alias kind = OMP_CLAUSE_ERROR, which returns a simple tree
list; however, for a decl, no location is associated with that variable,
yielding to confusing error locations. With this patch, also for kind=0,
a reasonable error location is stored, albeit with creating a tree node
(build_empty_stmt), which is otherwise not used.

gcc/cp/ChangeLog:

	* parser.cc (cp_parser_omp_var_list_no_open,
	cp_parser_omp_var_list): For kind=0 (= OMP_CLAUSE_ERROR),
	store also the expression location in the tree list.
	(cp_parser_oacc_data_clause_deviceptr,
	cp_finish_omp_declare_variant): Use that location instead or
	input_location/the before-parsing location.
	* semantics.cc (finish_omp_threadprivate): Likewise.

 gcc/cp/parser.cc| 27 +--
 gcc/cp/semantics.cc | 15 ---
 2 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 3ec9e414e62..0ee0e5b33e1 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -38698,8 +38698,9 @@ check_no_duplicate_clause (tree clauses, enum omp_clause_code code,
If KIND is nonzero, create the appropriate node and install the decl
in OMP_CLAUSE_DECL and add the node to the head of the list.
 
-   If KIND is zero, create a TREE_LIST with the decl in TREE_PURPOSE;
-   return the list created.
+   If KIND is zero (= OMP_CLAUSE_ERROR), create a TREE_LIST with the decl
+   in TREE_PURPOSE and the location in TREE_VALUE (accessible using
+   EXPR_LOCATION); return the list created.
 
COLON can be NULL if only closing parenthesis should end the list,
or pointer to bool which will receive false if the list is terminated
@@ -38836,7 +38837,7 @@ cp_parser_omp_var_list_no_open (cp_parser *parser, enum omp_clause_code kind,
 	  goto build_clause;
 	}
   token = cp_lexer_peek_token (parser->lexer);
-  if (kind != 0
+  if (kind != 0  /* kind != OMP_CLAUSE_ERROR */
 	  && cp_parser_is_keyword (token, RID_THIS))
 	{
 	  decl = finish_this_expr ();
@@ -38894,7 +38895,7 @@ cp_parser_omp_var_list_no_open (cp_parser *parser, enum omp_clause_code kind,
 	decl = process_outer_var_ref (decl, tf_warning_or_error);
   if (decl == error_mark_node)
 	;
-  else if (kind != 0)
+  else if (kind != 0)  /* kind != OMP_CLAUSE_ERROR */
 	{
 	  switch (kind)
 	{
@@ -39040,8 +39041,8 @@ cp_parser_omp_var_list_no_open (cp_parser *parser, enum omp_clause_code kind,
 	  OMP_CLAUSE_CHAIN (u) = list;
 	  list = u;
 	}
-  else
-	list = tree_cons (decl, NULL_TREE, list);
+  else  /* kind == 0 alias kind == OMP_CLAUSE_ERROR */
+	list = tree_cons (decl, build_empty_stmt (token->location), list);
 
 get_comma:
   if (cp_lexer_next_token_is_not (parser->lexer, CPP_COMMA))
@@ -39088,17 +39089,17 @@ cp_parser_omp_var_list (cp_parser *parser, enum omp_clause_code kind, tree list,
 {
   if (parser->lexer->in_omp_decl_attribute)
 {
+  location_t loc = cp_lexer_peek_token (parser->lexer)->location;
   if (kind)
 	{
-	  location_t loc = cp_lexer_peek_token (parser->lexer)->location;
 	  t

RE: [PATCH]AArch64: Fix costing of emulated gathers/scatters [PR118188]

2025-01-02 Thread Tamar Christina

> -Original Message-
> From: Richard Sandiford 
> Sent: Thursday, January 2, 2025 5:54 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> ; ktkac...@gcc.gnu.org
> Subject: Re: [PATCH]AArch64: Fix costing of emulated gathers/scatters
> [PR118188]
> 
> Tamar Christina  writes:
> >> > [...]
> >> > #define iterations 10
> >> > #define LEN_1D 32000
> >> >
> >> > float a[LEN_1D], b[LEN_1D];
> >> >
> >> > float
> >> > s4115 (int *ip)
> >> > {
> >> > float sum = 0.;
> >> > for (int i = 0; i < LEN_1D; i++)
> >> > {
> >> > sum += a[i] * b[ip[i]];
> >> > }
> >> > return sum;
> >> > }
> >> > [...]
> >> > diff --git a/gcc/config/aarch64/aarch64.cc 
> >> > b/gcc/config/aarch64/aarch64.cc
> >> > index
> >>
> 6bb4bdf2472e62d9b066a06561da8e516f1b3c3e..cb9b155826d12b622ae0df1
> >> 736e4b042d01cf56a 100644
> >> > --- a/gcc/config/aarch64/aarch64.cc
> >> > +++ b/gcc/config/aarch64/aarch64.cc
> >> > @@ -17358,6 +17358,25 @@ aarch64_vector_costs::count_ops (unsigned
> int
> >> count, vect_cost_for_stmt kind,
> >> >  return;
> >> >  }
> >> >
> >> > +  /* Detect the case where we are using an emulated gather/scatter.  
> >> > When a
> >> > + target does not support gathers and scatters directly the 
> >> > vectorizer
> >> > + emulates these by constructing an index vector and then issuing an
> >> > + extraction for every lane in the vector.  This is subsequently 
> >> > lowered
> >> > + by veclower into a series of loads which creates the scalar 
> >> > indexes for
> >> > + the subsequent loads.  After the final loads are done it issues a
> >> > + vec_construct to recreate the vector from the scalar.  For costing 
> >> > when
> >> > + we see a vec_to_scalar on a stmt with VMAT_GATHER_SCATTER we are
> >> dealing
> >> > + with an emulated instruction and should adjust costing properly.  
> >> > */
> >> > +  if (kind == vec_to_scalar
> >> > +  && (m_vec_flags & VEC_ADVSIMD)
> >> > +  && vect_mem_access_type (stmt_info, node) ==
> VMAT_GATHER_SCATTER)
> >> > +{
> >> > +  if (STMT_VINFO_TYPE (stmt_info) == load_vec_info_type)
> >> > +ops->loads += count - 1;
> >> > +  else
> >> > +ops->stores += count - 1;
> >>
> >> ...since the aim is to replace:
> >>
> >> - 1 vector load for the indices
> >>
> >> with
> >>
> >> - 4 vector loads for the indices
> >>
> >> I think this should be changing ops->loads even for scatter stores.
> >>
> >
> > Ah yes that's true... because the indexes for the stores are loads 
> > themselves.
> >
> >> But this assumes that the gather load indices come directly from memory.
> >> That isn't always the case.  If we have:
> >>
> >> float
> >> s4115 (int *ip)
> >> {
> >> float sum = 0.;
> >> for (int i = 0; i < LEN_1D; i++)
> >>   {
> >> sum += a[i] * b[ip[i] + 1];
> >>   }
> >> return sum;
> >> }
> >>
> >> then we'll have a vector load, a vector add, and then the 4 umovs
> >> that, as you said above, were elided by post-vectoriser optimisations
> >> in your b[ip[i]] example:
> >>
> >> .L2:
> >> ldr q31, [x0, x1]
> >> ldr q29, [x6, x1]
> >> add x1, x1, 16
> >> add v31.4s, v31.4s, v26.4s
> >> umovw5, v31.s[1]
> >> umovw4, v31.s[3]
> >> umovw3, v31.s[2]
> >> fmovw8, s31
> >> ldr s30, [x2, w5, sxtw 2]
> >> ldr s27, [x2, w4, sxtw 2]
> >> ldr s31, [x2, w8, sxtw 2]
> >> ldr s28, [x2, w3, sxtw 2]
> >> uzp1v30.2s, v30.2s, v27.2s
> >> uzp1v31.2s, v31.2s, v28.2s
> >> zip1v31.4s, v31.4s, v30.4s
> >> fmlav0.4s, v29.4s, v31.4s
> >> cmp x1, x7
> >> bne .L2
> >>
> >> These umovs are currently modelled:
> >>
> >>   note:load operations = 6
> >>   note:store operations = 0
> >>   note:general operations = 7
> >>   note:reduction latency = 2
> >>
> >> although I'm guessing it should be:
> >>
> >>   note:load operations = 6
> >>   note:store operations = 0
> >>   note:general operations = 9
> >>   note:reduction latency = 2
> >>
> >> instead.  The underaccounting comes from vec_construct, which counts 1
> >> rather than 3 operations for moving the scalars back to a vector.
> >>
> >> This patch would remove the umov accounting to give:
> >>
> >>   note:load operations = 9
> >>   note:store operations = 0
> >>   note:general operations = 3
> >>   note:reduction latency = 2
> >>
> >> Counting loads rather than general ops wouldn't matter for tight loops
> >> like these in which memory dominates anyway, since counting loads is
> >> then pessimistically correct.  But in a loop that is compute bound,
> >> it's probably more important to get this right.
> >>
> >> So I think ideally, we should try to detect whether the indices come
> >> directly from memory or are the result of arithmetic.  In th

Re: [PATCH] recog: Handle some mode-changing hardreg propagations

2025-01-02 Thread Andreas Schwab

during RTL pass: reload
../../../../../libstdc++-v3/src/c++20/tzdb.cc: In function ‘std::istream& 
std::chrono::{anonymous}::operator>>(std::istream&, at_time&)’:
../../../../../libstdc++-v3/src/c++20/tzdb.cc:2080:5: internal compiler error: 
in gen_rtx_SUBREG, at emit-rtl.cc:1032
 2080 | }
  | ^
0x20570ef internal_error(char const*, ...)
../../gcc/diagnostic-global-context.cc:517
0x6bca4d fancy_abort(char const*, int, char const*)
../../gcc/diagnostic.cc:1722
0xc7faed gen_rtx_SUBREG(machine_mode, rtx_def*, poly_int<1u, unsigned long>)
../../gcc/emit-rtl.cc:1032
0xf4779f simplify_operand_subreg
../../gcc/lra-constraints.cc:1952
0xf48463 curr_insn_transform
../../gcc/lra-constraints.cc:4224
0xf4b196 lra_constraints(bool)
../../gcc/lra-constraints.cc:5503
0xf3352a lra(_IO_FILE*, int)
../../gcc/lra.cc:2449
0xee1247 do_reload
../../gcc/ira.cc:5977
0xee1247 execute
../../gcc/ira.cc:6165

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

[Patch] OpenMP: Enable has_device_addr clause for 'dispatch' in Fortran

2025-01-02 Thread Tobias Burnus


Support the 'has_device_addr' clause with OpenMP's 'dispatch'
directive.

The testcase is even more questionable as the C/C++ testcase
(looking at it globally/semantically), but it tests (locally)
what it is supposed to test: namely, 'has_device_addr' does not
fulfill the 'is_device_ptr' property (warning, tree dump), which
in turn also checks that the clause actually reached the middle
end.

I intent to commit it soon after PA has committed his
"OpenMP: Fortran front-end support for dispatch + adjust_args"
patch.

* * *

As mentioned in the commit logs (C++ and as attached for Fortran),
dispatch's has_device_addr clause only becomes useful once the
'adjust_args' clause of 'declare variant' supports the
'need_device_addr' modifier. - Deferred for C++ and Fortran to
a follow up patch (after understanding the semantic/spec vs.
current implementation/backward compat better; at least for C++,
I believe that there is a bug in the current
{has,use,is}_device_{ptr,addr} code). [As C does not have reference
types, 'need_device_addr' is invalid and, hence, rejected.]

Tobias
OpenMP: Enable has_device_addr clause for 'dispatch' in Fortran

Fortran version of commit r15-6178-g2cbb2408a830a6 for C/C++.
However, this only becomes really useful (for C++ and Fortran) once the
'need_device_addr' modifier to declare variant's 'adjust_args' clause
is supported.

 fortran/openmp.cc |3 
 testsuite/gfortran.dg/gomp/adjust-args-10.f90 |   99 ++
 2 files changed, 101 insertions(+), 1 deletion(-)

diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index 47c1ded4e44..863c96ab64a 100644
--- a/gcc/fortran/openmp.cc
+++ b/gcc/fortran/openmp.cc
@@ -5018,7 +5018,8 @@ cleanup:
| OMP_CLAUSE_INIT | OMP_CLAUSE_DESTROY | OMP_CLAUSE_USE)
 #define OMP_DISPATCH_CLAUSES   \
   (omp_mask (OMP_CLAUSE_DEVICE) | OMP_CLAUSE_DEPEND | OMP_CLAUSE_NOVARIANTS\
-   | OMP_CLAUSE_NOCONTEXT | OMP_CLAUSE_IS_DEVICE_PTR | OMP_CLAUSE_NOWAIT)
+   | OMP_CLAUSE_NOCONTEXT | OMP_CLAUSE_IS_DEVICE_PTR | OMP_CLAUSE_NOWAIT   \
+   | OMP_CLAUSE_HAS_DEVICE_ADDR)
 
 
 static match
diff --git a/gcc/testsuite/gfortran.dg/gomp/adjust-args-10.f90 b/gcc/testsuite/gfortran.dg/gomp/adjust-args-10.f90
new file mode 100644
index 000..3b649b5d7d0
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/adjust-args-10.f90
@@ -0,0 +1,99 @@
+! { dg-additional-options "-fdump-tree-gimple" }
+
+! This mainly checks 'has_device_addr' without associated 'need_device_addr'
+!
+! Do diagnostic check / dump check only;
+! Note: this test should work as run-test as well.
+
+module m
+  use iso_c_binding
+  ! use omp_lib
+  implicit none (type, external)
+  interface
+integer function omp_get_default_device (); end
+integer function omp_get_num_devices (); end
+  end interface
+
+contains
+  subroutine g (x, y)
+!$omp declare variant(f) adjust_args(need_device_ptr: x, y) match(construct={dispatch})
+type(c_ptr), value :: x, y
+  end
+
+  subroutine f (cfrom, cto)
+type(c_ptr), value :: cfrom, cto
+integer, save :: cnt = 0
+cnt = cnt + 1
+if (cnt >= 3) then
+  if (omp_get_default_device () /= -1  &
+  .and. omp_get_default_device () < omp_get_num_devices ()) then
+! On offload device but not mapped
+if (.not. c_associated(cfrom)) & ! Not mapped
+  stop 1
+  else
+block
+  integer, pointer :: from(:)
+  call c_f_pointer(cfrom, from, shape=[1])
+  if (from(1) /= 5) &
+stop 2
+end block
+  end if
+  return
+end if
+
+!$omp target is_device_ptr(cfrom, cto)
+  block
+integer, pointer :: from(:), to(:)
+call c_f_pointer(cfrom, from, shape=[2])
+call c_f_pointer(cto, to, shape=[2])
+to(1) = from(1) * 10
+to(2) = from(2) * 10
+  end block
+  end
+
+  subroutine sub (a, b)
+integer, target :: a(:), b(:)
+type(c_ptr), target :: ca, cb
+
+ca = c_loc(a)
+cb = c_loc(b)
+
+! The has_device_addr is a bit questionable as the caller is not actually
+! passing a device address - but we cannot pass one because of the
+! following:
+!
+! As for 'b' need_device_ptr has been specified and 'b' is not
+! in the semantic requirement set 'is_device_ptr' (and only in 'has_device_addr')
+! "the argument is converted in the same manner that a use_device_ptr clause
+!  on a target_data construct converts its pointer"
+
+!$omp dispatch is_device_ptr(ca), has_device_addr(cb)
+  call g (ca, cb)  ! { dg-warning "'has_device_addr' for 'cb' does not imply 'is_device_ptr' required for 'need_device_ptr' \\\[-Wopenmp\\\]" }
+  end
+end
+
+program main
+  use m
+  implicit none (type, external)
+
+  integer, target :: A(2), B(2) = [123, 456], C(1) = [5]
+  integer, pointer :: p(:)
+
+  p => A
+
+  !$omp target enter data map(A, B)
+
+  ! Note: We don't add  'use_device_ad

Re: [PATCH] testsuite: libitm: Adjust how libitm.c++ passes link flags

2025-01-02 Thread Richard Sandiford

Matthew Malcomson  writes:
> On 1/2/25 12:08, Richard Sandiford wrote:
>>> +# This set in order to give libitm.c++/c++.exp a nicely named flag to 
>>> set
>>> +# when adding C++ options.
>>> +set TEST_ALWAYS_FLAGS ""
>> 
>> This looked odd at first glance.  By unconditionally writing "" to the
>> variable, it seems to subvert the save and restore done in c++.exp.
>> 
>
> Yeah -- I see your point, that's not good.
>
>> How about instead copying the behaviour of asan_init and asan_finish,
>> so that libitm_init and libitm_finish do the save and restore?  Or perhaps
>> a slight variation: after saving, libitm_init can set TEST_ALWAYS_FLAGS
>> to "" if TEST_ALWAYS_FLAGS was previously unset.
>> 
>> c++.exp would then not need to save and restore the flags itself, and
>> could still assume that TEST_ALWAYS_FLAGS is always set.
>> 
>
> Have made the suggested change -- mentioning the extra little bit of 
> complexity that this introduced ...
>
> Since libitm is a "tool" in the DejaGNU sense (while asan is not), 
> libitm_finish gets called twice for each libitm_init call.
>
> The `runtest` procedure in DejaGNU's `runtest.exp` calls `${tool}_init`, 
> executes the c.exp or c++.exp test runner and then calls 
> `${tool}_finish`, while in each of the test runners we also call 
> `dg-finish` (as required by the dg.exp API) which calls `${tool}_finish` 
> directly.
>
> This means using `libitm_finish` needs an extra bit in global state to 
> check whether we have already reset things.
> - Has been set in libitm_init and was unset at start
>=> saved_TEST_ALWAYS_FLAGS is unset.
> - Has been set in libitm_init and was set at start
>=> saved_TEST_ALWAYS_FLAGS is set.
> - Has already been reset => some other flag.

Ick.  Hadn't realised that dg-finish called *_finish while dg-init didn't
call *_init.

I suppose a way of handling this without the second variable would be to
make libitm_finish check whether TEST_ALWAYS_FLAGS is set.  If it isn't,
then it must be the second call to libitm_finish.

OK that way if you agree, or OK as-is if you think it's better.

Richard

> Have attached the adjusted patch to this email.
>
> From fbce3b25e8ccad80697f1596f566b268fff71221 Mon Sep 17 00:00:00 2001
> From: Matthew Malcomson 
> Date: Wed, 11 Dec 2024 11:03:55 +
> Subject: [PATCH] testsuite: libitm: Adjust how libitm.c++ passes link flags
>
> For the `gcc` and `g++` tools we often pass -B/path/to/object/dir in via
> `TEST_ALWAYS_FLAGS` (see e.g. asan.exp where this is set).
> In libitm.c++/c++.exp we pass that -B flag via the `tool_flags` argument
> to `dg-runtest`.
>
> Passing as the `tool_flags` argument means that these flags get added to
> the name of the test.  This means that if one were to compare the
> testsuite results between runs made in different build directories
> libitm.c++ gives a reasonable amount of NA->PASS and PASS->NA
> differences even though the same tests passed each time.
>
> This patch follows the example set in other parts of the testsuite like
> asan_init and passes the -B arguments to the compiler via a global
> variable called `TEST_ALWAYS_FLAGS`.  For this DejaGNU "tool" we had to
> newly initialise that variable in libitm_init and add a check against
> that variable in libitm_target_compile.  I thought about adding the
> relevant flags we need for C++ into `ALWAYS_CFLAGS` but decided against
> it since the name didn't match what we would be using it for.
>
> We save the global `TEST_ALWAYS_FLAGS` in `libitm_init` and ensure
> it's initialised.  We then reset this in `libitm_finish`.  Since
> `libitm_finish` gets called twice (once from `dg-finish` and once from
> the `runtest` procedure) we have to manage state to tell whether
> TEST_ALWAYS_FLAGS has already been reset.
>
> Testing done to bootstrap & regtest on AArch64.  Manually observed that
> the testsuite diff between two different build directories no longer
> exists.
>
> N.b. since I pass the new flags in the `ldflags` option of the DejaGNU
> options while the previous code always passed this -B flag, the compile
> test throwdown.C no longer gets compiled with this -B flag.  I believe
> that is not a problem.
>
> libitm/ChangeLog:
>
>   * testsuite/libitm.c++/c++.exp: Use TEST_ALWAYS_FLAGS instead of
>   passing arguments to dg-runtest.
>   * testsuite/lib/libitm.exp (libitm_init): Initialise
>   TEST_ALWAYS_FLAGS.
>   (libitm_finish): Reset TEST_ALWAYS_FLAGS.
>   (libitm_target_compile): Take flags from TEST_ALWAYS_FLAGS.
>
> Signed-off-by: Matthew Malcomson 
> ---
>  libitm/testsuite/lib/libitm.exp | 42 +
>  libitm/testsuite/libitm.c++/c++.exp |  5 ++--
>  2 files changed, 44 insertions(+), 3 deletions(-)
>
> diff --git a/libitm/testsuite/lib/libitm.exp b/libitm/testsuite/lib/libitm.exp
> index ac390d6d0dd..c5b9bb1127f 100644
> --- a/libitm/testsuite/lib/libitm.exp
> +++ b/libitm/testsuite/lib/libitm.exp
> @@ -59,6 +59,7 @@ set dg-do-what-default run
>  #
>  
>  s

[r15-6503 Regression] FAIL: gcc.dg/torture/pr118184.c -O1 execution test on Linux/x86_64

2025-01-02 Thread haochen.jiang

On Linux/x86_64,

2b687ad95de61091105d040d6bc06cb3d44ac3d1 is the first bad commit
commit 2b687ad95de61091105d040d6bc06cb3d44ac3d1
Author: Richard Sandiford 
Date:   Thu Jan 2 11:34:52 2025 +

aarch64: Detect word-level modification in early-ra [PR118184]

caused

FAIL: gcc.dg/torture/pr118184.c   -O0  execution test
FAIL: gcc.dg/torture/pr118184.c   -O1  execution test

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r15-6503/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg-torture.exp=gcc.dg/torture/pr118184.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg-torture.exp=gcc.dg/torture/pr118184.c 
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)

[PATCH]AArch64: Implement four and eight chunk VLA concats [PR118272]

2025-01-02 Thread Tamar Christina

Hi All,

The following testcase

  #pragma GCC target ("+sve")
  extern char __attribute__ ((simd, const)) fn3 (int, short);
  void test_fn3 (float *a, float *b, double *c, int n)
  {
for (int i = 0; i < n; ++i)
  a[i] = fn3 (b[i], c[i]);
  }

at -Ofast ICEs because my previous patch only added support for combining 2
partial SVE vectors into a bigger vector.  However There can also 4 and 8
piece subvectors.

This patch fixes this by implementing the missing expansions.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

PR target/96342
PR middle-end/118272
* config/aarch64/aarch64-sve.md (vec_init,
vec_initvnx16qivnx2qi): New.
* config/aarch64/aarch64.cc (aarch64_sve_expand_vector_init_subvector):
Rewrite to support any arbitrary combinations.
* config/aarch64/iterators.md (SVE_NO2E): Update to use SVE_NO4E
(SVE_NO2E, Vquad): New.

gcc/testsuite/ChangeLog:

PR target/96342
PR middle-end/118272
* gcc.target/aarch64/vect-simd-clone-3.c: New test.

---
diff --git a/gcc/config/aarch64/aarch64-sve.md 
b/gcc/config/aarch64/aarch64-sve.md
index 
6659bb4fcab34699f22ff883825de1cd67108203..35f55bfacfc3238a8a7aa69015f36ba32981af59
 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -2839,6 +2839,7 @@ (define_expand "vec_init"
   }
 )
 
+;; Vector constructor combining two half vectors { a, b }
 (define_expand "vec_init"
   [(match_operand:SVE_NO2E 0 "register_operand")
(match_operand 1 "")]
@@ -2849,6 +2850,28 @@ (define_expand "vec_init"
   }
 )
 
+;; Vector constructor combining four quad vectors { a, b, c, d }
+(define_expand "vec_init"
+  [(match_operand:SVE_NO4E 0 "register_operand")
+   (match_operand 1 "")]
+  "TARGET_SVE"
+  {
+aarch64_sve_expand_vector_init_subvector (operands[0], operands[1]);
+DONE;
+  }
+)
+
+;; Vector constructor combining eight vectors { a, b, c, d, ... }
+(define_expand "vec_initvnx16qivnx2qi"
+  [(match_operand:VNx16QI 0 "register_operand")
+   (match_operand 1 "")]
+  "TARGET_SVE"
+  {
+aarch64_sve_expand_vector_init_subvector (operands[0], operands[1]);
+DONE;
+  }
+)
+
 ;; Shift an SVE vector left and insert a scalar into element 0.
 (define_insn "vec_shl_insert_"
   [(set (match_operand:SVE_FULL 0 "register_operand")
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
cb9b155826d12b622ae0df1736e4b042d01cf56a..e062cc00d1a548290377382c98ea8f3bb9310513
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -24898,18 +24898,51 @@ aarch64_sve_expand_vector_init_subvector (rtx target, 
rtx vals)
   machine_mode mode = GET_MODE (target);
   int nelts = XVECLEN (vals, 0);
 
-  gcc_assert (nelts == 2);
+  gcc_assert (nelts % 2 == 0);
 
-  rtx arg0 = XVECEXP (vals, 0, 0);
-  rtx arg1 = XVECEXP (vals, 0, 1);
-
-  /* If we have two elements and are concatting vector.  */
-  machine_mode elem_mode = GET_MODE (arg0);
+  /* We have to be concatting vector.  */
+  machine_mode elem_mode = GET_MODE (XVECEXP (vals, 0, 0));
   gcc_assert (VECTOR_MODE_P (elem_mode));
 
-  arg0 = force_reg (elem_mode, arg0);
-  arg1 = force_reg (elem_mode, arg1);
-  emit_insn (gen_aarch64_pack_partial (mode, target, arg0, arg1));
+  auto_vec worklist;
+  machine_mode wider_mode
+= related_vector_mode (elem_mode, GET_MODE_INNER (elem_mode),
+  GET_MODE_NUNITS (elem_mode) * 2).require ();
+
+  /* First create the wider first level pack,  which also allows us to force
+ the values to registers and put the elements in a more convenient
+ data structure.  */
+
+  for (int i = 0; i < nelts; i+=2)
+{
+  rtx arg0 = XVECEXP (vals, 0, i);
+  rtx arg1 = XVECEXP (vals, 0, i + 1);
+  arg0 = force_reg (elem_mode, arg0);
+  arg1 = force_reg (elem_mode, arg1);
+  rtx tmp = gen_reg_rtx (wider_mode);
+  emit_insn (gen_aarch64_pack_partial (wider_mode, tmp, arg0, arg1));
+  worklist.safe_push (tmp);
+}
+
+  /* Keep widening pairwise to have maximum throughput.  */
+  while (worklist.length () > 1)
+{
+  rtx arg0 = worklist.pop ();
+  rtx arg1 = worklist.pop ();
+  gcc_assert (GET_MODE (arg0) == GET_MODE (arg1));
+
+  wider_mode
+   = related_vector_mode (wider_mode, GET_MODE_INNER (wider_mode),
+  GET_MODE_NUNITS (wider_mode) * 2).require ();
+
+  rtx tmp = gen_reg_rtx (wider_mode);
+  emit_insn (gen_aarch64_pack_partial (wider_mode, tmp, arg0, arg1));
+  worklist.safe_push (tmp);
+}
+
+  gcc_assert (wider_mode == mode);
+  emit_move_insn (target, worklist.pop ());
+
   return;
 }
 
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 
07b97547cb292e6d4dc1040173a5d5525fb268d5..60bad5c72bc667be19f8224af87ec5451b4b1a9a
 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.m

[PATCH] c++: Fix up reshape_* RAW_DATA_CST handling [PR118214]

2025-01-02 Thread Jakub Jelinek

Hi!

The embed-17.C testcase is miscompiled and pr118214.C testcase used to be
miscompiled on the trunk before I've temporarily reverted the
r15-6339 C++ large initializer speed-up commit in r15-6448.
The problem is that reshape_* is only sometimes allowed to modify the given
CONSTRUCTOR in place (when reuse is true, so
first_initializer_p
&& (complain & tf_error)
&& !CP_AGGREGATE_TYPE_P (elt_type)
&& !TREE_SIDE_EFFECTS (first_initializer_p)
) and at other times is not allowed to change it.  But the RAW_DATA_CST
handling was modifying those in place always, by peeling off whatever
was needed for the processing of the current element or set of elements
and leaving the rest in the original CONSTRUCTOR_ELTS, either as
RAW_DATA_CST with adjusted RAW_DATA_POINTER/RAW_DATA_LENGTH, or turning
it into INTEGER_CST if it would be a RAW_DATA_LENGTH == 1 RAW_DATA_CST.

The following patch fixes that by adding raw_idx member into
struct reshape_iter where we for the RAW_DATA_CST current elements track
offset into the current RAW_DATA_CST (how many elements were processed
from it already) and modifying the original CONSTRUCTOR_ELTS only if reuse
is true and we used the whole RAW_DATA_CST (with zero raw_idx); which means
just modifying its type in place.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2025-01-02  Jakub Jelinek  

PR c++/118214
* decl.cc (struct reshape_iter): Add raw_idx member.
(cp_maybe_split_raw_data): Add inc_cur parameter, set *inc_cur,
don't modify original CONSTRUCTOR, use d->raw_idx to track index
into a RAW_DATA_CST d->cur->value.
(consume_init): Adjust cp_maybe_split_raw_data caller, increment
d->cur when cur_inc is true.
(reshape_init_array_1): Don't modify original CONSTRUCTOR when
handling RAW_DATA_CST d->cur->value and !reuse, instead use
d->raw_idx to track index into RAW_DATA_CST.
(reshape_single_init): Initialize iter.raw_idx.
(reshape_init_class): Adjust for introduction of d->raw_idx,
adjust cp_maybe_split_raw_data caller, do d->cur++ if inc_cur
rather than when it returns non-NULL.
(reshape_init_r): Likewise.
(reshape_init): Initialize d.raw_idx.

* g++.dg/cpp/embed-17.C: New test.
* g++.dg/cpp0x/pr118214.C: New test.

--- gcc/cp/decl.cc.jj   2024-12-27 16:03:52.622536496 +0100
+++ gcc/cp/decl.cc  2024-12-30 13:35:31.585034994 +0100
@@ -6816,11 +6816,13 @@ check_for_uninitialized_const_var (tree
 
 /* Structure holding the current initializer being processed by reshape_init.
CUR is a pointer to the current element being processed, END is a pointer
-   after the last element present in the initializer.  */
+   after the last element present in the initializer and RAW_IDX is index into
+   RAW_DATA_CST if that is CUR elt.  */
 struct reshape_iter
 {
   constructor_elt *cur;
   constructor_elt *end;
+  unsigned raw_idx;
 };
 
 static tree reshape_init_r (tree, reshape_iter *, tree, tsubst_flags_t);
@@ -6888,18 +6890,20 @@ is_direct_enum_init (tree type, tree ini
 }
 
 /* Helper function for reshape_init*.  Split first element of
-   RAW_DATA_CST and save the rest to d->cur->value.  */
+   RAW_DATA_CST or return NULL for other elements.  Set *INC_CUR
+   to true if the whole d->cur has been consumed.  */
 
 static tree
-cp_maybe_split_raw_data (reshape_iter *d)
+cp_maybe_split_raw_data (reshape_iter *d, bool *inc_cur)
 {
+  *inc_cur = true;
   if (TREE_CODE (d->cur->value) != RAW_DATA_CST)
 return NULL_TREE;
-  tree ret = *raw_data_iterator (d->cur->value, 0);
-  ++RAW_DATA_POINTER (d->cur->value);
-  --RAW_DATA_LENGTH (d->cur->value);
-  if (RAW_DATA_LENGTH (d->cur->value) == 1)
-d->cur->value = *raw_data_iterator (d->cur->value, 0);
+  tree ret = *raw_data_iterator (d->cur->value, d->raw_idx++);
+  if (d->raw_idx != (unsigned) RAW_DATA_LENGTH (d->cur->value))
+*inc_cur = false;
+  else
+d->raw_idx = 0;
   return ret;
 }
 
@@ -6911,9 +6915,11 @@ cp_maybe_split_raw_data (reshape_iter *d
 static tree
 consume_init (tree init, reshape_iter *d)
 {
-  if (tree raw_init = cp_maybe_split_raw_data (d))
-return raw_init;
-  d->cur++;
+  bool inc_cur;
+  if (tree raw_init = cp_maybe_split_raw_data (d, &inc_cur))
+init = raw_init;
+  if (inc_cur)
+d->cur++;
   return init;
 }
 
@@ -6972,10 +6978,8 @@ reshape_init_array_1 (tree elt_type, tre
 {
   tree elt_init;
   constructor_elt *old_cur = d->cur;
-  const char *old_raw_data_ptr = NULL;
-
-  if (TREE_CODE (d->cur->value) == RAW_DATA_CST)
-   old_raw_data_ptr = RAW_DATA_POINTER (d->cur->value);
+  unsigned int old_raw_idx = d->raw_idx;
+  bool old_raw_data_cst = TREE_CODE (d->cur->value) == RAW_DATA_CST;
 
   if (d->cur->index)
CONSTRUCTOR_IS_DESIGNATED_INIT (new_init) = true;
@@ -6988,25 +6992,23 @@ reshape_init_array_1 (tree elt_type, tre

Re: [PATCH v5 04/10] OpenMP: Robustify C front end handling of attribute-syntax pragmas

2025-01-02 Thread Tobias Burnus


Sandra Loosemore wrote:

[in https://gcc.gnu.org/pipermail/gcc-patches/2024-November/669055.html ]

Presently, the code to handle OpenMP attribute-syntax pragmas in the C
front end assumes nothing else is messing with redirecting
parser->tokens, and makes no provision for restoring it from anything
other than parser->tokens_buf when the buffer allocated for the pragma
is exhausted.  Adding support for metadirectives will change that,
since it also needs to buffer tokens for the metadirective
alternatives, and an attribute-syntax directive can appear inside a
metadirective.
(Cf. token->in_omp_attribute_pragma handling in 
c_parser_omp_metadirective, added in part 5 of the patch series, 
https://gcc.gnu.org/pipermail/gcc-patches/2024-November/669058.html )

This patch adds a more general save/restore mechanism attached to the
parser via a pointer to a new struct omp_attribute_pragma_state.

gcc/c/ChangeLog

* c-parser.cc (struct c_parser): Change in_omp_attribute_pragma
field to be of type struct omp_attribute_pragma_state.
(struct omp_attribute_pragma_state): New.
(c_parser_skip_until_found): Use the new way to restore state
on EOF.
(c_parser_skip_to_pragma_eol): Likewise.
(c_parser_handle_statement_omp_attributes): Create an
omp_attribute_pragma_state to hold the restore state.
(omp_maybe_parse_omp_decl): Likewise.

...

--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -262,13 +262,24 @@ struct GTY(()) c_parser {
struct omp_for_parse_data * GTY((skip)) omp_for_parse_state;
  
/* If we're in the context of OpenMP directives written as C23

- attributes turned into pragma, vector of tokens created from that,
- otherwise NULL.  */
-  vec *in_omp_attribute_pragma;
+ attributes turned into pragma, the tokens field is temporarily
+ redirected.  This holds data needed to restore state afterwards.
+ It's NULL otherwise.  */
+  struct omp_attribute_pragma_state *in_omp_attribute_pragma;
  
/* Set for omp::decl attribute parsing to the decl to which it

   appertains.  */
tree in_omp_decl_attribute;
+
+};


Can you remove the added empty line?


+/* Holds data needed to restore the token stream to its previous state
+   after parsing an OpenMP attribute-syntax pragma.  */
+struct GTY(()) omp_attribute_pragma_state
+{
+  vec *token_vec;
+  c_token * GTY((skip)) save_tokens;
+  unsigned int save_tokens_avail;
  };

...

@@ -7187,9 +7200,13 @@ c_parser_handle_statement_omp_attributes (c_parser *parser, 
tree &attrs,
tok.flags = tokens_avail;
toks->quick_push (tok);

IMHO, 'tok.flags = tokens_avail' is no longer needed.

+  gcc_assert (!parser->in_omp_attribute_pragma);
+  parser->in_omp_attribute_pragma = ggc_alloc ();
+  parser->in_omp_attribute_pragma->token_vec = toks;
+  parser->in_omp_attribute_pragma->save_tokens = parser->tokens;
+  parser->in_omp_attribute_pragma->save_tokens_avail = tokens_avail;
parser->tokens = toks->address ();
parser->tokens_avail = tokens;
-  parser->in_omp_attribute_pragma = toks;
return true;
  }
  
@@ -27166,9 +27183,13 @@ c_maybe_parse_omp_decl (tree decl, tree d)

tok.flags = tokens_avail;
toks->quick_push (tok);


Likewise, IMHO, 'tok.flags = tokens_avail' is no longer needed.

Otherwise, LGTM.

Tobias

Re: [PATCH] c: special-case some "bool" errors with C23 (v2) [PR117629]

2025-01-02 Thread Joseph Myers

On Thu, 19 Dec 2024, David Malcolm wrote:

> Here's an updated version of the patch.
> 
> Changed in v2:
> - distinguish between "bool" and "_Bool" when determining
>   standard version
> - more test coverage
> 
> Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> OK for trunk?

OK.  (I'm guessing the other new keywords that weren't previously reserved 
(alignas, alignof, constexpr, nullptr, static_assert, thread_local, 
typeof, typeof_unqual) are sufficiently rare as identifiers that they 
aren't worth trying to produce better diagnostics for.)

-- 
Joseph S. Myers
josmy...@redhat.com

Re: [PATCH] c/c++: UX improvements to 'too {few, many} arguments' errors [PR118112]

2025-01-02 Thread Joseph Myers

On Thu, 19 Dec 2024, David Malcolm wrote:

> gcc/c/ChangeLog:
>   PR c/118112
>   * c-typeck.cc (inform_declaration): Add "function_expr" param and
>   use it for cases where we couldn't show the function decl to show
>   field decls for callbacks.
>   (build_function_call_vec): Add missing auto_diagnostic_group.
>   Update for new param of inform_declaration.
>   (convert_arguments): Likewise.  For the "too many arguments" case
>   add the expected vs actual counts to the message, and if we have
>   it, add the location_t of the first surplus param as a secondary
>   location within the diagnostic.  For the "too few arguments" case,
>   determine the minimum number of arguments required and add the
>   expected vs actual counts to the message, tweaking it to "at least"
>   for variadic functions.

The C front-end changes are OK.  (More specifically informing the user 
when C23 changes to the semantics of () are involved in "too many 
arguments" would also be good as a followup, though as discussed that 
would require propagating information about use of () further into the 
front-end.)

-- 
Joseph S. Myers
josmy...@redhat.com

Re: [WIP 0/8] Algol 68 GCC Front-End

2025-01-02 Thread Jose E. Marchesi



> As to where to host the project, the obvious choice is perhaps a
> project at sourceware, but it would be nice if we could develop the
> front-end in a branch in the GCC git repo, to have a mailing list
> under gcc.gnu.org and to use a page in the GCC wiki to track the FE
> progress... please let me know if that is feasible.

After chatting with Frank I thought it would be interesting to use the
experimental forge at sourceware, so until the front-end is ready for
official submission it will be hosted in:

  https://forge.sourceware.org/jemarch/a68-gcc branch a68

I also went ahead and created a page in the wiki to track the
development:

  https://gcc.gnu.org/wiki/Algol68FrontEnd

Salud!

RE: [PATCH] COBOL 1/8 hdr: header files

2025-01-02 Thread Joseph Myers

On Thu, 19 Dec 2024, Robert Dubner wrote:

> At compile-time (or on the host), we also do numeric calculations.  The
> ISO specification allows for compile-time computations specified in the
> source code.  In addition, at times I put initial values for the COBOL
> variables into the run-time structures that are the COBOL variables.  In
> order to create those CONSTRUCTOR nodes we have to do those calculations
> at compile time, hence the use of __int128 and _Float128 in the host code.
> 
> In the run-time/host code, I have been using intTI_type_node for __int128,
> and unsigned_intTI_type_node for __uint128.  For floating point, I've been
> using float32_type_node, float64_type_node, and float128_type_node.
> 
> If there are recommendations as to what would work better across other
> architectures, I am all ears.

As has been noted, wide_int can be used for large integer arithmetic 
within the compiler.  For floating-point arithmetic, there are the 
interfaces in real.h to handle floating point within GCC's internal 
representation (note that some of the operations might leave a result with 
extra internal precision; see how fold-const.cc:const_binop calls 
real_arithmetic followed by real_convert to ensure correct rounding to the 
desired format, for example).

I don't know if you support decimal floating-point in your front end, but 
the real.h interfaces deal with that as well (via calling out to 
libdecnumber for arithmetic on DFP values).  Again, DFP support on the 
target (which is supported for a much more limited set of architectures - 
just aarch64 / powerpc / x86 / x86_64 / s390) does not depend on DFP 
support on the host, since libdecnumber handles all the DFP arithmetic 
required within the compiler.

-- 
Joseph S. Myers
josmy...@redhat.com

65 matches

Mail list logo