Re: 6/7 [Fortran, Patch, Coarray, PR107635] Add transfer_between_remotes

2025-02-21 Thread Rainer Orth
Hi Andre,

> thanks for testing and also for giving a solution. I was sure, I would break
> something. Fixed as obvious as gcc-15-7662-g08bdc2ac98a after regtesting ok on
> x86_64-pc-linux-gnu / Fedora 41 (which gets more unstable with every update I
> receive IMO). I opted for the __builtin_alloca version. That seemed to be the
> most portable one.

agreed, and the least intrusive one.

> Thanks again for the report and keep testing please,

Sure: usually once a day ;-)

Thanks for the quick fix.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: 6/7 [Fortran, Patch, Coarray, PR107635] Add transfer_between_remotes

2025-02-21 Thread Andre Vehreschild
Hi Rainer,

thanks for testing and also for giving a solution. I was sure, I would break
something. Fixed as obvious as gcc-15-7662-g08bdc2ac98a after regtesting ok on
x86_64-pc-linux-gnu / Fedora 41 (which gets more unstable with every update I
receive IMO). I opted for the __builtin_alloca version. That seemed to be the
most portable one.

Thanks again for the report and keep testing please,
Andre

On Thu, 20 Feb 2025 19:26:41 +0100
Rainer Orth  wrote:

> Hi Andre,
> 
> this patch broke Solaris bootstrap:
> 
> /vol/gcc/src/hg/master/local/libgfortran/caf/single.c: In function
> ‘_gfortran_caf_transfer_between_remotes’:
> /vol/gcc/src/hg/master/local/libgfortran/caf/single.c:675:23: error: implicit
> declaration of function ‘alloca’ [-Wimplicit-function-declaration] 675 |
>  transfer_desc = alloca (desc_size); |   ^~
> /vol/gcc/src/hg/master/local/libgfortran/caf/single.c:675:23: warning:
> incompatible implicit declaration of built-in function ‘alloca’
> [-Wbuiltin-declaration-mismatch]
> /vol/gcc/src/hg/master/local/libgfortran/caf/single.c:680:20: warning:
> incompatible implicit declaration of built-in function ‘alloca’
> [-Wbuiltin-declaration-mismatch] 680 | transfer_ptr = alloca
> (*opt_dst_charlen * src_size); |^~ make[3]: ***
> [Makefile:4675: caf/single.lo] Error 1
> 
> Solaris needs to include  to get an alloca declaration.  While
> this could be handled with AC_FUNC_ALLOCA (like libcpp does) or checking
> for alloca.h and including it if found (like libssp does), I guess
> there's a simpler way: several runtime libs use __builtin_alloca
> unconditionally (libgcc, libquadmath, libstdc++-v3).
> 
>   Rainer
> 


-- 
Andre Vehreschild * Email: vehre ad gmx dot de 
From 76b2b1ee14024bba0a618ee98505550cd7c41a1d Mon Sep 17 00:00:00 2001
From: Andre Vehreschild 
Date: Fri, 21 Feb 2025 08:18:40 +0100
Subject: [PATCH] Fortran: Fix build on solaris [PR107635]

libgfortran/ChangeLog:

	PR fortran/107635
	* caf/single.c: Replace alloca with __builtin_alloca.
---
 libgfortran/caf/single.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libgfortran/caf/single.c b/libgfortran/caf/single.c
index d4e081be4dd..9c1c0c1bc8c 100644
--- a/libgfortran/caf/single.c
+++ b/libgfortran/caf/single.c
@@ -672,12 +672,12 @@ _gfortran_caf_transfer_between_remotes (
   if (!scalar_transfer)
 {
   const size_t desc_size = sizeof (*transfer_desc);
-  transfer_desc = alloca (desc_size);
+  transfer_desc = __builtin_alloca (desc_size);
   memset (transfer_desc, 0, desc_size);
   transfer_ptr = transfer_desc;
 }
   else if (opt_dst_charlen)
-transfer_ptr = alloca (*opt_dst_charlen * src_size);
+transfer_ptr = __builtin_alloca (*opt_dst_charlen * src_size);
   else
 {
   buffer = NULL;
--
2.48.1



Re: [PATCH] Fortran: initialize non-saved pointers with -fcheck=pointer [PR48958]

2025-02-21 Thread Andre Vehreschild
Hi Harald,

seconding Thomas here, thanks for the patch.

- Andre

On Thu, 20 Feb 2025 21:18:01 +0100
Harald Anlauf  wrote:

> Dear all,
>
> the attached simple patch addresses a small, left-over issue in the
> above PR and was already OK'ed in the PR by Thomas.  With it we
> initialize the data component of a non-saved pointer variable when
> -fcheck=pointer is specified, so that subsequent pointer checks
> (e.g. for the SIZE intrinsic) trigger consistently and not randomly.
>
> I plan to commit within 24h unless there are comments.
>
> Regtested on x86_64-pc-linux-gnu.  ON for mainline?
>
> Thanks,
> Harald
>


--
Andre Vehreschild * Email: vehre ad gmx dot de


[PATCH] Improve g++.dg/torture/pr118521.C

2025-02-21 Thread Richard Biener
Alexander pointed out the way to do a dg-bogus in an included header.

Tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/118521
* g++.dg/torture/pr118521.C: Use dg-bogus properly.
---
 gcc/testsuite/g++.dg/torture/pr118521.C | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/g++.dg/torture/pr118521.C 
b/gcc/testsuite/g++.dg/torture/pr118521.C
index 1a66aca0e8d..08836bd5f0e 100644
--- a/gcc/testsuite/g++.dg/torture/pr118521.C
+++ b/gcc/testsuite/g++.dg/torture/pr118521.C
@@ -1,7 +1,7 @@
 // { dg-do compile }
 // { dg-additional-options "-Wall" }
 
-#include  // dg-bogus new_allocator.h:191 warning: writing 1 byte into 
a region of size 0
+#include  // { dg-bogus "writing 1 byte into a region of size 0" "" { 
target *-*-* } 0 }
 
 void bar(std::vector);
 
-- 
2.43.0


Re: [PATCH v3] aarch64: Recognize vector permute patterns suitable for FMOV [PR100165]

2025-02-21 Thread Richard Sandiford
Richard Sandiford  writes:
> I think this would also simplify the evpc detection, since the requirement
> for using AND is the same for big-endian and little-endian, namely that
> index I of the result must either come from index I of the nonzero
> vector or from any element of the zero vector.  (What differs between
> big-endian and little-endian is which masks correspond to FMOV.)

Or perhaps more accurately, what differs between big-endian and
little-endian is the constant that needs to be materialised for a
given permute mask.  I think the easiest way of handling that would
be to construct an array of target_units (0xffs for bytes that come
from the nonzero vector, 0x00s for bytes that come from the zero
vector) and then get native_encode_rtx to convert that into a vector
constant.  native_encode_rtx will then do the endian correction for us.

Thanks,
Richard



Re: The COBOL front end, version 3, now in 14 easy pieces (+NIST)

2025-02-21 Thread Florian Weimer
* James K. Lowden:

> As I mentioned in other posts, IMO (IANAL) the copyright in
> unimportant and probably unenforceable.  The National Computing
> Centre no longer exists, and the document was also published by NIST
> which, as part of the US government, does not copyright its
> publications.

In the United States.  It is generally assumed that the
U.S. government can claim copyright on its works abroad.


Re: [PATCH v2] rs6000: Adding missed ISA 3.0 atomic memory operation instructions.

2025-02-21 Thread Michael Meissner
On Thu, Feb 20, 2025 at 07:41:26PM +0530, jeevitha wrote:
> Hi All,
> 
> The following patch has been bootstrapped and regtested on powerpc64le-linux.
> 
> Changes to amo.h include the addition of the following load atomic operations:
> Compare and Swap Not Equal, Fetch and Increment Bounded, Fetch and Increment
> Equal, and Fetch and Decrement Bounded. Additionally, Store Twin is added for
> store atomic operations.
> 
> 2025-02-20 Peter Bergner 
> 
> gcc/:
>   * config/rs6000/amo.h: Add missing atomic memory operations.
>   * doc/extend.texi (PowerPC Atomic Memory Operation Functions):
> Document new functions.
> 
> gcc/testsuite/:
>   * gcc.target/powerpc/amo3.c: New test.
>   * gcc.target/powerpc/amo4.c: Likewise.
>   * gcc.target/powerpc/amo5.c: Likewise.
>   * gcc.target/powerpc/amo6.c: Likewise.
>   * gcc.target/powerpc/amo7.c: Likewise.
> 
> Co-authored-by: Jeevitha Palanisamy  

It looks reasonable to me.  Hopefully Segher will approve.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Re: BPF, nvptx: Standardize on 'sorry, unimplemented: dynamic stack allocation not supported'

2025-02-21 Thread Jose E. Marchesi


>> diff --git a/gcc/testsuite/gcc.target/bpf/diag-alloca-1.c 
>> b/gcc/testsuite/gcc.target/bpf/diag-alloca-1.c
>> index 0406f2c3595..e549cab84ca 100644
>> --- a/gcc/testsuite/gcc.target/bpf/diag-alloca-1.c
>> +++ b/gcc/testsuite/gcc.target/bpf/diag-alloca-1.c
>> @@ -3,7 +3,8 @@
>>  int
>>  foo (int x)
>>  {
>> -  int *p = __builtin_alloca (x); /* { dg-error "support" } */
>> +  int *p = __builtin_alloca (x);
>> +  /* { dg-message {sorry, unimplemented: dynamic stack allocation not 
>> supported} {} { target *-*-* } .-1 } */
>>  
>>return p[2];
>>  }
>
> I am bit suprised that works.  Doesn't dg-message trigger a location
> (line) check like dg-error and dg-warning do?

Ok the .-1  Nice :)


Re: [PATCH v2] ira: Add a target hook for callee-saved register cost scale

2025-02-21 Thread Richard Sandiford
Jan Hubicka  writes:
>> 
>> Thanks for running these.  I saw poor results for perlbench with my
>> initial aarch64 hooks because the hooks reduced the cost to zero for
>> the entry case:
>> 
>>  auto entry_cost = targetm.callee_save_cost
>>(spill_cost_type::SAVE, hard_regno, mode, saved_nregs,
>> ira_memory_move_cost[mode][rclass][0] * saved_nregs / nregs,
>> allocated_callee_save_regs, existing_spills_p);
>>  /* In the event of a tie between caller-save and callee-save,
>> prefer callee-save.  We apply this to the entry cost rather
>> than the exit cost since the entry frequency must be at
>> least as high as the exit frequency.  */
>>  if (entry_cost > 0)
>>entry_cost -= 1;
>> 
>> I "fixed" that by bumping the cost to a minimum of 2, but I was
>> wondering whether the "entry_cost > 0" should instead be "entry_cost > 1",
>> so that the cost is always greater than not using a callee save for
>> registers that don't cross a call.  WDYT?
>
> For x86 perfomance costs, the push cost should be memory_move_cost which
> is 6, -2 for adjustment in the target hook and -1 for this. So cost
> should not be 0 I think.
>
> For size cost, I currently return 1, so we indeed get 0 after
> adjustment.
>
> I think cost of 0 will make us to pick callee save even if caller save
> is available and there are no function calls, so I guess we do not want
> that

OK, here's an updated patch that makes that change.  The x86 parts
should be replaced by your patch.

Tested on aarch64-linux-gnu.  I also tried to test on pwoerpc64el-linux-gnu
(on gcc112), but I keep getting broken pipes during the test runs,
so I'm struggling to get good before/after comparisons.  It does at
least bootstrap though...

I'm going to be away next week.  If there's agreement, please feel free
to combine the patch with yours and push it while I'm away.

Thanks,
Richard


>From 164c8590ce2f0db28a1b91efc3f31e5feb4c6b9e Mon Sep 17 00:00:00 2001
From: Richard Sandiford 
Date: Fri, 7 Feb 2025 15:40:21 +
Subject: [PATCH] ira: Add new hooks for callee-save vs spills [PR117477]
To: gcc-patches@gcc.gnu.org

Following on from the discussion in:

  https://gcc.gnu.org/pipermail/gcc-patches/2025-February/675256.html

this patch removes TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE and
replaces it with two hooks: one that controls the cost of using an
extra callee-saved register and one that controls the cost of allocating
a frame for the first spill.

(The patch does not attempt to address the shrink-wrapping part of
the thread above.)

On AArch64, this is enough to fix PR117477, as verified by the new tests.
The patch does not change the SPEC2017 scores significantly.  (I saw a
slight improvement in fotonik3d and roms, but I'm not convinced that
the improvements are real.)

The patch makes IRA use caller saves for gcc.target/aarch64/pr103350-1.c,
which is a scan-dump correctness test that relies on not using
caller saves.  The decision to use caller saves looks appropriate,
and saves an instruction, so I've just added -fno-caller-saves
to the test options.

gcc/
	PR rtl-optimization/117477
	* config/aarch64/aarch64.cc (aarch64_count_saves): New function.
	(aarch64_count_above_hard_fp_saves, aarch64_callee_save_cost)
	(aarch64_frame_allocation_cost): Likewise.
	(TARGET_CALLEE_SAVE_COST): Define.
	(TARGET_FRAME_ALLOCATION_COST): Likewise.
	* config/i386/i386.cc (ix86_ira_callee_saved_register_cost_scale):
	Replace with...
	(ix86_callee_save_cost): ...this new hook.
	(TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE): Delete.
	(TARGET_CALLEE_SAVE_COST): Define.
	* target.h (spill_cost_type, frame_cost_type): New enums.
	* target.def (callee_save_cost, frame_allocation_cost): New hooks.
	(ira_callee_saved_register_cost_scale): Delete.
	* doc/tm.texi.in (TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE): Delete.
	(TARGET_CALLEE_SAVE_COST, TARGET_FRAME_ALLOCATION_COST): New hooks.
	* doc/tm.texi: Regenerate.
	* hard-reg-set.h (hard_reg_set_popcount): New function.
	* ira-color.cc (allocated_memory_p): New variable.
	(allocated_callee_save_regs): Likewise.
	(record_allocation): New function.
	(assign_hard_reg): Use targetm.frame_allocation_cost to model
	the cost of the first spill or first caller save.  Use
	targetm.callee_save_cost to model the cost of using new callee-saved
	registers.  Apply the exit rather than entry frequency to the cost
	of restoring a register or deallocating the frame.  Update the
	new variables above.
	(improve_allocation): Use record_allocation.
	(color): Initialize allocated_callee_save_regs.
	(ira_color): Initialize allocated_memory_p.
	* targhooks.h (default_callee_save_cost): Declare.
	(default_frame_allocation_cost): Likewise.
	* targhooks.cc (default_callee_save_cost): New function.
	(default_frame_allocation_cost): Likewise.

gcc/testsuite/
	PR rtl-optimization/117477
	* gcc.target/aarch64/callee_save_1.c: New test.
	* gcc.target/aarch64/

Re: The COBOL front end, version 3, now in 14 easy pieces (+NIST)

2025-02-21 Thread James K. Lowden
On Fri, 21 Feb 2025 01:17:15 + (UTC)
Joseph Myers  wrote:

> > The Makefile fetches the NIST archive from our website.   
> 
> The normal build and test process ("make" and "make check") must
> never rely on any network connectivity.  

Fair enough.  I haven't added the NIST tests and documentation for 2 reasons:

1.  $ ls -sh gcc/cobol/nist/newcob.val.Z # source code archive
4.3M gcc/cobol/nist/newcob.val.Z

2.  Page 1 of the documentation contains
CCVS 85
USER GUIDE
Version 4.2
Copyright 1993
The National Computing Centre Limited All rights reserved.

The PDF document, for reference, is 7.1 MB. 

As I mentioned in other posts, IMO (IANAL) the copyright in unimportant and 
probably unenforceable.  The National Computing Centre no longer exists, and 
the document was also published by NIST which, as part of the US government, 
does not copyright its publications.  

I didn't want to ask the FSF their legal opinion until GCC agreed that it wants 
to accept these artifacts into its repository.  If you want me to do that, I 
will.  

--jkl


Ping [PATCH 0/2] v2 Add prime path coverage to gcc/gcov

2025-02-21 Thread Jørgen Kvalsvik

Ping

On 2/12/25 16:30, Jørgen Kvalsvik wrote:

I have applied fixes for everything in the last review, plus some GNU
style fixes that I had missed previously. We have tested and used a
build with this applied for 3-4 months now and haven't run into any
issues.

Jørgen Kvalsvik (2):
   gcov: branch, conds, calls in function summaries
   Add prime path coverage to gcc/gcov

  gcc/Makefile.in|6 +-
  gcc/builtins.cc|2 +-
  gcc/collect2.cc|6 +-
  gcc/common.opt |   16 +
  gcc/doc/gcov.texi  |  187 +++
  gcc/doc/invoke.texi|   36 +
  gcc/gcc.cc |4 +-
  gcc/gcov-counter.def   |3 +
  gcc/gcov-io.h  |3 +
  gcc/gcov.cc|  535 +-
  gcc/ipa-inline.cc  |2 +-
  gcc/passes.cc  |4 +-
  gcc/path-coverage.cc   |  776 +
  gcc/prime-paths.cc | 2052 
  gcc/profile.cc |6 +-
  gcc/selftest-run-tests.cc  |1 +
  gcc/selftest.h |1 +
  gcc/testsuite/g++.dg/gcov/gcov-22.C|  170 ++
  gcc/testsuite/g++.dg/gcov/gcov-23-1.h  |9 +
  gcc/testsuite/g++.dg/gcov/gcov-23-2.h  |9 +
  gcc/testsuite/g++.dg/gcov/gcov-23.C|   30 +
  gcc/testsuite/gcc.misc-tests/gcov-29.c |  869 ++
  gcc/testsuite/gcc.misc-tests/gcov-30.c |  869 ++
  gcc/testsuite/gcc.misc-tests/gcov-31.c |   35 +
  gcc/testsuite/gcc.misc-tests/gcov-32.c |   24 +
  gcc/testsuite/gcc.misc-tests/gcov-33.c |   27 +
  gcc/testsuite/gcc.misc-tests/gcov-34.c |   29 +
  gcc/testsuite/lib/gcov.exp |  118 +-
  gcc/tree-profile.cc|   11 +-
  29 files changed, 5818 insertions(+), 22 deletions(-)
  create mode 100644 gcc/path-coverage.cc
  create mode 100644 gcc/prime-paths.cc
  create mode 100644 gcc/testsuite/g++.dg/gcov/gcov-22.C
  create mode 100644 gcc/testsuite/g++.dg/gcov/gcov-23-1.h
  create mode 100644 gcc/testsuite/g++.dg/gcov/gcov-23-2.h
  create mode 100644 gcc/testsuite/g++.dg/gcov/gcov-23.C
  create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-29.c
  create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-30.c
  create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-31.c
  create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-32.c
  create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-33.c
  create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-34.c





Re: The COBOL front end, version 3, now in 14 easy pieces (+NIST)

2025-02-21 Thread Paul Koning



> On Feb 21, 2025, at 1:59 PM, Florian Weimer  wrote:
> 
> * James K. Lowden:
> 
>> As I mentioned in other posts, IMO (IANAL) the copyright in
>> unimportant and probably unenforceable.  The National Computing
>> Centre no longer exists, and the document was also published by NIST
>> which, as part of the US government, does not copyright its
>> publications.
> 
> In the United States.  It is generally assumed that the
> U.S. government can claim copyright on its works abroad.

Really?  That's puzzling.  It's my understanding that works of the United 
States are, by law, in the public domain.

paul

Re: [PATCH] Further use of mod_scope in modified_type_die

2025-02-21 Thread Tom Tromey
Tom> While working on this I found a couple of paths in modified_type_die
Tom> where "mod_scope" should be used, but is not.  I suspect these cases
Tom> are only reachable by Ada code, as in both spots (subrange types and
Tom> base types), I believe that other languages don't generally have named
Tom> types in a non-top-level scope, and in these other situations,
Tom> mod_scope will still be correct.

Tom> Ping.

Ping.

Tom


Re: The COBOL front end, version 3, now in 14 easy pieces (+NIST)

2025-02-21 Thread Florian Weimer
* Paul Koning:

>> On Feb 21, 2025, at 1:59 PM, Florian Weimer  wrote:
>> 
>> * James K. Lowden:
>> 
>>> As I mentioned in other posts, IMO (IANAL) the copyright in
>>> unimportant and probably unenforceable.  The National Computing
>>> Centre no longer exists, and the document was also published by NIST
>>> which, as part of the US government, does not copyright its
>>> publications.
>> 
>> In the United States.  It is generally assumed that the
>> U.S. government can claim copyright on its works abroad.
>
> Really?  That's puzzling.  It's my understanding that works of the
> United States are, by law, in the public domain.

I don't think the rationale has changed since 1976:

| Copyright Law Revision (House Report No. 94-1476)

| The prohibition on copyright protection for United States Government
| works is not intended to have any effect on protection of these
| works abroad. Works of the governments of most other countries are
| copyrighted. There are no valid policy reasons for denying such
| protection to United States Government works in foreign countries,
| or for precluding the Government from making licenses for the use of
| its works abroad.



Regarding the claim about copyright held by foreign governments, I
think this may have been a bit dubuous because functional works such
as laws seem to be exempted in many countries.  But I think the United
States may be unique in extended this to software written by
government employees.


Re: The COBOL front end, version 3, now in 14 easy pieces (+NIST)

2025-02-21 Thread James K. Lowden
On Fri, 21 Feb 2025 13:00:13 +0100
Richard Biener  wrote:

> > Branches in git don't have independent permissions.  If we use
> > gcc.gnu.org git, are we granted commit rights with the priviso that
> > we color inside the lines, and commit only to our own branches?
> 
> My expectation is that by contributing the COBOL frontend you are
> volunteering to be maintainers for it which grants you permission
> (as in reviewing your own and other peoples patches) in that area.
> As part of the contribution you should also get commit access to GCCs
> git repository (see https://gcc.gnu.org/gitwrite.html)

Understood.  And, yes, we're committed to continuing to maintain the COBOL 
front end.  

> > 2.  Where should the NIST archive and documentation be hosted?
> 
> I guess gcc/testsuite/cobol/nist or sth like that.

ok

> We for example have ./contrib/download_prerequesites
> so I envision a ./contrib/download_cobol_nist that would download
> the relevant parts and installs it in the appropriate places in the source
> directory so that a later make check-gcc-cobol will run the testsuite.

That sounds like a starting point.  I'll take a swing at it and send it as a 
new patch, distinct from the first 14.  

--jkl





Re: The COBOL front end, version 3, now in 14 easy pieces (+NIST)

2025-02-21 Thread Paul Koning


> On Feb 21, 2025, at 2:23 PM, Florian Weimer  wrote:
> 
> * Paul Koning:
> 
>>> On Feb 21, 2025, at 1:59 PM, Florian Weimer  wrote:
>>> 
>>> * James K. Lowden:
>>> 
 As I mentioned in other posts, IMO (IANAL) the copyright in
 unimportant and probably unenforceable.  The National Computing
 Centre no longer exists, and the document was also published by NIST
 which, as part of the US government, does not copyright its
 publications.
>>> 
>>> In the United States.  It is generally assumed that the
>>> U.S. government can claim copyright on its works abroad.
>> 
>> Really?  That's puzzling.  It's my understanding that works of the
>> United States are, by law, in the public domain.
> 
> I don't think the rationale has changed since 1976:
> 
> | Copyright Law Revision (House Report No. 94-1476)
> 
> | The prohibition on copyright protection for United States Government
> | works is not intended to have any effect on protection of these
> | works abroad. Works of the governments of most other countries are
> | copyrighted. There are no valid policy reasons for denying such
> | protection to United States Government works in foreign countries,
> | or for precluding the Government from making licenses for the use of
> | its works abroad.
> 
> 

Interesting, thanks

> Regarding the claim about copyright held by foreign governments, I
> think this may have been a bit dubuous because functional works such
> as laws seem to be exempted in many countries.  But I think the United
> States may be unique in extended this to software written by
> government employees.

It applies to all "works".  For example, this is why photos taken by miltary 
personnel that show up in newspapers are credited to the photographer but don't 
mention copyright, because there isn't any.  Simiilarly photos from NASA.  Or 
topographic maps.  (Those are an example of the "copyright held by foreign 
governments -- I remember "crown copyright" markings on Ordnance Maps in the 
UK.)

paul

Re: [PATCH v2] rs6000: Adding missed ISA 3.0 atomic memory operation instructions.

2025-02-21 Thread Peter Bergner
On 2/20/25 8:11 AM, jeevitha wrote:
> The following patch has been bootstrapped and regtested on powerpc64le-linux.

A note for the future:
  1) When sending a V2, V3, ... patch, please send it as its own email
 thread, meaning, don't hit reply to a previous patch version email,
 since that tends to put them into the same email thread.  I know Segher's
 MUA sorts those differently and your submission might not be seen.
  2) Please add a Changes from V1, etc. lines to your V2, V3, etc. patches,
 so we can see what changed from the previous patch(es), which makes
 reviewing your V2, V3, etc. patches easier.  Something like:

Changes from V2:
 * Added blah blah
 * Removed blah blah
Changes from V1:
 * Modified blah blah
 * Rebase to current trunk.
 * etc.




> Changes to amo.h include the addition of the following load atomic operations:
> Compare and Swap Not Equal, Fetch and Increment Bounded, Fetch and Increment
> Equal, and Fetch and Decrement Bounded. Additionally, Store Twin is added for
> store atomic operations.

I mentioned offline I had some questions about the V2 patch, but after looking
closer at the patch, it was a misunderstanding on my part of what part of the
patch was doing.  ...meaning, the code changes in the patch LGTM, modulo some
white space issues:


>   * doc/extend.texi (PowerPC Atomic Memory Operation Functions):
> Document new functions.

There should be a tab in front of "Document new functions." line, rather than
8 consecutive spaces.


>   * gcc.target/powerpc/amo6.c: Likewise.
>   * gcc.target/powerpc/amo7.c: Likewise.

Both of these test cases have multiple 8 consecutive spaces that should
be converted to tabs too.

Otherwise, LGTM.  We just need Segher's approval now.


Peter




BPF, nvptx: Standardize on 'sorry, unimplemented: dynamic stack allocation not supported'

2025-02-21 Thread Thomas Schwinge
Hi!

José, any objection to me pushing the attached
"BPF, nvptx: Standardize on 'sorry, unimplemented: dynamic stack allocation not 
supported'"?
(Why this is useful, you'll understand later today.)

I've tested BPF as follows:

$ [...]/configure --target=bpf-none
$ make -j12 all-gcc
$ make check-gcc-c RUNTESTFLAGS=bpf.exp

The before vs. after 'diff' of 'gcc/testsuite/gcc/gcc.log' looks as
expected:

[...]
 Executing on host: [...]/xgcc -B[...]/ 
[...]/gcc.target/bpf/diag-alloca-1.c-fdiagnostics-plain-output-ansi 
-pedantic-errors -S -o diag-alloca-1.s(timeout = 300)
 spawn -ignore SIGHUP [...]/xgcc -B[...]/ 
[...]/gcc.target/bpf/diag-alloca-1.c -fdiagnostics-plain-output -ansi 
-pedantic-errors -S -o diag-alloca-1.s
 [...]/gcc.target/bpf/diag-alloca-1.c: In function 'foo':
-[...]/gcc.target/bpf/diag-alloca-1.c:6:12: error: BPF does not support 
dynamic stack allocation
+[...]/gcc.target/bpf/diag-alloca-1.c:6:12: sorry, unimplemented: dynamic 
stack allocation not supported
 compiler exited with status 1
-PASS: gcc.target/bpf/diag-alloca-1.c  (test for errors, line 6)
+PASS: gcc.target/bpf/diag-alloca-1.c  at line 7 (test for warnings, line 6)
 PASS: gcc.target/bpf/diag-alloca-1.c (test for excess errors)
 Executing on host: [...]/xgcc -B[...]/ 
[...]/gcc.target/bpf/diag-alloca-2.c-fdiagnostics-plain-output   -std=gnu89 
-S -o diag-alloca-2.s(timeout = 300)
 spawn -ignore SIGHUP [...]/xgcc -B[...]/ 
[...]/gcc.target/bpf/diag-alloca-2.c -fdiagnostics-plain-output -std=gnu89 -S 
-o diag-alloca-2.s
 [...]/gcc.target/bpf/diag-alloca-2.c: In function 'foo':
-[...]/gcc.target/bpf/diag-alloca-2.c:7:7: error: BPF does not support 
dynamic stack allocation
+[...]/gcc.target/bpf/diag-alloca-2.c:7:7: sorry, unimplemented: dynamic 
stack allocation not supported
 compiler exited with status 1
-PASS: gcc.target/bpf/diag-alloca-2.c  (test for errors, line 7)
+PASS: gcc.target/bpf/diag-alloca-2.c  at line 8 (test for warnings, line 7)
 PASS: gcc.target/bpf/diag-alloca-2.c (test for excess errors)
[...]

(No further testing done for BPF.)


Grüße
 Thomas


>From 8ebac3064696c6d42041e398ebb5a622498523cc Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 21 Feb 2025 11:21:08 +0100
Subject: [PATCH] BPF, nvptx: Standardize on 'sorry, unimplemented: dynamic
 stack allocation not supported'

... instead of BPF: 'error: BPF does not support dynamic stack allocation', and
nvptx: 'sorry, unimplemented: target cannot support alloca'.

	gcc/
	* config/bpf/bpf.md (define_expand "allocate_stack"): Emit
	'sorry, unimplemented: dynamic stack allocation not supported'.
	* config/nvptx/nvptx.md (define_expand "allocate_stack")
	[!TARGET_SOFT_STACK && !(TARGET_PTX_7_3 && TARGET_SM52)]: Likewise.
	gcc/testsuite/
	* gcc.target/bpf/diag-alloca-1.c: Adjust 'dg-message'.
	* gcc.target/bpf/diag-alloca-2.c: Likewise.
	* gcc.target/nvptx/alloca-1-sm_30.c: Likewise.
	* gcc.target/nvptx/vla-1-sm_30.c: Likewise.
	* lib/target-supports.exp (proc check_effective_target_alloca):
	Adjust comment.
---
 gcc/config/bpf/bpf.md   | 5 ++---
 gcc/config/nvptx/nvptx.md   | 2 +-
 gcc/testsuite/gcc.target/bpf/diag-alloca-1.c| 3 ++-
 gcc/testsuite/gcc.target/bpf/diag-alloca-2.c| 4 +++-
 gcc/testsuite/gcc.target/nvptx/alloca-1-sm_30.c | 2 +-
 gcc/testsuite/gcc.target/nvptx/vla-1-sm_30.c| 2 +-
 gcc/testsuite/lib/target-supports.exp   | 2 +-
 7 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/gcc/config/bpf/bpf.md b/gcc/config/bpf/bpf.md
index 12cf9fae855..91d94838e39 100644
--- a/gcc/config/bpf/bpf.md
+++ b/gcc/config/bpf/bpf.md
@@ -121,12 +121,11 @@
   [(match_operand:DI 0 "general_operand" "")
(match_operand:DI 1 "general_operand" "")]
   ""
-  "
 {
-  error (\"BPF does not support dynamic stack allocation\");
+  sorry ("dynamic stack allocation not supported");
   emit_insn (gen_nop ());
   DONE;
-}")
+})
 
  Arithmetic/Logical
 
diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index 532812bcf70..dc19695d040 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -1723,7 +1723,7 @@
 emit_insn (gen_nvptx_alloca (Pmode, operands[0], operands[1]));
   else if (!TARGET_SOFT_STACK)
 {
-  sorry ("target cannot support alloca");
+  sorry ("dynamic stack allocation not supported");
   emit_insn (gen_nop ());
 }
   else if (TARGET_SOFT_STACK)
diff --git a/gcc/testsuite/gcc.target/bpf/diag-alloca-1.c b/gcc/testsuite/gcc.target/bpf/diag-alloca-1.c
index 0406f2c3595..e549cab84ca 100644
--- a/gcc/testsuite/gcc.target/bpf/diag-alloca-1.c
+++ b/gcc/testsuite/gcc.target/bpf/diag-alloca-1.c
@@ -3,7 +3,8 @@
 int
 foo (int x)
 {
-  int *p = __builtin_alloca (x); /* { dg-error "support" } */
+  int *p = __builtin_alloca (x);
+  /* { dg-message {sorry, unimplemented: dynamic stack allocation not supported} {} { target *-*-*

Re: [PATCH v2] aarch64: Ignore target pragmas while defining intrinsics

2025-02-21 Thread Andrew Carlotti
On Fri, Feb 21, 2025 at 11:28:14AM +, Richard Sandiford wrote:
> Andrew Carlotti  writes:
> > On Wed, Feb 19, 2025 at 12:17:55PM +, Richard Sandiford wrote:
> >> Andrew Carlotti  writes:
> >> >  /* Print a list of CANDIDATES for an argument, and try to suggest a 
> >> > specific
> >> > close match.  */
> >> > diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
> >> > b/gcc/config/aarch64/aarch64-builtins.cc
> >> > index 
> >> > 128cc365d3d585e01cb69668f285318ee56a36fc..5174fb1daefee2d73a5098e0de1cca73dc103416
> >> >  100644
> >> > --- a/gcc/config/aarch64/aarch64-builtins.cc
> >> > +++ b/gcc/config/aarch64/aarch64-builtins.cc
> >> > @@ -1877,23 +1877,31 @@ aarch64_scalar_builtin_type_p (aarch64_simd_type 
> >> > t)
> >> >return (t == Poly8_t || t == Poly16_t || t == Poly64_t || t == 
> >> > Poly128_t);
> >> >  }
> >> >  
> >> > -/* Enable AARCH64_FL_* flags EXTRA_FLAGS on top of the base Advanced 
> >> > SIMD
> >> > -   set.  */
> >> > -aarch64_simd_switcher::aarch64_simd_switcher (aarch64_feature_flags 
> >> > extra_flags)
> >> > +/* Temporarily set FLAGS as the enabled target features.  */
> >> > +aarch64_target_switcher::aarch64_target_switcher (aarch64_feature_flags 
> >> > flags)
> >> >: m_old_asm_isa_flags (aarch64_asm_isa_flags),
> >> > -m_old_general_regs_only (TARGET_GENERAL_REGS_ONLY)
> >> > +m_old_general_regs_only (TARGET_GENERAL_REGS_ONLY),
> >> > +m_old_target_pragma (current_target_pragma)
> >> >  {
> >> > +  /* Include all dependencies.  */
> >> > +  flags = aarch64_get_required_features (flags);
> >> > +
> >> >/* Changing the ISA flags should be enough here.  We shouldn't need to
> >> >   pay the compile-time cost of a full target switch.  */
> >> > -  global_options.x_target_flags &= ~MASK_GENERAL_REGS_ONLY;
> >> > -  aarch64_set_asm_isa_flags (AARCH64_FL_FP | AARCH64_FL_SIMD | 
> >> > extra_flags);
> >> > +  if (flags & AARCH64_FL_FP)
> >> > +global_options.x_target_flags &= ~MASK_GENERAL_REGS_ONLY;
> >> > +  aarch64_set_asm_isa_flags (flags);
> >> 
> >> In the earlier review I'd suggested keeping aarch64_simd_(target_)switcher
> >> as a separate class, with aarch64_target_switcher being a new base class
> >> that handles the pragma.
> >> 
> >> This patch seems to be more in the direction of treating
> >> aarch64_target_switcher as a one-stop shop that works out what to do
> >> based on the flags.  I can see the attraction of that, but I think
> >> we should then fold sve_(target_)switcher into it too, for consistency.
> >
> >
> > I think I want to take a step back and look at what all the switcher effects
> > are and (to some extent) why they're needed.
> >
> > 1. Setting the value of aarch64_isa_flags - this requires unsetting
> > MASK_GENERAL_REGS_ONLY (for fp/simd targets), calling 
> > aarc64_set_asm_isa_flags,
> > and (for functions) unsetting current_target_pragma.  I'm not sure what 
> > breaks
> > if this isn't done.
> >
> > 2. Setting maximum_field_alignment = 0 (to disable the effect of 
> > -fpack-struct)
> > - I think this is only relevant for ensuring the correct layout of the SVE
> > tuple types.
> 
> Yeah.  IMO it's a misfeature that, when compiled with -fpack-struct=2:
> 
> #include 
> 
> int x1 = _Alignof (int32x4_t);
> int x2 = _Alignof (int32x4x2_t);
> 
> sets x1 to 16 and x2 to 2.  But that's been GCC's long-standing behaviour
> and is also what Clang does, so I suppose we have to live with it.
> 
> > 3. Enabling have_regs_for_mode[] for the SVE modes - I'm not sure 
> > specifically
> > why this is needed either, but I'm guessing this normally gets enabled as 
> > part
> > of a full target switch (though I couldn't immediately see how).
> 
> Yeah, that's right.  It happens via target_reinit -> init_regs when
> the target is used for the first time.  The result is then reused
> for subsequent switches to the same target.
> 
> > I think it makes sense to me for 1. and 3. to be based on flags (although I
> > think we need to base 3. upon either of SVE or SME being specified, to 
> > support
> > decoupling that dependency in the future).  I'm not sure the same applies
> > for 2., however - perhaps it makes more sense for that to be a separate
> > standalone switcher class that is only used when defining the SVE tuple 
> > types?
> >
> > How does that sound?
> 
> Yeah, a separate RAII class for the field alignment sounds good to me,
> with the have_regs_for_mode stuff moving to aarch64_target_switcher.

I'm glad that we've iterated to something we're both happy with - I'll post a
v3 for review later. 

> 
> Thanks,
> Richard


[Fortran, Patch, PR107635, 4_v1] Fix class type and descriptor handling for new coarray interface [PR107635]

2025-02-21 Thread Andre Vehreschild
Hi all,

during testing and compiling some larger coarray codes, I found a few
deficiencies. One was with handling class types when splitting the coarray
expression and the other was that the backend_decl of a formal argument in a
function's symbol was not the same as the one the function was compiled to. So
looking at the function-decl's tree n-th formal argument is the way to go there.

Regtests ok on x86_64-pc-linux-gnu / F41. Ok for mainline?

Regards,
Andre
--
Andre Vehreschild * Email: vehre ad gmx dot de
From efc6ed615b36bb6b42a43f6f35bf81a1adce2941 Mon Sep 17 00:00:00 2001
From: Andre Vehreschild 
Date: Fri, 21 Feb 2025 14:06:28 +0100
Subject: [PATCH] Fortran: Fix detection of descriptor arrays in coarray
 [PR107635]

Look at the formal arguments generated type in the function declaration
to figure if an argument is a descriptor arrays.  Fix handling of class
types while splitting coarray expressions.

	PR fortran/107635

gcc/fortran/ChangeLog:

	* coarray.cc (fixup_comp_refs): For class types set correct
	component (class) type.
	(split_expr_at_caf_ref): Provide location.
	* trans-intrinsic.cc (conv_caf_send_to_remote): Look at
	generated formal argument and not declared one to detect
	descriptor arrays.
	(conv_caf_sendget): Same.
---
 gcc/fortran/coarray.cc | 15 ++-
 gcc/fortran/trans-intrinsic.cc | 15 +--
 2 files changed, 19 insertions(+), 11 deletions(-)

diff --git a/gcc/fortran/coarray.cc b/gcc/fortran/coarray.cc
index e5648e0d027..f53de0b20e3 100644
--- a/gcc/fortran/coarray.cc
+++ b/gcc/fortran/coarray.cc
@@ -295,11 +295,12 @@ move_coarray_ref (gfc_ref **from, gfc_expr *expr)
 static void
 fixup_comp_refs (gfc_expr *expr)
 {
-  gfc_symbol *type = expr->symtree->n.sym->ts.type == BT_DERIVED
-		   ? expr->symtree->n.sym->ts.u.derived
-		   : (expr->symtree->n.sym->ts.type == BT_CLASS
-			? CLASS_DATA (expr->symtree->n.sym)->ts.u.derived
-			: nullptr);
+  bool class_ref = expr->symtree->n.sym->ts.type == BT_CLASS;
+  gfc_symbol *type
+= expr->symtree->n.sym->ts.type == BT_DERIVED
+	? expr->symtree->n.sym->ts.u.derived
+	: (class_ref ? CLASS_DATA (expr->symtree->n.sym)->ts.u.derived
+		 : nullptr);
   if (!type)
 return;
   gfc_ref **pref = &(expr->ref);
@@ -317,6 +318,9 @@ fixup_comp_refs (gfc_expr *expr)
 	  ref = nullptr;
 	  break;
 	}
+	  if (class_ref)
+	/* Link to the class type to allow for derived type resolution.  */
+	(*pref)->u.c.sym = ref->u.c.sym;
 	  (*pref)->next = ref->next;
 	  ref->next = NULL;
 	  gfc_free_ref_list (ref);
@@ -372,6 +376,7 @@ split_expr_at_caf_ref (gfc_expr *expr, gfc_namespace *ns,
   st->n.sym->attr.dummy = 1;
   st->n.sym->attr.intent = INTENT_IN;
   st->n.sym->ts = *caf_ts;
+  st->n.sym->declared_at = expr->where;

   *post_caf_ref_expr = gfc_get_variable_expr (st);
   (*post_caf_ref_expr)->where = expr->where;
diff --git a/gcc/fortran/trans-intrinsic.cc b/gcc/fortran/trans-intrinsic.cc
index 80e98dc3c20..f48f39d7a82 100644
--- a/gcc/fortran/trans-intrinsic.cc
+++ b/gcc/fortran/trans-intrinsic.cc
@@ -1445,8 +1445,9 @@ conv_caf_send_to_remote (gfc_code *code)
 	  NULL_TREE, gfc_trans_force_lval (&block, lhs_se.string_length));
   else
 	opt_lhs_charlen = build_zero_cst (build_pointer_type (size_type_node));
-  if (!TYPE_LANG_SPECIFIC (TREE_TYPE (caf_decl))->rank
-	  || GFC_ARRAY_TYPE_P (TREE_TYPE (caf_decl)))
+  tmp = TREE_VALUE (TREE_CHAIN (TREE_CHAIN (TYPE_ARG_TYPES (
+	TREE_TYPE (receiver_fn_expr->symtree->n.sym->backend_decl);
+  if (!GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (tmp)))
 	opt_lhs_desc = null_pointer_node;
   else
 	opt_lhs_desc
@@ -1635,8 +1636,9 @@ conv_caf_sendget (gfc_code *code)
 	  NULL_TREE, gfc_trans_force_lval (&block, lhs_se.string_length));
   else
 	opt_lhs_charlen = build_zero_cst (build_pointer_type (size_type_node));
-  if (!TYPE_LANG_SPECIFIC (TREE_TYPE (lhs_caf_decl))->rank
-	  || GFC_ARRAY_TYPE_P (TREE_TYPE (lhs_caf_decl)))
+  tmp = TREE_VALUE (TREE_CHAIN (TREE_CHAIN (TYPE_ARG_TYPES (
+	TREE_TYPE (receiver_fn_expr->symtree->n.sym->backend_decl);
+  if (!GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (tmp)))
 	opt_lhs_desc = null_pointer_node;
   else
 	opt_lhs_desc
@@ -1677,8 +1679,9 @@ conv_caf_sendget (gfc_code *code)
 	  rhs_size = gfc_typenode_for_spec (ts)->type_common.size_unit;
 	}
 }
-  else if (!TYPE_LANG_SPECIFIC (TREE_TYPE (rhs_caf_decl))->rank
-	   || GFC_ARRAY_TYPE_P (TREE_TYPE (rhs_caf_decl)))
+  else if (!GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (TREE_VALUE (
+	 TREE_CHAIN (TREE_CHAIN (TREE_CHAIN (TREE_CHAIN (TYPE_ARG_TYPES (
+	   TREE_TYPE (sender_fn_expr->symtree->n.sym->backend_decl))
 {
   rhs_se.data_not_needed = 1;
   gfc_conv_expr_descriptor (&rhs_se, rhs_expr);
--
2.48.1



Re: [PATCH v3] aarch64: Recognize vector permute patterns suitable for FMOV [PR100165]

2025-02-21 Thread Richard Sandiford
Pengxuan Zheng  writes:
> This patch optimizes certain vector permute expansion with the FMOV 
> instruction
> when one of the input vectors is a vector of all zeros and the result of the
> vector permute is as if the upper lane of the non-zero input vector is set to
> zero and the lower lane remains unchanged.
>
> Note that the patch also propagates zero_op0_p and zero_op1_p during re-encode
> now.  They will be used by aarch64_evpc_fmov to check if the input vectors are
> valid candidates.
>
>   PR target/100165
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-protos.h (aarch64_lane0_mask_p): New.
>   * config/aarch64/aarch64-simd.md 
> (@aarch64_simd_vec_set_zero_fmov):
>   New define_insn.
>   * config/aarch64/aarch64.cc (aarch64_lane0_mask_p): New.
>   (aarch64_evpc_reencode): Copy zero_op0_p and zero_op1_p.
>   (aarch64_evpc_fmov): New.
>   (aarch64_expand_vec_perm_const_1): Add call to aarch64_evpc_fmov.
>   * config/aarch64/iterators.md (VALL_F16_NO_QI): New mode iterator.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/vec-set-zero.c: Update test accordingly.
>   * gcc.target/aarch64/fmov-1.c: New test.
>   * gcc.target/aarch64/fmov-2.c: New test.
>   * gcc.target/aarch64/fmov-3.c: New test.
>   * gcc.target/aarch64/fmov-be-1.c: New test.
>   * gcc.target/aarch64/fmov-be-2.c: New test.
>   * gcc.target/aarch64/fmov-be-3.c: New test.

Sorry to be awkward, but looking at this again, and going back to my
previous comment:

  Part of me thinks that this should just be described as a plain old AND,
  but I suppose that doesn't work well for FP modes.  Still, handling ANDs
  might be an interesting follow-up :)

I wonder whether we should model this as an AND after all.  That is,
any permute the blends a vector with zero can be interpreted as an AND
of a mask.  We could even provide a target-independent routine for
detecting that case.

At present:

v4hf
f_v4hf (v4hf x)
{
  return __builtin_shuffle (x, (v4hf){ 0, 0, 0, 0 }, (v4hi){ 4, 1, 6, 3 });
}

generates:

f_v4hf:
uzp1v0.2d, v0.2d, v0.2d
adrpx0, .LC0
ldr d31, [x0, #:lo12:.LC0]
tbl v0.8b, {v0.16b}, v31.8b
ret
.LC0:
.byte   -1
.byte   -1
.byte   2
.byte   3
.byte   -1
.byte   -1
.byte   6
.byte   7

whereas with SVE enabled it could just be:

f_v4hf:
and z0.d, z0.d, #0x
ret

and even without SVE it would be:

f_v4hf:
moviv31.2s, 0xff, msl 8
and v0.8b, v0.8b, v31.8b
ret

Then, using fmov would be an optimisation of AND.

I think this would also simplify the evpc detection, since the requirement
for using AND is the same for big-endian and little-endian, namely that
index I of the result must either come from index I of the nonzero
vector or from any element of the zero vector.  (What differs between
big-endian and little-endian is which masks correspond to FMOV.)

Sorry again for the run-around.

Richard


Re: [PATCH] COBOL 3/15 92K bld: config and build machinery

2025-02-21 Thread Richard Biener
On Thu, Feb 20, 2025 at 7:23 PM Paul Koning  wrote:
>
>
>
> > On Feb 19, 2025, at 8:18 PM, James K. Lowden  
> > wrote:
> >
> > On Wed, 19 Feb 2025 12:55:03 +0100
> > Matthias Klose  wrote:
> >
> >> libgcobol/ChangeLog
> >> * Makefile.in: New file.
> >> * acinclude.m4: New file.
> >> * aclocal.m4: New file.
> >> * configure.ac: New file.
> >> * configure.tgt: New file.
> >>
> >> I had updated the configure.tgt, please find it attached here again.
> >
> > It seems we missed your updated patch and invented our own solution, such 
> > as it is.  Sorry for the misunderstanding.  I certainly never meant to 
> > ignore you.
> >
> > We know we're currently being too restrictive by limiting host and target 
> > to x86_64 and aarch64.  But your patch as supplied is also insufficient, so 
> > we attempted to generalize it.
> >
> > I would like to do this in a way you approve of.  Let me lay out the 
> > requirements as of today.   For both host and target:
> >
> > 1.  support for _Float128
> > 2.  support for __int128
>
> I understand a target requirement if you need to generate code that uses that 
> datatype.  But why a host requirement?  For floating point arithmetic at 
> compile time, there is the GCC internal "real" component, which offers 
> host-independent floating point arithmetic.  One of the reasons that exists 
> is to allow compile-time arithmetic for targets that have a wider float range 
> than the host (say, VAX vs. IEEE), but it should also enable high precision 
> floating point operations inside GCC without caring what width floating point 
> is native to the host.

On the target there's soft-float (not sure about _Float128 status
there though), for __int128 we don't have general libgcc emulation
available though, but it would be a nice-to-have thing.

I'm sure we can work on both aspects even after merging in the process
of enabling a wider set
of host/target combinations (with, as you say, priority on host support).

Richard.

>
> paul
>


Re: The COBOL front end, version 3, now in 14 easy pieces (+NIST)

2025-02-21 Thread Richard Biener
On Thu, Feb 20, 2025 at 8:38 PM James K. Lowden
 wrote:
>
> On Thu, 20 Feb 2025 11:38:58 +0100
> Richard Biener  wrote:
>
> > Can you clarify on the future development model for Cobol after it has
> > been merged?  Is the cobolworx gitlab still going to be the primary
> > development location and changes should be made there and then merged
> > to the GCC side?
>
> I would like the future development model for Cobol to be convenient
> for all involved.  If we don't change anything then, yes, we'll keep
> using our cobolworx gitlab server.  But we don't insist on that.  From
> our point of view, one git server is as good as another.  That's what
> peer-to-peer is all about, right?  We can use gcc's git as the primary,
> and mirror that ourselves, if that's what you're suggesting.
>
> Branches in git don't have independent permissions.  If we use
> gcc.gnu.org git, are we granted commit rights with the priviso that we
> color inside the lines, and commit only to our own branches?

My expectation is that by contributing the COBOL frontend you are
volunteering to be maintainers for it which grants you permission
(as in reviewing your own and other peoples patches) in that area.
As part of the contribution you should also get commit access to GCCs
git repository (see https://gcc.gnu.org/gitwrite.html)

> > The most important part for GCC 15 will be documentation to get
> > user expectancy right.
>
> Absolutely, that's in everyone's interest.
>
> > Having a minimal harness in GCCs testsuite is critical - I'd expect a
> > gcc/testsuite/gcobol.dg/dg.exp supporting execution tests.  I assume
> > Cobol has a way to exit OK or fatally and this should be
> > distinguished as testsuite PASS or FAIL.
>
> Yes, a COBOL program exits with a return status.  And we rigged up NIST
> to do that.  What that means requires a long explanation, sorry.
>
> NIST Is highly configurable within its envelope.  It generates most of
> its inputs, and a configuration process controls the names of the data
> files.  (There is also a substitution capability to adapt the COBOL
> source to the compiler.)  The "configuration process" is itself a COBOL
> program called EXEC85.  It reads the source archive and produces
> modified COBOL programs for compilation and execution.
>
> NIST as we use it comprises 12 modules covering different aspects
> of COBOL. Each module is designated a 2-letter prefix, NC for "NIST
> core", IX for "Indexed I/O", etc.  We create one directory per module.
> Each module has ~100 programs, each with many tests.  Some programs
> must be run in a fixed order because they produce and process files in
> series.
>
> Each program simply runs and reports its tests' results to a file (which
> could be stdout, but isn't in our setup).  The program always exits
> normally unless it crashes, of course.  Test failures are indicated by
> "FAIL*" in the report.
>
> As it's set up now in our CI/CD, NIST lives in its own subdirectory,
> gcc/cobol/nist.   There we have two critical files: Makefile (900
> lines) and report.awk (56 lines).  In each module's directory we also
> maintain any configuration inputs to EXEC85.  In nist/NC, for example,
> we have NC109M.conf  and NC204M.conf.
>
> The Makefile fetches the NIST archive from our website.  (We originally
> got it from NIST, but their site was reorganized last year.  The file
> went missing, as apparently did my email to the webmaster.
> Technology!)  The file might have 100 targets to run various bits.  For
> gcc's purpose, only one matters: "make report".
>
> That target:
>
> 1. fetches the archive
> 2. extracts & patches EXEC85.cbl
> 3. compiles to produce EXEC85
> 4. runs EXEC85 against the archive to extract modified COBOL
> test programs.
> 5. compiles the programs
> 6. runs each program, producing a .rpt file for each one
> 7. trundles over each .rpt file with report.awk searching for failures
>
> Because the process is run under make(1), steps 2-6 run in parallel, N
> jobs for -j N.  If there are no failures, report.awk returns 0 to make,
> else 1. Start to end, it takes just a few minutes on a fast machine.
>
> Now you know what I know, and I need to know what you know.
>
> > I'm not sure if /* { dg-... } */ directive support is easy or
> > desirable (well, desirable for sure).  I'd be happy with a setup like
> > the old gcc.c-torture/{execute,compile}.  Possibly tcl/shell wrapping
> > around the NIST tests (even if not included in the GCC tree, running
> > that by unpacking it in gcc/testsuite/gcobol/nist would be nice).
>
> I don't understand most of that.  I think you would like to use
> DejaGnu, and I think we can get there.  One hurdle is that I've never
> used that software.
>
> I suggest a 2-phase process, one expedient and the other long term.
>
> 1.  For now, keep the above intact, and put it where it belongs, wired
> up with the least possible glue (to mix metaphors).  That will work,
> and more than meet the "minimal" threshold.

I'll point you to the ACATS testsu

Re: [PATCH] Append a newline in debug_edge

2025-02-21 Thread Richard Biener
On Fri, Feb 21, 2025 at 3:36 AM H.J. Lu  wrote:
>
> Append a newline in debug_edge so that we get
>
> (gdb) call debug_edge (e)
> edge (bb_9, bb_1)
> (gdb)
>
> instead of
>
> (gdb) call debug_edge (e)
> edge (bb_9, bb_1)(gdb)

OK.

> * sese.cc (debug_edge): Append a newline.
>
> --
> H.J.


[PATCH] tree-optimization/118954 - avoid UB on ref created by predcom

2025-02-21 Thread Richard Biener
When predicitive commoning moves an invariant ref it makes sure to
not build a MEM_REF with a base that is negatively offsetted from
an object.  But in trying to preserve some transforms it does not
consider association of a constant offset with the address computation
in DR_BASE_ADDRESS leading to exactly this problem again.  This is
arguably a problem in data-ref analysis producing such an out-of-bound
DR_BASE_ADDRESS, but this looks quite involved to fix, so the
following avoids the association in one more case.  This fixes the
testcase while preserving the desired transform in
gcc.dg/tree-ssa/predcom-1.c.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

Richard.

PR tree-optimization/118954
* tree-predcom.cc (ref_at_iteration): Make sure to not
associate the constant offset with DR_BASE_ADDRESS when
that is an offsetted pointer.

* gcc.dg/torture/pr118954.c: New testcase.
---
 gcc/testsuite/gcc.dg/torture/pr118954.c | 22 ++
 gcc/tree-predcom.cc |  3 ++-
 2 files changed, 24 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr118954.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr118954.c 
b/gcc/testsuite/gcc.dg/torture/pr118954.c
new file mode 100644
index 000..6b95e0b8c46
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr118954.c
@@ -0,0 +1,22 @@
+/* { dg-do run } */
+/* { dg-additional-options "-fpredictive-commoning" } */
+
+int a, b, c, d;
+void e(int f)
+{
+  int g[] = {5, 8};
+  int *h = g;
+  while (c < f) {
+d = 0;
+for (; d < f; d++)
+  c = h[d];
+  }
+}
+int main()
+{
+  int i = (0 == a + 1) - 1;
+  e(i + 3);
+  if (b != 0)
+__builtin_abort ();
+  return 0;
+}
diff --git a/gcc/tree-predcom.cc b/gcc/tree-predcom.cc
index ade8fbf0b5c..d45aa3857b9 100644
--- a/gcc/tree-predcom.cc
+++ b/gcc/tree-predcom.cc
@@ -1807,7 +1807,8 @@ ref_at_iteration (data_reference_p dr, int iter,
  then.  But for some cases we can retain that to allow tree_could_trap_p
  to return false - see gcc.dg/tree-ssa/predcom-1.c  */
   tree addr, alias_ptr;
-  if (integer_zerop  (off))
+  if (integer_zerop  (off)
+  && TREE_CODE (DR_BASE_ADDRESS (dr)) != POINTER_PLUS_EXPR)
 {
   alias_ptr = fold_convert (reference_alias_ptr_type (ref), coff);
   addr = DR_BASE_ADDRESS (dr);
-- 
2.43.0


Re: [PATCH] avoid-store-forwarding: Handle REG_EH_REGION notes

2025-02-21 Thread Konstantinos Eleftheriou
Hi Richard, thanks for the feedback.

On Tue, Feb 18, 2025 at 9:17 PM Richard Biener
 wrote:
>
>
>
> > Am 18.02.2025 um 17:04 schrieb Konstantinos Eleftheriou 
> > :
> >
> > From: kelefth 
> >
> > The pass rejects the transformation when there are instructions in the
> > sequence that might throw an exception. This was added due to having
> > cases that the load instruction contains a REG_EH_REGION note and
> > moving it before the store instructions caused an error, as it was
> > no longer the last instruction in the basic block.
> >
> > This patch handles those cases by moving a possible REG_EH_REGION
> > note from the load instruction of the store-load sequence to the
> > last instruction of the basic block.
>
> But that’s not a correct transform and will lead to bogus exception handling? 
>  You’d need to move the note and split the block, possibly updating the EH 
> info on the side.

I had originally thought about splitting the block, but tried to get
away with this solution. In case that we are splitting the block after
the load, we wouldn't need to also move the note, right?

Thanks,
Konstantinos

> Richard
>
> > gcc/ChangeLog:
> >
> >* avoid-store-forwarding.cc (process_store_forwarding):
> >(store_forwarding_analyzer::avoid_store_forwarding):
> >Move a possible REG_EH_REGION note from the load instruction
> >to the last instruction of the basic block.
> > ---
> > gcc/avoid-store-forwarding.cc | 13 -
> > 1 file changed, 12 insertions(+), 1 deletion(-)
> >
> > diff --git a/gcc/avoid-store-forwarding.cc b/gcc/avoid-store-forwarding.cc
> > index 34a7bba4043..05c91bb1a82 100644
> > --- a/gcc/avoid-store-forwarding.cc
> > +++ b/gcc/avoid-store-forwarding.cc
> > @@ -400,6 +400,17 @@ process_store_forwarding (vec &stores, 
> > rtx_insn *load_insn,
> >   if (load_elim)
> > delete_insn (load_insn);
> >
> > +  /* Find possible REG_EH_REGION note in the load instruction and move it
> > + into the last instruction of the basic block.  */
> > +  rtx reg_eh_region_note = find_reg_note (load_insn, REG_EH_REGION, 
> > NULL_RTX);
> > +  if (reg_eh_region_note != NULL_RTX)
> > +{
> > +  remove_note (load_insn, reg_eh_region_note);
> > +  basic_block load_bb = BLOCK_FOR_INSN (load_insn);
> > +  add_reg_note (BB_END (load_bb), REG_EH_REGION,
> > +XEXP (reg_eh_region_note, 0));
> > +}
> > +
> >   return true;
> > }
> >
> > @@ -425,7 +436,7 @@ store_forwarding_analyzer::avoid_store_forwarding 
> > (basic_block bb)
> >
> >   rtx set = single_set (insn);
> >
> > -  if (!set || insn_could_throw_p (insn))
> > +  if (!set)
> >{
> >  store_exprs.truncate (0);
> >  continue;
> > --
> > 2.47.0
> >


Re: BPF, nvptx: Standardize on 'sorry, unimplemented: dynamic stack allocation not supported'

2025-02-21 Thread Jose E. Marchesi


Hi Thomas.

> José, any objection to me pushing the attached
> "BPF, nvptx: Standardize on 'sorry, unimplemented: dynamic stack allocation 
> not supported'"?

Sure, please do.  It is good to have a consistent message, and yours is
better anyway.

> (Why this is useful, you'll understand later today.)

Now I am intrigued :)

>
> I've tested BPF as follows:
>
> $ [...]/configure --target=bpf-none
> $ make -j12 all-gcc
> $ make check-gcc-c RUNTESTFLAGS=bpf.exp
>
> The before vs. after 'diff' of 'gcc/testsuite/gcc/gcc.log' looks as
> expected:
>
> [...]
>  Executing on host: [...]/xgcc -B[...]/ 
> [...]/gcc.target/bpf/diag-alloca-1.c-fdiagnostics-plain-output-ansi 
> -pedantic-errors -S -o diag-alloca-1.s(timeout = 300)
>  spawn -ignore SIGHUP [...]/xgcc -B[...]/ 
> [...]/gcc.target/bpf/diag-alloca-1.c -fdiagnostics-plain-output -ansi 
> -pedantic-errors -S -o diag-alloca-1.s
>  [...]/gcc.target/bpf/diag-alloca-1.c: In function 'foo':
> -[...]/gcc.target/bpf/diag-alloca-1.c:6:12: error: BPF does not support 
> dynamic stack allocation
> +[...]/gcc.target/bpf/diag-alloca-1.c:6:12: sorry, unimplemented: dynamic 
> stack allocation not supported
>  compiler exited with status 1
> -PASS: gcc.target/bpf/diag-alloca-1.c  (test for errors, line 6)
> +PASS: gcc.target/bpf/diag-alloca-1.c  at line 7 (test for warnings, line 
> 6)
>  PASS: gcc.target/bpf/diag-alloca-1.c (test for excess errors)
>  Executing on host: [...]/xgcc -B[...]/ 
> [...]/gcc.target/bpf/diag-alloca-2.c-fdiagnostics-plain-output   
> -std=gnu89 -S -o diag-alloca-2.s(timeout = 300)
>  spawn -ignore SIGHUP [...]/xgcc -B[...]/ 
> [...]/gcc.target/bpf/diag-alloca-2.c -fdiagnostics-plain-output -std=gnu89 -S 
> -o diag-alloca-2.s
>  [...]/gcc.target/bpf/diag-alloca-2.c: In function 'foo':
> -[...]/gcc.target/bpf/diag-alloca-2.c:7:7: error: BPF does not support 
> dynamic stack allocation
> +[...]/gcc.target/bpf/diag-alloca-2.c:7:7: sorry, unimplemented: dynamic 
> stack allocation not supported
>  compiler exited with status 1
> -PASS: gcc.target/bpf/diag-alloca-2.c  (test for errors, line 7)
> +PASS: gcc.target/bpf/diag-alloca-2.c  at line 8 (test for warnings, line 
> 7)
>  PASS: gcc.target/bpf/diag-alloca-2.c (test for excess errors)
> [...]
>
> (No further testing done for BPF.)
>
>
> Grüße
>  Thomas
>
>
> From 8ebac3064696c6d42041e398ebb5a622498523cc Mon Sep 17 00:00:00 2001
> From: Thomas Schwinge 
> Date: Fri, 21 Feb 2025 11:21:08 +0100
> Subject: [PATCH] BPF, nvptx: Standardize on 'sorry, unimplemented: dynamic
>  stack allocation not supported'
>
> ... instead of BPF: 'error: BPF does not support dynamic stack allocation', 
> and
> nvptx: 'sorry, unimplemented: target cannot support alloca'.
>
>   gcc/
>   * config/bpf/bpf.md (define_expand "allocate_stack"): Emit
>   'sorry, unimplemented: dynamic stack allocation not supported'.
>   * config/nvptx/nvptx.md (define_expand "allocate_stack")
>   [!TARGET_SOFT_STACK && !(TARGET_PTX_7_3 && TARGET_SM52)]: Likewise.
>   gcc/testsuite/
>   * gcc.target/bpf/diag-alloca-1.c: Adjust 'dg-message'.
>   * gcc.target/bpf/diag-alloca-2.c: Likewise.
>   * gcc.target/nvptx/alloca-1-sm_30.c: Likewise.
>   * gcc.target/nvptx/vla-1-sm_30.c: Likewise.
>   * lib/target-supports.exp (proc check_effective_target_alloca):
>   Adjust comment.
> ---
>  gcc/config/bpf/bpf.md   | 5 ++---
>  gcc/config/nvptx/nvptx.md   | 2 +-
>  gcc/testsuite/gcc.target/bpf/diag-alloca-1.c| 3 ++-
>  gcc/testsuite/gcc.target/bpf/diag-alloca-2.c| 4 +++-
>  gcc/testsuite/gcc.target/nvptx/alloca-1-sm_30.c | 2 +-
>  gcc/testsuite/gcc.target/nvptx/vla-1-sm_30.c| 2 +-
>  gcc/testsuite/lib/target-supports.exp   | 2 +-
>  7 files changed, 11 insertions(+), 9 deletions(-)
>
> diff --git a/gcc/config/bpf/bpf.md b/gcc/config/bpf/bpf.md
> index 12cf9fae855..91d94838e39 100644
> --- a/gcc/config/bpf/bpf.md
> +++ b/gcc/config/bpf/bpf.md
> @@ -121,12 +121,11 @@
>[(match_operand:DI 0 "general_operand" "")
> (match_operand:DI 1 "general_operand" "")]
>""
> -  "
>  {
> -  error (\"BPF does not support dynamic stack allocation\");
> +  sorry ("dynamic stack allocation not supported");
>emit_insn (gen_nop ());
>DONE;
> -}")
> +})
>  
>   Arithmetic/Logical
>  
> diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
> index 532812bcf70..dc19695d040 100644
> --- a/gcc/config/nvptx/nvptx.md
> +++ b/gcc/config/nvptx/nvptx.md
> @@ -1723,7 +1723,7 @@
>  emit_insn (gen_nvptx_alloca (Pmode, operands[0], operands[1]));
>else if (!TARGET_SOFT_STACK)
>  {
> -  sorry ("target cannot support alloca");
> +  sorry ("dynamic stack allocation not supported");
>emit_insn (gen_nop ());
>  }
>else if (TARGET_SOFT_STACK)
> diff --git a/gcc/testsuite/gcc.target/bpf/diag-alloca-1.c 
> b/g

Re: [PATCH] Fortran: initialize non-saved pointers with -fcheck=pointer [PR48958]

2025-02-21 Thread Harald Anlauf

On 2/21/25 09:21, Andre Vehreschild wrote:

Hi Harald,

seconding Thomas here, thanks for the patch.


Thanks, Andre!

Pushed: r15-7663-g7d383a7343af05


- Andre

On Thu, 20 Feb 2025 21:18:01 +0100
Harald Anlauf  wrote:


Dear all,

the attached simple patch addresses a small, left-over issue in the
above PR and was already OK'ed in the PR by Thomas.  With it we
initialize the data component of a non-saved pointer variable when
-fcheck=pointer is specified, so that subsequent pointer checks
(e.g. for the SIZE intrinsic) trigger consistently and not randomly.

I plan to commit within 24h unless there are comments.

Regtested on x86_64-pc-linux-gnu.  ON for mainline?

Thanks,
Harald




--
Andre Vehreschild * Email: vehre ad gmx dot de






ด้วยอัตราดอกเบี้ยที่แตกต่าง เพื่อให้เข้าถึงแหล่งเงินทุน

2025-02-21 Thread TPL Group



[PATCH] RISC-V: Fix .cfi_offset directive when push pop in zc

2025-02-21 Thread 彭新佑

From 7cca6e23b293bdf47280997df87a0cda70d2f91d Mon Sep 17 00:00:00 2001
From: Lino Hsing-Yu Peng 
Date: Thu, 20 Feb 2025 17:09:22 +0800
Subject: [PATCH] RISC-V: Fix .cfi_offset directive when push/pop in zcmp

The incorrect cfi directive info breaks stack unwind in try/catch/cxa.

Before patch:
  cm.push	{ra, s0-s2}, -16
  .cfi_offset 1, -12
  .cfi_offset 8, -8
  .cfi_offset 18, -4

After patch:
  cm.push	{ra, s0-s2}, -16
  .cfi_offset 1, -16
  .cfi_offset 8, -12
  .cfi_offset 9, -8
  .cfi_offset 18, -4

gcc/ChangeLog:
	* config/riscv/riscv.cc: Set multi push regs bits.
gcc/testsuite/ChangeLog:
	* gcc.target/riscv/zcmp_push_gpr.c: New test.
---
 gcc/config/riscv/riscv.cc  | 11 +++
 gcc/testsuite/gcc.target/riscv/zcmp_push_gpr.c | 12 
 2 files changed, 23 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zcmp_push_gpr.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 9bf7713139f..5badfc8396f 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -728,6 +728,12 @@ static const unsigned gpr_save_reg_order[] = {
   S10_REGNUM, S11_REGNUM
 };
 
+/* Order for the (ra, s0-sx) of zcmp_save.  */
+static const unsigned zcmp_save_reg_order[]
+  = {RETURN_ADDR_REGNUM, S0_REGNUM,  S1_REGNUM,	 S2_REGNUM,	S3_REGNUM,
+ S4_REGNUM,		 S5_REGNUM,  S6_REGNUM,	 S7_REGNUM,	S8_REGNUM,
+ S9_REGNUM,		 S10_REGNUM, S11_REGNUM, INVALID_REGNUM};
+
 /* A table describing all the processors GCC knows about.  */
 static const struct riscv_tune_info riscv_tune_info_table[] = {
 #define RISCV_TUNE(TUNE_NAME, PIPELINE_MODEL, TUNE_INFO)	\
@@ -7772,6 +7778,11 @@ riscv_compute_frame_info (void)
 	  unsigned num_multi_push = riscv_multi_push_regs_count (frame->mask);
 	  x_save_size = riscv_stack_align (num_multi_push * UNITS_PER_WORD);
 	  frame->multi_push_adj_base = riscv_16bytes_align (x_save_size);
+	  for (int i = 0; i < num_multi_push; i++)
+	{
+	  gcc_assert (zcmp_save_reg_order[i] != INVALID_REGNUM);
+	  frame->mask |= 1 << (zcmp_save_reg_order[i] - GP_REG_FIRST);
+	}
 	}
 }
 
diff --git a/gcc/testsuite/gcc.target/riscv/zcmp_push_gpr.c b/gcc/testsuite/gcc.target/riscv/zcmp_push_gpr.c
new file mode 100644
index 000..acebafa41ab
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zcmp_push_gpr.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32imafd_zicsr_zifencei_zca_zcmp -mabi=ilp32d -Os -g" } */
+/* { dg-skip-if "" { *-*-* } {"-O0" "-O1" "-O2" "-Og" "-O3" "-Oz" "-flto"} } */
+
+int
+zcmp_push ()
+{
+  __asm__ __volatile__("" ::: "ra", "s0", "s2");
+  return 0;
+}
+
+/* { dg-final { scan-assembler ".cfi_offset 1, -16\n\t.cfi_offset 8, -12\n\t.cfi_offset 9, -8\n\t.cfi_offset 18, -4\n\t.cfi_def_cfa_offset 16" } } */
-- 
2.34.1



Re: [PATCH v2] aarch64: Ignore target pragmas while defining intrinsics

2025-02-21 Thread Richard Sandiford
Andrew Carlotti  writes:
> On Wed, Feb 19, 2025 at 12:17:55PM +, Richard Sandiford wrote:
>> Andrew Carlotti  writes:
>> >  /* Print a list of CANDIDATES for an argument, and try to suggest a 
>> > specific
>> > close match.  */
>> > diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
>> > b/gcc/config/aarch64/aarch64-builtins.cc
>> > index 
>> > 128cc365d3d585e01cb69668f285318ee56a36fc..5174fb1daefee2d73a5098e0de1cca73dc103416
>> >  100644
>> > --- a/gcc/config/aarch64/aarch64-builtins.cc
>> > +++ b/gcc/config/aarch64/aarch64-builtins.cc
>> > @@ -1877,23 +1877,31 @@ aarch64_scalar_builtin_type_p (aarch64_simd_type t)
>> >return (t == Poly8_t || t == Poly16_t || t == Poly64_t || t == 
>> > Poly128_t);
>> >  }
>> >  
>> > -/* Enable AARCH64_FL_* flags EXTRA_FLAGS on top of the base Advanced SIMD
>> > -   set.  */
>> > -aarch64_simd_switcher::aarch64_simd_switcher (aarch64_feature_flags 
>> > extra_flags)
>> > +/* Temporarily set FLAGS as the enabled target features.  */
>> > +aarch64_target_switcher::aarch64_target_switcher (aarch64_feature_flags 
>> > flags)
>> >: m_old_asm_isa_flags (aarch64_asm_isa_flags),
>> > -m_old_general_regs_only (TARGET_GENERAL_REGS_ONLY)
>> > +m_old_general_regs_only (TARGET_GENERAL_REGS_ONLY),
>> > +m_old_target_pragma (current_target_pragma)
>> >  {
>> > +  /* Include all dependencies.  */
>> > +  flags = aarch64_get_required_features (flags);
>> > +
>> >/* Changing the ISA flags should be enough here.  We shouldn't need to
>> >   pay the compile-time cost of a full target switch.  */
>> > -  global_options.x_target_flags &= ~MASK_GENERAL_REGS_ONLY;
>> > -  aarch64_set_asm_isa_flags (AARCH64_FL_FP | AARCH64_FL_SIMD | 
>> > extra_flags);
>> > +  if (flags & AARCH64_FL_FP)
>> > +global_options.x_target_flags &= ~MASK_GENERAL_REGS_ONLY;
>> > +  aarch64_set_asm_isa_flags (flags);
>> 
>> In the earlier review I'd suggested keeping aarch64_simd_(target_)switcher
>> as a separate class, with aarch64_target_switcher being a new base class
>> that handles the pragma.
>> 
>> This patch seems to be more in the direction of treating
>> aarch64_target_switcher as a one-stop shop that works out what to do
>> based on the flags.  I can see the attraction of that, but I think
>> we should then fold sve_(target_)switcher into it too, for consistency.
>
>
> I think I want to take a step back and look at what all the switcher effects
> are and (to some extent) why they're needed.
>
> 1. Setting the value of aarch64_isa_flags - this requires unsetting
> MASK_GENERAL_REGS_ONLY (for fp/simd targets), calling 
> aarc64_set_asm_isa_flags,
> and (for functions) unsetting current_target_pragma.  I'm not sure what breaks
> if this isn't done.
>
> 2. Setting maximum_field_alignment = 0 (to disable the effect of 
> -fpack-struct)
> - I think this is only relevant for ensuring the correct layout of the SVE
> tuple types.

Yeah.  IMO it's a misfeature that, when compiled with -fpack-struct=2:

#include 

int x1 = _Alignof (int32x4_t);
int x2 = _Alignof (int32x4x2_t);

sets x1 to 16 and x2 to 2.  But that's been GCC's long-standing behaviour
and is also what Clang does, so I suppose we have to live with it.

> 3. Enabling have_regs_for_mode[] for the SVE modes - I'm not sure specifically
> why this is needed either, but I'm guessing this normally gets enabled as part
> of a full target switch (though I couldn't immediately see how).

Yeah, that's right.  It happens via target_reinit -> init_regs when
the target is used for the first time.  The result is then reused
for subsequent switches to the same target.

> I think it makes sense to me for 1. and 3. to be based on flags (although I
> think we need to base 3. upon either of SVE or SME being specified, to support
> decoupling that dependency in the future).  I'm not sure the same applies
> for 2., however - perhaps it makes more sense for that to be a separate
> standalone switcher class that is only used when defining the SVE tuple types?
>
> How does that sound?

Yeah, a separate RAII class for the field alignment sounds good to me,
with the have_regs_for_mode stuff moving to aarch64_target_switcher.

Thanks,
Richard


[PATCH v2 1/1] AArch64: Fold builtins with highpart args to highpart equivalent [PR117850]

2025-02-21 Thread Spencer Abson
Add a fold at gimple_fold_builtin to prefer the highpart variant of a builtin
if the arguments are better suited to it. This helps us avoid copying data
between lanes before operation.

E.g. We prefer to use UMULL2 rather than DUP+UMULL for the following:

uint16x8_t
foo(const uint8x16_t s) {
const uint8x16_t f0 = vdupq_n_u8(4);
return vmull_u8(vget_high_u8(s), vget_high_u8(f0));
}

gcc/ChangeLog:

* config/aarch64/aarch64-builtins.cc (LO_HI_PAIRINGS): New macro.
Cover every lo/hi pairing in builtin-pairs.def.
(aarch64_get_highpart_builtin): New function.  Get the fndecl for
the hi builtin paired with FCODE.
(LO_HI_PAIR): New macro.
(aarch64_object_of_bfr): New function.  Parse BIT_FIELD_REF expressions.
(aarch64_duplicate_vector_cst): New function.
(aarch64_nbit_vector_type_p): New function.  Check if a type describes
an n-bit vector.
(aarch64_vq_high_half): New function. Helper to identify vector
highparts.
(aarch64_fold_lo_call_to_hi): New function.  Perform the fold described
here.
(aarch64_general_gimple_fold_builtin): Add cases for lo builtins.
* config/aarch64/aarch64-builtin-pairs.def: New file.  Declare pairings
of lo/hi builtins.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/simd/vabal_combine.c: Removed.
* gcc.target/aarch64/simd/fold_to_highpart_1.c: New test.
* gcc.target/aarch64/simd/fold_to_highpart_2.c: New test.
* gcc.target/aarch64/simd/fold_to_highpart_3.c: New test.
* gcc.target/aarch64/simd/fold_to_highpart_4.c: New test.
* gcc.target/aarch64/simd/fold_to_highpart_5.c: New test.
* gcc.target/aarch64/simd/fold_to_highpart_6.c: New test.
* gcc.target/aarch64/simd/fold_to_highpart_7.c: New test.
---
 gcc/config/aarch64/aarch64-builtin-pairs.def  |  81 ++
 gcc/config/aarch64/aarch64-builtins.cc| 206 +
 .../aarch64/simd/fold_to_highpart_1.c | 733 ++
 .../aarch64/simd/fold_to_highpart_2.c |  86 ++
 .../aarch64/simd/fold_to_highpart_3.c |  81 ++
 .../aarch64/simd/fold_to_highpart_4.c |  77 ++
 .../aarch64/simd/fold_to_highpart_5.c |  38 +
 .../aarch64/simd/fold_to_highpart_6.c |  94 +++
 .../aarch64/simd/fold_to_highpart_7.c |  36 +
 .../gcc.target/aarch64/simd/vabal_combine.c   |  72 --
 10 files changed, 1432 insertions(+), 72 deletions(-)
 create mode 100644 gcc/config/aarch64/aarch64-builtin-pairs.def
 create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/fold_to_highpart_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/fold_to_highpart_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/fold_to_highpart_3.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/fold_to_highpart_4.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/fold_to_highpart_5.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/fold_to_highpart_6.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/fold_to_highpart_7.c
 delete mode 100644 gcc/testsuite/gcc.target/aarch64/simd/vabal_combine.c

diff --git a/gcc/config/aarch64/aarch64-builtin-pairs.def 
b/gcc/config/aarch64/aarch64-builtin-pairs.def
new file mode 100644
index 000..e1dc0b71a1c
--- /dev/null
+++ b/gcc/config/aarch64/aarch64-builtin-pairs.def
@@ -0,0 +1,81 @@
+/* Pairings of AArch64 builtins that can be folded into each other.
+   Copyright (C) 2025 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   .  */
+
+/* LO/HI widenable integer modes.  */
+#define LO_HI_PAIR_V_WI(T, LO, HI) \
+  LO_HI_PAIR (T##_##LO##v2si, T##_##HI##v4si) \
+  LO_HI_PAIR (T##_##LO##v4hi, T##_##HI##v8hi) \
+  LO_HI_PAIR (T##_##LO##v8qi, T##_##HI##v16qi)
+
+/* LO/HI Single/Half integer modes.  */
+#define LO_HI_PAIR_V_HSI(T, LO, HI) \
+  LO_HI_PAIR (T##_##LO##v2si, T##_##HI##v4si) \
+  LO_HI_PAIR (T##_##LO##v4hi, T##_##HI##v8hi)
+
+#define UNOP_LONG_LH_PAIRS \
+  LO_HI_PAIR (UNOP_sxtlv8hi,  UNOP_vec_unpacks_hi_v16qi) \
+  LO_HI_PAIR (UNOP_sxtlv4si,  UNOP_vec_unpacks_hi_v8hi) \
+  LO_HI_PAIR (UNOP_sxtlv2di,  UNOP_vec_unpacks_hi_v4si) \
+  LO_HI_PAIR (UNOPU_uxtlv8hi, UNOPU_vec_unpacku_hi_v16qi) \
+  LO_HI_PAIR (UNOPU_uxtlv4si, UNOPU_vec_unpacku_hi_v8hi) \
+  LO_HI_PAIR (UNOPU_uxtlv2di, UNOPU_vec_unpa

[PATCH v2 0/1] AArch64: Fold builtins with highpart args to highpart equivalent [PR117850]

2025-02-21 Thread Spencer Abson
Hi all,

This patch implements the missed optimisation noted in PR117850.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117850

Changes since the last revision:
- Now processing different function signatures programatically
- Fixed ICE RE the type referred to by a BIT_FIELD_REF
- Modified testcase regex to account for commutativity
- Removed vabal_combine.c in favour of coverage in fold_to_highpart_1.c
- Supported widening floating-point conversions
- General response to feedback from Richard S., Christophe Lyon.

Notes:
- I'm on the fence about adding more conservative assertions to 
aarch64_duplicate_vector_cst.
  E.g.
gcc_assert (types_compatible_p (TREE_TYPE (TREE_TYPE 
(vec_in)),
   (TREE_TYPE (out_ty;

- I give this test in fold_to_highpart_7.c; 
https://godbolt.org/z/sG6GdEdGb.  It's interesting
  that the current behavior of GCC is worse than GCC14, trunk targeting 
aarch64_be, and  Clang.

  Maybe it's worth a thought?

Bootstrapped and regtested on aarch64-none-linux-gnu. This work was also
tested on a cross-compiler targeting aarch64_be-none-linux-gnu.

OK for stage-1?

Thanks,
Spencer

Spencer Abson (1):
  AArch64: Fold builtins with highpart args to highpart equivalent
[PR117850]

 gcc/config/aarch64/aarch64-builtin-pairs.def  |  81 ++
 gcc/config/aarch64/aarch64-builtins.cc| 206 +
 .../aarch64/simd/fold_to_highpart_1.c | 733 ++
 .../aarch64/simd/fold_to_highpart_2.c |  86 ++
 .../aarch64/simd/fold_to_highpart_3.c |  81 ++
 .../aarch64/simd/fold_to_highpart_4.c |  77 ++
 .../aarch64/simd/fold_to_highpart_5.c |  38 +
 .../aarch64/simd/fold_to_highpart_6.c |  94 +++
 .../aarch64/simd/fold_to_highpart_7.c |  36 +
 .../gcc.target/aarch64/simd/vabal_combine.c   |  72 --
 10 files changed, 1432 insertions(+), 72 deletions(-)
 create mode 100644 gcc/config/aarch64/aarch64-builtin-pairs.def
 create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/fold_to_highpart_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/fold_to_highpart_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/fold_to_highpart_3.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/fold_to_highpart_4.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/fold_to_highpart_5.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/fold_to_highpart_6.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/fold_to_highpart_7.c
 delete mode 100644 gcc/testsuite/gcc.target/aarch64/simd/vabal_combine.c

-- 
2.34.1



Re: [PATCH 2/4] c++/modules: Track module purview for deferred instantiations [PR114630]

2025-02-21 Thread Nathaniel Shead
After seeing PR c++/118964 I'm coming back around to this [1] patch
series, since it appears that this can cause errors on otherwise valid
code by instantiations coming into module purview that reference
TU-local entities.

[1]: https://gcc.gnu.org/pipermail/gcc-patches/2024-May/650324.html

We'd previously discussed that the best way to solve this in general
would be to perform all deferred instantiations at the end of the GMF to
ensure that they would not leak into the module purview.  I still
tentatively agree that this would be the "nicer" way to go (though I've
since come across https://eel.is/c++draft/temp.point#7 and wonder if
this may not be strictly conforming according to the current wording?).

I've not yet managed to implement this however, and even if I had, at
this stage it would definitely not be appropriate for GCC15; would the
approach described below be appropriate for GCC15 as a stop-gap to
reduce these issues?

Another approach would be to maybe lower the error to a permerror, so
that users that 'know better' could compile such modules anyway (at risk
of various link errors and ODR issues), though I would also need to
adjust the streaming logic to handle this better.  Thoughts?

Nathaniel

On Wed, May 01, 2024 at 08:00:22PM +1000, Nathaniel Shead wrote:
> Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?
> 
> -- >8 --
> 
> When calling instantiate_pending_templates at end of parsing, any new
> functions that are instantiated from this point have their module
> purview set based on the current value of module_kind.
> 
> This is unideal, however, as the modules code will then treat these
> instantiations as reachable and cause large swathes of the GMF to be
> emitted into the module CMI, despite no code in the actual module
> purview referencing it.
> 
> This patch fixes this by also remembering the value of module_kind when
> the instantiation was deferred, and restoring it when doing this
> deferred instantiation.  That way newly instantiated declarations
> appropriately get a DECL_MODULE_PURVIEW_P appropriate for where the
> instantiation was required, meaning that GMF entities won't be counted
> as reachable unless referenced by an actually reachable entity.
> 
> Note that purviewness and attachment etc. is generally only determined
> by the base template: this is purely for determining whether a
> specialisation was declared in the module purview and hence whether it
> should be streamed out.  See the comment on 'set_instantiating_module'.
> 
>   PR c++/114630
>   PR c++/114795
> 
> gcc/cp/ChangeLog:
> 
>   * cp-tree.h (struct tinst_level): Add field for tracking
>   module_kind.
>   * pt.cc (push_tinst_level_loc): Cache module_kind in new_level.
>   (reopen_tinst_level): Restore module_kind from level.
>   (instantiate_pending_templates): Save and restore module_kind so
>   it isn't affected by reopen_tinst_level.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/modules/gmf-3.C: New test.
> 
> Signed-off-by: Nathaniel Shead 
> ---
>  gcc/cp/cp-tree.h |  3 +++
>  gcc/cp/pt.cc |  4 
>  gcc/testsuite/g++.dg/modules/gmf-3.C | 13 +
>  3 files changed, 20 insertions(+)
>  create mode 100644 gcc/testsuite/g++.dg/modules/gmf-3.C
> 
> diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
> index 1938ada0268..0e619120ccc 100644
> --- a/gcc/cp/cp-tree.h
> +++ b/gcc/cp/cp-tree.h
> @@ -6626,6 +6626,9 @@ struct GTY((chain_next ("%h.next"))) tinst_level {
>/* The location where the template is instantiated.  */
>location_t locus;
>  
> +  /* The module kind where the template is instantiated. */
> +  unsigned module_kind;
> +
>/* errorcount + sorrycount when we pushed this level.  */
>unsigned short errors;
>  
> diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> index 1c3eef60c06..401aa92bc3e 100644
> --- a/gcc/cp/pt.cc
> +++ b/gcc/cp/pt.cc
> @@ -11277,6 +11277,7 @@ push_tinst_level_loc (tree tldcl, tree targs, 
> location_t loc)
>new_level->tldcl = tldcl;
>new_level->targs = targs;
>new_level->locus = loc;
> +  new_level->module_kind = module_kind;
>new_level->errors = errorcount + sorrycount;
>new_level->next = NULL;
>new_level->refcount = 0;
> @@ -11345,6 +11346,7 @@ reopen_tinst_level (struct tinst_level *level)
>for (t = level; t; t = t->next)
>  ++tinst_depth;
>  
> +  module_kind = level->module_kind;
>set_refcount_ptr (current_tinst_level, level);
>pop_tinst_level ();
>if (current_tinst_level)
> @@ -27442,6 +27444,7 @@ instantiate_pending_templates (int retries)
>  {
>int reconsider;
>location_t saved_loc = input_location;
> +  unsigned saved_module_kind = module_kind;
>  
>/* Instantiating templates may trigger vtable generation.  This in turn
>   may require further template instantiations.  We place a limit here
> @@ -27532,6 +27535,7 @@ instantiate_pending_templates (int retries)
>while (reconsider);
>  
>inp

Re: [PATCH] gimple-fold: Improve optimize_memcpy_to_memset to handle STRING_CST [PR78408]

2025-02-21 Thread Richard Biener
On Fri, Feb 21, 2025 at 5:11 AM Andrew Pinski  wrote:
>
> While looking into PR 118947, I noticed that optimize_memcpy_to_memset didn't
> handle STRING_CST which are also used for a memset of 0 but for char arrays.
> This fixes that and improves optimize_memcpy_to_memset to handle that case.
>
> This fixes part of PR 118947 but not the whole thing; we still need to skip 
> over
> vdefs in some cases.
>
> Boostrapped and tested on x86_64-linux-gnu.

OK.

Thanks,
Richard.

> PR tree-optimization/78408
> PR tree-optimization/118947
>
> gcc/ChangeLog:
>
> * gimple-fold.cc (optimize_memcpy_to_memset): Handle STRING_CST case 
> too.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/pr78408-3.c: New test.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/gimple-fold.cc   | 17 +
>  gcc/testsuite/gcc.dg/pr78408-3.c | 14 ++
>  2 files changed, 31 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/pr78408-3.c
>
> diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
> index 0380c7af4c2..1a48778da5a 100644
> --- a/gcc/gimple-fold.cc
> +++ b/gcc/gimple-fold.cc
> @@ -919,6 +919,23 @@ optimize_memcpy_to_memset (gimple_stmt_iterator *gsip, 
> tree dest, tree src, tree
>poly_int64 offset, offset2;
>tree val = integer_zero_node;
>if (gimple_store_p (defstmt)
> +  && gimple_assign_single_p (defstmt)
> +  && TREE_CODE (gimple_assign_rhs1 (defstmt)) == STRING_CST
> +  && !gimple_clobber_p (defstmt))
> +{
> +  tree str = gimple_assign_rhs1 (defstmt);
> +  src2 = gimple_assign_lhs (defstmt);
> +  /* The string must contain all null char's for now.  */
> +  for (int i = 0; i < TREE_STRING_LENGTH (str); i++)
> +   {
> + if (TREE_STRING_POINTER (str)[i] != 0)
> +   {
> + src2 = NULL_TREE;
> + break;
> +   }
> +   }
> +}
> +  else if (gimple_store_p (defstmt)
>&& gimple_assign_single_p (defstmt)
>&& TREE_CODE (gimple_assign_rhs1 (defstmt)) == CONSTRUCTOR
>&& !gimple_clobber_p (defstmt))
> diff --git a/gcc/testsuite/gcc.dg/pr78408-3.c 
> b/gcc/testsuite/gcc.dg/pr78408-3.c
> new file mode 100644
> index 000..3de90d02392
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr78408-3.c
> @@ -0,0 +1,14 @@
> +/* PR tree-optimization/78408 */
> +/* { dg-do compile { target size32plus } } */
> +/* { dg-options "-O2 -fdump-tree-forwprop1-details" } */
> +/* { dg-final { scan-tree-dump-times "after previous" 1 "forwprop1" } } */
> +
> +void* aaa();
> +void* bbb()
> +{
> +void* ret = aaa();
> +char buf[32] = {};
> +__builtin_memcpy(ret, buf, 32);
> +return ret;
> +}
> +
> --
> 2.43.0
>


[PATCH] COBOL 12K inf: info updates for gcobol

2025-02-21 Thread James K. Lowden
The following set of patches update gcc documentation to include the
COBOL front end. I have tried to follow the style.  

The order of appearance of the supported languages in various places
looks inconsistent to me.  I don't *think* they occur in the order they
were implemented, which in any case is no service to the reader.  And
they're definitely not in alphabetical order, but kinda.  (If that's a
TLC issue that no one has gotten around to, I'm happy give it a try
later.)  I took the liberty of adding COBOL after the C family.

It feels funny to add my own name, but I guess somebody has to.  

--jkl


>From 71aca801f6dc5f6c2ea19044755f01a09742e7db Fri 21 Feb 2025 12:15:18 PM EST
From: "James K. Lowden" 
Date: Fri 21 Feb 2025 12:15:18 PM EST
Subject: [PATCH] COBOL inf: info updates for gcobol

gcc/doc/ChangeLog
* doc/contrib.texi: Add Dubner & Lowden.
* doc/frontends.texi: Mention COBOL, sort names.
* doc/install.texi: Document gcobol prerequisites.
* doc/invoke.texi: COBOL source code file extensions.
* doc/sourcebuild.texi: Add libgcobol runtime library.
* doc/standards.texi: Cite ISO as COBOL standard.

---
gcc/doc/contrib.texi | -
gcc/doc/frontends.texi | -
gcc/doc/install.texi | 

gcc/doc/invoke.texi | +++-
gcc/doc/sourcebuild.texi | +++-
gcc/doc/standards.texi | +++
6 files changed, 85 insertions(+), 29 deletions(-)
diff --git a/gcc/doc/contrib.texi b/gcc/doc/contrib.texi
index 10dc64350d2..17e088845ea 100644
--- a/gcc/doc/contrib.texi
+++ b/gcc/doc/contrib.texi
@@ -268,6 +268,10 @@ libraries including for all kinds of C interface issues, 
contributing and
 maintaining @code{complex<>}, sanity checking and disbursement, configuration
 architecture, libio maintenance, and early math work.
 
+@item
+Robert J. Dubner for his work on the COBOL front end, mating the
+parser output to the GENERIC tree.
+
 @item
 Fran@,{c}ois Dumont for his work on libstdc++-v3, especially maintaining and
 improving @code{debug-mode} and associative and unordered containers.
@@ -604,6 +608,10 @@ many other diagnostics fixes and improvements.
 Dave Love for his ongoing work with the Fortran front end and
 runtime libraries.
 
+@item
+James K. Lowden for his work on the COBOL front end, mainly the parser
+and CDF.
+
 @item
 Martin von L@"owis for internal consistency checking infrastructure,
 various C++ improvements including namespace support, and tons of
diff --git a/gcc/doc/frontends.texi b/gcc/doc/frontends.texi
index 73c222c9b0b..acd05941856 100644
--- a/gcc/doc/frontends.texi
+++ b/gcc/doc/frontends.texi
@@ -31,23 +31,23 @@ The language-independent component of GCC includes the 
majority of the
 optimizers, as well as the ``back ends'' that generate machine code for
 various processors.
 
-@cindex COBOL
 @cindex Mercury
 The part of a compiler that is specific to a particular language is
 called the ``front end''.  In addition to the front ends that are
 integrated components of GCC, there are several other front ends that
-are maintained separately.  These support languages such as
-Mercury, and COBOL@.  To use these, they must be built together with
-GCC proper.
+are maintained separately.  These support languages such as Mercury.
+To use these, they must be built together with GCC proper.
 
+@cindex Ada
 @cindex C++
+@cindex COBOL
 @cindex G++
-@cindex Ada
 @cindex GNAT
 Most of the compilers for languages other than C have their own names.
-The C++ compiler is G++, the Ada compiler is GNAT, and so on.  When we
-talk about compiling one of those languages, we might refer to that
-compiler by its own name, or as GCC@.  Either is correct.
+The C++ compiler is G++, the COBOL compiler is gcobol, the Ada
+compiler is GNAT, and so on.  When we talk about compiling one of
+those languages, we might refer to that compiler by its own name, or
+as GCC@.  Either is correct.
 
 @cindex compiler compared to C++ preprocessor
 @cindex intermediate C version, nonexistent
diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 3b9f56b0529..1fdc109183c 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -291,11 +291,39 @@ Ada runtime libraries. You can check that your build 
environment is clean
 by verifying that @samp{gnatls -v} lists only one explicit path in each
 section.
 
+@cindex cobol
+@item @anchor{GCOBOL-prerequisite}GCOBOL
+
+The COBOL compiler, gcobol, first appeared in GCC 15.  To build the
+COBOL parser, you need GNU Bison 3.5.1 or later (but not 3.8.0). To build
+the lexer requires GNU Flex 2.6.4, the current version as of this writing,
+released on 2017-05-06.
+
+The gcobol documentation is maintained as manpages using troff
+mdoc. GNU groff is required to convert them to PDF format.  Conversion
+to HTML is done with mandoc, available at
+@uref{http://mdocml.bsd.lv/}.
+
+Because ISO COBOL defines strict requirements for numerical precision,
+gcob

[PATCH v1 3/3] LoongArch: General xvbitsel insn combinations optimization for special type.

2025-02-21 Thread Li Wei
In LoongArch, when the permutation idx comes from different vectors and
idx is not repeated, for V8SI/V8SF/V4DI/V4DF type vectors, we can use
two xvperm.w + one xvbitsel.v instructions or two xvpermi.d + one
xvbitsel.v instructions for shuffle optimization.

gcc/ChangeLog:

* config/loongarch/loongarch.cc 
(loongarch_expand_vec_perm_generic_bitsel):
Add new vector shuffle optimize function.
(loongarch_expand_vec_perm_const): Adjust.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vec_perm-xvbitsel-2.c: New test.
* gcc.target/loongarch/vec_perm-xvbitsel-3.c: New test.
---
 gcc/config/loongarch/loongarch.cc | 136 ++
 .../loongarch/vec_perm-xvbitsel-2.c   |  18 +++
 .../loongarch/vec_perm-xvbitsel-3.c   |  22 +++
 3 files changed, 176 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vec_perm-xvbitsel-2.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vec_perm-xvbitsel-3.c

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 2de3110383a..07c29eae2c3 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -9201,6 +9201,139 @@ loongarch_expand_vec_perm_bitsel (struct 
expand_vec_perm_d *d)
   return true;
 }
 
+/* A general shuffle method for 256-bit V8SI/V8SF/V4DI/V4DF types when
+   the permutate idx comes from different vectors and idx is not repeated.  */
+static bool
+loongarch_expand_vec_perm_generic_bitsel (struct expand_vec_perm_d *d)
+{
+  if (!ISA_HAS_LASX)
+return false;
+
+  auto_bitmap used;
+  machine_mode mode = d->vmode;
+  int nelt = d->nelt, val, i;
+
+  /* Due to instruction set restrictions, the following types do not support
+ this optimization method.  */
+  if (mode != E_V8SImode && mode != E_V8SFmode
+  && mode != E_V4DImode && mode != E_V4DFmode)
+return false;
+
+  /* We should ensure that d->perm[i] % nelt has no repeat.  */
+  for (i = 0; i < nelt; i += 1)
+{
+  if (bitmap_bit_p (used, d->perm[i] % nelt))
+   return false;
+  else
+   bitmap_set_bit (used, d->perm[i] % nelt);
+}
+
+  if (d->testing_p)
+return true;
+
+  rtx reg_bitsel, tmp_bitsel, sel_bitsel, op0, op1;
+  rtx rmap_bitsel[MAX_VECT_LEN];
+  op0 = gen_reg_rtx (mode);
+  op1 = gen_reg_rtx (mode);
+  reg_bitsel = gen_reg_rtx (mode);
+
+  if (mode == E_V8SImode || mode == E_V8SFmode)
+{
+  rtx rmap_xvperm[MAX_VECT_LEN];
+  rtx sel_xvperm, reg_xvperm;
+
+  for (i = 0; i < nelt; i += 1)
+   {
+ /* For xvperm insn we just copy original permutate index.  */
+ rmap_xvperm[i] = GEN_INT (d->perm[i]);
+ val = d->perm[i] >= nelt ? -1 : 0;
+ /* For xvbitsel insn we should do some conversion, where -1 means
+the destination element comes from operand1, and 0 means the
+destination element comes from operand0.  */
+ rmap_bitsel[i] = GEN_INT (val);
+   }
+
+  reg_xvperm = gen_reg_rtx (E_V8SImode);
+
+  /* Prepare reg of selective index for xvperm.  */
+  sel_xvperm = gen_rtx_CONST_VECTOR (E_V8SImode,
+gen_rtvec_v (nelt, rmap_xvperm));
+  emit_move_insn (reg_xvperm, sel_xvperm);
+
+  /* Prepare reg of selective index for xvbitsel.  */
+  sel_bitsel = gen_rtx_CONST_VECTOR (E_V8SImode,
+gen_rtvec_v (nelt, rmap_bitsel));
+  if (mode == E_V8SFmode)
+   {
+ tmp_bitsel = simplify_gen_subreg (E_V8SImode, reg_bitsel, mode, 0);
+ emit_move_insn (tmp_bitsel, sel_bitsel);
+   }
+  else
+   emit_move_insn (reg_bitsel, sel_bitsel);
+
+  switch (mode)
+   {
+   case E_V8SFmode:
+ emit_insn (gen_lasx_xvperm_w_f (op0, d->op0, reg_xvperm));
+ emit_insn (gen_lasx_xvperm_w_f (op1, d->op1, reg_xvperm));
+ break;
+   case E_V8SImode:
+ emit_insn (gen_lasx_xvperm_w (op0, d->op0, reg_xvperm));
+ emit_insn (gen_lasx_xvperm_w (op1, d->op1, reg_xvperm));
+ break;
+   default:
+ gcc_unreachable ();
+ break;
+   }
+
+  emit_insn (gen_simd_vbitsel (mode, d->target, op0, op1, reg_bitsel));
+}
+  else
+{
+  unsigned int imm = 0;
+  unsigned int val2;
+
+  for (i = nelt - 1; i >= 0; i -= 1)
+   {
+ val = d->perm[i] >= nelt ? -1 : 0;
+ rmap_bitsel[i] = GEN_INT (val);
+ val2 = d->perm[i] % nelt;
+ imm |= val2;
+ imm = (i != 0) ? imm << 2 : imm;
+   }
+
+  /* Prepare reg of selective index for xvbitsel.  */
+  sel_bitsel = gen_rtx_CONST_VECTOR (E_V4DImode,
+gen_rtvec_v (nelt, rmap_bitsel));
+  if (mode == E_V4DFmode)
+   {
+ tmp_bitsel = simplify_gen_subreg (E_V4DImode, reg_bitsel, mode, 0);
+ emit_move_insn (tmp_bitsel, sel_bitsel);
+   }
+  else
+   emit_move_insn (reg_bitsel, sel_

[PATCH v1 2/3] LoongArch: Optimize [x]vshuf insn to [x]vbitsel insn in some shuffle cases.

2025-02-21 Thread Li Wei
Currently, the shuffle in which LoongArch selects two vectors at
corresponding positions is implemented through the [x]vshuf instruction,
but this will introduce additional index copies. In this case, the
[x]vbitsel.v instruction can be used for optimization.

gcc/ChangeLog:

* config/loongarch/lasx.md (lasx_xvbitsel_): Remove.
* config/loongarch/loongarch-builtins.cc (CODE_FOR_lsx_vbitsel_v):
Adjust.
(CODE_FOR_lasx_xvbitsel_v): Ditto.
* config/loongarch/loongarch.cc (loongarch_try_expand_lsx_vshuf_const):
Ditto.
(loongarch_is_bitsel_pattern): Add new check function.
(loongarch_expand_vec_perm_bitsel): Add new implement function.
(loongarch_expand_lsx_shuffle): Adjust.
(loongarch_expand_vec_perm_const): Add new optimize case.
* config/loongarch/lsx.md (lsx_vbitsel_): Adjust insn
pattern mode.
* config/loongarch/simd.md (@simd_vbitsel): New
define_insn template.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vec_perm-xvshuf.c: Move to...
* gcc.target/loongarch/vec_perm-xvbitsel.c: ...here.
* gcc.target/loongarch/vec_perm-vbitsel.c: New test.
---
 gcc/config/loongarch/lasx.md  | 12 ---
 gcc/config/loongarch/loongarch-builtins.cc|  4 +-
 gcc/config/loongarch/loongarch.cc | 89 ++-
 gcc/config/loongarch/lsx.md   | 12 ---
 gcc/config/loongarch/simd.md  | 13 +++
 .../gcc.target/loongarch/vec_perm-vbitsel.c   | 17 
 ...{vec_perm-xvshuf.c => vec_perm-xvbitsel.c} |  4 +-
 7 files changed, 121 insertions(+), 30 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vec_perm-vbitsel.c
 rename gcc/testsuite/gcc.target/loongarch/{vec_perm-xvshuf.c => 
vec_perm-xvbitsel.c} (77%)

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index a37c85a25a4..c1049e319db 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -1217,18 +1217,6 @@ (define_insn "lasx_xvbitrevi_"
   [(set_attr "type" "simd_bit")
(set_attr "mode" "")])
 
-(define_insn "lasx_xvbitsel_"
-  [(set (match_operand:LASX 0 "register_operand" "=f")
-   (ior:LASX (and:LASX (not:LASX
- (match_operand:LASX 3 "register_operand" "f"))
- (match_operand:LASX 1 "register_operand" "f"))
- (and:LASX (match_dup 3)
-   (match_operand:LASX 2 "register_operand" "f"]
-  "ISA_HAS_LASX"
-  "xvbitsel.v\t%u0,%u1,%u2,%u3"
-  [(set_attr "type" "simd_bitmov")
-   (set_attr "mode" "")])
-
 (define_insn "lasx_xvbitseli_b"
   [(set (match_operand:V32QI 0 "register_operand" "=f")
(ior:V32QI (and:V32QI (not:V32QI
diff --git a/gcc/config/loongarch/loongarch-builtins.cc 
b/gcc/config/loongarch/loongarch-builtins.cc
index 92d995a916a..0682bc6baf9 100644
--- a/gcc/config/loongarch/loongarch-builtins.cc
+++ b/gcc/config/loongarch/loongarch-builtins.cc
@@ -247,7 +247,7 @@ AVAIL_ALL (lasx_frecipe, ISA_HAS_LASX && ISA_HAS_FRECIPE)
 #define CODE_FOR_lsx_vandi_b CODE_FOR_andv16qi3
 #define CODE_FOR_lsx_bnz_v CODE_FOR_lsx_bnz_v_b
 #define CODE_FOR_lsx_bz_v CODE_FOR_lsx_bz_v_b
-#define CODE_FOR_lsx_vbitsel_v CODE_FOR_lsx_vbitsel_b
+#define CODE_FOR_lsx_vbitsel_v CODE_FOR_simd_vbitselv16qi
 #define CODE_FOR_lsx_vseqi_b CODE_FOR_lsx_vseq_b
 #define CODE_FOR_lsx_vseqi_h CODE_FOR_lsx_vseq_h
 #define CODE_FOR_lsx_vseqi_w CODE_FOR_lsx_vseq_w
@@ -538,7 +538,7 @@ AVAIL_ALL (lasx_frecipe, ISA_HAS_LASX && ISA_HAS_FRECIPE)
 #define CODE_FOR_lasx_xvaddi_du CODE_FOR_addv4di3
 #define CODE_FOR_lasx_xvand_v CODE_FOR_andv32qi3
 #define CODE_FOR_lasx_xvandi_b CODE_FOR_andv32qi3
-#define CODE_FOR_lasx_xvbitsel_v CODE_FOR_lasx_xvbitsel_b
+#define CODE_FOR_lasx_xvbitsel_v CODE_FOR_simd_vbitselv32qi
 #define CODE_FOR_lasx_xvseqi_b CODE_FOR_lasx_xvseq_b
 #define CODE_FOR_lasx_xvseqi_h CODE_FOR_lasx_xvseq_h
 #define CODE_FOR_lasx_xvseqi_w CODE_FOR_lasx_xvseq_w
diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 3ac6a74f15b..2de3110383a 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -8372,7 +8372,10 @@ loongarch_try_expand_lsx_vshuf_const (struct 
expand_vec_perm_d *d)
   else
{
  sel = gen_rtx_CONST_VECTOR (d->vmode, gen_rtvec_v (d->nelt, rperm));
- emit_move_insn (d->target, sel);
+ /* Weakening dependencies by copying indices (for vshuf).  */
+ tmp = gen_reg_rtx (d->vmode);
+ emit_move_insn (tmp, sel);
+ emit_move_insn (d->target, tmp);
}
 
   switch (d->vmode)
@@ -8444,9 +8447,31 @@ loongarch_is_imm_set_shuffle (struct expand_vec_perm_d 
*d)
   return true;
 }
 
+/* Check if the d->perm meets the requirements of the [x]vbitsel.v insn.  */
+static bool
+loongarch_is_bitsel_pattern (struct expand_vec_perm_d *d)
+{
+  bool result = true;
+
+  for (int i = 0; i < d->nelt; i++)
+{

[PATCH v2 2/3] LoongArch: Optimize [x]vshuf insn to [x]vbitsel insn in some shuffle cases.

2025-02-21 Thread Li Wei
Currently, the shuffle in which LoongArch selects two vectors at
corresponding positions is implemented through the [x]vshuf instruction,
but this will introduce additional index copies. In this case, the
[x]vbitsel.v instruction can be used for optimization.

gcc/ChangeLog:

* config/loongarch/lasx.md (lasx_xvbitsel_): Remove.
* config/loongarch/loongarch-builtins.cc (CODE_FOR_lsx_vbitsel_v):
Adjust.
(CODE_FOR_lasx_xvbitsel_v): Ditto.
* config/loongarch/loongarch.cc (loongarch_try_expand_lsx_vshuf_const):
Ditto.
(loongarch_is_bitsel_pattern): Add new check function.
(loongarch_expand_vec_perm_bitsel): Add new implement function.
(loongarch_expand_lsx_shuffle): Adjust.
(loongarch_expand_vec_perm_const): Add new optimize case.
* config/loongarch/lsx.md (lsx_vbitsel_): Adjust insn
pattern mode.
* config/loongarch/simd.md (@simd_vbitsel): New
define_insn template.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vec_perm-xvshuf.c: Move to...
* gcc.target/loongarch/vec_perm-xvbitsel.c: ...here.
* gcc.target/loongarch/vec_perm-vbitsel.c: New test.
---
 gcc/config/loongarch/lasx.md  | 12 ---
 gcc/config/loongarch/loongarch-builtins.cc|  4 +-
 gcc/config/loongarch/loongarch.cc | 89 ++-
 gcc/config/loongarch/lsx.md   | 12 ---
 gcc/config/loongarch/simd.md  | 13 +++
 .../gcc.target/loongarch/vec_perm-vbitsel.c   | 17 
 ...{vec_perm-xvshuf.c => vec_perm-xvbitsel.c} |  4 +-
 7 files changed, 121 insertions(+), 30 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vec_perm-vbitsel.c
 rename gcc/testsuite/gcc.target/loongarch/{vec_perm-xvshuf.c => 
vec_perm-xvbitsel.c} (77%)

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index e4505c1660d..b72b6f97fac 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -1149,18 +1149,6 @@ (define_insn "lasx_xvbitrevi_"
   [(set_attr "type" "simd_bit")
(set_attr "mode" "")])
 
-(define_insn "lasx_xvbitsel_"
-  [(set (match_operand:LASX 0 "register_operand" "=f")
-   (ior:LASX (and:LASX (not:LASX
- (match_operand:LASX 3 "register_operand" "f"))
- (match_operand:LASX 1 "register_operand" "f"))
- (and:LASX (match_dup 3)
-   (match_operand:LASX 2 "register_operand" "f"]
-  "ISA_HAS_LASX"
-  "xvbitsel.v\t%u0,%u1,%u2,%u3"
-  [(set_attr "type" "simd_bitmov")
-   (set_attr "mode" "")])
-
 (define_insn "lasx_xvbitseli_b"
   [(set (match_operand:V32QI 0 "register_operand" "=f")
(ior:V32QI (and:V32QI (not:V32QI
diff --git a/gcc/config/loongarch/loongarch-builtins.cc 
b/gcc/config/loongarch/loongarch-builtins.cc
index 9493dedcab7..982f90e7048 100644
--- a/gcc/config/loongarch/loongarch-builtins.cc
+++ b/gcc/config/loongarch/loongarch-builtins.cc
@@ -247,7 +247,7 @@ AVAIL_ALL (lasx_frecipe, ISA_HAS_LASX && ISA_HAS_FRECIPE)
 #define CODE_FOR_lsx_vandi_b CODE_FOR_andv16qi3
 #define CODE_FOR_lsx_bnz_v CODE_FOR_lsx_bnz_v_b
 #define CODE_FOR_lsx_bz_v CODE_FOR_lsx_bz_v_b
-#define CODE_FOR_lsx_vbitsel_v CODE_FOR_lsx_vbitsel_b
+#define CODE_FOR_lsx_vbitsel_v CODE_FOR_simd_vbitselv16qi
 #define CODE_FOR_lsx_vseqi_b CODE_FOR_lsx_vseq_b
 #define CODE_FOR_lsx_vseqi_h CODE_FOR_lsx_vseq_h
 #define CODE_FOR_lsx_vseqi_w CODE_FOR_lsx_vseq_w
@@ -568,7 +568,7 @@ AVAIL_ALL (lasx_frecipe, ISA_HAS_LASX && ISA_HAS_FRECIPE)
 #define CODE_FOR_lasx_xvaddi_du CODE_FOR_addv4di3
 #define CODE_FOR_lasx_xvand_v CODE_FOR_andv32qi3
 #define CODE_FOR_lasx_xvandi_b CODE_FOR_andv32qi3
-#define CODE_FOR_lasx_xvbitsel_v CODE_FOR_lasx_xvbitsel_b
+#define CODE_FOR_lasx_xvbitsel_v CODE_FOR_simd_vbitselv32qi
 #define CODE_FOR_lasx_xvseqi_b CODE_FOR_lasx_xvseq_b
 #define CODE_FOR_lasx_xvseqi_h CODE_FOR_lasx_xvseq_h
 #define CODE_FOR_lasx_xvseqi_w CODE_FOR_lasx_xvseq_w
diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index e9e51465344..a4a72923b7f 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -8400,7 +8400,10 @@ loongarch_try_expand_lsx_vshuf_const (struct 
expand_vec_perm_d *d)
   else
{
  sel = gen_rtx_CONST_VECTOR (d->vmode, gen_rtvec_v (d->nelt, rperm));
- emit_move_insn (d->target, sel);
+ /* Weakening dependencies by copying indices (for vshuf).  */
+ tmp = gen_reg_rtx (d->vmode);
+ emit_move_insn (tmp, sel);
+ emit_move_insn (d->target, tmp);
}
 
   switch (d->vmode)
@@ -8472,9 +8475,31 @@ loongarch_is_imm_set_shuffle (struct expand_vec_perm_d 
*d)
   return true;
 }
 
+/* Check if the d->perm meets the requirements of the [x]vbitsel.v insn.  */
+static bool
+loongarch_is_bitsel_pattern (struct expand_vec_perm_d *d)
+{
+  bool result = true;
+
+  for (int i = 0; i < d->nelt; i++)
+{

[PATCH v2 1/3] LoongArch: Optimize two 256-bit vectors correspond highpart and lowpart splicing shuffle.

2025-02-21 Thread Li Wei
In LoongArch, we have xvshuf.{b/h/w/d} instructions which can dealt the
situation that all low 128-bit elements of the target vector are shuffled
by concatenating the low 128-bit elements of the two input vectors, and
all high 128-bit elements of the target vector are similarly shuffled.
Therefore, we added recognition for such situations and used the xvshuf
instruction for optimization.

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_if_match_xvshuffle):
Add new condition.
(loongarch_expand_vec_perm_const): Add new function.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vec_perm-verify-xvshuf.c: New test.
* gcc.target/loongarch/vec_perm-xvshuf.c: New test.
---
 gcc/config/loongarch/loongarch.cc |  69 
 .../loongarch/vec_perm-verify-xvshuf.c| 106 ++
 .../gcc.target/loongarch/vec_perm-xvshuf.c|  17 +++
 3 files changed, 192 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vec_perm-verify-xvshuf.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vec_perm-xvshuf.c

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index f2177f892fb..e9e51465344 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -9346,6 +9346,34 @@ loongarch_is_elem_duplicate (struct expand_vec_perm_d *d)
   return result;
 }
 
+/* If the target vector low 128-bit element comes from the low 128-bit element
+   of op0 or op1, and the target vector high 128-bit element comes from the
+   high 128-bit element of op0 or op1, the corresponding xvshuf.{h/w/d}
+   instruction can be matched.  */
+static bool
+loongarch_if_match_xvshuffle (struct expand_vec_perm_d *d)
+{
+  for (int i = 0; i < d->nelt; i++)
+{
+  unsigned char buf = d->perm[i];
+
+  if (i < d->nelt / 2)
+   {
+ if ((buf >= d->nelt / 2 && buf < d->nelt)
+ || buf >= (d->nelt + d->nelt / 2))
+   return false;
+   }
+  else
+   {
+ if ((buf >= d->nelt && buf < (d->nelt + d->nelt / 2))
+ || buf < d->nelt / 2)
+   return false;
+   }
+}
+
+  return true;
+}
+
 /* In LASX, some permutation insn does not have the behavior that gcc expects
when compiler wants to emit a vector permutation.
 
@@ -9598,6 +9626,47 @@ loongarch_expand_vec_perm_const (struct 
expand_vec_perm_d *d)
  return true;
}
 
+  if (loongarch_if_match_xvshuffle (d))
+   {
+ if (d->testing_p)
+   return true;
+
+ /* Selector example: E_V8SImode, { 0, 9, 2, 11, 4, 13, 6, 15 }.  */
+ /* If target low 128-bit has op1 low 128-bit element {9, 11}, we
+need subtract half of d->nelt (so index in range (4, 7)) to form
+the 256-bit intermediate vector vec0.
+Similarly, if target high 128-bit has op0 high 128-bit element
+{4, 6}, we need subtract half of d->nelt (so index in range
+(0, 3)) to form the 256-bit intermediate vector vec1.
+Especially if target high 128-bit has op1 high 128-bit element
+{13, 15}, we need modulo d->nelt (so index in range (4, 7)) to
+form the 256-bit intermediate vector vec1.  */
+ for (i = 0; i < d->nelt; i += 1)
+   {
+ if (i < d->nelt / 2)
+   {
+ if (d->perm[i] >= d->nelt)
+   remapped[i] = d->perm[i] - d->nelt / 2;
+ else
+   remapped[i] = d->perm[i];
+   }
+ else
+   {
+ if (d->perm[i] < d->nelt)
+   remapped[i] = d->perm[i] - d->nelt / 2;
+ else
+   remapped[i] = d->perm[i] % d->nelt;
+   }
+   }
+
+ /* Selector after: { 0, 5, 2, 7, 0, 5, 2, 7 }.  */
+ for (i = 0; i < d->nelt; i += 1)
+   rperm[i] = GEN_INT (remapped[i]);
+
+ flag = true;
+ goto expand_perm_const_end;
+   }
+
 expand_perm_const_end:
   if (flag)
{
diff --git a/gcc/testsuite/gcc.target/loongarch/vec_perm-verify-xvshuf.c 
b/gcc/testsuite/gcc.target/loongarch/vec_perm-verify-xvshuf.c
new file mode 100644
index 000..658dacfe340
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/vec_perm-verify-xvshuf.c
@@ -0,0 +1,106 @@
+/* { dg-do run } */
+/* { dg-options "-O3 -mlasx -w -fno-strict-aliasing" } */
+
+#include "./vector/simd_correctness_check.h"
+#include 
+#define N 16
+
+typedef int TYPE;
+
+void
+foo (TYPE a[], TYPE b[], TYPE c[])
+{
+  for (int i = 0; i < N; i += 2)
+{
+  c[i + 0] = a[i + 0] + b[i + 0];
+  c[i + 1] = a[i + 1] - b[i + 1];
+}
+}
+
+__m256i
+change_to_256vec (TYPE c[], int offset)
+{
+  __m256i __m256i_op;
+  int type_bit_len = sizeof (TYPE) * 8;
+  long int tmp;
+
+  for (int i = offset; i < 256 / type_bit_len + offset; i += 2)
+{
+  __m256i_op[(i - offset) / 2] = 

[PATCH v2 3/3] LoongArch: General xvbitsel insn combinations optimization for special type.

2025-02-21 Thread Li Wei
In LoongArch, when the permutation idx comes from different vectors and
idx is not repeated, for V8SI/V8SF/V4DI/V4DF type vectors, we can use
two xvperm.w + one xvbitsel.v instructions or two xvpermi.d + one
xvbitsel.v instructions for shuffle optimization.

gcc/ChangeLog:

* config/loongarch/loongarch.cc 
(loongarch_expand_vec_perm_generic_bitsel):
Add new vector shuffle optimize function.
(loongarch_expand_vec_perm_const): Adjust.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vec_perm-xvbitsel-2.c: New test.
* gcc.target/loongarch/vec_perm-xvbitsel-3.c: New test.
---
 gcc/config/loongarch/loongarch.cc | 136 ++
 .../loongarch/vec_perm-xvbitsel-2.c   |  18 +++
 .../loongarch/vec_perm-xvbitsel-3.c   |  22 +++
 3 files changed, 176 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vec_perm-xvbitsel-2.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vec_perm-xvbitsel-3.c

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index a4a72923b7f..16fe755742b 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -9229,6 +9229,139 @@ loongarch_expand_vec_perm_bitsel (struct 
expand_vec_perm_d *d)
   return true;
 }
 
+/* A general shuffle method for 256-bit V8SI/V8SF/V4DI/V4DF types when
+   the permutate idx comes from different vectors and idx is not repeated.  */
+static bool
+loongarch_expand_vec_perm_generic_bitsel (struct expand_vec_perm_d *d)
+{
+  if (!ISA_HAS_LASX)
+return false;
+
+  auto_bitmap used;
+  machine_mode mode = d->vmode;
+  int nelt = d->nelt, val, i;
+
+  /* Due to instruction set restrictions, the following types do not support
+ this optimization method.  */
+  if (mode != E_V8SImode && mode != E_V8SFmode
+  && mode != E_V4DImode && mode != E_V4DFmode)
+return false;
+
+  /* We should ensure that d->perm[i] % nelt has no repeat.  */
+  for (i = 0; i < nelt; i += 1)
+{
+  if (bitmap_bit_p (used, d->perm[i] % nelt))
+   return false;
+  else
+   bitmap_set_bit (used, d->perm[i] % nelt);
+}
+
+  if (d->testing_p)
+return true;
+
+  rtx reg_bitsel, tmp_bitsel, sel_bitsel, op0, op1;
+  rtx rmap_bitsel[MAX_VECT_LEN];
+  op0 = gen_reg_rtx (mode);
+  op1 = gen_reg_rtx (mode);
+  reg_bitsel = gen_reg_rtx (mode);
+
+  if (mode == E_V8SImode || mode == E_V8SFmode)
+{
+  rtx rmap_xvperm[MAX_VECT_LEN];
+  rtx sel_xvperm, reg_xvperm;
+
+  for (i = 0; i < nelt; i += 1)
+   {
+ /* For xvperm insn we just copy original permutate index.  */
+ rmap_xvperm[i] = GEN_INT (d->perm[i]);
+ val = d->perm[i] >= nelt ? -1 : 0;
+ /* For xvbitsel insn we should do some conversion, where -1 means
+the destination element comes from operand1, and 0 means the
+destination element comes from operand0.  */
+ rmap_bitsel[i] = GEN_INT (val);
+   }
+
+  reg_xvperm = gen_reg_rtx (E_V8SImode);
+
+  /* Prepare reg of selective index for xvperm.  */
+  sel_xvperm = gen_rtx_CONST_VECTOR (E_V8SImode,
+gen_rtvec_v (nelt, rmap_xvperm));
+  emit_move_insn (reg_xvperm, sel_xvperm);
+
+  /* Prepare reg of selective index for xvbitsel.  */
+  sel_bitsel = gen_rtx_CONST_VECTOR (E_V8SImode,
+gen_rtvec_v (nelt, rmap_bitsel));
+  if (mode == E_V8SFmode)
+   {
+ tmp_bitsel = simplify_gen_subreg (E_V8SImode, reg_bitsel, mode, 0);
+ emit_move_insn (tmp_bitsel, sel_bitsel);
+   }
+  else
+   emit_move_insn (reg_bitsel, sel_bitsel);
+
+  switch (mode)
+   {
+   case E_V8SFmode:
+ emit_insn (gen_lasx_xvperm_w_f (op0, d->op0, reg_xvperm));
+ emit_insn (gen_lasx_xvperm_w_f (op1, d->op1, reg_xvperm));
+ break;
+   case E_V8SImode:
+ emit_insn (gen_lasx_xvperm_w (op0, d->op0, reg_xvperm));
+ emit_insn (gen_lasx_xvperm_w (op1, d->op1, reg_xvperm));
+ break;
+   default:
+ gcc_unreachable ();
+ break;
+   }
+
+  emit_insn (gen_simd_vbitsel (mode, d->target, op0, op1, reg_bitsel));
+}
+  else
+{
+  unsigned int imm = 0;
+  unsigned int val2;
+
+  for (i = nelt - 1; i >= 0; i -= 1)
+   {
+ val = d->perm[i] >= nelt ? -1 : 0;
+ rmap_bitsel[i] = GEN_INT (val);
+ val2 = d->perm[i] % nelt;
+ imm |= val2;
+ imm = (i != 0) ? imm << 2 : imm;
+   }
+
+  /* Prepare reg of selective index for xvbitsel.  */
+  sel_bitsel = gen_rtx_CONST_VECTOR (E_V4DImode,
+gen_rtvec_v (nelt, rmap_bitsel));
+  if (mode == E_V4DFmode)
+   {
+ tmp_bitsel = simplify_gen_subreg (E_V4DImode, reg_bitsel, mode, 0);
+ emit_move_insn (tmp_bitsel, sel_bitsel);
+   }
+  else
+   emit_move_insn (reg_bitsel, sel_

[PATCH v1 1/3] LoongArch: Optimize two 256-bit vectors correspond highpart and lowpart splicing shuffle.

2025-02-21 Thread Li Wei
In LoongArch, we have xvshuf.{b/h/w/d} instructions which can dealt the
situation that all low 128-bit elements of the target vector are shuffled
by concatenating the low 128-bit elements of the two input vectors, and
all high 128-bit elements of the target vector are similarly shuffled.
Therefore, we added recognition for such situations and used the xvshuf
instruction for optimization.

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_if_match_xvshuffle):
Add new condition.
(loongarch_expand_vec_perm_const): Add new function.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vec_perm-verify-xvshuf.c: New test.
* gcc.target/loongarch/vec_perm-xvshuf.c: New test.
---
 gcc/config/loongarch/loongarch.cc |  69 
 .../loongarch/vec_perm-verify-xvshuf.c| 106 ++
 .../gcc.target/loongarch/vec_perm-xvshuf.c|  17 +++
 3 files changed, 192 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vec_perm-verify-xvshuf.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vec_perm-xvshuf.c

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index e9978370e8c..3ac6a74f15b 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -9318,6 +9318,34 @@ loongarch_is_elem_duplicate (struct expand_vec_perm_d *d)
   return result;
 }
 
+/* If the target vector low 128-bit element comes from the low 128-bit element
+   of op0 or op1, and the target vector high 128-bit element comes from the
+   high 128-bit element of op0 or op1, the corresponding xvshuf.{h/w/d}
+   instruction can be matched.  */
+static bool
+loongarch_if_match_xvshuffle (struct expand_vec_perm_d *d)
+{
+  for (int i = 0; i < d->nelt; i++)
+{
+  unsigned char buf = d->perm[i];
+
+  if (i < d->nelt / 2)
+   {
+ if ((buf >= d->nelt / 2 && buf < d->nelt)
+ || buf >= (d->nelt + d->nelt / 2))
+   return false;
+   }
+  else
+   {
+ if ((buf >= d->nelt && buf < (d->nelt + d->nelt / 2))
+ || buf < d->nelt / 2)
+   return false;
+   }
+}
+
+  return true;
+}
+
 /* In LASX, some permutation insn does not have the behavior that gcc expects
when compiler wants to emit a vector permutation.
 
@@ -9570,6 +9598,47 @@ loongarch_expand_vec_perm_const (struct 
expand_vec_perm_d *d)
  return true;
}
 
+  if (loongarch_if_match_xvshuffle (d))
+   {
+ if (d->testing_p)
+   return true;
+
+ /* Selector example: E_V8SImode, { 0, 9, 2, 11, 4, 13, 6, 15 }.  */
+ /* If target low 128-bit has op1 low 128-bit element {9, 11}, we
+need subtract half of d->nelt (so index in range (4, 7)) to form
+the 256-bit intermediate vector vec0.
+Similarly, if target high 128-bit has op0 high 128-bit element
+{4, 6}, we need subtract half of d->nelt (so index in range
+(0, 3)) to form the 256-bit intermediate vector vec1.
+Especially if target high 128-bit has op1 high 128-bit element
+{13, 15}, we need modulo d->nelt (so index in range (4, 7)) to
+form the 256-bit intermediate vector vec1.  */
+ for (i = 0; i < d->nelt; i += 1)
+   {
+ if (i < d->nelt / 2)
+   {
+ if (d->perm[i] >= d->nelt)
+   remapped[i] = d->perm[i] - d->nelt / 2;
+ else
+   remapped[i] = d->perm[i];
+   }
+ else
+   {
+ if (d->perm[i] < d->nelt)
+   remapped[i] = d->perm[i] - d->nelt / 2;
+ else
+   remapped[i] = d->perm[i] % d->nelt;
+   }
+   }
+
+ /* Selector after: { 0, 5, 2, 7, 0, 5, 2, 7 }.  */
+ for (i = 0; i < d->nelt; i += 1)
+   rperm[i] = GEN_INT (remapped[i]);
+
+ flag = true;
+ goto expand_perm_const_end;
+   }
+
 expand_perm_const_end:
   if (flag)
{
diff --git a/gcc/testsuite/gcc.target/loongarch/vec_perm-verify-xvshuf.c 
b/gcc/testsuite/gcc.target/loongarch/vec_perm-verify-xvshuf.c
new file mode 100644
index 000..658dacfe340
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/vec_perm-verify-xvshuf.c
@@ -0,0 +1,106 @@
+/* { dg-do run } */
+/* { dg-options "-O3 -mlasx -w -fno-strict-aliasing" } */
+
+#include "./vector/simd_correctness_check.h"
+#include 
+#define N 16
+
+typedef int TYPE;
+
+void
+foo (TYPE a[], TYPE b[], TYPE c[])
+{
+  for (int i = 0; i < N; i += 2)
+{
+  c[i + 0] = a[i + 0] + b[i + 0];
+  c[i + 1] = a[i + 1] - b[i + 1];
+}
+}
+
+__m256i
+change_to_256vec (TYPE c[], int offset)
+{
+  __m256i __m256i_op;
+  int type_bit_len = sizeof (TYPE) * 8;
+  long int tmp;
+
+  for (int i = offset; i < 256 / type_bit_len + offset; i += 2)
+{
+  __m256i_op[(i - offset) / 2] = 

[PATCHv2] i386: Quote user-defined symbols in assembly in Intel syntax

2025-02-21 Thread LIU Hao

Patch v2:

The special treatment about + and - seems to be specific to `ASM_OUTPUT_SYMBOL_REF()`. Neither 
operator is passed to `ASM_OUTPUT_LABELREF()`, so it's not necessary to check for them in 
`ix86_asm_output_labelref()`.



--
Best regards,
LIU Hao
From f6c09e9397d5fe9c0dd1f7a02c90536732aed3df Mon Sep 17 00:00:00 2001
From: LIU Hao 
Date: Sat, 22 Feb 2025 13:11:51 +0800
Subject: [PATCH] i386: Quote user-defined symbols in assembly in Intel syntax

With `-masm=intel`, GCC generates registers without % prefixes. If a
user-declared symbol happens to match a register, it will confuse the
assembler. User-defined symbols should be quoted, so they are not to
be mistaken for registers or operators.

Support for quoted symbols were added in Binutils 2.26, originally
for ARM assembly, where registers are also unprefixed:
https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=d02603dc201f80cd9d2a1f4b1a16110b1e04222b

This change is required for `@SECREL32` to work in Intel syntax when
targeting Windows, where `@` is allowed as part of a symbol. GNU AS
fails to parse a plain symbol with that suffix:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80881#c79

gcc/config/:
PR target/53929
PR target/80881
* gcc/config/i386/i386-protos.h (ix86_asm_output_labelref): Declare new
function for quoting user-defined symbols in Intel syntax.
* gcc/config/i386/i386.cc (ix86_asm_output_labelref): Implement it.
* gcc/config/i386/i386.h (ASM_OUTPUT_LABELREF): Use it.
* gcc/config/i386/cygming.h (ASM_OUTPUT_LABELREF): Use it.
---
 gcc/config/i386/cygming.h |  5 +++--
 gcc/config/i386/i386-protos.h |  1 +
 gcc/config/i386/i386.cc   | 13 +
 gcc/config/i386/i386.h|  7 +++
 4 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/cygming.h b/gcc/config/i386/cygming.h
index 3ddcbecb22fd..4a192900045a 100644
--- a/gcc/config/i386/cygming.h
+++ b/gcc/config/i386/cygming.h
@@ -247,9 +247,10 @@ do {   
\
 #undef ASM_OUTPUT_LABELREF
 #define  ASM_OUTPUT_LABELREF(STREAM, NAME) \
 do {   \
+  const char* prefix = ""; \
   if ((NAME)[0] != FASTCALL_PREFIX)\
-fputs (user_label_prefix, (STREAM));   \
-  fputs ((NAME), (STREAM));\
+prefix = user_label_prefix;\
+  ix86_asm_output_labelref ((STREAM), prefix, (NAME)); \
 } while (0)
 
 /* This does much the same in memory rather than to a stream.  */
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index bea3fd4b2e2a..3b9e28ced91c 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -198,6 +198,7 @@ extern int ix86_attr_length_vex_default (rtx_insn *, bool, 
bool);
 extern rtx ix86_libcall_value (machine_mode);
 extern bool ix86_function_arg_regno_p (int);
 extern void ix86_asm_output_function_label (FILE *, const char *, tree);
+extern void ix86_asm_output_labelref (FILE *, const char *, const char *);
 extern void ix86_call_abi_override (const_tree);
 extern int ix86_reg_parm_stack_space (const_tree);
 
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 3128973ba79c..d9dea86afa9c 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -1709,6 +1709,19 @@ ix86_asm_output_function_label (FILE *out_file, const 
char *fname,
 }
 }
 
+/* Output a user-defined label.  In AT&T syntax, registers are prefixed
+   with %, so labels require no punctuation.  In Intel syntax, registers
+   are unprefixed, so labels may clash with registers or other operators,
+   and require quoting.  */
+void
+ix86_asm_output_labelref (FILE *file, const char *prefix, const char *label)
+{
+  if (ASSEMBLER_DIALECT == ASM_ATT)
+fprintf (file, "%s%s", prefix, label);
+  else
+fprintf (file, "\"%s%s\"", prefix, label);
+}
+
 /* Implementation of call abi switching target hook. Specific to FNDECL
the specific call register sets are set.  See also
ix86_conditional_register_usage for more details.  */
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 40b1aa4e6dfe..79a1afdde02c 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -2251,6 +2251,13 @@ extern unsigned int const 
svr4_debugger_register_map[FIRST_PSEUDO_REGISTER];
   } while (0)
 #endif
 
+/* In Intel syntax, we have to quote user-defined labels that would
+   match (unprefixed) registers or operators.  */
+
+#undef ASM_OUTPUT_LABELREF
+#define ASM_OUTPUT_LABELREF(STREAM, NAME)  \
+  ix86_asm_output_labelref ((STREAM), user_label_prefix, (NAME))
+
 /* Under some conditions we need jump tables in the text section,
because the assembler cannot handle label differences between
sections.  */
-- 
2.48.1



OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: BPF, nvptx: Standardize on 'sorry, unimplemented: dynamic stack allocation not supported'

2025-02-21 Thread Jose E. Marchesi



> diff --git a/gcc/testsuite/gcc.target/bpf/diag-alloca-1.c 
> b/gcc/testsuite/gcc.target/bpf/diag-alloca-1.c
> index 0406f2c3595..e549cab84ca 100644
> --- a/gcc/testsuite/gcc.target/bpf/diag-alloca-1.c
> +++ b/gcc/testsuite/gcc.target/bpf/diag-alloca-1.c
> @@ -3,7 +3,8 @@
>  int
>  foo (int x)
>  {
> -  int *p = __builtin_alloca (x); /* { dg-error "support" } */
> +  int *p = __builtin_alloca (x);
> +  /* { dg-message {sorry, unimplemented: dynamic stack allocation not 
> supported} {} { target *-*-* } .-1 } */
>  
>return p[2];
>  }

I am bit suprised that works.  Doesn't dg-message trigger a location
(line) check like dg-error and dg-warning do?


[PATCH] LoongArch: Avoid unnecessary zero-initialization using LSX for scalar popcount

2025-02-21 Thread Xi Ruoyao
Now for __builtin_popcountl we are getting things like

vrepli.b$vr0,0
vinsgr2vr.d $vr0,$r4,0
vpcnt.d $vr0,$vr0
vpickve2gr.du   $r4,$vr0,0
slli.w  $r4,$r4,0
jr  $r1

The "vrepli.b" instruction is introduced by the init-regs pass (see
PR61810 and all the issues it references).  To work it around, we can
use post-reload instead of define_expand: the "f" constraint will make
the compiler automatically move the scalar between GPR and FPR, and
reload is much later than init-regs so init-regs won't get in our way.

Now the code looks like:

movgr2fr.d  $f0,$r4
vpcnt.d $vr0,$vr0
movfr2gr.d  $r4,$f0
jr  $r1

gcc/ChangeLog:

* config/loongarch/loongarch.md (cntmap): Change to uppercase.
(popcount2): Modify to a post reload split.
---

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?

 gcc/config/loongarch/loongarch.md | 26 ++
 1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 6f507c3c7f6..478f859051c 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -1732,21 +1732,23 @@ (define_insn "truncdfsf2"
 
 ;; This attribute used for get connection of scalar mode and corresponding
 ;; vector mode.
-(define_mode_attr cntmap [(SI "v4si") (DI "v2di")])
+(define_mode_attr cntmap [(SI "V4SI") (DI "V2DI")])
 
-(define_expand "popcount2"
-  [(set (match_operand:GPR 0 "register_operand")
-   (popcount:GPR (match_operand:GPR 1 "register_operand")))]
+(define_insn_and_split "popcount2"
+  [(set (match_operand:GPR 0 "register_operand" "=f")
+   (popcount:GPR (match_operand:GPR 1 "register_operand" "f")))]
   "ISA_HAS_LSX"
+  "#"
+  ;; Do the split very lately to work around init-regs unneeded zero-
+  ;; initialization from init-regs.  See PR61810 and all the referenced
+  ;; issues.
+  "&& reload_completed"
+  [(set (match_operand: 0 "register_operand" "=f")
+   (popcount:
+ (match_operand: 1 "register_operand" "f")))]
 {
-  rtx in = operands[1];
-  rtx out = operands[0];
-  rtx vreg = mode == SImode ? gen_reg_rtx (V4SImode) :
-   gen_reg_rtx (V2DImode);
-  emit_insn (gen_lsx_vinsgr2vr_ (vreg, in, vreg, GEN_INT (1)));
-  emit_insn (gen_popcount2 (vreg, vreg));
-  emit_insn (gen_lsx_vpickve2gr_ (out, vreg, GEN_INT (0)));
-  DONE;
+  operands[0] = gen_rtx_REG (mode, REGNO (operands[0]));
+  operands[1] = gen_rtx_REG (mode, REGNO (operands[1]));
 })
 
 ;;
-- 
2.48.1