Re: [PATCH 0/4] Fortran: Improve flow of intrinsics/library documentation [PR47928]

2025-03-02 Thread Sandra Loosemore

On 2/28/25 02:56, Andre Vehreschild wrote:

Hi Sandra,

thanks for taking on the laborious task. I have browsed over the changes and
found:

Patch 3 in intrinsic.texi:

@@ -2071,6 +2071,9 @@ end program atomic
  @cindex Atomic subroutine, ADD with fetch

  @table @asis
+@item @emph{Synopsis}:
+@code{CALL ATOMIC_FETCH_ADD (ATOM, VALUE, old [, STAT])}
+
`old` should be uppercase here, too, for consistency.

Yes, I know, that is nothing you changed. I just stumbled over it and while we
are at it, let's address it.

Same for:

@@ -3074,6 +3074,9 @@ end program test_btest
  @cindex pointer, C association status

  @table @asis
+@item @emph{Synopsis}:
+@code{RESULT = C_ASSOCIATED(c_ptr_1[, c_ptr_2])}

With uppercasing in the following paragraph needed, too. And I vote for using
CPTR1 and CPTR2 instead.

Same here:
@@ -3177,6 +3177,9 @@ end program main
  @cindex pointer, C address of pointers

  @table @asis
+@item @emph{Synopsis}:
+@code{CALL C_F_PROCPOINTER(cptr, fptr)}

and here:
@@ -3235,6 +3235,9 @@ end program main
  @cindex pointer, C address of procedures

  @table @asis
+@item @emph{Synopsis}:
+@code{RESULT = C_FUNLOC(x)}
+

I'd say: "Ok, I'll stop." here, but that is the list of changes needed to get
the description in intrinsic.texi neat.

In part 4 of your patch, can you rephrase:

@@ -1118,6 +1114,10 @@ program test_allocated
if (.not. allocated(x)) allocate(x(i))
  end program test_allocated
  @end smallexample
+
+@item @emph{Standard}:
+Fortran 90 and later.  Note, the @code{SCALAR=} keyword and allocatable
+scalar entities are available in Fortran 2003 and later.
  @end table

to

+Fortran 90 and later; for @code{SCALAR=} keyword and allocatable
+scalar entities Fortran 2003 and later.

Just for consistency.

With these changes, ok for mainline.

Thank you very much for taking on that laborious task. My deepest respect!


Thanks for the review!  I've pushed the changes now, along with the 
attached additional patch to address those existing minor issues you 
identified.  As I said, there are a lot of remaining markup and 
formatting problems in there as well.  :-(


-Sandra

From 3cfe5832d049c55cacc5f73431a4a14e97b2659f Mon Sep 17 00:00:00 2001
From: Sandra Loosemore 
Date: Sun, 2 Mar 2025 01:43:26 +
Subject: [PATCH] Fortran: Small fixes in intrinsic.texi.

gcc/fortran/ChangeLog
	* intrinsic.texi: Fix inconsistent capitalization of argument
	names and other minor copy-editing.
---
 gcc/fortran/intrinsic.texi | 26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/gcc/fortran/intrinsic.texi b/gcc/fortran/intrinsic.texi
index 4e6d2faea31..8c160e58b00 100644
--- a/gcc/fortran/intrinsic.texi
+++ b/gcc/fortran/intrinsic.texi
@@ -1116,8 +1116,8 @@ end program test_allocated
 @end smallexample
 
 @item @emph{Standard}:
-Fortran 90 and later.  Note, the @code{SCALAR=} keyword and allocatable
-scalar entities are available in Fortran 2003 and later.
+Fortran 90 and later; for the @code{SCALAR=} keyword and allocatable
+scalar entities, Fortran 2003 and later.
 @end table
 
 
@@ -2072,7 +2072,7 @@ Fortran 2008 and later; with @var{STAT}, TS 18508 or later
 
 @table @asis
 @item @emph{Synopsis}:
-@code{CALL ATOMIC_FETCH_ADD (ATOM, VALUE, old [, STAT])}
+@code{CALL ATOMIC_FETCH_ADD (ATOM, VALUE, OLD [, STAT])}
 
 @item @emph{Description}:
 @code{ATOMIC_FETCH_ADD(ATOM, VALUE, OLD)} atomically stores the value of
@@ -3075,24 +3075,24 @@ for @code{UNSIGNED} (@pxref{Unsigned integers})
 
 @table @asis
 @item @emph{Synopsis}:
-@code{RESULT = C_ASSOCIATED(c_ptr_1[, c_ptr_2])}
+@code{RESULT = C_ASSOCIATED(CPTR1[, CPTR2])}
 
 @item @emph{Description}:
-@code{C_ASSOCIATED(c_ptr_1[, c_ptr_2])} determines the status of the C pointer
-@var{c_ptr_1} or if @var{c_ptr_1} is associated with the target @var{c_ptr_2}.
+@code{C_ASSOCIATED(CPTR1[, CPTR2])} determines the status of the C pointer
+@var{CPTR1} or if @var{CPTR1} is associated with the target @var{CPTR2}.
 
 @item @emph{Class}:
 Inquiry function
 
 @item @emph{Arguments}:
 @multitable @columnfractions .15 .70
-@item @var{c_ptr_1} @tab Scalar of the type @code{C_PTR} or @code{C_FUNPTR}.
-@item @var{c_ptr_2} @tab (Optional) Scalar of the same type as @var{c_ptr_1}.
+@item @var{CPTR1} @tab Scalar of the type @code{C_PTR} or @code{C_FUNPTR}.
+@item @var{CPTR2} @tab (Optional) Scalar of the same type as @var{CPTR1}.
 @end multitable
 
 @item @emph{Return value}:
 The return value is of type @code{LOGICAL}; it is @code{.false.} if either
-@var{c_ptr_1} is a C NULL pointer or if @var{c_ptr1} and @var{c_ptr_2}
+@var{CPTR1} is a C NULL pointer or if @var{CPTR1} and @var{CPTR2}
 point to different addresses.
 
 @item @emph{Example}:
@@ -3178,7 +3178,7 @@ Fortran 2003 and later
 
 @table @asis
 @item @emph{Synopsis}:
-@code{CALL C_F_PROCPOINTER(cptr, fptr)}
+@code{CALL C_F_PROCPOINTER(CPTR, FPTR)}
 
 @item @emph{Description}:
 @code{C_F_PROCPOINTER(CPTR, FPTR)} Assign the target of the C function pointer
@@ -3236,17 +3236,17 @@

[PATCH] Fortran: reject empty derived type with bind(C) attribute [PR101577]

2025-03-02 Thread Harald Anlauf

Dear all,

due to an oversight in the Fortran standard before 2018,
empty derived types with bind(C) attribute were explicitly
(deliberately?) accepted by gfortran, giving a warning that
the companion processor might not provide an interoperating
entity.

In the PR, Tobias pointed to a discussion on the J3 ML that
there was a defect in older standards.  The attached patch
now generates an error when -std=f20xx is specified, and
continues to generate a warning otherwise.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

Thanks,
Harald

From 5c38ce50ed7cca905401f6fa6506b47fd79a7739 Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Sun, 2 Mar 2025 22:20:28 +0100
Subject: [PATCH] Fortran: reject empty derived type with bind(C) attribute
 [PR101577]

	PR fortran/101577

gcc/fortran/ChangeLog:

	* symbol.cc (verify_bind_c_derived_type): Generate error message
	for derived type with no components in standard conformance mode,
	indicating that this is a GNU extension.

gcc/testsuite/ChangeLog:

	* gfortran.dg/empty_derived_type.f90: Adjust dg-options.
	* gfortran.dg/empty_derived_type_2.f90: New test.
---
 gcc/fortran/symbol.cc | 22 ---
 .../gfortran.dg/empty_derived_type.f90|  1 +
 .../gfortran.dg/empty_derived_type_2.f90  | 11 ++
 3 files changed, 31 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/empty_derived_type_2.f90

diff --git a/gcc/fortran/symbol.cc b/gcc/fortran/symbol.cc
index c6894810bce..9ddf13b3f0d 100644
--- a/gcc/fortran/symbol.cc
+++ b/gcc/fortran/symbol.cc
@@ -4624,12 +4624,28 @@ verify_bind_c_derived_type (gfc_symbol *derived_sym)
  entity may be defined by means of C and the Fortran entity is said
  to be interoperable with the C entity.  There does not have to be such
  an interoperating C entity."
+
+ However, later discussion on the J3 mailing list
+ (https://mailman.j3-fortran.org/pipermail/j3/2021-July/013190.html)
+ found this to be a defect, and Fortran 2018 added in section 18.3.4
+ the following constraint:
+ "C1805: A derived type with the BIND attribute shall have at least one
+ component."
+
+ We thus allow empty derived types only as GNU extension while giving a
+ warning by default, or reject empty types in standard conformance mode.
   */
   if (curr_comp == NULL)
 {
-  gfc_warning (0, "Derived type %qs with BIND(C) attribute at %L is empty, "
-		   "and may be inaccessible by the C companion processor",
-		   derived_sym->name, &(derived_sym->declared_at));
+  if (!gfc_notify_std (GFC_STD_GNU, "Derived type %qs with BIND(C) "
+			   "attribute at %L has no components",
+			   derived_sym->name, &(derived_sym->declared_at)))
+	return false;
+  else if (!pedantic)
+	gfc_warning (0, "Derived type %qs with BIND(C) attribute at %L "
+		 "is empty, and may be inaccessible by the C "
+		 "companion processor",
+		 derived_sym->name, &(derived_sym->declared_at));
   derived_sym->ts.is_c_interop = 1;
   derived_sym->attr.is_bind_c = 1;
   return true;
diff --git a/gcc/testsuite/gfortran.dg/empty_derived_type.f90 b/gcc/testsuite/gfortran.dg/empty_derived_type.f90
index 6bf616c2c6a..496262de2cd 100644
--- a/gcc/testsuite/gfortran.dg/empty_derived_type.f90
+++ b/gcc/testsuite/gfortran.dg/empty_derived_type.f90
@@ -1,4 +1,5 @@
 ! { dg-do compile }
+! { dg-options "" }
 module stuff
implicit none
type, bind(C) :: junk ! { dg-warning "may be inaccessible by the C companion" }
diff --git a/gcc/testsuite/gfortran.dg/empty_derived_type_2.f90 b/gcc/testsuite/gfortran.dg/empty_derived_type_2.f90
new file mode 100644
index 000..1ef56da4c25
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/empty_derived_type_2.f90
@@ -0,0 +1,11 @@
+! { dg-do compile }
+! { dg-additional-options "-std=f2018" }
+!
+! PR fortran/101577
+!
+! Contributed by Tobias Burnus
+
+type, bind(C) :: t ! { dg-error "has no components" }
+   ! Empty!
+end type t
+end
-- 
2.43.0



Re: [PATCH] rtl: Remove invalid compare simplification [PR117186]

2025-03-02 Thread Levi Zim

On 2025-01-13 17:48, Tobias Burnus wrote:

Andreas Schwab wrote:

This breaks m68k:

Same issue on GCN, hence I filed https://gcc.gnu.org/PR118418

This also breaks gcc bootstrap on riscv64 under some specific configuration:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119012#c12

Although it doesn't seem to be the same issue here.



If I look at the debugging output, see  PR, it seems as if
the self-test function test_comparisons contains the assumption:
 FALSE < TRUE
but if TRUE is -1, that assumption does not hold
(for signed variables).

And both GCN and m68k '#define STORE_FLAG_VALUE -1',
as Andreas noted.

Tobias


[PATCH] OpenMP: Integrate dynamic selectors with dispatch argument handling [PR118457]

2025-03-02 Thread Sandra Loosemore
Support for dynamic selectors in "declare variant" was developed in
parallel with support for the adjust_args/append_args clauses and the
dispatch construct; they collided in a bad way.  This patch fixes the
"sorry" for calls that need both by removing the adjust_args/append_args
code from gimplify_call_expr and invoking it from the new variant
substitution code instead.  It's handled as a tree -> tree transformation
rather than tree -> gimple because eventually this code may end up being
invoked from the front ends instead of the gimplifier (see PR115076).

gcc/ChangeLog
PR middle-end/118457
* gimplify.cc (modify_call_for_omp_dispatch): New, containing
code split from gimplify_call_expr and modified to emit tree
instead of gimple.  Remove the error for falling through to a call
to the base function.
(expand_variant_call_expr): New, split from gimplify_variant_call_expr.
Call modify_call_for_omp_dispatch on calls to
variants in a dispatch construct context.
(gimplify_variant_call_expr): Make it call expand_variant_call_expr
to do the actual work.
(gimplify_call_expr): Remove sorry for calls involving both
dynamic/late selectors and adjust_args/append_args, and adjust
for new interface.  Move adjust_args/append_args code to
modify_call_for_omp_dispatch.
(gimplify_omp_dispatch): Add some comments.

gcc/testsuite/ChangeLog
PR middle-end/118457
* c-c++-common/gomp/adjust-args-6.c: Remove xfails and adjust
expected output.
* c-c++-common/gomp/append-args-5.c: Adjust expected output.
* c-c++-common/gomp/append-args-dynamic.c: New.
* c-c++-common/gomp/dispatch-11.c: Adjust expected output.
* gfortran.dg/gomp/dispatch-11.f90: Likewise.
---
 gcc/gimplify.cc   | 815 +-
 .../c-c++-common/gomp/adjust-args-6.c |  13 +-
 .../c-c++-common/gomp/append-args-5.c |  19 +-
 .../c-c++-common/gomp/append-args-dynamic.c   |  80 ++
 gcc/testsuite/c-c++-common/gomp/dispatch-11.c |  22 +-
 .../gfortran.dg/gomp/dispatch-11.f90  |   5 -
 6 files changed, 487 insertions(+), 467 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/gomp/append-args-dynamic.c

diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index 160e7fc9df6..5852c618b05 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -3872,29 +3872,331 @@ find_supercontext (void)
   return NULL_TREE;
 }
 
+/* OpenMP: Handle the append_args and adjust_args clauses of
+   declare_variant for EXPR, which is a CALL_EXPR whose CALL_EXPR_FN
+   is the variant, within a dispatch construct with clauses DISPATCH_CLAUSES
+   and location DISPATCH_LOC.
+
+   'append_args' causes interop objects are added after the last regular
+   (nonhidden, nonvariadic) arguments of the variant function.
+   'adjust_args' with need_device_{addr,ptr} converts the pointer target of
+   a pointer from a host to a device address. This uses either the default
+   device or the passed device number, which then sets the default device
+   address.  */
+static tree
+modify_call_for_omp_dispatch (tree expr, tree dispatch_clauses,
+ location_t dispatch_loc)
+{
+  tree fndecl = get_callee_fndecl (expr);
+
+  /* Skip processing if we don't get the expected call form.  */
+  if (!fndecl)
+return expr;
+
+  int nargs = call_expr_nargs (expr);
+  tree dispatch_device_num = NULL_TREE;
+  tree dispatch_device_num_init = NULL_TREE;
+  tree dispatch_interop = NULL_TREE;
+  tree dispatch_append_args = NULL_TREE;
+  int nfirst_args = 0;
+  tree dispatch_adjust_args_list
+= lookup_attribute ("omp declare variant variant args",
+   DECL_ATTRIBUTES (fndecl));
+
+  if (dispatch_adjust_args_list)
+{
+  dispatch_adjust_args_list = TREE_VALUE (dispatch_adjust_args_list);
+  dispatch_append_args = TREE_CHAIN (dispatch_adjust_args_list);
+  if (TREE_PURPOSE (dispatch_adjust_args_list) == NULL_TREE
+ && TREE_VALUE (dispatch_adjust_args_list) == NULL_TREE)
+   dispatch_adjust_args_list = NULL_TREE;
+}
+  if (dispatch_append_args)
+{
+  nfirst_args = tree_to_shwi (TREE_PURPOSE (dispatch_append_args));
+  dispatch_append_args = TREE_VALUE (dispatch_append_args);
+}
+  dispatch_device_num = omp_find_clause (dispatch_clauses, OMP_CLAUSE_DEVICE);
+  if (dispatch_device_num)
+dispatch_device_num = OMP_CLAUSE_DEVICE_ID (dispatch_device_num);
+  dispatch_interop = omp_find_clause (dispatch_clauses, OMP_CLAUSE_INTEROP);
+  int nappend = 0, ninterop = 0;
+  for (tree t = dispatch_append_args; t; t = TREE_CHAIN (t))
+nappend++;
+
+  /* FIXME: error checking should be taken out of this function and
+ handled before any attempt at filtering or resolution happens.
+ Otherwise whether or not diagnostics appear is determined by
+ GCC internals, how good the front ends are at constant-fo

[PATCH 01/17] LoongArch: (NFC) Remove atomic_optab and use amop instead

2025-03-02 Thread Xi Ruoyao
They are the same.

gcc/ChangeLog:

* config/loongarch/sync.md (atomic_optab): Remove.
(atomic_): Change atomic_optab to amop.
(atomic_fetch_): Likewise.
---
 gcc/config/loongarch/sync.md | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/gcc/config/loongarch/sync.md b/gcc/config/loongarch/sync.md
index fd8d732dd67..75b134cd853 100644
--- a/gcc/config/loongarch/sync.md
+++ b/gcc/config/loongarch/sync.md
@@ -35,8 +35,6 @@ (define_c_enum "unspec" [
 ])
 
 (define_code_iterator any_atomic [plus ior xor and])
-(define_code_attr atomic_optab
-  [(plus "add") (ior "or") (xor "xor") (and "and")])
 
 ;; This attribute gives the format suffix for atomic memory operations.
 (define_mode_attr amo [(QI "b") (HI "h") (SI "w") (DI "d")])
@@ -175,7 +173,7 @@ (define_insn "atomic_store"
 }
   [(set (attr "length") (const_int 12))])
 
-(define_insn "atomic_"
+(define_insn "atomic_"
   [(set (match_operand:GPR 0 "memory_operand" "+ZB")
(unspec_volatile:GPR
  [(any_atomic:GPR (match_dup 0)
@@ -197,7 +195,7 @@ (define_insn "atomic_add"
   "amadd%A2.\t$zero,%z1,%0"
   [(set (attr "length") (const_int 4))])
 
-(define_insn "atomic_fetch_"
+(define_insn "atomic_fetch_"
   [(set (match_operand:GPR 0 "register_operand" "=&r")
(match_operand:GPR 1 "memory_operand" "+ZB"))
(set (match_dup 1)
-- 
2.48.1



[to-be-committed][RISC-V][PR target/118934] Fix ICE in RISC-V long branch supportvi !$

2025-03-02 Thread Jeff Law
I'm not sure if I goof'd this or if I merely upstreamed someone else's 
goof.  Either way the long branch code isn't working correctly.


We were using 'n' as the output modifier to negate the condition.  But 
'n' has a special meaning elsewhere, so when presented with a condition 
rather than what was expected, boom, the compiler ICE'd.


Thankfully there's only a few places where we were using %n which I 
turned into %r.


The BZ entry includes a good testcase, it just takes a long time to 
compile as it's trying to create the out-of-range scenario.  I'm not 
including the testcase due to how long it takes, but I did test it 
locally to ensure it's working properly now.


I'm sure that with a little bit of work I could create at testcase that 
worked before and fails with the trunk (by taking advantage of the 
fuzzyness in length computations).  So I'm going to consider this a 
regression.


Will push to the trunk after pre-commit testing does its thing.


Jeff

PR target/188934
gcc/
* config/riscv/corev.md (cv_branch): Adjust output template.
(branch): Likewise.
* config/risc/riscv.md (branch): Likewise.
* config/risc/riscv.cc (riscv_asm_output_opcode): Handle 'r' rather
than 'n'.

diff --git a/gcc/config/riscv/corev.md b/gcc/config/riscv/corev.md
index e44fdc1129d..d1c3aaa973e 100644
--- a/gcc/config/riscv/corev.md
+++ b/gcc/config/riscv/corev.md
@@ -2627,7 +2627,7 @@ (define_insn "*cv_branch"
   "TARGET_XCVBI"
 {
   if (get_attr_length (insn) == 12)
-return "cv.b%n1\t%2,%z3,1f; jump\t%l0,ra; 1:";
+return "cv.b%r1\t%2,%z3,1f; jump\t%l0,ra; 1:";
 
   return "cv.b%C1imm\t%2,%3,%0";
 }
@@ -2645,7 +2645,7 @@ (define_insn "*branch"
   "TARGET_XCVBI"
 {
   if (get_attr_length (insn) == 12)
-return "b%n1\t%2,%z3,1f; jump\t%l0,ra; 1:";
+return "b%r1\t%2,%z3,1f; jump\t%l0,ra; 1:";
 
   return "b%C1\t%2,%z3,%l0";
 }
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 89aa25d5da9..38f3ae7cd84 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -6868,7 +6868,7 @@ riscv_asm_output_opcode (FILE *asm_out_file, const char 
*p)
  any outermost HIGH.
'R' Print the low-part relocation associated with OP.
'C' Print the integer branch condition for comparison OP.
-   'n' Print the inverse of the integer branch condition for comparison OP.
+   'r' Print the inverse of the integer branch condition for comparison OP.
'A' Print the atomic operation suffix for memory model OP.
'I' Print the LR suffix for memory model OP.
'J' Print the SC suffix for memory model OP.
@@ -7027,7 +7027,7 @@ riscv_print_operand (FILE *file, rtx op, int letter)
   fputs (GET_RTX_NAME (code), file);
   break;
 
-case 'n':
+case 'r':
   /* The RTL names match the instruction names. */
   fputs (GET_RTX_NAME (reverse_condition (code)), file);
   break;
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index f7070766783..95951605fb4 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -3252,7 +3252,7 @@ (define_insn "*branch"
   "!TARGET_XCVBI"
 {
   if (get_attr_length (insn) == 12)
-return "b%n1\t%2,%z3,1f; jump\t%l0,ra; 1:";
+return "b%r1\t%2,%z3,1f; jump\t%l0,ra; 1:";
 
   return "b%C1\t%2,%z3,%l0";
 }


New Swedish PO file for 'gcc' (version 15-b20250216)

2025-03-02 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the Swedish team of translators.  The file is available at:

https://translationproject.org/latest/gcc/sv.po

(This file, 'gcc-15-b20250216.sv.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

https://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

https://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




[patch,avr] texi: Add new subsubsection "AVR Optimization Options"

2025-03-02 Thread Georg-Johann Lay

This patch adds a new section "AVR Optimization Options"
in the texi documentation.

Ok for trunk?

Johann

--

AVR: Add texi @subsubsection "AVR Optimization Options".

gcc/
* doc/invoke.texi (AVR Optimization Options): New @subsubsection
for pure optimization options.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 0c7adc039b5..eaf1727f88c 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -24349,33 +24349,6 @@ instructions.  This option has only an effect on reduced Tiny devices like
 ATtiny40.  See also the @code{absdata}
 @ref{AVR Variable Attributes,variable attribute}.
 
-@opindex maccumulate-args
-@item -maccumulate-args
-Accumulate outgoing function arguments and acquire/release the needed
-stack space for outgoing function arguments once in function
-prologue/epilogue.  Without this option, outgoing arguments are pushed
-before calling a function and popped afterwards.
-
-Popping the arguments after the function call can be expensive on
-AVR so that accumulating the stack space might lead to smaller
-executables because arguments need not be removed from the
-stack after such a function call.
-
-This option can lead to reduced code size for functions that perform
-several calls to functions that get their arguments on the stack like
-calls to printf-like functions.
-
-@opindex mbranch-cost
-@item -mbranch-cost=@var{cost}
-Set the branch costs for conditional branch instructions to
-@var{cost}.  Reasonable values for @var{cost} are small, non-negative
-integers. The default branch cost is 0.
-
-@opindex mcall-prologues
-@item -mcall-prologues
-Functions prologues/epilogues are expanded as calls to appropriate
-subroutines.  Code size is smaller.
-
 @opindex mcvt
 @item -mcvt
 Use a @emph{compact vector table}.  Some devices support a CVT
@@ -24393,27 +24366,6 @@ For example, you can link with @code{-Wl,--defsym,__init_cvt=0}.
 The CVT startup code is available since
 @w{@uref{https://github.com/avrdudes/avr-libc/issues/1010,AVR-LibC v2.3}}.
 
-@opindex mfuse-add
-@item -mfuse-add
-@itemx -mno-fuse-add
-@itemx -mfuse-add=@var{level}
-Optimize indirect memory accesses on reduced Tiny devices.
-The default uses @code{@var{level}=1} for optimizations @option{-Og}
-and @option{-O1}, and @code{@var{level}=2} for higher optimizations.
-Valid values for @var{level} are @code{0}, @code{1} and @code{2}.
-
-@opindex mfuse-move
-@item -mfuse-move
-@itemx -mno-fuse-move
-@itemx -mfuse-move=@var{level}
-Run a post reload optimization pass that tries to fuse move instructions
-and to split multi-byte instructions into 8-bit operations.
-The default uses @code{@var{level}=3} for optimization @option{-O1},
-and @code{@var{level}=23} for higher optimizations.
-Valid values for @var{level} are in the range @code{0} @dots{} @code{23}
-which is a 3:2:2:2 mixed radix value.  Each digit controls some
-aspect of the optimization.
-
 @opindex mdouble
 @opindex mlong-double
 @item -mdouble=@var{bits}
@@ -24502,39 +24454,6 @@ support (@w{@uref{https://sourceware.org/PR31124,PR31124}}) is available.
 In that case, @option{-mrodata-in-ram} can be used to return to the old
 layout with @code{.rodata} in RAM.
 
-@opindex mstrict-X
-@item -mstrict-X
-Use address register @code{X} in a way proposed by the hardware.  This means
-that @code{X} is only used in indirect, post-increment or
-pre-decrement addressing.
-
-Without this option, the @code{X} register may be used in the same way
-as @code{Y} or @code{Z} which then is emulated by additional
-instructions.
-For example, loading a value with @code{X+const} addressing with a
-small non-negative @code{const < 64} to a register @var{Rn} is
-performed as
-
-@example
-adiw r26, const   ; X += const
-ld   @var{Rn}, X; @var{Rn} = *X
-sbiw r26, const   ; X -= const
-@end example
-
-@opindex msplit-bit-shift
-@item -msplit-bit-shift
-Split multi-byte shifts with a constant offset into a shift with
-a byte offset and a residual shift with a non-byte offset.
-This optimization is turned on per default for @option{-O2} and higher,
-including @option{-Os} but excluding @option{-Oz}.
-Splitting of shifts with a constant offset that is
-a multiple of 8 is controlled by @option{-mfuse-move}.
-
-@opindex msplit-ldst
-@item -msplit-ldst
-Split multi-byte loads and stores into several byte loads and stores.
-This optimization is turned on per default for @option{-O2} and higher.
-
 @opindex mtiny-stack
 @item -mtiny-stack
 Only change the lower 8@tie{}bits of the stack pointer.
@@ -24586,6 +24505,98 @@ Warn if the ISR is misspelled, i.e.@: without __vector prefix.
 Enabled by default.
 @end table
 
+
+@subsubsection AVR Optimization Options
+The following options are pure optimization options.
+Options @option{-mgas-isr-prologues}, @option{-mmain-is-OS_task},
+@option{-mno-call-main} and @option{-mrelax} from above are only
+@emph{almost} optimization options, since there are rare occasions
+where their different code generation matters.
+
+@table 

[PATCH] LoongArch: Fix incorrect reorder of __lsx_vldx and __lasx_xvldx [PR119084]

2025-03-02 Thread Xi Ruoyao
They could be incorrectly reordered with store instructions like st.b
because the RTL expression does not have a memory_operand or a (mem)
expression.  The incorrect reorder has been observed in openh264 LTO
build.

Expand them to a (mem) expression instead of unspec to fix the issue.
Then we need to make loongarch_address_insns return 1 for
ADDRESS_REG_REG because the constraint "R" expects this behavior, or
the vldx instruction will be considered invalid by the register
allocate pass and turned to add.d + vld.  Apply the ADDRESS_REG_REG
penalty in loongarch_address_cost instead, loongarch_rtx_costs should
also call loongarch_address_cost instead of loongarch_address_insns
then.

Closes: https://github.com/cisco/openh264/issues/3857

gcc/ChangeLog:

PR target/119084
* config/loongarch/lasx.md (UNSPEC_LASX_XVLDX): Remove.
(lasx_xvldx): Remove.
* config/loongarch/lsx.md (UNSPEC_LSX_VLDX): Remove.
(lsx_vldx): Remove.
* config/loongarch/simd.md (QIVEC): New define_mode_iterator.
(_vldx): New define_expand.
* config/loongarch/loongarch.cc (loongarch_address_insns_1): New
static function with most logic factored out from ...
(loongarch_address_insns): ... here.  Call
loongarch_address_insns_1 with reg_reg_cost = 1.
(loongarch_address_cost): Call loongarch_address_insns_1 with
reg_reg_cost = la_addr_reg_reg_cost.

gcc/testsuite/ChangeLog:

PR target/119084
* gcc.target/loongarch/pr119084.c: New test.
---

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk and
gcc-14 branch?

 gcc/config/loongarch/lasx.md  | 13 -
 gcc/config/loongarch/loongarch.cc | 48 +++
 gcc/config/loongarch/lsx.md   | 13 -
 gcc/config/loongarch/simd.md  |  9 
 gcc/testsuite/gcc.target/loongarch/pr119084.c | 24 ++
 5 files changed, 61 insertions(+), 46 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/pr119084.c

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index e4505c1660d..43e3ab0026a 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -119,7 +119,6 @@ (define_c_enum "unspec" [
   UNSPEC_LASX_XVSSRLRN
   UNSPEC_LASX_XVEXTL_QU_DU
   UNSPEC_LASX_XVLDI
-  UNSPEC_LASX_XVLDX
   UNSPEC_LASX_XVSTX
   UNSPEC_LASX_VECINIT_MERGE
   UNSPEC_LASX_VEC_SET_INTERNAL
@@ -3579,18 +3578,6 @@ (define_insn "lasx_xvldi"
   [(set_attr "type" "simd_load")
(set_attr "mode" "V4DI")])
 
-(define_insn "lasx_xvldx"
-  [(set (match_operand:V32QI 0 "register_operand" "=f")
-   (unspec:V32QI [(match_operand:DI 1 "register_operand" "r")
-  (match_operand:DI 2 "reg_or_0_operand" "rJ")]
- UNSPEC_LASX_XVLDX))]
-  "ISA_HAS_LASX"
-{
-  return "xvldx\t%u0,%1,%z2";
-}
-  [(set_attr "type" "simd_load")
-   (set_attr "mode" "V32QI")])
-
 (define_insn "lasx_xvstx"
   [(set (mem:V32QI (plus:DI (match_operand:DI 1 "register_operand" "r")
(match_operand:DI 2 "reg_or_0_operand" "rJ")))
diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index eb3baac7019..3779e283f8d 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -2363,14 +2363,9 @@ loongarch_index_address_p (rtx addr, machine_mode mode 
ATTRIBUTE_UNUSED)
   return true;
 }
 
-/* Return the number of instructions needed to load or store a value
-   of mode MODE at address X.  Return 0 if X isn't valid for MODE.
-   Assume that multiword moves may need to be split into word moves
-   if MIGHT_SPLIT_P, otherwise assume that a single load or store is
-   enough.  */
-
-int
-loongarch_address_insns (rtx x, machine_mode mode, bool might_split_p)
+static int
+loongarch_address_insns_1 (rtx x, machine_mode mode, bool might_split_p,
+  int reg_reg_cost)
 {
   struct loongarch_address_info addr;
   int factor;
@@ -2405,7 +2400,7 @@ loongarch_address_insns (rtx x, machine_mode mode, bool 
might_split_p)
return factor;
 
   case ADDRESS_REG_REG:
-   return factor * la_addr_reg_reg_cost;
+   return factor * reg_reg_cost;
 
   case ADDRESS_CONST_INT:
return lsx_p ? 0 : factor;
@@ -2420,6 +2415,18 @@ loongarch_address_insns (rtx x, machine_mode mode, bool 
might_split_p)
   return 0;
 }
 
+/* Return the number of instructions needed to load or store a value
+   of mode MODE at address X.  Return 0 if X isn't valid for MODE.
+   Assume that multiword moves may need to be split into word moves
+   if MIGHT_SPLIT_P, otherwise assume that a single load or store is
+   enough.  */
+
+int
+loongarch_address_insns (rtx x, machine_mode mode, bool might_split_p)
+{
+  return loongarch_address_insns_1 (x, mode, might_split_p, 1);
+}
+
 /* Return true if X fits within an unsigned field of BITS bits that is
shifted left SHIFT bits before being used.  */
 

[to-be-committed][RISC-V][PR target/116256] Fix minor code quality regression in reassociated arithmetic

2025-03-02 Thread Jeff Law


The patch for target/116256 significantly simplified the condition and, 
I guess not too surprisingly, exposed a minor code quality regression.


Specifically the split part of the define_insn_and_split only splits 
after reload (because we use a match_scratch).  So there's nothing to 
combine the load-immediate with the subsequent add into an addi when the 
immediate fits into a simm12 field.


This patch adjusts the split code to handle that scenario directly and 
generate the more efficient code.  We can squeeze out the slli in this 
test with a bit more work, but that's out of scope right now since that 
isn't a regression.


Tested in my tester.  Waiting on pre-commit testing to render a verdict.

jeffPR target/116256
gcc/
* config/riscv/riscv.md (reassociating constant addition): Adjust
split code to generate addi directly when possible.

gcc/testsuite

* gcc.target/riscv/pr116256-1.c: New test.

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 95951605fb4..84bce409bc7 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -4684,10 +4684,22 @@ (define_insn_and_split ""
   "(TARGET_64BIT && riscv_const_insns (operands[3], false) == 1)"
   "#"
   "&& reload_completed"
-  [(set (match_dup 0) (ashift:DI (match_dup 1) (match_dup 2)))
-   (set (match_dup 4) (match_dup 3))
-   (set (match_dup 0) (plus:DI (match_dup 0) (match_dup 4)))]
-  ""
+  [(const_int 0)]
+  "{
+ rtx x = gen_rtx_ASHIFT (DImode, operands[1], operands[2]);
+ emit_insn (gen_rtx_SET (operands[0], x));
+
+ /* If the constant fits in a simm12, use it directly as we do not
+   get another good chance to optimize things again.  */
+ if (!SMALL_OPERAND (INTVAL (operands[3])))
+   emit_move_insn (operands[4], operands[3]);
+ else
+   operands[4] = operands[3];
+
+ x = gen_rtx_PLUS (DImode, operands[0], operands[4]);
+ emit_insn (gen_rtx_SET (operands[0], x));
+ DONE;
+   }"
   [(set_attr "type" "arith")])
 
 (define_insn_and_split ""
@@ -4700,13 +4712,26 @@ (define_insn_and_split ""
   "(TARGET_64BIT && riscv_const_insns (operands[3], false) == 1)"
   "#"
   "&& reload_completed"
-  [(set (match_dup 0) (ashift:DI (match_dup 1) (match_dup 2)))
-   (set (match_dup 4) (match_dup 3))
-   (set (match_dup 0) (sign_extend:DI (plus:SI (match_dup 5) (match_dup 6]
+  [(const_int 0)]
   "{
  operands[1] = gen_lowpart (DImode, operands[1]);
  operands[5] = gen_lowpart (SImode, operands[0]);
  operands[6] = gen_lowpart (SImode, operands[4]);
+
+ rtx x = gen_rtx_ASHIFT (DImode, operands[1], operands[2]);
+ emit_insn (gen_rtx_SET (operands[0], x));
+
+ /* If the constant fits in a simm12, use it directly as we do not
+   get another good chance to optimize things again.  */
+ if (!SMALL_OPERAND (INTVAL (operands[3])))
+   emit_move_insn (operands[4], operands[3]);
+ else
+   operands[6] = operands[3];
+
+ x = gen_rtx_PLUS (SImode, operands[5], operands[6]);
+ x = gen_rtx_SIGN_EXTEND (DImode, x);
+ emit_insn (gen_rtx_SET (operands[0], x));
+ DONE;
}"
   [(set_attr "type" "arith")])
 
diff --git a/gcc/testsuite/gcc.target/riscv/pr116256-1.c 
b/gcc/testsuite/gcc.target/riscv/pr116256-1.c
new file mode 100644
index 000..9543716cd68
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr116256-1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcb -mabi=lp64d" { target { rv64 } } } */
+
+
+bool f1(long a)
+{
+long b = a << 4;
+return b == -128;
+}
+
+/* We want to verify that we have generated addi
+   rather than li+add.  */
+/* { dg-final { scan-assembler-not "add\t" } } */
+/* { dg-final { scan-assembler "addi\t" } } */
+