from:"Xi Ruoyao"

[PATCH] LoongArch: Replace UNSPEC_FCOPYSIGN with copysign RTL

2023-10-02 Thread Xi Ruoyao

When I added copysign support for LoongArch (r13-3702), we did not have
a copysign RTL insn, so I had to use UNSPEC to represent the copysign
instruction. Now the copysign RTX code has been added in r14-1586, so
this patch removes those UNSPECs, and it uses the native RTL copysign
insn.

Inspired by rs6000 patch "Cleanup: Replace UNSPEC_COPYSIGN with copysign
RTL" [1] from Michael Meissner.

[1]: https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631701.html

gcc/ChangeLog:

* config/loongarch/loongarch.md (UNSPEC_FCOPYSIGN): Delete.
(copysign3): Use copysign RTL instead of UNSPEC.
---

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?

 gcc/config/loongarch/loongarch.md | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 2b09209945b..9916c741641 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -37,7 +37,6 @@ (define_c_enum "unspec" [
   UNSPEC_FCLASS
   UNSPEC_FMAX
   UNSPEC_FMIN
-  UNSPEC_FCOPYSIGN
   UNSPEC_FTINT
   UNSPEC_FTINTRM
   UNSPEC_FTINTRP
@@ -1130,9 +1129,8 @@ (define_insn "abs2"
 
 (define_insn "copysign3"
   [(set (match_operand:ANYF 0 "register_operand" "=f")
-   (unspec:ANYF [(match_operand:ANYF 1 "register_operand" "f")
- (match_operand:ANYF 2 "register_operand" "f")]
-UNSPEC_FCOPYSIGN))]
+   (copysign:ANYF (match_operand:ANYF 1 "register_operand" "f")
+  (match_operand:ANYF 2 "register_operand" "f")))]
   "TARGET_HARD_FLOAT"
   "fcopysign.\t%0,%1,%2"
   [(set_attr "type" "fcopysign")
-- 
2.42.0

Re: [PATCH] Support g++ 4.8 as a host compiler.

2023-10-04 Thread Xi Ruoyao

On Wed, 2023-10-04 at 23:19 +0100, Roger Sayle wrote:
> 
> The recent patch to remove poly_int_pod triggers a bug in g++ 4.8.5's
> C++ 11 support which mistakenly believes poly_uint16 has a non-trivial
> constructor.  This in turn prohibits it from being used as a member in
> a union (rtxunion) that constructed statically, resulting in a (fatal)
> error during stage 1.  A workaround is to add an explicit constructor
> to the problematic union, which allows mainline to be bootstrapped with
> the system compiler on older RedHat 7 systems.
> 
> This patch has been tested on x86_64-pc-linux-gnu where it allows a
> bootstrap to complete when using g++ 4.8.5 as the host compiler.
> Ok for mainline?
> 
> 
> 2023-10-04  Roger Sayle  
> 
> gcc/ChangeLog
>   * rtl.h (rtx_def::u): Add explicit constructor to workaround
>   issue using g++ 4.8 as a host compiler.

AFAIK G++ 5.1 also has a bug (https://gcc.gnu.org/PR65801) breaking
building recent GCC.  I don't think it's really "maintainable" to ensure
current GCC able to be built with a buggy host compiler.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH] LoongArch: Reimplement multilib build option handling.

2023-10-07 Thread Xi Ruoyao

On Sat, 2023-10-07 at 11:41 +0800, Yang Yujie wrote:
> Thanks for the testing!
> 
> This error seems to be difficult to reproduce since it is a makefile 
> dependency
> problem.  I think appending loongarch-multilib.h to $(GTM_H) instead of 
> $(TM_H)
> could help.

FWIW such issues are easier to reproduce with a high -j number.  I can
easily reproduce it with -j32 on a 3C5000-based server.

> > And when this is fixed, it might be a nice idea to have a
> > --with-multilib-list config in ./contrib/config-list.mk .
> 
> Thanks, will add this later too.
> 
> P.S. Currently support for "f32" is not active, and it should probably be
> avoided if you want to build a working rootfs.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

[PATCH] LoongArch: Use fcmp.caf.s instead of movgr2cf for zeroing a fcc

2023-10-17 Thread Xi Ruoyao

During the review of a LLVM change [1], on LA464 we found that zeroing
a fcc with fcmp.caf.s is much faster than a movgr2cf from $r0.

[1]: https://github.com/llvm/llvm-project/pull/69300

gcc/ChangeLog:

* config/loongarch/loongarch.md (movfcc): Use fcmp.caf.s for
zeroing a fcc.
---

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?

 gcc/config/loongarch/loongarch.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 68897799505..743e75907a6 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -2151,7 +2151,7 @@ (define_insn "movfcc"
   [(set (match_operand:FCC 0 "register_operand" "=z")
(const_int 0))]
   ""
-  "movgr2cf\t%0,$r0")
+  "fcmp.caf.s\t%0,$f0,$f0")
 
 ;; Conditional move instructions.
 
-- 
2.42.0

Pushed: [PATCH] LoongArch: Use fcmp.caf.s instead of movgr2cf for zeroing a fcc

2023-10-18 Thread Xi Ruoyao

On Wed, 2023-10-18 at 09:34 +0800, chenglulu wrote:
> 
> 在 2023/10/17 下午10:24, WANG Xuerui 写道:
> > 
> > On 10/17/23 22:06, Xi Ruoyao wrote:
> > > During the review of a LLVM change [1], on LA464 we found that zeroing
> > "an" LLVM change (because the word LLVM is pronounced letter-by-letter)
> > > a fcc with fcmp.caf.s is much faster than a movgr2cf from $r0.
> > Similarly, "an" fcc
> > > 
> > > [1]: https://github.com/llvm/llvm-project/pull/69300
> > > 
> > > gcc/ChangeLog:
> > > 
> > > * config/loongarch/loongarch.md (movfcc): Use fcmp.caf.s for
> > > zeroing a fcc.
> > > ---
> > > 
> > > Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?
> 
> Ok!

Pushed r14-4712 with the commit message modified following Xuerui's
suggestion.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

[PATCH 1/5] LoongArch: Add enum-style -mexplicit-relocs= option

2023-10-19 Thread Xi Ruoyao

To take a better balance between scheduling and relaxation when -flto is
enabled, add three-way -mexplicit-relocs={auto,none,always} options.
The old -mexplicit-relocs and -mno-explicit-relocs options are still
supported, they are mapped to -mexplicit-relocs=always and
-mexplicit-relocs=none.

The default choice is determined by probing assembler capabilities at
build time.  If the assembler does not supports explicit relocs at all,
the default will be none; if it supports explicit relocs but not
relaxation, the default will be always; if both explicit relocs and
relaxation are supported, the default will be auto.

Currently auto is same as none.  We will make auto more clever in
following changes.

gcc/ChangeLog:

* config/loongarch/genopts/loongarch-strings: Add strings for
-mexplicit-relocs={auto,none,always}.
* config/loongarch/genopts/loongarch.opt.in: Add options for
-mexplicit-relocs={auto,none,always}.
* config/loongarch/loongarch-str.h: Regenerate.
* config/loongarch/loongarch.opt: Regenerate.
* config/loongarch/loongarch-def.h
(EXPLICIT_RELOCS_AUTO): Define.
(EXPLICIT_RELOCS_NONE): Define.
(EXPLICIT_RELOCS_ALWAYS): Define.
(N_EXPLICIT_RELOCS_TYPES): Define.
* config/loongarch/loongarch.cc
(loongarch_option_override_internal): Error out if the old-style
-m[no-]explicit-relocs option is used with
-mexplicit-relocs={auto,none,always} together.  Map
-mno-explicit-relocs to -mexplicit-relocs=none and
-mexplicit-relocs to -mexplicit-relocs=always for backward
compatibility.  Set a proper default for -mexplicit-relocs=
based on configure-time probed linker capability.  Update a
diagnostic message to mention -mexplicit-relocs=always instead
of the old-style -mexplicit-relocs.
(loongarch_handle_model_attribute): Update a diagnostic message
to mention -mexplicit-relocs=always instead of the old-style
-mexplicit-relocs.
* config/loongarch/loongarch.h (TARGET_EXPLICIT_RELOCS): Define.
---
 .../loongarch/genopts/loongarch-strings   |  6 +
 gcc/config/loongarch/genopts/loongarch.opt.in | 21 ++--
 gcc/config/loongarch/loongarch-def.h  |  6 +
 gcc/config/loongarch/loongarch-str.h  |  5 
 gcc/config/loongarch/loongarch.cc | 24 +--
 gcc/config/loongarch/loongarch.h  |  3 +++
 gcc/config/loongarch/loongarch.opt| 21 ++--
 7 files changed, 80 insertions(+), 6 deletions(-)

diff --git a/gcc/config/loongarch/genopts/loongarch-strings 
b/gcc/config/loongarch/genopts/loongarch-strings
index adecaec3eda..8e412f7536e 100644
--- a/gcc/config/loongarch/genopts/loongarch-strings
+++ b/gcc/config/loongarch/genopts/loongarch-strings
@@ -63,3 +63,9 @@ STR_CMODEL_TS   tiny-static
 STR_CMODEL_MEDIUM medium
 STR_CMODEL_LARGE  large
 STR_CMODEL_EXTREMEextreme
+
+# -mexplicit-relocs
+OPTSTR_EXPLICIT_RELOCS explicit-relocs
+STR_EXPLICIT_RELOCS_AUTO   auto
+STR_EXPLICIT_RELOCS_NONE   none
+STR_EXPLICIT_RELOCS_ALWAYS always
diff --git a/gcc/config/loongarch/genopts/loongarch.opt.in 
b/gcc/config/loongarch/genopts/loongarch.opt.in
index 4a2d7438f1b..e1fe0c7086e 100644
--- a/gcc/config/loongarch/genopts/loongarch.opt.in
+++ b/gcc/config/loongarch/genopts/loongarch.opt.in
@@ -170,10 +170,27 @@ mmax-inline-memcpy-size=
 Target Joined RejectNegative UInteger Var(loongarch_max_inline_memcpy_size) 
Init(1024)
 -mmax-inline-memcpy-size=SIZE  Set the max size of memcpy to inline, default 
is 1024.
 
-mexplicit-relocs
-Target Var(TARGET_EXPLICIT_RELOCS) Init(HAVE_AS_EXPLICIT_RELOCS & 
!HAVE_AS_MRELAX_OPTION)
+Enum
+Name(explicit_relocs) Type(int)
+The code model option names for -mexplicit-relocs:
+
+EnumValue
+Enum(explicit_relocs) String(@@STR_EXPLICIT_RELOCS_AUTO@@) 
Value(EXPLICIT_RELOCS_AUTO)
+
+EnumValue
+Enum(explicit_relocs) String(@@STR_EXPLICIT_RELOCS_NONE@@) 
Value(EXPLICIT_RELOCS_NONE)
+
+EnumValue
+Enum(explicit_relocs) String(@@STR_EXPLICIT_RELOCS_ALWAYS@@) 
Value(EXPLICIT_RELOCS_ALWAYS)
+
+mexplicit-relocs=
+Target RejectNegative Joined Enum(explicit_relocs) Var(la_opt_explicit_relocs) 
Init(M_OPT_UNSET)
 Use %reloc() assembly operators.
 
+mexplicit-relocs
+Target Var(la_opt_explicit_relocs_backward) Init(M_OPT_UNSET)
+Use %reloc() assembly operators (for backward compatibility).
+
 ; The code model option names for -mcmodel.
 Enum
 Name(cmodel) Type(int)
diff --git a/gcc/config/loongarch/loongarch-def.h 
b/gcc/config/loongarch/loongarch-def.h
index 769efcb70fb..6e2a6987910 100644
--- a/gcc/config/loongarch/loongarch-def.h
+++ b/gcc/config/loongarch/loongarch-def.h
@@ -99,6 +99,12 @@ extern const char* loongarch_cmodel_strings[];
 #define CMODEL_EXTREME   5
 #define N_CMODEL_TYPES   6
 
+/* enum explicit_relocs */
+#define EXPLICIT_RELOCS_AUTO   0
+#define EXPLICIT_RELOCS_NONE   1
+#de

[PATCH 3/5] LoongArch: Use explicit relocs for TLS access with -mexplicit-relocs=auto

2023-10-19 Thread Xi Ruoyao

The linker does not know how to relax TLS access for LoongArch, so let's
emit machine instructions with explicit relocs for TLS.

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_explicit_relocs_p):
Return true for TLS symbol types if -mexplicit-relocs=auto.
(loongarch_call_tls_get_addr): Replace TARGET_EXPLICIT_RELOCS
with la_opt_explicit_relocs != EXPLICIT_RELOCS_NONE.
(loongarch_legitimize_tls_address): Likewise.
* config/loongarch/loongarch.md (@tls_low): Remove
TARGET_EXPLICIT_RELOCS from insn condition.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/explicit-relocs-auto-tls-ld-gd.c: New
test.
* gcc.target/loongarch/explicit-relocs-auto-tls-le-ie.c: New
test.
---
 gcc/config/loongarch/loongarch.cc | 37 ---
 gcc/config/loongarch/loongarch.md |  2 +-
 .../explicit-relocs-auto-tls-ld-gd.c  |  9 +
 .../explicit-relocs-auto-tls-le-ie.c  |  6 +++
 4 files changed, 40 insertions(+), 14 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-tls-ld-gd.c
 create mode 100644 
gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-tls-le-ie.c

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index c12d77ea144..c782f571abc 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -1936,16 +1936,27 @@ loongarch_explicit_relocs_p (enum loongarch_symbol_type 
type)
   if (la_opt_explicit_relocs != EXPLICIT_RELOCS_AUTO)
 return la_opt_explicit_relocs == EXPLICIT_RELOCS_ALWAYS;
 
-  /* If we are performing LTO for a final link, and we have the linker
- plugin so we know the resolution of the symbols, then all GOT
- references are binding to external symbols or preemptable symbols.
- So the linker cannot relax them.  */
-  return (in_lto_p
- && !flag_incremental_link
- && HAVE_LTO_PLUGIN == 2
- && (!global_options_set.x_flag_use_linker_plugin
- || global_options.x_flag_use_linker_plugin)
- && type == SYMBOL_GOT_DISP);
+  switch (type)
+{
+  case SYMBOL_TLS_IE:
+  case SYMBOL_TLS_LE:
+  case SYMBOL_TLSGD:
+  case SYMBOL_TLSLDM:
+   /* The linker don't know how to relax TLS accesses.  */
+   return true;
+  case SYMBOL_GOT_DISP:
+   /* If we are performing LTO for a final link, and we have the
+  linker plugin so we know the resolution of the symbols, then
+  all GOT references are binding to external symbols or
+  preemptable symbols.  So the linker cannot relax them.  */
+   return (in_lto_p
+   && !flag_incremental_link
+   && HAVE_LTO_PLUGIN == 2
+   && (!global_options_set.x_flag_use_linker_plugin
+   || global_options.x_flag_use_linker_plugin));
+  default:
+   return false;
+}
 }
 
 /* Returns the number of instructions necessary to reference a symbol.  */
@@ -2753,7 +2764,7 @@ loongarch_call_tls_get_addr (rtx sym, enum 
loongarch_symbol_type type, rtx v0)
 
   start_sequence ();
 
-  if (TARGET_EXPLICIT_RELOCS)
+  if (la_opt_explicit_relocs != EXPLICIT_RELOCS_NONE)
 {
   /* Split tls symbol to high and low.  */
   rtx high = gen_rtx_HIGH (Pmode, copy_rtx (loc));
@@ -2918,7 +2929,7 @@ loongarch_legitimize_tls_address (rtx loc)
  tp = gen_rtx_REG (Pmode, THREAD_POINTER_REGNUM);
  tmp1 = gen_reg_rtx (Pmode);
  dest = gen_reg_rtx (Pmode);
- if (TARGET_EXPLICIT_RELOCS)
+ if (la_opt_explicit_relocs != EXPLICIT_RELOCS_NONE)
{
  tmp2 = loongarch_unspec_address (loc, SYMBOL_TLS_IE);
  tmp3 = gen_reg_rtx (Pmode);
@@ -2955,7 +2966,7 @@ loongarch_legitimize_tls_address (rtx loc)
  tmp1 = gen_reg_rtx (Pmode);
  dest = gen_reg_rtx (Pmode);
 
- if (TARGET_EXPLICIT_RELOCS)
+ if (la_opt_explicit_relocs != EXPLICIT_RELOCS_NONE)
{
  tmp2 = loongarch_unspec_address (loc, SYMBOL_TLS_LE);
  tmp3 = gen_reg_rtx (Pmode);
diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index bec73f1bc91..695c8eb9a6f 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -2257,7 +2257,7 @@ (define_insn "@tls_low"
(unspec:P [(mem:P (lo_sum:P (match_operand:P 1 "register_operand" "r")
(match_operand:P 2 "symbolic_operand" "")))]
UNSPEC_TLS_LOW))]
-  "TARGET_EXPLICIT_RELOCS"
+  ""
   "addi.\t%0,%1,%L2"
   [(set_attr "type" "arith")
(set_attr "mode" "")])
diff --git 
a/gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-tls-ld-gd.c 
b/gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-tls-ld-gd.c
new file mode 100644
index 000..957ff98df62
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-tls-ld-gd

[PATCH 5/5] LoongArch: Document -mexplicit-relocs={auto,none,always}

2023-10-19 Thread Xi Ruoyao

gcc/ChangeLog:

* doc/invoke.texi (-mexplicit-relocs=style): Document.
(-mexplicit-relocs): Document as an alias of
-mexplicit-relocs=always.
(-mno-explicit-relocs): Document as an alias of
-mexplicit-relocs=none.
(-mcmodel=extreme): Mention -mexplicit-relocs=always instead of
-mexplicit-relocs.
---
 gcc/doc/invoke.texi | 37 +
 1 file changed, 25 insertions(+), 12 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 16c45843123..f4633715e2b 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1038,7 +1038,7 @@ Objective-C and Objective-C++ Dialects}.
 -mcond-move-float  -mno-cond-move-float
 -memcpy  -mno-memcpy -mstrict-align -mno-strict-align
 -mmax-inline-memcpy-size=@var{n}
--mexplicit-relocs -mno-explicit-relocs
+-mexplicit-relocs=@var{style} -mexplicit-relocs -mno-explicit-relocs
 -mdirect-extern-access -mno-direct-extern-access
 -mcmodel=@var{code-model}}
 
@@ -26194,26 +26194,39 @@ The text segment and data segment must be within 2GB 
addressing space.
 
 @item extreme
 This mode does not limit the size of the code segment and data segment.
-The @option{-mcmodel=extreme} option is incompatible with @option{-fplt} and
-@option{-mno-explicit-relocs}.
+The @option{-mcmodel=extreme} option is incompatible with @option{-fplt},
+and it requires @option{-mexplicit-relocs=always}.
 @end table
 The default code model is @code{normal}.
 
-@opindex mexplicit-relocs
-@opindex mno-explicit-relocs
-@item -mexplicit-relocs
-@itemx -mno-explicit-relocs
-Use or do not use assembler relocation operators when dealing with symbolic
+@item -mexplicit-relocs=@var{style}
+Set when to use assembler relocation operators when dealing with symbolic
 addresses.  The alternative is to use assembler macros instead, which may
-limit instruction scheduling but allow linker relaxation.  The default
+limit instruction scheduling but allow linker relaxation.
+with @option{-mexplicit-relocs=none} the assembler macros are always used,
+with @option{-mexplicit-relocs=always} the assembler relocation operators
+are always used, with @option{-mexplicit-relocs=auto} the compiler will
+use the relocation operators where the linker relaxation is impossible to
+improve the code quality, and macros elsewhere.  The default
 value for the option is determined during GCC build-time by detecting
 corresponding assembler support:
-@code{-mno-explicit-relocs} if the assembler supports relaxation or it
-does not support relocation operators at all,
-@code{-mexplicit-relocs} otherwise.  This option is mostly useful for
+@option{-mexplicit-relocs=none} if the assembler does not support
+relocation operators at all,
+@option{-mexplicit-relocs=always} if the assembler supports relocation
+operators but does not support relaxation,
+@option{-mexplicit-relocs=auto} if the assembler supports both relocation
+operators and relaxation.  This option is mostly useful for
 debugging, or interoperation with assemblers different from the build-time
 one.
 
+@opindex mexplicit-relocs
+@item -mexplicit-relocs
+An alias of @option{-mexplicit-relocs=always} for backward compatibility.
+
+@opindex mno-explicit-relocs
+@item -mno-explicit-relocs
+An alias of @option{-mexplicit-relocs=none} for backward compatibility.
+
 @opindex mdirect-extern-access
 @item -mdirect-extern-access
 @itemx -mno-direct-extern-access
-- 
2.42.0

[PATCH 0/5] LoongArch: Better balance between relaxation and scheduling

2023-10-19 Thread Xi Ruoyao

For relaxation we are now generating assembler macros for symbolic
addresses everywhere, but this is limiting scheduling and there are
known situations where the relaxation cannot improve the code.

1. When we are performing LTO during a final link and the linker plugin
is used, la.global won't be relaxed because they reference to an
external or preemptable symbol.
2. The linker currently do not relax la.tls.*.
3. For la.local + ld/st pairs, if the address is only used once,
emitting pcalau12i + ld/st is always not worse than relying on linker
relaxation.

Add -mexplicit-relocs=auto to allow the compiler to use explicit relocs
for these cases, but assembler macros for other cases.  Use it as the
default if the assembler supports both explicit relocs and relaxation.

LTO-bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?

Xi Ruoyao (5):
  LoongArch: Add enum-style -mexplicit-relocs= option
  LoongArch: Use explicit relocs for GOT access when
-mexplicit-relocs=auto and LTO during a final link with linker
plugin
  LoongArch: Use explicit relocs for TLS access with
-mexplicit-relocs=auto
  LoongArch: Use explicit relocs for addresses only used for one load or
store with -mexplicit-relocs=auto and -mcmodel={normal,medium}
  LoongArch: Document -mexplicit-relocs={auto,none,always}

 .../loongarch/genopts/loongarch-strings   |   6 +
 gcc/config/loongarch/genopts/loongarch.opt.in |  21 ++-
 gcc/config/loongarch/loongarch-def.h  |   6 +
 gcc/config/loongarch/loongarch-protos.h   |   1 +
 gcc/config/loongarch/loongarch-str.h  |   5 +
 gcc/config/loongarch/loongarch.cc |  75 --
 gcc/config/loongarch/loongarch.h  |   3 +
 gcc/config/loongarch/loongarch.md | 128 +-
 gcc/config/loongarch/loongarch.opt|  21 ++-
 gcc/config/loongarch/predicates.md|  15 +-
 gcc/doc/invoke.texi   |  37 +++--
 .../loongarch/explicit-relocs-auto-lto.c  |  26 
 ...-relocs-auto-single-load-store-no-anchor.c |   6 +
 .../explicit-relocs-auto-single-load-store.c  |  14 ++
 .../explicit-relocs-auto-tls-ld-gd.c  |   9 ++
 .../explicit-relocs-auto-tls-le-ie.c  |   6 +
 16 files changed, 343 insertions(+), 36 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-lto.c
 create mode 100644 
gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-single-load-store-no-anchor.c
 create mode 100644 
gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-single-load-store.c
 create mode 100644 
gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-tls-ld-gd.c
 create mode 100644 
gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-tls-le-ie.c

-- 
2.42.0

[PATCH 2/5] LoongArch: Use explicit relocs for GOT access when -mexplicit-relocs=auto and LTO during a final link with linker plugin

2023-10-19 Thread Xi Ruoyao

If we are performing LTO for a final link and linker plugin is enabled,
then we are sure any GOT access may resolve to a symbol out of the link
unit (otherwise the linker plugin will tell us the symbol should be
resolved locally and we'll use PC-relative access instead).

Produce machine instructions with explicit relocs instead of la.global
for better scheduling.

gcc/ChangeLog:

* config/loongarch/loongarch-protos.h
(loongarch_explicit_relocs_p): Declare new function.
* config/loongarch/loongarch.cc (loongarch_explicit_relocs_p):
Implement.
(loongarch_symbol_insns): Call loongarch_explicit_relocs_p for
SYMBOL_GOT_DISP, instead of using TARGET_EXPLICIT_RELOCS.
(loongarch_split_symbol): Call loongarch_explicit_relocs_p for
deciding if return early, instead of using
TARGET_EXPLICIT_RELOCS.
(loongarch_output_move): CAll loongarch_explicit_relocs_p
instead of using TARGET_EXPLICIT_RELOCS.
* config/loongarch/loongarch.md (*low): Remove
TARGET_EXPLICIT_RELOCS from insn condition.
(@ld_from_got): Likewise.
* config/loongarch/predicates.md (move_operand): Call
loongarch_explicit_relocs_p instead of using
TARGET_EXPLICIT_RELOCS.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/explicit-relocs-auto-lto.c: New test.
---
 gcc/config/loongarch/loongarch-protos.h   |  1 +
 gcc/config/loongarch/loongarch.cc | 34 +++
 gcc/config/loongarch/loongarch.md |  4 +--
 gcc/config/loongarch/predicates.md|  8 ++---
 .../loongarch/explicit-relocs-auto-lto.c  | 26 ++
 5 files changed, 59 insertions(+), 14 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-lto.c

diff --git a/gcc/config/loongarch/loongarch-protos.h 
b/gcc/config/loongarch/loongarch-protos.h
index 72ae9918b09..cb8fc36b086 100644
--- a/gcc/config/loongarch/loongarch-protos.h
+++ b/gcc/config/loongarch/loongarch-protos.h
@@ -220,4 +220,5 @@ extern rtx loongarch_gen_const_int_vector_shuffle 
(machine_mode, int);
 extern tree loongarch_build_builtin_va_list (void);
 
 extern rtx loongarch_build_signbit_mask (machine_mode, bool, bool);
+extern bool loongarch_explicit_relocs_p (enum loongarch_symbol_type);
 #endif /* ! GCC_LOONGARCH_PROTOS_H */
diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 5df8b12ed92..c12d77ea144 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -1925,6 +1925,29 @@ loongarch_symbolic_constant_p (rtx x, enum 
loongarch_symbol_type *symbol_type)
   gcc_unreachable ();
 }
 
+/* If -mexplicit-relocs=auto, we use machine operations with reloc hints
+   for cases where the linker is unable to relax so we can schedule the
+   machine operations, otherwise use an assembler pseudo-op so the
+   assembler will generate R_LARCH_RELAX.  */
+
+bool
+loongarch_explicit_relocs_p (enum loongarch_symbol_type type)
+{
+  if (la_opt_explicit_relocs != EXPLICIT_RELOCS_AUTO)
+return la_opt_explicit_relocs == EXPLICIT_RELOCS_ALWAYS;
+
+  /* If we are performing LTO for a final link, and we have the linker
+ plugin so we know the resolution of the symbols, then all GOT
+ references are binding to external symbols or preemptable symbols.
+ So the linker cannot relax them.  */
+  return (in_lto_p
+ && !flag_incremental_link
+ && HAVE_LTO_PLUGIN == 2
+ && (!global_options_set.x_flag_use_linker_plugin
+ || global_options.x_flag_use_linker_plugin)
+ && type == SYMBOL_GOT_DISP);
+}
+
 /* Returns the number of instructions necessary to reference a symbol.  */
 
 static int
@@ -1940,7 +1963,7 @@ loongarch_symbol_insns (enum loongarch_symbol_type type, 
machine_mode mode)
 case SYMBOL_GOT_DISP:
   /* The constant will have to be loaded from the GOT before it
 is used in an address.  */
-  if (!TARGET_EXPLICIT_RELOCS && mode != MAX_MACHINE_MODE)
+  if (!loongarch_explicit_relocs_p (type) && mode != MAX_MACHINE_MODE)
return 0;
 
   return 3;
@@ -3038,7 +3061,7 @@ loongarch_symbol_extreme_p (enum loongarch_symbol_type 
type)
If so, and if LOW_OUT is nonnull, emit the high part and store the
low part in *LOW_OUT.  Leave *LOW_OUT unchanged otherwise.
 
-   Return false if build with '-mno-explicit-relocs'.
+   Return false if build with '-mexplicit-relocs=none'.
 
TEMP is as for loongarch_force_temporary and is used to load the high
part into a register.
@@ -3052,12 +3075,9 @@ loongarch_split_symbol (rtx temp, rtx addr, machine_mode 
mode, rtx *low_out)
 {
   enum loongarch_symbol_type symbol_type;
 
-  /* If build with '-mno-explicit-relocs', don't split symbol.  */
-  if (!TARGET_EXPLICIT_RELOCS)
-return false;
-
   if ((GET_CODE (addr) == HIGH && mode == MAX_MACHINE_MODE)
   || !loongarch_symbolic_constant_p (addr, &symbol_type)
+  ||

[PATCH 4/5] LoongArch: Use explicit relocs for addresses only used for one load or store with -mexplicit-relocs=auto and -mcmodel={normal, medium}

2023-10-19 Thread Xi Ruoyao

In these cases, if we use explicit relocs, we end up with 2
instructions:

pcalau12it0, %pc_hi20(x)
ld.d t0, t0, %pc_lo12(x)

If we use la.local pseudo-op, in the best scenario (x is in +/- 2MiB
range) we still have 2 instructions:

pcaddi   t0, %pcrel_20(x)
ld.d t0, t0, 0

If x is out of the range we'll have 3 instructions.  So for these cases
just emit machine instructions with explicit relocs.

gcc/ChangeLog:

* config/loongarch/predicates.md (symbolic_pcrel_operand): New
predicate.
* config/loongarch/loongarch.md (define_peephole2): Optimize
la.local + ld/st to pcalau12i + ld/st if the address is only used
once if -mexplicit-relocs=auto and -mcmodel=normal or medium.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/explicit-relocs-auto-single-load-store.c:
New test.
* 
gcc.target/loongarch/explicit-relocs-auto-single-load-store-no-anchor.c:
New test.
---
 gcc/config/loongarch/loongarch.md | 122 ++
 gcc/config/loongarch/predicates.md|   7 +
 ...-relocs-auto-single-load-store-no-anchor.c |   6 +
 .../explicit-relocs-auto-single-load-store.c  |  14 ++
 4 files changed, 149 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-single-load-store-no-anchor.c
 create mode 100644 
gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-single-load-store.c

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 695c8eb9a6f..13473472171 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -65,6 +65,7 @@ (define_c_enum "unspec" [
 
   UNSPEC_LOAD_FROM_GOT
   UNSPEC_PCALAU12I
+  UNSPEC_PCALAU12I_GR
   UNSPEC_ORI_L_LO12
   UNSPEC_LUI_L_HI20
   UNSPEC_LUI_H_LO20
@@ -2297,6 +2298,16 @@ (define_insn "@pcalau12i"
   "pcalau12i\t%0,%%pc_hi20(%1)"
   [(set_attr "type" "move")])
 
+;; @pcalau12i may be used for sibcall so it has a strict constraint.  This
+;; allows any general register as the operand.
+(define_insn "@pcalau12i_gr"
+  [(set (match_operand:P 0 "register_operand" "=r")
+   (unspec:P [(match_operand:P 1 "symbolic_operand" "")]
+   UNSPEC_PCALAU12I_GR))]
+  ""
+  "pcalau12i\t%0,%%pc_hi20(%1)"
+  [(set_attr "type" "move")])
+
 (define_insn "@ori_l_lo12"
   [(set (match_operand:P 0 "register_operand" "=r")
(unspec:P [(match_operand:P 1 "register_operand" "r")
@@ -3748,6 +3759,117 @@ (define_insn "loongarch_crcc_w__w"
   [(set_attr "type" "unknown")
(set_attr "mode" "")])
 
+;; With normal or medium code models, if the only use of a pc-relative
+;; address is for loading or storing a value, then relying on linker
+;; relaxation is not better than emitting the machine instruction directly.
+;; Even if the la.local pseudo op can be relaxed, we get:
+;;
+;; pcaddi $t0, %pcrel_20(x)
+;; ld.d   $t0, $t0, 0
+;;
+;; There are still two instructions, same as using the machine instructions
+;; and explicit relocs:
+;;
+;; pcalau12i  $t0, %pc_hi20(x)
+;; ld.d   $t0, $t0, %pc_lo12(x)
+;;
+;; And if the pseudo op cannot be relaxed, we'll get a worse result (with
+;; 3 instructions).
+(define_peephole2
+  [(set (match_operand:P 0 "register_operand")
+   (match_operand:P 1 "symbolic_pcrel_operand"))
+   (set (match_operand:GPR 2 "register_operand")
+   (mem:GPR (match_dup 0)))]
+  "la_opt_explicit_relocs == EXPLICIT_RELOCS_AUTO \
+   && (TARGET_CMODEL_NORMAL || TARGET_CMODEL_MEDIUM) \
+   && (peep2_reg_dead_p (2, operands[0]) \
+   || REGNO (operands[0]) == REGNO (operands[2]))"
+  [(set (match_dup 2) (mem:GPR (lo_sum:P (match_dup 0) (match_dup 1]
+  {
+emit_insn (gen_pcalau12i_gr (operands[0], operands[1]));
+  })
+
+(define_peephole2
+  [(set (match_operand:P 0 "register_operand")
+   (match_operand:P 1 "symbolic_pcrel_operand"))
+   (set (match_operand:GPR 2 "register_operand")
+   (mem:GPR (plus (match_dup 0)
+  (match_operand 3 "const_int_operand"]
+  "la_opt_explicit_relocs == EXPLICIT_RELOCS_AUTO \
+   && (TARGET_CMODEL_NORMAL || TARGET_CMODEL_MEDIUM) \
+   && (peep2_reg_dead_p (2, operands[0]) \
+   || REGNO (operands[0]) == REGNO (operands[2]))"
+  [(set (match_dup 2) (mem:GPR (lo_sum:P (match_dup 0) (match_dup 1]
+  {
+operands[1] = plus_constant (Pmode, operands[1], INTVAL (operands[3]));
+emit_insn (gen_pcalau12i_gr (operands[0], operands[1]));
+  })
+
+(define_peephole2
+  [(set (match_operand:P 0 "register_operand")
+   (match_operand:P 1 "symbolic_pcrel_operand"))
+   (set (match_operand:GPR 2 "register_operand")
+   (any_extend:GPR (mem:SUBDI (match_dup 0]
+  "la_opt_explicit_relocs == EXPLICIT_RELOCS_AUTO \
+   && (TARGET_CMODEL_NORMAL || TARGET_CMODEL_MEDIUM) \
+   && (peep2_reg_dead_p (2, operands[0]) \
+   || REGNO (operands[0]) == REGNO (operands[2]))"
+  [(set (match_dup 2)
+   (any_extend:GPR (mem:SUBDI (lo_sum:P

Re: [PATCH 2/5] LoongArch: Use explicit relocs for GOT access when -mexplicit-relocs=auto and LTO during a final link with linker plugin

2023-10-21 Thread Xi Ruoyao

On Sat, 2023-10-21 at 15:32 +0800, chenglulu wrote:
> > +  /* If we are performing LTO for a final link, and we have the linker
> > + plugin so we know the resolution of the symbols, then all GOT
> > + references are binding to external symbols or preemptable symbols.
> > + So the linker cannot relax them.  */
> > +  return (in_lto_p
> > +     && !flag_incremental_link
> 
> I don’t quite understand this condition "!flag_incremental_link". Can 
> you explain it? Others LGTM.
> 
> Thanks.

If we have two (or several) .o files containing LTO bytecode, GCC
supports "LTO incremental linking" with:

gcc a.o b.o -o ab.o -O2 -flto -flinker-output=nolto-rel

The resulted ab.o will include data and code in a.o and b.o, but it
contains machine code instead of LTO bytecode.  Now if ab.o refers to an
external symbol c, the linker may relax "la.global c" to "la.local c"
(if ab.o is linked together with another file c.o which contains the
definition of c) or not.  As we cannot exclude the possibility of a
relaxation on la.global for incremental linking, just emit la.global and
let the linker to do the correct thing.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Pushed: [PATCH 0/5] LoongArch: Better balance between relaxation and scheduling

2023-10-23 Thread Xi Ruoyao

Pushed r14-{4848..4852}.

On Thu, 2023-10-19 at 22:02 +0800, Xi Ruoyao wrote:
> For relaxation we are now generating assembler macros for symbolic
> addresses everywhere, but this is limiting scheduling and there are
> known situations where the relaxation cannot improve the code.
> 
> 1. When we are performing LTO during a final link and the linker plugin
> is used, la.global won't be relaxed because they reference to an
> external or preemptable symbol.
> 2. The linker currently do not relax la.tls.*.
> 3. For la.local + ld/st pairs, if the address is only used once,
> emitting pcalau12i + ld/st is always not worse than relying on linker
> relaxation.
> 
> Add -mexplicit-relocs=auto to allow the compiler to use explicit relocs
> for these cases, but assembler macros for other cases.  Use it as the
> default if the assembler supports both explicit relocs and relaxation.
> 
> LTO-bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?
> 
> Xi Ruoyao (5):
>   LoongArch: Add enum-style -mexplicit-relocs= option
>   LoongArch: Use explicit relocs for GOT access when
>     -mexplicit-relocs=auto and LTO during a final link with linker
>     plugin
>   LoongArch: Use explicit relocs for TLS access with
>     -mexplicit-relocs=auto
>   LoongArch: Use explicit relocs for addresses only used for one load or
>     store with -mexplicit-relocs=auto and -mcmodel={normal,medium}
>   LoongArch: Document -mexplicit-relocs={auto,none,always}
> 
>  .../loongarch/genopts/loongarch-strings   |   6 +
>  gcc/config/loongarch/genopts/loongarch.opt.in |  21 ++-
>  gcc/config/loongarch/loongarch-def.h  |   6 +
>  gcc/config/loongarch/loongarch-protos.h   |   1 +
>  gcc/config/loongarch/loongarch-str.h  |   5 +
>  gcc/config/loongarch/loongarch.cc |  75 --
>  gcc/config/loongarch/loongarch.h  |   3 +
>  gcc/config/loongarch/loongarch.md | 128 +-
>  gcc/config/loongarch/loongarch.opt    |  21 ++-
>  gcc/config/loongarch/predicates.md    |  15 +-
>  gcc/doc/invoke.texi   |  37 +++--
>  .../loongarch/explicit-relocs-auto-lto.c  |  26 
>  ...-relocs-auto-single-load-store-no-anchor.c |   6 +
>  .../explicit-relocs-auto-single-load-store.c  |  14 ++
>  .../explicit-relocs-auto-tls-ld-gd.c  |   9 ++
>  .../explicit-relocs-auto-tls-le-ie.c  |   6 +
>  16 files changed, 343 insertions(+), 36 deletions(-)
>  create mode 100644 
> gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-lto.c
>  create mode 100644 
> gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-single-load-store-no-anchor.c
>  create mode 100644 
> gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-single-load-store.c
>  create mode 100644 
> gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-tls-ld-gd.c
>  create mode 100644 
> gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-tls-le-ie.c
> 

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

[PATCH] LoongArch: Define HAVE_AS_TLS to 0 if it's undefined

2023-10-30 Thread Xi Ruoyao

Now loongarch.md uses HAVE_AS_TLS, we need this to fix the failure
building a cross compiler if the cross assembler is not installed yet.

gcc/ChangeLog:

* config/loongarch/loongarch-opts.h (HAVE_AS_TLS): Define to 0
if not defined yet.
---

Ok for trunk?

 gcc/config/loongarch/loongarch-opts.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/config/loongarch/loongarch-opts.h 
b/gcc/config/loongarch/loongarch-opts.h
index 2756939b05d..f204828015e 100644
--- a/gcc/config/loongarch/loongarch-opts.h
+++ b/gcc/config/loongarch/loongarch-opts.h
@@ -101,4 +101,8 @@ loongarch_update_gcc_opt_status (struct loongarch_target 
*target,
 #define HAVE_AS_MRELAX_OPTION 0
 #endif
 
+#ifndef HAVE_AS_TLS
+#define HAVE_AS_TLS 0
+#endif
+
 #endif /* LOONGARCH_OPTS_H */
-- 
2.42.0

Re: [PATCH] LoongArch: Define HAVE_AS_TLS to 0 if it's undefined

2023-10-30 Thread Xi Ruoyao

On Mon, 2023-10-30 at 19:50 +0800, chenglulu wrote:
> 在 2023/10/30 下午7:42, Xi Ruoyao 写道:
> > Now loongarch.md uses HAVE_AS_TLS, we need this to fix the failure
> > building a cross compiler if the cross assembler is not installed yet.
> > 
> > gcc/ChangeLog:
> > 
> >     * config/loongarch/loongarch-opts.h (HAVE_AS_TLS): Define to 0
> >     if not defined yet.
> > ---
> > 
> > Ok for trunk?
> I have no problem with this submission, but I don't understand the 
> circumstances surrounding the error.

When the developers hack GCC they sometimes build a cross compiler with
no cross assembler, then HAVE_AS_TLS will just be undefined.  And in the
future we may have an assmebler w/o TLS support (for example a tiny
assembler for bare-metal target), then HAVE_AS_TLS will be undefined
too.

The error message is:

g++ -c   -g -O2   -DIN_GCC -DCROSS_DIRECTORY_STRUCTURE   -fno-exceptions 
-fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings 
-Wcast-qual -Wmissing-format-attribute -Wconditionally-supported 
-Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros 
-Wno-overlength-strings -fno-common  -DHAVE_CONFIG_H  -DGENERATOR_FILE -I. 
-Ibuild -I../../gcc/gcc -I../../gcc/gcc/build -I../../gcc/gcc/../include  
-I../../gcc/gcc/../libcpp/include  \
-o build/gencondmd.o build/gencondmd.cc
../../gcc/gcc/config/loongarch/loongarch.md:3655:2: error: 'HAVE_AS_TLS' was 
not declared in this scope
 3655 |   "HAVE_AS_TLS"
  |  ^~~
../../gcc/gcc/config/loongarch/loongarch.md:3655:2: error: 'HAVE_AS_TLS' was 
not declared in this scope
 3655 |   "HAVE_AS_TLS"
  |  ^~~
../../gcc/gcc/config/loongarch/loongarch.md:3655:2: error: 'HAVE_AS_TLS' was 
not declared in this scope
 3655 |   "HAVE_AS_TLS"
  |  ^~~
../../gcc/gcc/config/loongarch/loongarch.md:3655:2: error: 'HAVE_AS_TLS' was 
not declared in this scope
 3655 |   "HAVE_AS_TLS"
  |  ^~~
make[1]: *** [Makefile:2962: build/gencondmd.o] Error 1

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Pushed: [PATCH v2] LoongArch: Define HAVE_AS_TLS to 0 if it's undefined [PR112299]

2023-10-30 Thread Xi Ruoyao

Pushed r14-5030.  The subject and ChangeLog are updated to include the
PR number.  The code change is same as v1.

On Mon, 2023-10-30 at 20:44 +0800, chenglulu wrote:
> 
> 在 2023/10/30 下午8:26, Xi Ruoyao 写道:
> > On Mon, 2023-10-30 at 19:50 +0800, chenglulu wrote:
> > > 在 2023/10/30 下午7:42, Xi Ruoyao 写道:
> > > > Now loongarch.md uses HAVE_AS_TLS, we need this to fix the failure
> > > > building a cross compiler if the cross assembler is not installed yet.
> > > > 
> > > > gcc/ChangeLog:
> > > > 
> > > >     * config/loongarch/loongarch-opts.h (HAVE_AS_TLS): Define to 0
> > > >     if not defined yet.
> > > > ---
> > > > 
> > > > Ok for trunk?
> > > I have no problem with this submission, but I don't understand the
> > > circumstances surrounding the error.
> > When the developers hack GCC they sometimes build a cross compiler with
> > no cross assembler, then HAVE_AS_TLS will just be undefined.  And in the
> > future we may have an assmebler w/o TLS support (for example a tiny
> > assembler for bare-metal target), then HAVE_AS_TLS will be undefined
> > too.
> 
> Ok!
> 
> Thanks!
> 
> > 
> > The error message is:
> > 
> > g++ -c   -g -O2   -DIN_GCC -DCROSS_DIRECTORY_STRUCTURE   -fno-exceptions 
> > -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing 
> > -Wwrite-strings -Wcast-qual -Wmissing-format-attribute 
> > -Wconditionally-supported -Woverloaded-virtual -pedantic -Wno-long-long 
> > -Wno-variadic-macros -Wno-overlength-strings -fno-common  -DHAVE_CONFIG_H  
> > -DGENERATOR_FILE -I. -Ibuild -I../../gcc/gcc -I../../gcc/gcc/build 
> > -I../../gcc/gcc/../include  -I../../gcc/gcc/../libcpp/include  \
> > -o build/gencondmd.o build/gencondmd.cc
> > ../../gcc/gcc/config/loongarch/loongarch.md:3655:2: error: 'HAVE_AS_TLS' 
> > was not declared in this scope
> >   3655 |   "HAVE_AS_TLS"
> >    |  ^~~
> > ../../gcc/gcc/config/loongarch/loongarch.md:3655:2: error: 'HAVE_AS_TLS' 
> > was not declared in this scope
> >   3655 |   "HAVE_AS_TLS"
> >    |  ^~~
> > ../../gcc/gcc/config/loongarch/loongarch.md:3655:2: error: 'HAVE_AS_TLS' 
> > was not declared in this scope
> >   3655 |   "HAVE_AS_TLS"
> >    |  ^~~
> > ../../gcc/gcc/config/loongarch/loongarch.md:3655:2: error: 'HAVE_AS_TLS' 
> > was not declared in this scope
> >   3655 |   "HAVE_AS_TLS"
> >    |  ^~~
> > make[1]: *** [Makefile:2962: build/gencondmd.o] Error 1
> > 
> 

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

[PATCH] LoongArch: Disable relaxation if the assembler don't support conditional branch relaxation [PR112330]

2023-11-05 Thread Xi Ruoyao

As the commit message of r14-4674 has indicated, if the assembler does
not support conditional branch relaxation, a relocation overflow may
happen on conditional branches when relaxation is enabled because the
number of NOP instructions inserted by the assembler will be more than
the number estimated by GCC.

To work around this issue, disable relaxation by default if the
assembler is detected incapable to perform conditional branch relaxation
at GCC build time.  We also need to pass -mno-relax to the assembler to
really disable relaxation.  But, if the assembler does not support
-mrelax option at all, we should not pass -mno-relax to the assembler or
it will immediately error out.  Also handle this with the build time
assembler capability probing, and add a pair of options
-m[no-]pass-mrelax-to-as to allow using a different assembler from the
build-time one.

With this change, if GCC is built with GAS 2.41, relaxation will be
disabled by default.  So the default value of -mexplicit-relocs= is also
changed to 'always' if -mno-relax is specified or implied by the
build-time default, because using assembler macros for symbol addresses
produces no benefit when relaxation is disabled.

gcc/ChangeLog:

PR target/112330
* config/loongarch/genopts/loongarch.opt.in: Add
-m[no]-pass-relax-to-as.  Change the default of -m[no]-relax to
account conditional branch relaxation support status.
* config/loongarch/loongarch.opt: Regenerate.
* configure.ac (gcc_cv_as_loongarch_cond_branch_relax): Check if
the assembler supports conditional branch relaxation.
* configure: Regenerate.
* config.in: Regenerate.
* config/loongarch/loongarch-opts.h
(HAVE_AS_COND_BRANCH_RELAXATION): Define to 0 if not defined.
* config/loongarch/loongarch-driver.h (ASM_MRELAX_DEFAULT):
Define.
(ASM_MRELAX_SPEC): Define.
(ASM_SPEC): Use ASM_MRELAX_SPEC instead of "%{mno-relax}".
* config/loongarch/loongarch.cc: Take the setting of
-m[no-]relax into account when determining the default of
-mexplicit-relocs=.
* doc/invoke.texi: Document -m[no-]relax and
-m[no-]pass-mrelax-to-as for LoongArch.  Update the default
value of -mexplicit-relocs=.
---

Bootstrapped and regtested on loongarch64-linux-gnu twice: once with
Binutils 2.41, another with Binutils 2.41.50.20231105.  With Binutils
2.41.50.20231105 there is a regression: the compilation of
c-c++-common/asan/pr59063-2.c timeouts.  My diagnostic has shown that
the timeout was caused by the linker (it seemed running indefinitely),
so it's more likely a Binutils regression rather than GCC regression
and I'll leave this for Qinggang.

Ok for trunk?

 gcc/config.in |  6 +++
 gcc/config/loongarch/genopts/loongarch.opt.in |  6 ++-
 gcc/config/loongarch/loongarch-driver.h   | 16 +++-
 gcc/config/loongarch/loongarch-opts.h |  4 ++
 gcc/config/loongarch/loongarch.cc |  2 +-
 gcc/config/loongarch/loongarch.opt|  6 ++-
 gcc/configure | 39 ++-
 gcc/configure.ac  | 10 +
 gcc/doc/invoke.texi   | 36 +
 9 files changed, 111 insertions(+), 14 deletions(-)

diff --git a/gcc/config.in b/gcc/config.in
index 03faee1c6ac..7728e53ca1f 100644
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -386,6 +386,12 @@
 #endif
 
 
+/* Define if your assembler supports conditional branch relaxation. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_AS_COND_BRANCH_RELAXATION
+#endif
+
+
 /* Define if your assembler supports the --debug-prefix-map option. */
 #ifndef USED_FOR_TARGET
 #undef HAVE_AS_DEBUG_PREFIX_MAP
diff --git a/gcc/config/loongarch/genopts/loongarch.opt.in 
b/gcc/config/loongarch/genopts/loongarch.opt.in
index e1fe0c7086e..158701d327a 100644
--- a/gcc/config/loongarch/genopts/loongarch.opt.in
+++ b/gcc/config/loongarch/genopts/loongarch.opt.in
@@ -223,10 +223,14 @@ Target Var(TARGET_DIRECT_EXTERN_ACCESS) Init(0)
 Avoid using the GOT to access external symbols.
 
 mrelax
-Target Var(loongarch_mrelax) Init(HAVE_AS_MRELAX_OPTION)
+Target Var(loongarch_mrelax) Init(HAVE_AS_MRELAX_OPTION && 
HAVE_AS_COND_BRANCH_RELAXATION)
 Take advantage of linker relaxations to reduce the number of instructions
 required to materialize symbol addresses.
 
+mpass-mrelax-to-as
+Target Var(loongarch_pass_mrelax_to_as) Init(HAVE_AS_MRELAX_OPTION)
+Pass -mrelax or -mno-relax option to the assembler.
+
 -param=loongarch-vect-unroll-limit=
 Target Joined UInteger Var(loongarch_vect_unroll_limit) Init(6) 
IntegerRange(1, 64) Param
 Used to limit unroll factor which indicates how much the autovectorizer may
diff --git a/gcc/config/loongarch/loongarch-driver.h 
b/gcc/config/loongarch/loongarch-driver.h
index d859afcc9fe..20d233cc938 100644
--- a/gcc/config/loongarch/loongarch-driver.h
+++ b/gcc/config/loongarch/loo

[PATCH] LoongArch: Optimize single-used address with -mexplicit-relocs=auto for fld/fst

2023-11-05 Thread Xi Ruoyao

fld and fst have same address mode as ld.w and st.w, so the same
optimization as r14-4851 should be applied for them too.

gcc/ChangeLog:

* config/loongarch/loongarch.md (LD_AT_LEAST_32_BIT): New mode
iterator.
(ST_ANY): New mode iterator.
(define_peephole2): Use LD_AT_LEAST_32_BIT instead of GPR and
ST_ANY instead of QHWD for applicable patterns.
---

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?

 gcc/config/loongarch/loongarch.md | 46 +--
 1 file changed, 32 insertions(+), 14 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 4dd716e1941..9c247242215 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -400,6 +400,22 @@ (define_mode_iterator SPLITF
(DI "!TARGET_64BIT && TARGET_DOUBLE_FLOAT")
(TF "TARGET_64BIT && TARGET_DOUBLE_FLOAT")])
 
+;; A mode for anything with 32 bits or more, and able to be loaded with
+;; the same addressing mode as ld.w.
+(define_mode_iterator LD_AT_LEAST_32_BIT
+  [SI
+   (DI "TARGET_64BIT")
+   (SF "TARGET_SINGLE_FLOAT || TARGET_DOUBLE_FLOAT")
+   (DF "TARGET_DOUBLE_FLOAT")])
+
+;; A mode for anything able to be stored with the same addressing mode as
+;; st.w.
+(define_mode_iterator ST_ANY
+  [QI HI SI
+   (DI "TARGET_64BIT")
+   (SF "TARGET_SINGLE_FLOAT || TARGET_DOUBLE_FLOAT")
+   (DF "TARGET_DOUBLE_FLOAT")])
+
 ;; In GPR templates, a string like "mul." will expand to "mul.w" in the
 ;; 32-bit version and "mul.d" in the 64-bit version.
 (define_mode_attr d [(SI "w") (DI "d")])
@@ -3785,13 +3801,14 @@ (define_insn "loongarch_crcc_w__w"
 (define_peephole2
   [(set (match_operand:P 0 "register_operand")
(match_operand:P 1 "symbolic_pcrel_operand"))
-   (set (match_operand:GPR 2 "register_operand")
-   (mem:GPR (match_dup 0)))]
+   (set (match_operand:LD_AT_LEAST_32_BIT 2 "register_operand")
+   (mem:LD_AT_LEAST_32_BIT (match_dup 0)))]
   "la_opt_explicit_relocs == EXPLICIT_RELOCS_AUTO \
&& (TARGET_CMODEL_NORMAL || TARGET_CMODEL_MEDIUM) \
&& (peep2_reg_dead_p (2, operands[0]) \
|| REGNO (operands[0]) == REGNO (operands[2]))"
-  [(set (match_dup 2) (mem:GPR (lo_sum:P (match_dup 0) (match_dup 1]
+  [(set (match_dup 2)
+   (mem:LD_AT_LEAST_32_BIT (lo_sum:P (match_dup 0) (match_dup 1]
   {
 emit_insn (gen_pcalau12i_gr (operands[0], operands[1]));
   })
@@ -3799,14 +3816,15 @@ (define_peephole2
 (define_peephole2
   [(set (match_operand:P 0 "register_operand")
(match_operand:P 1 "symbolic_pcrel_operand"))
-   (set (match_operand:GPR 2 "register_operand")
-   (mem:GPR (plus (match_dup 0)
-  (match_operand 3 "const_int_operand"]
+   (set (match_operand:LD_AT_LEAST_32_BIT 2 "register_operand")
+   (mem:LD_AT_LEAST_32_BIT (plus (match_dup 0)
+   (match_operand 3 "const_int_operand"]
   "la_opt_explicit_relocs == EXPLICIT_RELOCS_AUTO \
&& (TARGET_CMODEL_NORMAL || TARGET_CMODEL_MEDIUM) \
&& (peep2_reg_dead_p (2, operands[0]) \
|| REGNO (operands[0]) == REGNO (operands[2]))"
-  [(set (match_dup 2) (mem:GPR (lo_sum:P (match_dup 0) (match_dup 1]
+  [(set (match_dup 2)
+   (mem:LD_AT_LEAST_32_BIT (lo_sum:P (match_dup 0) (match_dup 1]
   {
 operands[1] = plus_constant (Pmode, operands[1], INTVAL (operands[3]));
 emit_insn (gen_pcalau12i_gr (operands[0], operands[1]));
@@ -3850,13 +3868,13 @@ (define_peephole2
 (define_peephole2
   [(set (match_operand:P 0 "register_operand")
(match_operand:P 1 "symbolic_pcrel_operand"))
-   (set (mem:QHWD (match_dup 0))
-   (match_operand:QHWD 2 "register_operand"))]
+   (set (mem:ST_ANY (match_dup 0))
+   (match_operand:ST_ANY 2 "register_operand"))]
   "la_opt_explicit_relocs == EXPLICIT_RELOCS_AUTO \
&& (TARGET_CMODEL_NORMAL || TARGET_CMODEL_MEDIUM) \
&& (peep2_reg_dead_p (2, operands[0])) \
&& REGNO (operands[0]) != REGNO (operands[2])"
-  [(set (mem:QHWD (lo_sum:P (match_dup 0) (match_dup 1))) (match_dup 2))]
+  [(set (mem:ST_ANY (lo_sum:P (match_dup 0) (match_dup 1))) (match_dup 2))]
   {
 emit_insn (gen_pcalau12i_gr (operands[0], operands[1]));
   })
@@ -3864,14 +3882,14 @@ (define_peephole2
 (define_peephole2
   [(set (match_operand:P 0 "register_operand")
(match_operand:P 1 "symbolic_pcrel_operand"))
-   (set (mem:QHWD (plus (match_dup 0)
-   (match_operand 3 "const_int_operand")))
-   (match_operand:QHWD 2 "register_operand"))]
+   (set (mem:ST_ANY (plus (match_dup 0)
+ (match_operand 3 "const_int_operand")))
+   (match_operand:ST_ANY 2 "register_operand"))]
   "la_opt_explicit_relocs == EXPLICIT_RELOCS_AUTO \
&& (TARGET_CMODEL_NORMAL || TARGET_CMODEL_MEDIUM) \
&& (peep2_reg_dead_p (2, operands[0])) \
&& REGNO (operands[0]) != REGNO (operands[2])"
-  [(set (mem:QHWD (lo_sum:P (match_dup 0) (match_dup 1))) (ma

[PATCH] LoongArch: Remove redundant barrier instructions before LL-SC loops

2023-11-06 Thread Xi Ruoyao

This is isomorphic to the LLVM changes [1-2].

On LoongArch, the LL and SC instructions has memory barrier semantics:

- LL:  + 
- SC:  + 

But the compare and swap operation is allowed to fail, and if it fails
the SC instruction is not executed, thus the guarantee of acquiring
semantics cannot be ensured. Therefore, an acquire barrier needs to be
generated when failure_memorder includes an acquire operation.

On CPUs implementing LoongArch v1.10 or later, "dbar 0b10100" is an
acquire barrier; on CPUs implementing LoongArch v1.00, it is a full
barrier.  So it's always enough for acquire semantics.  OTOH if an
acquire semantic is not needed, we still needs the "dbar 0x700" as the
load-load barrier like all LL-SC loops.

[1]:https://github.com/llvm/llvm-project/pull/67391
[2]:https://github.com/llvm/llvm-project/pull/69339

gcc/ChangeLog:

* config/loongarch/loongarch.cc
(loongarch_memmodel_needs_release_fence): Remove.
(loongarch_cas_failure_memorder_needs_acquire): New static
function.
(loongarch_print_operand): Redefine 'G' for the barrier on CAS
failure.
* config/loongarch/sync.md (atomic_cas_value_strong):
Remove the redundant barrier before the LL instruction, and
emit an acquire barrier on failure if needed by
failure_memorder.
(atomic_cas_value_cmp_and_7_): Likewise.
(atomic_cas_value_add_7_): Remove the unnecessary barrier
before the LL instruction.
(atomic_cas_value_sub_7_): Likewise.
(atomic_cas_value_and_7_): Likewise.
(atomic_cas_value_xor_7_): Likewise.
(atomic_cas_value_or_7_): Likewise.
(atomic_cas_value_nand_7_): Likewise.
(atomic_cas_value_exchange_7_): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/cas-acquire.c: New test.
---

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk
and/or GCC 12/13 (for fixing the acquire semantics in failure_memorder)?

 gcc/config/loongarch/loongarch.cc | 27 +++---
 gcc/config/loongarch/sync.md  | 49 +--
 .../gcc.target/loongarch/cas-acquire.c| 84 +++
 3 files changed, 118 insertions(+), 42 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/cas-acquire.c

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 9b63f0dc322..d9b7a1076a2 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -5833,25 +5833,22 @@ loongarch_memmodel_needs_rel_acq_fence (enum memmodel 
model)
 }
 }
 
-/* Return true if a FENCE should be emitted to before a memory access to
-   implement the release portion of memory model MODEL.  */
+/* Return true if a FENCE should be emitted after a failed CAS to
+   implement the acquire semantic of failure_memorder.  */
 
 static bool
-loongarch_memmodel_needs_release_fence (enum memmodel model)
+loongarch_cas_failure_memorder_needs_acquire (enum memmodel model)
 {
-  switch (model)
+  switch (memmodel_base (model))
 {
+case MEMMODEL_ACQUIRE:
 case MEMMODEL_ACQ_REL:
+case MEMMODEL_CONSUME:
 case MEMMODEL_SEQ_CST:
-case MEMMODEL_SYNC_SEQ_CST:
-case MEMMODEL_RELEASE:
-case MEMMODEL_SYNC_RELEASE:
   return true;
 
-case MEMMODEL_ACQUIRE:
-case MEMMODEL_CONSUME:
-case MEMMODEL_SYNC_ACQUIRE:
 case MEMMODEL_RELAXED:
+case MEMMODEL_RELEASE:
   return false;
 
 default:
@@ -5966,7 +5963,8 @@ loongarch_print_operand_reloc (FILE *file, rtx op, bool 
hi64_part,
'd' Print CONST_INT OP in decimal.
'E' Print CONST_INT OP element 0 of a replicated CONST_VECTOR in decimal.
'F' Print the FPU branch condition for comparison OP.
-   'G' Print a DBAR insn if the memory model requires a release.
+   'G' Print a DBAR insn for CAS failure (with an acquire semantic if
+   needed, otherwise a simple load-load barrier).
'H'  Print address 52-61bit relocation associated with OP.
'h'  Print the high-part relocation associated with OP.
'i' Print i if the operand is not a register.
@@ -6057,8 +6055,11 @@ loongarch_print_operand (FILE *file, rtx op, int letter)
   break;
 
 case 'G':
-  if (loongarch_memmodel_needs_release_fence ((enum memmodel) INTVAL (op)))
-   fputs ("dbar\t0", file);
+  if (loongarch_cas_failure_memorder_needs_acquire (
+   memmodel_from_int (INTVAL (op
+   fputs ("dbar\t0b10100", file);
+  else
+   fputs ("dbar\t0x700", file);
   break;
 
 case 'h':
diff --git a/gcc/config/loongarch/sync.md b/gcc/config/loongarch/sync.md
index 9924d522bcd..db3a21690b8 100644
--- a/gcc/config/loongarch/sync.md
+++ b/gcc/config/loongarch/sync.md
@@ -129,19 +129,18 @@ (define_insn "atomic_cas_value_strong"
(clobber (match_scratch:GPR 6 "=&r"))]
   ""
 {
-  return "%G5\\n\\t"
-"1:\\n\\t"
+  return "1:\\n\\t"
 "ll.\\t%0,%1\\n\\t"
 "bne\\t%0,%z2,2f\\n\\t"
 "or%i3\\t

Re: [PATCH] MIPS: Fix PR target/98491 (ChangeLog)

2021-02-12 Thread Xi Ruoyao

Well, it just dislike my mail server :(.  Switch to the mail server of my
university.

On 2021-02-12 22:54 +0800, Xi Ruoyao wrote:
> Resend the mail.  I had to fill in a form to send mail to Robert.
> 
> On 2021-02-12 22:17 +0800, Xi Ruoyao wrote:
> > On 2021-01-11 01:01 +0800, Xi Ruoyao wrote:
> > > Hi Jeff and Jakub,
> > > 
> > > On 2021-01-04 14:19 -0700, Jeff Law wrote:
> > > > On 1/4/21 2:00 PM, Jakub Jelinek wrote:
> > > > > On Mon, Jan 04, 2021 at 01:51:59PM -0700, Jeff Law via Gcc-patches
> > > > > wrote:
> > > > > > > Sorry, I forgot to include the ChangeLog:
> > > > > > > 
> > > > > > >     gcc/ChangeLog:
> > > > > > >     
> > > > > > >     2020-12-31  Xi Ruoyao 
> > > > > > >     
> > > > > > >     PR target/98491
> > > > > > >     * config/mips/mips.c (mips_symbol_insns): Do not use
> > > > > > >   MSA_SUPPORTED_MODE_P if mode is MAX_MACHINE_MODE.
> > > > > > So I absolutely agree the current code is wrong as it does an out of
> > > > > > bounds array access.
> > > > > > 
> > > > > > 
> > > > > > Would it be better to instead to change MSA_SUPPORTED_MODE_P to
> > > > > > evaluate
> > > > > > to zero if MODE is MAX_MACHINE_MODE?  That would protect all the
> > > > > > uses
> > > > > > of
> > > > > > MSA_SUPPORTED_MODE_P.    Something like this perhaps?
> > > > > But MAX_MACHINE_MODE is the one past last valid mode, I'm not aware of
> > > > > any target that would protect all macros that deal with modes that
> > > > > way.
> > > > > 
> > > > > So, perhaps best would be stop using the MAX_MACHINE_MODE as magic
> > > > > value
> > > > > for that function and instead use say VOIDmode that shouldn't normally
> > > > > appear either?
> > > > I think we have to allow VOIDmode because constants don't necessarily
> > > > have modes.   And I certainly agree that using MAX_MACHINE_MODE like
> > > > this is ugly and error prone (as we can see from the BZ).
> > > > 
> > > > I also couldn't convince myself that the code and comments were actually
> > > > consistent, particularly for MSA targets which the comment claims can
> > > > never handle constants for ld/st (and thus should be returning 0 for
> > > > MAX_MACHINE_MODE).  Though maybe mips_symbol_insns_1 ultimately handles
> > > > that correctly.
> > > > 
> > > > 
> > > > > 
> > > > > But I don't really see anything wrong on the mips_symbol_insns above
> > > > > change either.
> > > > Me neither.  I'm just questioning if bullet-proofing in the
> > > > MSA_SUPPORTED_MODE_P would be a better option.  While I've worked in the
> > > > MIPS port in the past, I don't really have any significannt experience
> > > > with the MSA support.
> > > 
> > > I can't understand the comment either.  To me it looks like it's possible
> > > to
> > > remove this "if (MSA_SUPPORTED_P (mode)) return 0;"
> > > 
> > > CC Robert to get some help.
> > 
> > Happy new lunar year folks.
> > 
> > I found a newer email address of Robert.  Hope it is still being used.
> > 
> > Could someone update MAINTAINERS file by the way?
>

[PATCH] Fix symver attribute with LTO

2019-12-16 Thread Xi Ruoyao

Hi,
with Jan's patch commited in r278878 we can use symver attribute for functions
and variables.  The symver attribute is designed for replacing toplevel asm
statements containing ".symver" which may be removed by LTO.  Unfortunately,
a quick test shown GCC still generates buggy so file with LTO and new symver
attribute.

Two issues:

1. The symver node in symtab is marked as PREVAILING_DEF_IRONLY (no EXP) and
   then removed by LTO.
2. The actual function body implementing the symver-ed function is also marked
   as PREVAILING_DEF_IRONLY and then removed or marked as local.  So no ".globl"
   directive is outputed for it.

Both issue cause symbols with symver missing in DSO (with LTO enabled).

I modified fuse-3.9.0 code to use new symver attribute and tried to build it
with GCC trunk and LTO.  The result is a buggy DSO.  With this patch applied,
fuse-3.9.0 can be built with LTO enabled and no problem.

I'll test symver patch and this patch with more packages.

Bootstrapped/regtested x86_64-linux.  I'm not a maintainer.

gcc/ChangeLog:

2019-12-17  Xi Ruoyao  

* cgraph.h (symtab_node::used_from_object_file_p): Symver nodes are
part of DSO ABI so always used by non-LTO object files.
* ipa-visibility.c (cgraph_externally_visible_p): Functions with symver
attributes should always be visible.

Index: gcc/cgraph.h
===
--- gcc/cgraph.h(revision 279452)
+++ gcc/cgraph.h(working copy)
@@ -2682,7 +2682,7 @@ symtab_node::used_from_object_file_p (vo
 {
   if (!TREE_PUBLIC (decl) || DECL_EXTERNAL (decl))
 return false;
-  if (resolution_used_from_other_file_p (resolution))
+  if (symver || resolution_used_from_other_file_p (resolution))
 return true;
   return false;
 }
Index: gcc/ipa-visibility.c
===
--- gcc/ipa-visibility.c(revision 279452)
+++ gcc/ipa-visibility.c(working copy)
@@ -216,6 +216,8 @@ cgraph_externally_visible_p (struct cgra
 return true;
   if (lookup_attribute ("noipa", DECL_ATTRIBUTES (node->decl)))
 return true;
+  if (lookup_attribute ("symver", DECL_ATTRIBUTES (node->decl)))
+return true;
   if (TARGET_DLLIMPORT_DECL_ATTRIBUTES
   && lookup_attribute ("dllexport",
   DECL_ATTRIBUTES (node->decl)))
-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH] Fix symver attribute with LTO

2019-12-17 Thread Xi Ruoyao

On 2019-12-17 09:32 +0100, Jan Hubicka wrote:
> > Hi,
> > with Jan's patch commited in r278878 we can use symver attribute for
> > functions
> > and variables.  The symver attribute is designed for replacing toplevel asm
> > statements containing ".symver" which may be removed by LTO.  Unfortunately,
> > a quick test shown GCC still generates buggy so file with LTO and new symver
> > attribute.
> 
> Thanks for looking into this.  It was on my TODo list to actually
> convert some packages, so it is great you did that.
> > Two issues:
> > 
> > 1. The symver node in symtab is marked as PREVAILING_DEF_IRONLY (no EXP) and
> >then removed by LTO.
> 
> This is however wrong - linker should not mark it as
> PREVAILING_DEF_IRONLY if it is used externally.  What linker do you use?
> On my testcases this was working with
>  GNU ld (GNU Binutils) 2.31.51.20181222
> I could easily imagine that some linkers get it wrong which should be
> reported to bintuils bugzilla but it is also easy to work around as done
> in your patch.

Hi Jan,

I'm using GNU ld 2.33.1.

I'll attach a testcase simplified from fuse-3.9 code.  "local: *;" in the
versioning script triggers the issue.  Without it there would be no problem.

> > 2. The actual function body implementing the symver-ed function is also
> > marked
> >as PREVAILING_DEF_IRONLY and then removed or marked as local.  So no
> > ".globl"
> >directive is outputed for it.
> 
> Here is the symver-ed function exported from the DSO (or is it set
> to have hidden attribute)?
> Again this was working for me, so it would be good to understand this
> issue.

It's also triggered by "local: *;".

Untar the attachment and use "make" to build it, then "make show-dynamic-syms"
to dump the dynamic symbol table.  I believe (with 99% chance) you'll see only
foo (VERS_1) and foo_v1 (because foo_v1 is marked as global in the version
script).  And foo (VERS_2) would be missing.  With this patch foo (VERS_2) would
show up.

We can't mark "foo_v2" to be "global" because it should not be a part of DSO
ABI.

The other 1% chance would be a regression in Binutils.
-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


pr48200.tar.gz
Description: application/compressed-tar

Re: [PATCH] Fix symver attribute with LTO

2019-12-18 Thread Xi Ruoyao

On 2019-12-17 18:47 +0100, Jan Hubicka wrote:
> > Would it be equivalent to:
> > 1) output foo_v2 local
> > 2) producing static alias with local name (.L1)
> > 3) do .symver .L1,foo@@@VERS_2
> > That is somewhat more systematic and would not lead to false
> > visibilities.
> 
> I spent some time playing with this.  An in order to 
> 1) be able to handle foo_v2 according to the resolution info
>(so it behaves like a regular symbol and can be called dirrectly,
> localized and optimized)
> 2) get intended objdump -T relocations
> 3) do not polute global symbol tables
> 
> I ended up with the following codegen:
> 
>   .type   foo_v2, @function
> foo_v2:
> .LFB1:
>   .cfi_startproc
>   movl$2, %eax
>   ret
>   .cfi_endproc
> .LFE1:
>   .size   foo_v2, .-foo_v2
>   .globl  .LSYMVER0
>   .set.LSYMVER0,foo_v2
>   .symver .LSYMVER0, foo@@@VERS_2
> 
> This uses @@@ symver version of gas which seems to have odd semantics of
> requiring to be passed global symbol name which it then tkes away and
> produces foo@@VERS_2.
> 
> So the nm outoutp of the ltrans unit is:
>  T foo_v1
> 0010 t foo_v2
>  T foo@VERS_1
> 0010 T foo@@VERS_2
> 
> So the difference to your patch is that foo_v2 is static which enables
> normal optimizations.
> 
> Since additional symbol alias is produced this would also make it
> possible to attach multiple symver attributes with @@ string.
> 
> Does somehting like this make sense to you? Modulo the obvious buffer
> overflow issue?
> Honza

Unfortunately, I got an ICE with my testcase with the patch applied to trunk.

lto1: internal compiler error: tree check: expected tree that contains ‘decl
minimal’ structure, have ‘identifier_node’ in do_assemble_symver, at
varasm.c:5986
0x6fa648 tree_contains_struct_check_failed(tree_node const*,
tree_node_structure_enum, char const*, int, char const*)
../../gcc/gcc/tree.c:9859
0x71466e contains_struct_check(tree_node*, tree_node_structure_enum, char
const*, int, char const*)
../../gcc/gcc/tree.h:3387
0x71466e do_assemble_symver(tree_node*, tree_node*)
../../gcc/gcc/varasm.c:5986
0x89e409 cgraph_node::assemble_thunks_and_aliases()
../../gcc/gcc/cgraphunit.c:2225
0x89e698 cgraph_node::expand()
../../gcc/gcc/cgraphunit.c:2351
0x89f62f expand_all_functions
../../gcc/gcc/cgraphunit.c:2456
0x89f62f symbol_table::compile()
../../gcc/gcc/cgraphunit.c:2806
0x7fb589 lto_main()
../../gcc/gcc/lto/lto.c:658
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.
lto-wrapper: fatal error: /home/xry111/gcc-test/bin/gcc returned 1 exit status
compilation terminated.
/usr/bin/ld: error: lto-wrapper failed
collect2: error: ld returned 1 exit status
make: *** [Makefile:4: obj/test.so] Error 1

The change to lto/lto-common.c makes sense.  I tried it instead of my change to
cgraph.h and everything is OK.  I'll investigate the change to varasm.c a
little.
-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH] Fix symver attribute with LTO

2019-12-18 Thread Xi Ruoyao

IDENTIFIER_POINTER
> +  (DECL_ASSEMBLER_NAME (tmpdecl)),
> +buf);
> +}
>  #else
>error ("symver is only supported on ELF platforms");
>  #endif
> Index: lto/lto-common.c
> ===
> --- lto/lto-common.c  (revision 279467)
> +++ lto/lto-common.c  (working copy)
> @@ -2818,6 +2818,10 @@ read_cgraph_and_symbols (unsigned nfiles
>  IDENTIFIER_POINTER
>(DECL_ASSEMBLER_NAME (snode->decl)));
> }
> + /* Symbol versions are always used externally, but linker does not
> +report that correctly.  */
> + else if (snode->symver && *res == LDPR_PREVAILING_DEF_IRONLY)
> +   snode->resolution = LDPR_PREVAILING_DEF_IRONLY_EXP;

This is absolutely correct.

>   else
> snode->resolution = *res;
>}

I still believe we should consider symver targets to be externally visible in
cgraph_externally_visible_p.  There is a comment saying "if linker counts on us,
we must preserve the function".  That's true in our case.

And, I think

.globl  .LSYMVER0
.set.LSYMVER0, foo_v2
.symver .LSYMVER0, foo@@VERS_2

is exactly same as

.globl  foo_v2
.symver foo_v2, foo@@VERS_2

except there is an unnecessary ".LSYMVER0".  Adding ".globl foo_v2" or ".globl
foo_v1" won't cause them to be "global" in the final DSO because the linker will
hide them according to the version script.

So if it's safe we can force a ".globl foo_v2" before ".symver foo_v2,
foo@@VERS_2".  But I can't prove it's safe so I think it's better to consider
this case in cgraph_externally_visible_p.
-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH] Fix symver attribute with LTO

2019-12-18 Thread Xi Ruoyao

 in our case.
> > 
> > And, I think
> > 
> > .globl  .LSYMVER0
> > .set.LSYMVER0, foo_v2
> > .symver .LSYMVER0, foo@@VERS_2
> I produce
>   .symver .LSYMVER0, foo@@@VERS_2
> 
> > is exactly same as
> > 
> > .globl  foo_v2
> > .symver foo_v2, foo@@VERS_2
> > 
> > except there is an unnecessary ".LSYMVER0".  Adding ".globl foo_v2" or
> > ".globl
> > foo_v1" won't cause them to be "global" in the final DSO because the linker
> > will
> > hide them according to the version script.
> 
> The difference is that in first case compiler can fully control foo_v2
> symbol (with LTO it will turn it into static symbol, it will inline
> calls to it and do other things), while in the second case we need to
> treat foo_v2 specially.
> > So if it's safe we can force a ".globl foo_v2" before ".symver foo_v2,
> > foo@@VERS_2".  But I can't prove it's safe so I think it's better to
> > consider
> > this case in cgraph_externally_visible_p.
> 
> It sort of makes things work, but for example it will prevent gcc from
> inlining calls to foo_v2.  I think we will still need to do something
> about -fvisibility=hidden.
> 
> It is sad that we do not have way to produce symbol version without a
> corresponding symbol of global visiblity.  If we had we could support
> multiple symver aliases from one symbol and avoid the need to explicitly
> hide the unnecesary symbols in the map files...

Explicitly hiding the unnecessary symbols in map files is now how we handle this
[with __asm__(".symver foo_v2, foo@@VERS_2")].  We can continue to do in this
way and leave it as an enhancement.
-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH] Fix symver attribute with LTO

2019-12-18 Thread Xi Ruoyao

On 2019-12-18 14:19 +0100, Jan Hubicka wrote:
> The problem here is that we lie to the compiler (by pretending that
> foo_v2 is exported from DSO while it is not) and force user to do the
> same.
> 
> We support two ways to hide symbol - either at compile time via
> attribute((visibility("hidden"))) or at link-time via map file.  The
> first produces better code because compiler can do more optimizations
> knowing that the symbol can not be interposed.

I just get your point: if the library calls foo_v2 it won't be interposed.  If
it supposes a call to be interposed it should call foo() [foo@@VER_2] instead of
foo_v2().

But it seems there is no way we can do this [even with traditional
__asm__("symver foo, foo@@VER_2")].  For this purpose we should either:

1. Change GAS (introducing some new syntax like '' or '.symver_export')

or

2. Add some mangled symbol name in GCC (like ".LSYMVERx" or
"foo_v2.symver_export").
-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH] Fix symver attribute with LTO

2019-12-19 Thread Xi Ruoyao

On 2019-12-19 11:06 +0100, Jan Hubicka wrote:
> This is variant of your patch I comitted. It also adds verification so
> we get ICE rather then wrong code.  In addition I moved the checks away
> rom used_from_object_file.  This function is about non-LTO objects
> linked into the DSO and thus does not really fit for the check.
> Lastly we can not rely on symver attribute to still be present here.
> 
> Regtested x86_64-linux and comitted.
> Honza
>   * cgraph.c (cgraph_node_cannot_be_local_p_1): Prevent targets of
>   symver attributes to be localized.
>   * ipa-visibility.c (cgraph_externally_visible_p,
>   varpool_node::externally_visible_p): Likewise.
>   * symtab.c (symtab_node::verify_base): Check visibility of symbol
>   versions.
> 
>   * lto-common.c (read_cgraph_and_symbols): Work around binutils
>   PR25424

/* snip */

> Index: ipa-visibility.c
> ===
> --- ipa-visibility.c  (revision 279523)
> +++ ipa-visibility.c  (working copy)
> @@ -220,6 +220,14 @@ cgraph_externally_visible_p (struct cgra
>&& lookup_attribute ("dllexport",
>  DECL_ATTRIBUTES (node->decl)))
>  return true;
> +
> +  /* Limitation of gas requires us to output targets of symver aliases as
> + global symbols.  This is binutils PR 25295.  */
> +  ipa_ref *ref;
> +  FOR_EACH_ALIAS (node, ref)
> +if (ref->referring->symver)
> +  return true;
> +
>if (node->resolution == LDPR_PREVAILING_DEF_IRONLY)
>  return false;
>/* When doing LTO or whole program, we can bring COMDAT functoins static.
> @@ -284,14 +292,13 @@ varpool_node::externally_visible_p (void
>  DECL_ATTRIBUTES (decl)))
>  return true;
>  
> -  /* See if we have linker information about symbol not being used or
> - if we need to make guess based on the declaration.
> +  /* Limitation of gas requires us to output targets of symver aliases as
> + global symbols.  This is binutils PR 25295.  */
> +  ipa_ref *ref;
> +  FOR_EACH_ALIAS (this, ref)
> +if (ref->referring->symver)
> +  return true;
>  
> - Even if the linker clams the symbol is unused, never bring internal
> - symbols that are declared by user as used or externally visible.
> - This is needed for i.e. references from asm statements.   */
> -  if (used_from_object_file_p ())
> -return true;

Are these two lines removed intentionally?

>if (resolution == LDPR_PREVAILING_DEF_IRONLY)
>  return false;
>  
> Index: lto/lto-common.c
> ===
-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH] Fix symver attribute with LTO

2019-12-19 Thread Xi Ruoyao

On 2019-12-19 19:12 +0800, Xi Ruoyao wrote:
> On 2019-12-19 11:06 +0100, Jan Hubicka wrote:
> > -  /* See if we have linker information about symbol not being used or
> > - if we need to make guess based on the declaration.
> > +  /* Limitation of gas requires us to output targets of symver aliases as
> > + global symbols.  This is binutils PR 25295.  */
> > +  ipa_ref *ref;
> > +  FOR_EACH_ALIAS (this, ref)
> > +if (ref->referring->symver)
> > +  return true;
> >  
> > - Even if the linker clams the symbol is unused, never bring internal
> > - symbols that are declared by user as used or externally visible.
> > - This is needed for i.e. references from asm statements.   */
> > -  if (used_from_object_file_p ())
> > -return true;
> 
> Are these two lines removed intentionally?

Oh I see, it was a duplicated branch.

Sorry for noise.
-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH] LoongArch: Fix inconsistent description in *sge_

2024-03-04 Thread Xi Ruoyao

On Mon, 2024-03-04 at 11:03 +0800, Guo Jie wrote:
> The constraint of op[1] is inconsistent with the output template.
> 
> gcc/ChangeLog:
> 
>   * config/loongarch/loongarch.md
>   (define_insn "*sge_"): Fix inconsistency
>   error.
>
> ---
>  gcc/config/loongarch/loongarch.md | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/config/loongarch/loongarch.md
> b/gcc/config/loongarch/loongarch.md
> index f3b5c641fce..2d25374bdc9 100644
> --- a/gcc/config/loongarch/loongarch.md
> +++ b/gcc/config/loongarch/loongarch.md
> @@ -3357,10 +3357,10 @@ (define_insn "*sgt_"
>  
>  (define_insn "*sge_"
>    [(set (match_operand:GPR 0 "register_operand" "=r")
> - (any_ge:GPR (match_operand:X 1 "register_operand" "r")
> + (any_ge:GPR (match_operand:X 1 "arith_operand" "rI")
>    (const_int 1)))]

No, arith_operand is just register_operand or const_imm12_operand, but
comparing a const_imm12_operand with (const_int 1) should be folded into
a constant (even at -O0, AFAIK).  So allowing const_imm12_operand here
makes no benefit.

>    ""
> -  "slti\t%0,%.,%1"
> +  "slt%i1\t%0,%.,%1"
>    [(set_attr "type" "slt")
>     (set_attr "mode" "")])
>  

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH v2] LoongArch: Fix inconsistent description in *sge_

2024-03-05 Thread Xi Ruoyao

On Tue, 2024-03-05 at 16:05 +0800, Guo Jie wrote:
> The constraint of op[1] is inconsistent with the output template.
> 
> gcc/ChangeLog:
> 
>   * config/loongarch/loongarch.md
>   (define_insn "*sge_"): Fix inconsistency
>   error.
> 
> ---
> Update in v2:
>     Remove useless support for op[1] is const_imm12_operand.
> 
> ---
>  gcc/config/loongarch/loongarch.md | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/config/loongarch/loongarch.md 
> b/gcc/config/loongarch/loongarch.md
> index f3b5c641fce..e35a001e0ed 100644
> --- a/gcc/config/loongarch/loongarch.md
> +++ b/gcc/config/loongarch/loongarch.md
> @@ -3360,7 +3360,7 @@ (define_insn "*sge_"
>   (any_ge:GPR (match_operand:X 1 "register_operand" "r")
>    (const_int 1)))]
>    ""
> -  "slti\t%0,%.,%1"
> +  "slt\t%0,%.,%1"
>    [(set_attr "type" "slt")
>     (set_attr "mode" "")])

Hmm, this define_insn seems never really used or it would generate
something like "sltui $r4,$r0,$r4" and trigger an assembler failure. 
The generic path seems already converting "x >= 1" to "x > 0".

So it seems we should just remove this define_insn?


-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

[PATCH v3] testsuite: Add a test case for negating FP vectors containing zeros

2024-03-05 Thread Xi Ruoyao

Recently I've fixed two wrong FP vector negate implementation which
caused wrong sign bits in zeros in targets (r14-8786 and r14-8801).  To
prevent a similar issue from happening again, add a test case.

Tested on x86_64 (with SSE2, AVX, AVX2, and AVX512F), AArch64, MIPS
(with MSA), LoongArch (with LSX and LASX).

gcc/testsuite:

* gcc.dg/vect/vect-neg-zero.c: New test.
---

- v1 -> v2: Remove { dg-do run } which may cause SIGILL.
- v2 -> v3: Add -fno-associative-math to fix an excessive warning on
  arm.

Ok for trunk?

 gcc/testsuite/gcc.dg/vect/vect-neg-zero.c | 38 +++
 1 file changed, 38 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-neg-zero.c

diff --git a/gcc/testsuite/gcc.dg/vect/vect-neg-zero.c 
b/gcc/testsuite/gcc.dg/vect/vect-neg-zero.c
new file mode 100644
index 000..21fa00cfa15
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-neg-zero.c
@@ -0,0 +1,38 @@
+/* { dg-add-options ieee } */
+/* { dg-additional-options "-fno-associative-math -fsigned-zeros" } */
+
+double x[4] = {-0.0, 0.0, -0.0, 0.0};
+float y[8] = {-0.0, 0.0, -0.0, 0.0, -0.0, -0.0, 0.0, 0.0};
+
+static __attribute__ ((always_inline)) inline void
+test (int factor)
+{
+  double a[4];
+  float b[8];
+
+  asm ("" ::: "memory");
+
+  for (int i = 0; i < 2 * factor; i++)
+a[i] = -x[i];
+
+  for (int i = 0; i < 4 * factor; i++)
+b[i] = -y[i];
+
+#pragma GCC novector
+  for (int i = 0; i < 2 * factor; i++)
+if (__builtin_signbit (a[i]) == __builtin_signbit (x[i]))
+  __builtin_abort ();
+
+#pragma GCC novector
+  for (int i = 0; i < 4 * factor; i++)
+if (__builtin_signbit (b[i]) == __builtin_signbit (y[i]))
+  __builtin_abort ();
+}
+
+int
+main (void)
+{
+  test (1);
+  test (2);
+  return 0;
+}
-- 
2.44.0

[PATCH v2] LoongArch: Allow s9 as a register alias

2024-03-05 Thread Xi Ruoyao

The psABI allows using s9 as an alias of r22.

gcc/ChangeLog:

* config/loongarch/loongarch.h (ADDITIONAL_REGISTER_NAMES): Add
s9 as an alias of r22.
---

v1 -> v2: Add a test case.

Ok for trunk?

 gcc/config/loongarch/loongarch.h   | 1 +
 gcc/testsuite/gcc.target/loongarch/regname-fp-s9.c | 3 +++
 2 files changed, 4 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/regname-fp-s9.c

diff --git a/gcc/config/loongarch/loongarch.h b/gcc/config/loongarch/loongarch.h
index 8b453ab3140..bf2351f0968 100644
--- a/gcc/config/loongarch/loongarch.h
+++ b/gcc/config/loongarch/loongarch.h
@@ -931,6 +931,7 @@ typedef struct {
   { "t8",  20 + GP_REG_FIRST },\
   { "x",   21 + GP_REG_FIRST },\
   { "fp",  22 + GP_REG_FIRST },\
+  { "s9",  22 + GP_REG_FIRST },\
   { "s0",  23 + GP_REG_FIRST },\
   { "s1",  24 + GP_REG_FIRST },\
   { "s2",  25 + GP_REG_FIRST },\
diff --git a/gcc/testsuite/gcc.target/loongarch/regname-fp-s9.c 
b/gcc/testsuite/gcc.target/loongarch/regname-fp-s9.c
new file mode 100644
index 000..d2e3b80f83c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/regname-fp-s9.c
@@ -0,0 +1,3 @@
+/* { dg-do compile } */
+register long s9 asm("s9"); /* { dg-note "conflicts with 's9'" } */
+register long fp asm("fp"); /* { dg-warning "register of 'fp' used for 
multiple global register variables" } */
-- 
2.44.0

[PATCH] LoongArch: testsuite: Rewrite {x, }vfcmp-{d, f}.c to avoid named registers

2024-03-05 Thread Xi Ruoyao

Loops on named vector register are not vectorized (see comment 11 of
PR113622), so the these test cases have been failing for a while.
Rewrite them using check-function-bodies to remove hard coding register
names.  A barrier is needed to always load the first operand before the
second operand.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vfcmp-f.c: Rewrite to avoid named
registers.
* gcc.target/loongarch/vfcmp-d.c: Likewise.
* gcc.target/loongarch/xvfcmp-f.c: Likewise.
* gcc.target/loongarch/xvfcmp-d.c: Likewise.
---

Tested on loongarch64-linux-gnu.  Ok for trunk?

 gcc/testsuite/gcc.target/loongarch/vfcmp-d.c  | 202 --
 gcc/testsuite/gcc.target/loongarch/vfcmp-f.c  | 347 ++
 gcc/testsuite/gcc.target/loongarch/xvfcmp-d.c | 202 --
 gcc/testsuite/gcc.target/loongarch/xvfcmp-f.c | 204 --
 4 files changed, 816 insertions(+), 139 deletions(-)

diff --git a/gcc/testsuite/gcc.target/loongarch/vfcmp-d.c 
b/gcc/testsuite/gcc.target/loongarch/vfcmp-d.c
index 8b870ef38a0..87e4ed19e96 100644
--- a/gcc/testsuite/gcc.target/loongarch/vfcmp-d.c
+++ b/gcc/testsuite/gcc.target/loongarch/vfcmp-d.c
@@ -1,28 +1,188 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mlsx -ffixed-f0 -ffixed-f1 -ffixed-f2 
-fno-vect-cost-model" } */
+/* { dg-options "-O2 -mlsx -fno-vect-cost-model" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #define F double
 #define I long long
 
 #include "vfcmp-f.c"
 
-/* { dg-final { scan-assembler 
"compare_quiet_equal:.*\tvfcmp\\.ceq\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_quiet_equal\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_not_equal:.*\tvfcmp\\.cune\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_quiet_not_equal\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_signaling_greater:.*\tvfcmp\\.slt\\.d\t\\\$vr2,\\\$vr1,\\\$vr0.*-compare_signaling_greater\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_signaling_greater_equal:.*\tvfcmp\\.sle\\.d\t\\\$vr2,\\\$vr1,\\\$vr0.*-compare_signaling_greater_equal\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_signaling_less:.*\tvfcmp\\.slt\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_signaling_less\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_signaling_less_equal:.*\tvfcmp\\.sle\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_signaling_less_equal\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_signaling_not_greater:.*\tvfcmp\\.sule\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_signaling_not_greater\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_signaling_less_unordered:.*\tvfcmp\\.sult\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_signaling_less_unordered\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_signaling_not_less:.*\tvfcmp\\.sule\\.d\t\\\$vr2,\\\$vr1,\\\$vr0.*-compare_signaling_not_less\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_signaling_greater_unordered:.*\tvfcmp\\.sult\\.d\t\\\$vr2,\\\$vr1,\\\$vr0.*-compare_signaling_greater_unordered\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_less:.*\tvfcmp\\.clt\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_quiet_less\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_less_equal:.*\tvfcmp\\.cle\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_quiet_less_equal\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_greater:.*\tvfcmp\\.clt\\.d\t\\\$vr2,\\\$vr1,\\\$vr0.*-compare_quiet_greater\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_greater_equal:.*\tvfcmp\\.cle\\.d\t\\\$vr2,\\\$vr1,\\\$vr0.*-compare_quiet_greater_equal\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_not_less:.*\tvfcmp\\.cule\\.d\t\\\$vr2,\\\$vr1,\\\$vr0.*-compare_quiet_not_less\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_greater_unordered:.*\tvfcmp\\.cult\\.d\t\\\$vr2,\\\$vr1,\\\$vr0.*-compare_quiet_greater_unordered\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_not_greater:.*\tvfcmp\\.cule\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_quiet_not_greater\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_less_unordered:.*\tvfcmp\\.cult\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_quiet_less_unordered\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_unordered:.*\tvfcmp\\.cun\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_quiet_unordered\n"
 } } */
-/* { dg-final { scan-assembler 
"compare_quiet_ordered:.*\tvfcmp\\.cor\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_quiet_ordered\n"
 } } */
+/*
+** compare_quiet_equal:
+** vld (\$vr[0-9]+),\$r4,0
+** vld (\$vr[0-9]+),\$r5,0
+** vfcmp.ceq.d (\$vr[0-9]+),(\1,\2|\2,\1)
+** vst \3,\$r6,0
+** jr  \$r1
+*/
+
+/*
+** compare_quiet_not_equal:
+** vld (\$vr[0-9]+),\$r4,0
+** vld (\$vr[0-9]+),\$r5,0
+** vfcmp.cune.d(\$vr[0-9]+),(\1,\2|\2,\1)
+** vst \3,\$r6,0
+** jr  \$r1
+*/
+
+/*
+** compare_signaling_greater:
+** vld (\$vr[0-9]+),\$r4,0
+** vld (\$vr[0-9]+),\$r5,0
+** vfcmp.slt.d (\$vr[0-9]+),\2,\1
+** vst \3,\$r6,0
+**

Re: [PATCH] LoongArch: Emit R_LARCH_RELAX for TLS IE with non-extreme code model to allow the IE to LE linker relaxation

2024-03-06 Thread Xi Ruoyao

On Thu, 2024-03-07 at 10:43 +0800, mengqinggang wrote:
> Hi,
> 
> Whether to add an option to control the generation of R_LARCH_RELAX,
> similar to as -mrelax/-mno-relax.

There are already -mrelax and -mno-relax, they can be checked in the
compiler code with TARGET_LINKER_RELAXATION.

/* snip */

> > +    case 'Q':
> > +  if (!TARGET_LINKER_RELAXATION)
> > +break;

So with -mno-relax we'll break early here, then no R_LARCH_RELAX will be
printed.

> > +  if (code == HIGH)
> > +op = XEXP (op, 0);
> > +
> > +  if (loongarch_classify_symbolic_expression (op) == SYMBOL_TLS_IE)
> > +fprintf (file, ".reloc\t.,R_LARCH_RELAX\n\t");
> > +
> > +  break;

The tls-ie-norelax.c test case also checks for -mno-relax:

> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -mcmodel=normal -mexplicit-relocs -mno-relax" } */
> > +/* { dg-final { scan-assembler-not "R_LARCH_RELAX" { target tls_native } } 
> > } */

i.e. -mno-relax is used compiling this test case, and the compiled
assembly code should not contain R_LARCH_RELAX.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH v1] LoongArch: Fixed an issue with the implementation of the template atomic_compare_and_swapsi.

2024-03-07 Thread Xi Ruoyao

On Thu, 2024-03-07 at 09:12 +0800, Lulu Cheng wrote:

> +  output_asm_insn ("1:", operands);
> +  output_asm_insn ("ll.\t%0,%1", operands);
> +
> +  /* Like the test case atomic-cas-int.C, in loongarch64, O1 and higher, the
> + return value of the val_without_const_folding will not be truncated and
> + will be passed directly to the function compare_exchange_strong.
> + However, the instruction 'bne' does not distinguish between 32-bit and
> + 64-bit operations.  so if the upper 32 bits of the register are not
> + extended by the 32nd bit symbol, then the comparison may not be valid
> + here.  This will affect the result of the operation.  */
> +
> +  if (TARGET_64BIT && REG_P (operands[2])
> +  && GET_MODE (operands[2]) == SImode)
> +    {
> +  output_asm_insn ("addi.w\t%5,%2,0", operands);
> +  output_asm_insn ("bne\t%0,%5,2f", operands);

It should be better to extend the expected value before the ll/sc loop
(like what LLVM does), instead of repeating the extending in each
iteration.  Something like:

diff --git a/gcc/config/loongarch/sync.md b/gcc/config/loongarch/sync.md
index 8f35a5b48d2..c21781947fd 100644
--- a/gcc/config/loongarch/sync.md
+++ b/gcc/config/loongarch/sync.md
@@ -234,11 +234,11 @@ (define_insn "atomic_exchange_short"
   "amswap%A3.\t%0,%z2,%1"
   [(set (attr "length") (const_int 4))])
 
-(define_insn "atomic_cas_value_strong"
+(define_insn "atomic_cas_value_strong"
   [(set (match_operand:GPR 0 "register_operand" "=&r")
(match_operand:GPR 1 "memory_operand" "+ZC"))
(set (match_dup 1)
-   (unspec_volatile:GPR [(match_operand:GPR 2 "reg_or_0_operand" "rJ")
+   (unspec_volatile:GPR [(match_operand:X 2 "reg_or_0_operand" "rJ")
  (match_operand:GPR 3 "reg_or_0_operand" "rJ")
  (match_operand:SI 4 "const_int_operand")]  ;; 
mod_s
 UNSPEC_COMPARE_AND_SWAP))
@@ -246,10 +246,10 @@ (define_insn "atomic_cas_value_strong"
   ""
 {
   return "1:\\n\\t"
-"ll.\\t%0,%1\\n\\t"
+"ll.\\t%0,%1\\n\\t"
 "bne\\t%0,%z2,2f\\n\\t"
 "or%i3\\t%5,$zero,%3\\n\\t"
-"sc.\\t%5,%1\\n\\t"
+"sc.\\t%5,%1\\n\\t"
 "beqz\\t%5,1b\\n\\t"
 "b\\t3f\\n\\t"
 "2:\\n\\t"
@@ -301,9 +301,23 @@ (define_expand "atomic_compare_and_swap"
 operands[3], 
operands[4],
 operands[6]));
   else
-emit_insn (gen_atomic_cas_value_strong (operands[1], operands[2],
- operands[3], operands[4],
- operands[6]));
+{
+  rtx (*cas)(rtx, rtx, rtx, rtx, rtx) =
+   TARGET_64BIT ? gen_atomic_cas_value_strongdi
+: gen_atomic_cas_value_strongsi;
+  rtx expect = operands[3];
+
+  if (mode == SImode
+ && TARGET_64BIT
+ && operands[3] != const0_rtx)
+   {
+ expect = gen_reg_rtx (DImode);
+ emit_insn (gen_extendsidi2 (expect, operands[3]));
+   }
+
+  emit_insn (cas (operands[1], operands[2], expect, operands[4],
+ operands[6]));
+}
 
   rtx compare = operands[1];
   if (operands[3] != const0_rtx)

It produces:

slli.w  $r4,$r4,0
1:
ll.w$r14,$r3,0
bne $r14,$r4,2f
or  $r15,$zero,$r12
sc.w$r15,$r3,0
beqz$r15,1b
b   3f
2:
dbar0b10100
3:

for the test case and the compiled test case runs successfully.  I've
not done a full bootstrap yet though.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH v1] LoongArch: Fixed an issue with the implementation of the template atomic_compare_and_swapsi.

2024-03-07 Thread Xi Ruoyao

On Thu, 2024-03-07 at 21:07 +0800, chenglulu wrote:
> 
> 在 2024/3/7 下午8:52, Xi Ruoyao 写道:
> > It should be better to extend the expected value before the ll/sc loop
> > (like what LLVM does), instead of repeating the extending in each
> > iteration.  Something like:
> 
> I wanted to do this at first, but it didn't work out.
> 
> But then I thought about it, and there are two benefits to putting it in 
> the middle of ll/sc:
> 
> 1. If there is an operation that uses the $r4 register after this atomic 
> operation, another
> 
> register is required to store $r4.
> 
> 2. ll.w requires long cycles, so putting an addi.w command after ll.w 
> won't make a difference.
> 
> So based on the above, I didn't try again, but directly made a 
> modification like a patch.

Ah, the explanation makes sense to me.  Ok with the original patch then.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH v4] LoongArch: Add support for TLS descriptors

2024-03-12 Thread Xi Ruoyao

On Tue, 2024-03-12 at 17:20 +0800, mengqinggang wrote:
> +(define_insn "@got_load_tls_desc"
> +  [(set (match_operand:P 0 "register_operand" "=r")
> + (unspec:P
> +     [(match_operand:P 1 "symbolic_operand" "")]
> +     UNSPEC_TLS_DESC))
> +    (clobber (reg:SI FCC0_REGNUM))
> +    (clobber (reg:SI FCC1_REGNUM))
> +    (clobber (reg:SI FCC2_REGNUM))
> +    (clobber (reg:SI FCC3_REGNUM))
> +    (clobber (reg:SI FCC4_REGNUM))
> +    (clobber (reg:SI FCC5_REGNUM))
> +    (clobber (reg:SI FCC6_REGNUM))
> +    (clobber (reg:SI FCC7_REGNUM))
> +    (clobber (reg:SI RETURN_ADDR_REGNUM))]
> +  "TARGET_TLS_DESC"
> +{
> +  return TARGET_EXPLICIT_RELOCS
> +    ? "pcalau12i\t$r4,%%desc_pc_hi20(%1)\n\
> +  \taddi.d\t$r4,$r4,%%desc_pc_lo12(%1)\n\
> +  \tld.d\t$r1,$r4,%%desc_ld(%1)\n\
> +  \tjirl\t$r1,$r1,%%desc_call(%1)"

Use something like

? "pcalau12i\t$r4,%%desc_pc_hi20(%1)\n\t"
  "addi.d\t$r4,$r4,%%desc_pc_lo12(%1)\n\t"
  "ld.d\t$r1,$r4,%%desc_ld(%1)\n\t"
  "jirl\t$r1,$r1,%%desc_call(%1)"
: "la.tls.desc\t%0,%1";

to prevent additional white spaces in the output asm before tabs.

> +    : "la.tls.desc\t%0,%1";
> +}
> +  [(set_attr "got" "load")
> +   (set_attr "mode" "")
> +   (set_attr "length" "16")])
> +
> +(define_insn "got_load_tls_desc_off64"
> +  [(set (match_operand:DI 0 "register_operand" "=r")
> + (unspec:DI
> +     [(match_operand:DI 1 "symbolic_operand" "")]
> +     UNSPEC_TLS_DESC_OFF64))
> +    (clobber (reg:SI FCC0_REGNUM))
> +    (clobber (reg:SI FCC1_REGNUM))
> +    (clobber (reg:SI FCC2_REGNUM))
> +    (clobber (reg:SI FCC3_REGNUM))
> +    (clobber (reg:SI FCC4_REGNUM))
> +    (clobber (reg:SI FCC5_REGNUM))
> +    (clobber (reg:SI FCC6_REGNUM))
> +    (clobber (reg:SI FCC7_REGNUM))
> +    (clobber (reg:SI RETURN_ADDR_REGNUM))
> +    (clobber (match_operand:DI 2 "register_operand" "=&r"))]
> +  "TARGET_TLS_DESC && TARGET_CMODEL_EXTREME"
> +{
> +  return TARGET_EXPLICIT_RELOCS
> +    ? "pcalau12i\t$r4,%%desc_pc_hi20(%1)\n\
> +  \taddi.d\t%2,$r0,%%desc_pc_lo12(%1)\n\
> +  \tlu32i.d\t%2,%%desc64_pc_lo20(%1)\n\
> +  \tlu52i.d\t%2,%2,%%desc64_pc_hi12(%1)\n\
> +  \tadd.d\t$r4,$r4,%2\n\
> +  \tld.d\t$r1,$r4,%%desc_ld(%1)\n\
> +  \tjirl\t$r1,$r1,%%desc_call(%1)"
> +    : "la.tls.desc\t%0,%2,%1";

Likewise.

> +}
> +  [(set_attr "got" "load")
> +   (set_attr "length" "28")])

Otherwise OK.

It's better to allow splitting these two instructions but we can do it
in another patch.  And IMO it's better to enable TLS desc by default if
supported by both the assembler and the libc, but we'll have to defer it
until Glibc 2.40 release.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH v4] LoongArch: Add support for TLS descriptors

2024-03-12 Thread Xi Ruoyao

On Wed, 2024-03-13 at 06:15 +0800, Xi Ruoyao wrote:
> > +(define_insn "@got_load_tls_desc"
> > +  [(set (match_operand:P 0 "register_operand" "=r")

Hmm, and it looks like we should use (reg:P 4) instead of match_operand
here, because the instruction does not work for a different register:
with TARGET_EXPLICIT_RELOCS we are hard coding r4, and without
TARGET_EXPLICIT_RELOCS the TLS desc function still only puts the return
value in r4.

> > +   (unspec:P
> > +       [(match_operand:P 1 "symbolic_operand" "")]
> > +       UNSPEC_TLS_DESC))
> > +    (clobber (reg:SI FCC0_REGNUM))
> > +    (clobber (reg:SI FCC1_REGNUM))
> > +    (clobber (reg:SI FCC2_REGNUM))
> > +    (clobber (reg:SI FCC3_REGNUM))
> > +    (clobber (reg:SI FCC4_REGNUM))
> > +    (clobber (reg:SI FCC5_REGNUM))
> > +    (clobber (reg:SI FCC6_REGNUM))
> > +    (clobber (reg:SI FCC7_REGNUM))
> > +    (clobber (reg:SI RETURN_ADDR_REGNUM))]
> > +  "TARGET_TLS_DESC"
> > +{
> > +  return TARGET_EXPLICIT_RELOCS
> > +    ? "pcalau12i\t$r4,%%desc_pc_hi20(%1)\n\
> > +  \taddi.d\t$r4,$r4,%%desc_pc_lo12(%1)\n\
> > +  \tld.d\t$r1,$r4,%%desc_ld(%1)\n\
> > +  \tjirl\t$r1,$r1,%%desc_call(%1)"
> 
> Use something like
> 
>     ? "pcalau12i\t$r4,%%desc_pc_hi20(%1)\n\t"
>   "addi.d\t$r4,$r4,%%desc_pc_lo12(%1)\n\t"
>   "ld.d\t$r1,$r4,%%desc_ld(%1)\n\t"
>   "jirl\t$r1,$r1,%%desc_call(%1)"
>     : "la.tls.desc\t%0,%1";
> 
> to prevent additional white spaces in the output asm before tabs.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH v4] LoongArch: Add support for TLS descriptors

2024-03-12 Thread Xi Ruoyao

On Wed, 2024-03-13 at 06:56 +0800, Xi Ruoyao wrote:
> On Wed, 2024-03-13 at 06:15 +0800, Xi Ruoyao wrote:
> > > +(define_insn "@got_load_tls_desc"
> > > +  [(set (match_operand:P 0 "register_operand" "=r")
> 
> Hmm, and it looks like we should use (reg:P 4) instead of match_operand
> here, because the instruction does not work for a different register:
> with TARGET_EXPLICIT_RELOCS we are hard coding r4, and without
> TARGET_EXPLICIT_RELOCS the TLS desc function still only puts the return
> value in r4.

Suggested changes:

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 303666bf6d5..8f4d3f36c26 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -2954,10 +2954,10 @@ loongarch_legitimize_tls_address (rtx loc)
  tp = gen_rtx_REG (Pmode, THREAD_POINTER_REGNUM);
 
  if (TARGET_CMODEL_EXTREME)
-   emit_insn (gen_got_load_tls_desc_off64 (a0, loc,
+   emit_insn (gen_got_load_tls_desc_off64 (loc,
gen_reg_rtx (DImode)));
  else
-   emit_insn (gen_got_load_tls_desc (Pmode, a0, loc));
+   emit_insn (gen_got_load_tls_desc (Pmode, loc));
 
  emit_insn (gen_add3_insn (dest, a0, tp));
}
diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 0a1a6a24f61..8e8f1012344 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -2772,9 +2772,9 @@ (define_insn "store_word"
 ;; Thread-Local Storage
 
 (define_insn "@got_load_tls_desc"
-  [(set (match_operand:P 0 "register_operand" "=r")
+  [(set (reg:P 4)
(unspec:P
-   [(match_operand:P 1 "symbolic_operand" "")]
+   [(match_operand:P 0 "symbolic_operand" "")]
UNSPEC_TLS_DESC))
 (clobber (reg:SI FCC0_REGNUM))
 (clobber (reg:SI FCC1_REGNUM))
@@ -2788,20 +2788,20 @@ (define_insn "@got_load_tls_desc"
   "TARGET_TLS_DESC"
 {
   return TARGET_EXPLICIT_RELOCS
-? "pcalau12i\t$r4,%%desc_pc_hi20(%1)\n\
-  \taddi.d\t$r4,$r4,%%desc_pc_lo12(%1)\n\
-  \tld.d\t$r1,$r4,%%desc_ld(%1)\n\
-  \tjirl\t$r1,$r1,%%desc_call(%1)"
-: "la.tls.desc\t%0,%1";
+? "pcalau12i\t$r4,%%desc_pc_hi20(%0)\n\t"
+  "addi.d\t$r4,$r4,%%desc_pc_lo12(%0)\n\t"
+  "ld.d\t$r1,$r4,%%desc_ld(%0)\n\t"
+  "jirl\t$r1,$r1,%%desc_call(%0)"
+: "la.tls.desc\t$r4,%0";
 }
   [(set_attr "got" "load")
(set_attr "mode" "")
(set_attr "length" "16")])
 
 (define_insn "got_load_tls_desc_off64"
-  [(set (match_operand:DI 0 "register_operand" "=r")
+  [(set (reg:DI 4)
(unspec:DI
-   [(match_operand:DI 1 "symbolic_operand" "")]
+   [(match_operand:DI 0 "symbolic_operand" "")]
UNSPEC_TLS_DESC_OFF64))
 (clobber (reg:SI FCC0_REGNUM))
 (clobber (reg:SI FCC1_REGNUM))
@@ -2812,18 +2812,18 @@ (define_insn "got_load_tls_desc_off64"
 (clobber (reg:SI FCC6_REGNUM))
 (clobber (reg:SI FCC7_REGNUM))
 (clobber (reg:SI RETURN_ADDR_REGNUM))
-(clobber (match_operand:DI 2 "register_operand" "=&r"))]
+(clobber (match_operand:DI 1 "register_operand" "=&r"))]
   "TARGET_TLS_DESC && TARGET_CMODEL_EXTREME"
 {
   return TARGET_EXPLICIT_RELOCS
-? "pcalau12i\t$r4,%%desc_pc_hi20(%1)\n\
-  \taddi.d\t%2,$r0,%%desc_pc_lo12(%1)\n\
-  \tlu32i.d\t%2,%%desc64_pc_lo20(%1)\n\
-  \tlu52i.d\t%2,%2,%%desc64_pc_hi12(%1)\n\
-  \tadd.d\t$r4,$r4,%2\n\
-  \tld.d\t$r1,$r4,%%desc_ld(%1)\n\
-  \tjirl\t$r1,$r1,%%desc_call(%1)"
-: "la.tls.desc\t%0,%2,%1";
+? "pcalau12i\t$r4,%%desc_pc_hi20(%0)\n\t"
+  "addi.d\t%1,$r0,%%desc_pc_lo12(%0)\n\t"
+  "lu32i.d\t%1,%%desc64_pc_lo20(%0)\n\t"
+  "lu52i.d\t%1,%2,%%desc64_pc_hi12(%0)\n\t"
+  "add.d\t$r4,$r4,%1\n\t"
+  "ld.d\t$r1,$r4,%%desc_ld(%0)\n\t"
+  "jirl\t$r1,$r1,%%desc_call(%0)"
+: "la.tls.desc\t$r4,%1,%0";
 }
   [(set_attr "got" "load")
(set_attr "length" "28")])

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH v4] LoongArch: Add support for TLS descriptors

2024-03-13 Thread Xi Ruoyao

On Wed, 2024-03-13 at 11:06 +0800, mengqinggang wrote:
> 
> 在 2024/3/13 上午6:15, Xi Ruoyao 写道:
> > On Tue, 2024-03-12 at 17:20 +0800, mengqinggang wrote:
> > > +(define_insn "@got_load_tls_desc"
> > > +  [(set (match_operand:P 0 "register_operand" "=r")
> > > + (unspec:P
> > > +     [(match_operand:P 1 "symbolic_operand" "")]
> > > +     UNSPEC_TLS_DESC))
> > > +    (clobber (reg:SI FCC0_REGNUM))
> > > +    (clobber (reg:SI FCC1_REGNUM))
> > > +    (clobber (reg:SI FCC2_REGNUM))
> > > +    (clobber (reg:SI FCC3_REGNUM))
> > > +    (clobber (reg:SI FCC4_REGNUM))
> > > +    (clobber (reg:SI FCC5_REGNUM))
> > > +    (clobber (reg:SI FCC6_REGNUM))
> > > +    (clobber (reg:SI FCC7_REGNUM))
> > > +    (clobber (reg:SI RETURN_ADDR_REGNUM))]
> > > +  "TARGET_TLS_DESC"
> > > +{
> > > +  return TARGET_EXPLICIT_RELOCS
> > > +    ? "pcalau12i\t$r4,%%desc_pc_hi20(%1)\n\
> > > +  \taddi.d\t$r4,$r4,%%desc_pc_lo12(%1)\n\
> > > +  \tld.d\t$r1,$r4,%%desc_ld(%1)\n\
> > > +  \tjirl\t$r1,$r1,%%desc_call(%1)"
> > Use something like
> > 
> >  ? "pcalau12i\t$r4,%%desc_pc_hi20(%1)\n\t"
> >    "addi.d\t$r4,$r4,%%desc_pc_lo12(%1)\n\t"
> >    "ld.d\t$r1,$r4,%%desc_ld(%1)\n\t"
> >    "jirl\t$r1,$r1,%%desc_call(%1)"
> >  : "la.tls.desc\t%0,%1";
> > 
> > to prevent additional white spaces in the output asm before tabs.
> > 
> > > +    : "la.tls.desc\t%0,%1";
> > > +}
> > > +  [(set_attr "got" "load")
> > > +   (set_attr "mode" "")
> > > +   (set_attr "length" "16")])
> > > +
> > > +(define_insn "got_load_tls_desc_off64"
> > > +  [(set (match_operand:DI 0 "register_operand" "=r")
> > > + (unspec:DI
> > > +     [(match_operand:DI 1 "symbolic_operand" "")]
> > > +     UNSPEC_TLS_DESC_OFF64))
> > > +    (clobber (reg:SI FCC0_REGNUM))
> > > +    (clobber (reg:SI FCC1_REGNUM))
> > > +    (clobber (reg:SI FCC2_REGNUM))
> > > +    (clobber (reg:SI FCC3_REGNUM))
> > > +    (clobber (reg:SI FCC4_REGNUM))
> > > +    (clobber (reg:SI FCC5_REGNUM))
> > > +    (clobber (reg:SI FCC6_REGNUM))
> > > +    (clobber (reg:SI FCC7_REGNUM))
> > > +    (clobber (reg:SI RETURN_ADDR_REGNUM))
> > > +    (clobber (match_operand:DI 2 "register_operand" "=&r"))]
> > > +  "TARGET_TLS_DESC && TARGET_CMODEL_EXTREME"
> > > +{
> > > +  return TARGET_EXPLICIT_RELOCS
> > > +    ? "pcalau12i\t$r4,%%desc_pc_hi20(%1)\n\
> > > +  \taddi.d\t%2,$r0,%%desc_pc_lo12(%1)\n\
> > > +  \tlu32i.d\t%2,%%desc64_pc_lo20(%1)\n\
> > > +  \tlu52i.d\t%2,%2,%%desc64_pc_hi12(%1)\n\
> > > +  \tadd.d\t$r4,$r4,%2\n\
> > > +  \tld.d\t$r1,$r4,%%desc_ld(%1)\n\
> > > +  \tjirl\t$r1,$r1,%%desc_call(%1)"
> > > +    : "la.tls.desc\t%0,%2,%1";
> > Likewise.
> > 
> > > +}
> > > +  [(set_attr "got" "load")
> > > +   (set_attr "length" "28")])
> > Otherwise OK.
> > 
> > It's better to allow splitting these two instructions but we can do it
> > in another patch.  And IMO it's better to enable TLS desc by default if
> > supported by both the assembler and the libc, but we'll have to defer it
> > until Glibc 2.40 release.
> 
> 
> Do we need to wait until LLVM also supports TLS DESC  before setting it 
> as default?

Hmm, maybe...  I remember when we added R_LARCH_ALIGN lld was being
broken for a while.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH v4] LoongArch: Add support for TLS descriptors

2024-03-13 Thread Xi Ruoyao

On Wed, 2024-03-13 at 10:24 +0800, Xi Ruoyao wrote:
>    return TARGET_EXPLICIT_RELOCS
> -    ? "pcalau12i\t$r4,%%desc_pc_hi20(%1)\n\
> -  \taddi.d\t%2,$r0,%%desc_pc_lo12(%1)\n\
> -  \tlu32i.d\t%2,%%desc64_pc_lo20(%1)\n\
> -  \tlu52i.d\t%2,%2,%%desc64_pc_hi12(%1)\n\
> -  \tadd.d\t$r4,$r4,%2\n\
> -  \tld.d\t$r1,$r4,%%desc_ld(%1)\n\
> -  \tjirl\t$r1,$r1,%%desc_call(%1)"
> -    : "la.tls.desc\t%0,%2,%1";
> +    ? "pcalau12i\t$r4,%%desc_pc_hi20(%0)\n\t"
> +  "addi.d\t%1,$r0,%%desc_pc_lo12(%0)\n\t"
> +  "lu32i.d\t%1,%%desc64_pc_lo20(%0)\n\t"
> +  "lu52i.d\t%1,%2,%%desc64_pc_hi12(%0)\n\t"

Oops, the "%2" in the above line should be "%1".

> +  "add.d\t$r4,$r4,%1\n\t"
> +  "ld.d\t$r1,$r4,%%desc_ld(%0)\n\t"
> +  "jirl\t$r1,$r1,%%desc_call(%0)"
> +    : "la.tls.desc\t$r4,%1,%0";

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH] testsuite: Fix vfprintf-chk-1.c with -fhardened

2024-03-13 Thread Xi Ruoyao

On Tue, 2024-03-12 at 17:19 +0100, Jakub Jelinek wrote:
> On Thu, Feb 15, 2024 at 10:53:08PM +, Sam James wrote:
> > With _FORTIFY_SOURCE >= 2 (enabled by -fhardened), vfprintf-chk-1.c's
> > __vfprintf_chk ends up calling __vprintf_chk rather than vprintf.

Do we really want to support adding random CFLAGS running the test
suite?  AFAIK adding random CFLAGS will just cause test failures here or
there.  We are adjusting the test suite for -fPIE -pie and -fstack-
protector-strong but it's because they can be implicitly enabled with --
enable-default-* options, and we don't have --enable-default-hardened as
at now.

If we need to bootstrap a hardened GCC and test it, pass -fhardened as
how "info gccinstall" suggests:

make BOOT_CFLAGS="-O2 -g -fhardened"

instead of

env C{,XX}FLAGS="-O2 -g -fhardened" /path/to/gcc/configure ...

which will taint the test suite with -fhardened.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH v1] LoongArch: Remove masking process for operand 3 of xvpermi.q.

2024-03-13 Thread Xi Ruoyao

On Tue, 2024-03-12 at 09:56 +0800, Chenghui Pan wrote:
> The behavior of non-zero unused bits in xvpermi.q instruction's
> third operand is undefined on LoongArch, according to our
> discussion (https://github.com/llvm/llvm-project/pull/83540),
> we think that keeping original insn operand as unmodified
> state is better solution.
> 
> This patch partially reverts 7b158e036a95b1ab40793dd53bed7dbd770ffdaf.
> 
> gcc/ChangeLog:
> 
>   * config/loongarch/lasx.md: Remove masking of operand 3.

Add (lasx_xvpermi_q_) before ":".

> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/loongarch/vector/lasx/lasx-xvpermi_q.c:
>     Reposition operand 3's value into instruction's defined accept range.
^^

Remove these two white spaces.

Should be OK with these ChangeLog style issues fixed.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

[PATCH] LoongArch: Remove unused and incorrect "sge_" define_insn

2024-03-13 Thread Xi Ruoyao

If this insn is really used, we'll have something like

slti $r4,$r0,$r5

in the code.  The assembler will reject it because slti wants 2
register operands and 1 immediate operand.  But we've not got any bug
report for this, indicating this define_insn is unused at all.

Note that do_store_flag (in expr.cc) is already converting x >= 1 to
x > 0 unconditionally, so this define_insn is indeed unused and we can
just remove it.

gcc/ChangeLog:

* config/loongarch/loongarch.md (any_ge): Remove.
(sge_): Remove.
---

Not fully tested but should be obvious.  Ok for trunk?

 gcc/config/loongarch/loongarch.md | 10 --
 1 file changed, 10 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 525e1e82183..18fd9c1e7d5 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -517,7 +517,6 @@ (define_code_iterator equality_op [eq ne])
 ;; These code iterators allow the signed and unsigned scc operations to use
 ;; the same template.
 (define_code_iterator any_gt [gt gtu])
-(define_code_iterator any_ge [ge geu])
 (define_code_iterator any_lt [lt ltu])
 (define_code_iterator any_le [le leu])
 
@@ -3355,15 +3354,6 @@ (define_insn "*sgt_"
   [(set_attr "type" "slt")
(set_attr "mode" "")])
 
-(define_insn "*sge_"
-  [(set (match_operand:GPR 0 "register_operand" "=r")
-   (any_ge:GPR (match_operand:X 1 "register_operand" "r")
-(const_int 1)))]
-  ""
-  "slti\t%0,%.,%1"
-  [(set_attr "type" "slt")
-   (set_attr "mode" "")])
-
 (define_insn "*slt_"
   [(set (match_operand:GPR 0 "register_operand" "=r")
(any_lt:GPR (match_operand:X 1 "register_operand" "r")
-- 
2.44.0

[PATCH] LoongArch: Fix C23 (...) functions returning large aggregates [PR114175]

2024-03-18 Thread Xi Ruoyao

We were assuming TYPE_NO_NAMED_ARGS_STDARG_P don't have any named
arguments and there is nothing to advance, but that is not the case
for (...) functions returning by hidden reference which have one such
artificial argument.  This is causing gcc.dg/c23-stdarg-6.c and
gcc.dg/c23-stdarg-8.c to fail.

Fix the issue by checking if arg.type is NULL, as r14-9503 explains.

gcc/ChangeLog:

PR target/114175
* config/loongarch/loongarch.cc
(loongarch_setup_incoming_varargs): Only skip
loongarch_function_arg_advance for TYPE_NO_NAMED_ARGS_STDARG_P
functions if arg.type is NULL.
---

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?

 gcc/config/loongarch/loongarch.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 70e31bb831c..57de8ef7d20 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -767,7 +767,8 @@ loongarch_setup_incoming_varargs (cumulative_args_t cum,
  argument.  Advance a local copy of CUM past the last "real" named
  argument, to find out how many registers are left over.  */
   local_cum = *get_cumulative_args (cum);
-  if (!TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (current_function_decl)))
+  if (!TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (current_function_decl))
+  || arg.type != NULL_TREE)
 loongarch_function_arg_advance (pack_cumulative_args (&local_cum), arg);
 
   /* Found out how many registers we need to save.  */
-- 
2.44.0

Pushed: [PATCH v2] LoongArch: Fix C23 (...) functions returning large aggregates [PR114175]

2024-03-19 Thread Xi Ruoyao

On Tue, 2024-03-19 at 11:19 +0800, chenglulu wrote:
> 
> 在 2024/3/18 下午5:34, Xi Ruoyao 写道:
> > We were assuming TYPE_NO_NAMED_ARGS_STDARG_P don't have any named
> > arguments and there is nothing to advance, but that is not the case
> > for (...) functions returning by hidden reference which have one
> > such
> > artificial argument.  This is causing gcc.dg/c23-stdarg-6.c and
> > gcc.dg/c23-stdarg-8.c to fail.
> > 
> > Fix the issue by checking if arg.type is NULL, as r14-9503 explains.
> > 
> > gcc/ChangeLog:
> > 
> > PR target/114175
> > * config/loongarch/loongarch.cc
> > (loongarch_setup_incoming_varargs): Only skip
> > loongarch_function_arg_advance for
> > TYPE_NO_NAMED_ARGS_STDARG_P
> > functions if arg.type is NULL.
> > ---
> > 
> > Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?
> > 
> >   gcc/config/loongarch/loongarch.cc | 3 ++-
> >   1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/gcc/config/loongarch/loongarch.cc
> > b/gcc/config/loongarch/loongarch.cc
> > index 70e31bb831c..57de8ef7d20 100644
> > --- a/gcc/config/loongarch/loongarch.cc
> > +++ b/gcc/config/loongarch/loongarch.cc
> > @@ -767,7 +767,8 @@ loongarch_setup_incoming_varargs
> > (cumulative_args_t cum,
> >    argument.  Advance a local copy of CUM past the last "real"
> > named
> >    argument, to find out how many registers are left over.  */
> >     local_cum = *get_cumulative_args (cum);
> I think it's important to add annotation information here:
>  /* where there is no hidden return argument passed, arg.type
> 
>   is always NULL.  */
> 
> Others LTGM.
> 
> Thanks!

Pushed v2 with a comment added as attached.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University
From c1fd4589c2bf9fd8409d51b94df219cb75107762 Mon Sep 17 00:00:00 2001
From: Xi Ruoyao 
Date: Mon, 18 Mar 2024 17:18:34 +0800
Subject: [PATCH v2] LoongArch: Fix C23 (...) functions returning large
 aggregates [PR114175]

We were assuming TYPE_NO_NAMED_ARGS_STDARG_P don't have any named
arguments and there is nothing to advance, but that is not the case
for (...) functions returning by hidden reference which have one such
artificial argument.  This is causing gcc.dg/c23-stdarg-6.c and
gcc.dg/c23-stdarg-8.c to fail.

Fix the issue by checking if arg.type is NULL, as r14-9503 explains.

gcc/ChangeLog:

	PR target/114175
	* config/loongarch/loongarch.cc
	(loongarch_setup_incoming_varargs): Only skip
	loongarch_function_arg_advance for TYPE_NO_NAMED_ARGS_STDARG_P
	functions if arg.type is NULL.
---
 gcc/config/loongarch/loongarch.cc | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc
index 70e31bb831c..5344f2a6987 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -767,7 +767,13 @@ loongarch_setup_incoming_varargs (cumulative_args_t cum,
  argument.  Advance a local copy of CUM past the last "real" named
  argument, to find out how many registers are left over.  */
   local_cum = *get_cumulative_args (cum);
-  if (!TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (current_function_decl)))
+
+  /* For a C23 variadic function w/o any named argument, and w/o an
+ artifical argument for large return value, skip advancing args.
+ There is such an artifical argument iff. arg.type is non-NULL
+ (PR 114175).  */
+  if (!TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (current_function_decl))
+  || arg.type != NULL_TREE)
 loongarch_function_arg_advance (pack_cumulative_args (&local_cum), arg);
 
   /* Found out how many registers we need to save.  */
-- 
2.44.0

[PATCH] mips: Fix C23 (...) functions returning large aggregates [PR114175]

2024-03-20 Thread Xi Ruoyao

We were assuming TYPE_NO_NAMED_ARGS_STDARG_P don't have any named
arguments and there is nothing to advance, but that is not the case
for (...) functions returning by hidden reference which have one such
artificial argument.  This is causing gcc.dg/c23-stdarg-{6,8,9}.c to
fail.

Fix the issue by checking if arg.type is NULL, as r14-9503 explains.

gcc/ChangeLog:

PR target/114175
* config/mips/mips.cc (mips_setup_incoming_varargs): Only skip
mips_function_arg_advance for TYPE_NO_NAMED_ARGS_STDARG_P
functions if arg.type is NULL.
---

Bootstrapped and regtested on mips64el-linux-gnuabi64.  Ok for trunk?

 gcc/config/mips/mips.cc | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
index 68e2ae8d8fa..ce764a5cb35 100644
--- a/gcc/config/mips/mips.cc
+++ b/gcc/config/mips/mips.cc
@@ -6834,7 +6834,13 @@ mips_setup_incoming_varargs (cumulative_args_t cum,
  argument.  Advance a local copy of CUM past the last "real" named
  argument, to find out how many registers are left over.  */
   local_cum = *get_cumulative_args (cum);
-  if (!TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (current_function_decl)))
+
+  /* For a C23 variadic function w/o any named argument, and w/o an
+ artifical argument for large return value, skip advancing args.
+ There is such an artifical argument iff. arg.type is non-NULL
+ (PR 114175).  */
+  if (!TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (current_function_decl))
+  || arg.type != NULL_TREE)
 mips_function_arg_advance (pack_cumulative_args (&local_cum), arg);
 
   /* Found out how many registers we need to save.  */
-- 
2.44.0

Pushed: [PATCH] LoongArch: Fix a typo [PR 114407]

2024-03-20 Thread Xi Ruoyao

gcc/ChangeLog:

PR target/114407
* config/loongarch/loongarch-opts.cc (loongarch_config_target):
Fix typo in diagnostic message, enabing -> enabling.
---

Pushed r14-9582 as obvious.

 gcc/config/loongarch/loongarch-opts.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/loongarch/loongarch-opts.cc 
b/gcc/config/loongarch/loongarch-opts.cc
index 7eeac43ed2f..627f9148adf 100644
--- a/gcc/config/loongarch/loongarch-opts.cc
+++ b/gcc/config/loongarch/loongarch-opts.cc
@@ -362,7 +362,7 @@ config_target_isa:
  gcc_assert (constrained.simd);
 
  inform (UNKNOWN_LOCATION,
- "enabing %qs promotes %<%s%s%> to %<%s%s%>",
+ "enabling %qs promotes %<%s%s%> to %<%s%s%>",
  loongarch_isa_ext_strings[t.isa.simd],
  OPTSTR_ISA_EXT_FPU, loongarch_isa_ext_strings[t.isa.fpu],
  OPTSTR_ISA_EXT_FPU, loongarch_isa_ext_strings[ISA_EXT_FPU64]);
-- 
2.44.0

Re: [PATCH] MIPS: Add MIN/MAX.fmt instructions support for MIPS R6

2024-03-21 Thread Xi Ruoyao

On Thu, 2024-03-21 at 10:14 +0800, Jie Mei wrote:
> diff --git a/gcc/testsuite/gcc.target/mips/mips-minmax.c 
> b/gcc/testsuite/gcc.target/mips/mips-minmax.c
> new file mode 100644
> index 000..2d234ac4b1d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/mips/mips-minmax.c
> @@ -0,0 +1,40 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mhard-float -ffinite-math-only -march=mips32r6" } */

You may want to add fmin3 and fmax3 in addition to
smin3 and smax3 so it will work without -ffinite-math-only.

‘fminM3’, ‘fmaxM3’
 IEEE-conformant minimum and maximum operations.  If one operand is
 a quiet ‘NaN’, then the other operand is returned.  If both
 operands are quiet ‘NaN’, then a quiet ‘NaN’ is returned.  In the
 case when gcc supports signaling ‘NaN’ (-fsignaling-nans) an
 invalid floating point exception is raised and a quiet ‘NaN’ is
 returned.

And the MIPS 6.06 manual says:

Numbers are preferred to NaNs: if one input is a NaN, but not both, the
value of the numeric input is returned. If both are NaNs, the NaN in fs
is returned.

for MAX.fmt and MIN.fmt, so they matches fmin3 and fmax3.

> +/* { dg-skip-if "code quality test" { *-*-* } { "-O0" } { "" } } */

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

TARGET_RTX_COSTS and pipeline latency vs. variable-latency instructions (was Re: [PATCH] RISC-V: Add XiangShan Nanhu microarchitecture.)

2024-03-25 Thread Xi Ruoyao

On Mon, 2024-03-18 at 20:54 -0600, Jeff Law wrote:
> > +/* Costs to use when optimizing for xiangshan nanhu.  */
> > +static const struct riscv_tune_param xiangshan_nanhu_tune_info = {
> > +  {COSTS_N_INSNS (3), COSTS_N_INSNS (3)},  /* fp_add */
> > +  {COSTS_N_INSNS (3), COSTS_N_INSNS (3)},  /* fp_mul */
> > +  {COSTS_N_INSNS (10), COSTS_N_INSNS (20)},/* fp_div */
> > +  {COSTS_N_INSNS (3), COSTS_N_INSNS (3)},  /* int_mul */
> > +  {COSTS_N_INSNS (6), COSTS_N_INSNS (6)},  /* int_div */
> > +  6,   /* issue_rate */
> > +  3,   /* branch_cost */
> > +  3,   /* memory_cost */
> > +  3,   /* fmv_cost */
> > +  true,/* 
> > slow_unaligned_access */
> > +  false,   /* use_divmod_expansion */
> > +  RISCV_FUSE_ZEXTW | RISCV_FUSE_ZEXTH,  /* fusible_ops */
> > +  NULL,/* vector cost */

> Is your integer division really that fast?  The table above essentially 
> says that your cpu can do integer division in 6 cycles.

Hmm, I just seen I've coded some even smaller value for LoongArch CPUs
so forgive me for "hijacking" this thread...

The problem seems integer division may spend different number of cycles
for different inputs: on LoongArch LA664 I've observed 5 cycles for some
inputs and 39 cycles for other inputs.

So should we use the minimal value, the maximum value, or something in-
between for TARGET_RTX_COSTS and pipeline descriptions?

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

[PATCH] LoongArch: Increase division costs

2024-03-26 Thread Xi Ruoyao

The latency of LA464 and LA664 division instructions depends on the
input.  When I updated the costs in r14-6642, I unintentionally set the
division costs to the best-case latency (when the first operand is 0).
Per a recent discussion [1] we should use "something sensible" instead
of it.

Use the average of the minimum and maximum latency observed instead.
This enables multiplication to reciprocal sequence reduction and speeds
up the following test case for about 30%:

int
main (void)
{
  unsigned long stat = 0xdeadbeef;
  for (int i = 0; i < 1; i++)
stat = (stat * stat + stat * 114514 + 1919810) % 17;
  asm(""::"r"(stat));
}

[1]: https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648348.html

gcc/ChangeLog:

* config/loongarch/loongarch-def.cc
(loongarch_rtx_cost_data::loongarch_rtx_cost_data): Increase
default division cost to the average of the best case and worst
case senarios observed.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/div-const-reduction.c: New test.
---

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?

 gcc/config/loongarch/loongarch-def.cc| 8 
 gcc/testsuite/gcc.target/loongarch/div-const-reduction.c | 9 +
 2 files changed, 13 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/div-const-reduction.c

diff --git a/gcc/config/loongarch/loongarch-def.cc 
b/gcc/config/loongarch/loongarch-def.cc
index e8c129ce643..93e72a520d5 100644
--- a/gcc/config/loongarch/loongarch-def.cc
+++ b/gcc/config/loongarch/loongarch-def.cc
@@ -95,12 +95,12 @@ loongarch_rtx_cost_data::loongarch_rtx_cost_data ()
   : fp_add (COSTS_N_INSNS (5)),
 fp_mult_sf (COSTS_N_INSNS (5)),
 fp_mult_df (COSTS_N_INSNS (5)),
-fp_div_sf (COSTS_N_INSNS (8)),
-fp_div_df (COSTS_N_INSNS (8)),
+fp_div_sf (COSTS_N_INSNS (12)),
+fp_div_df (COSTS_N_INSNS (15)),
 int_mult_si (COSTS_N_INSNS (4)),
 int_mult_di (COSTS_N_INSNS (4)),
-int_div_si (COSTS_N_INSNS (5)),
-int_div_di (COSTS_N_INSNS (5)),
+int_div_si (COSTS_N_INSNS (14)),
+int_div_di (COSTS_N_INSNS (22)),
 movcf2gr (COSTS_N_INSNS (7)),
 movgr2cf (COSTS_N_INSNS (15)),
 branch_cost (6),
diff --git a/gcc/testsuite/gcc.target/loongarch/div-const-reduction.c 
b/gcc/testsuite/gcc.target/loongarch/div-const-reduction.c
new file mode 100644
index 000..0ee86410dd7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/div-const-reduction.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mtune=la464" } */
+/* { dg-final { scan-assembler-not "div\.\[dw\]" } } */
+
+int
+test (int a)
+{
+  return a % 17;
+}
-- 
2.44.0

Re: [PATCH v2] MIPS: Add MIN/MAX.fmt instructions support for MIPS R6

2024-03-26 Thread Xi Ruoyao

On Tue, 2024-03-26 at 11:15 +0800, YunQiang Su wrote:

/* snip */

> With -ffinite-math-only -fno-signed-zeros, it does work with
>     x >= y ? x : y
> while without `-ffinite-math-only -fno-signed-zeros`, it cannot.
> @Xi Ruoyao Is it expected by IEEE?

When y is (quiet) NaN and x is not, fmax(x, y) should produce x but x >=
y ? x : y should produce y.  Thus -ffinite-math-only is needed.

When x is +0.0 and y is -0.0, x >= y ? x : y should produce +0.0 but
fmax(x, y) may produce +0.0 or -0.0 (IEEE allows both and I don't see a
more strict requirement in MIPS 6.06 manual either).  Thus -fno-signed-
zeros is needed.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH] LoongArch: Increase division costs

2024-03-27 Thread Xi Ruoyao

On Wed, 2024-03-27 at 10:38 +0800, chenglulu wrote:
> 
> 在 2024/3/26 下午5:48, Xi Ruoyao 写道:
> > The latency of LA464 and LA664 division instructions depends on the
> > input.  When I updated the costs in r14-6642, I unintentionally set the
> > division costs to the best-case latency (when the first operand is 0).
> > Per a recent discussion [1] we should use "something sensible" instead
> > of it.
> > 
> > Use the average of the minimum and maximum latency observed instead.
> > This enables multiplication to reciprocal sequence reduction and speeds
> > up the following test case for about 30%:
> > 
> >  int
> >  main (void)
> >  {
> >    unsigned long stat = 0xdeadbeef;
> >    for (int i = 0; i < 1; i++)
> >  stat = (stat * stat + stat * 114514 + 1919810) % 17;
> >    asm(""::"r"(stat));
> >  }
> > 
> > [1]: https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648348.html
> 
> The test case div-const-reduction.c is modified to assemble the instruction
> sequence as follows:
>   lu12i.w $r12,97440>>12  # 0x3b9ac000
>   ori $r12,$r12,2567
>   mod.w   $r13,$r13,$r12
> 
> This sequence of instructions takes 5 clock cycles.

Hmm indeed, it seems a waste to do this reduction for int / 17.
I'll try to make a better heuristic as Richard suggests...


-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH] LoongArch: Increase division costs

2024-03-27 Thread Xi Ruoyao

On Wed, 2024-03-27 at 08:54 +0100, Richard Biener wrote:
> On Tue, Mar 26, 2024 at 10:52 AM Xi Ruoyao  wrote:
> > 
> > The latency of LA464 and LA664 division instructions depends on the
> > input.  When I updated the costs in r14-6642, I unintentionally set the
> > division costs to the best-case latency (when the first operand is 0).
> > Per a recent discussion [1] we should use "something sensible" instead
> > of it.
> > 
> > Use the average of the minimum and maximum latency observed instead.
> > This enables multiplication to reciprocal sequence reduction and speeds
> > up the following test case for about 30%:
> > 
> >     int
> >     main (void)
> >     {
> >   unsigned long stat = 0xdeadbeef;
> >   for (int i = 0; i < 1; i++)
> >     stat = (stat * stat + stat * 114514 + 1919810) % 17;
> >   asm(""::"r"(stat));
> >     }
> 
> I think you should be able to see a constant divisor and thus could do
> better than return the same latency for everything.  For non-constant
> divisors using the best-case latency shouldn't be a problem.

Hmm, it seems not really possible as at now.  expand_divmod does
something like:

  max_cost = (unsignedp
  ? udiv_cost (speed, compute_mode)
  : sdiv_cost (speed, compute_mode));

which is reading the pre-calculated costs from a table.  Thus we don't
really know the denominator and cannot estimate the cost based on it :(.

CSE really invokes the cost hook with the actual (mod (a, (const_int
17)) RTX but it's less important.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH] LoongArch: Increase division costs

2024-03-27 Thread Xi Ruoyao

On Wed, 2024-03-27 at 18:39 +0800, Xi Ruoyao wrote:
> On Wed, 2024-03-27 at 10:38 +0800, chenglulu wrote:
> > 
> > 在 2024/3/26 下午5:48, Xi Ruoyao 写道:
> > > The latency of LA464 and LA664 division instructions depends on the
> > > input.  When I updated the costs in r14-6642, I unintentionally set the
> > > division costs to the best-case latency (when the first operand is 0).
> > > Per a recent discussion [1] we should use "something sensible" instead
> > > of it.
> > > 
> > > Use the average of the minimum and maximum latency observed instead.
> > > This enables multiplication to reciprocal sequence reduction and speeds
> > > up the following test case for about 30%:
> > > 
> > >  int
> > >  main (void)
> > >  {
> > >    unsigned long stat = 0xdeadbeef;
> > >    for (int i = 0; i < 1; i++)
> > >  stat = (stat * stat + stat * 114514 + 1919810) % 17;
> > >    asm(""::"r"(stat));
> > >  }
> > > 
> > > [1]: https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648348.html
> > 
> > The test case div-const-reduction.c is modified to assemble the instruction
> > sequence as follows:
> > lu12i.w $r12,97440>>12  # 0x3b9ac000
> > ori $r12,$r12,2567
> > mod.w   $r13,$r13,$r12
> > 
> > This sequence of instructions takes 5 clock cycles.

It actually may take 5 to 8 cycles depending on the input.  And
multiplication is fully pipelined while division is not, so the
reciprocal sequence should still produce a better throughput.

> Hmm indeed, it seems a waste to do this reduction for int / 17.
> I'll try to make a better heuristic as Richard suggests...

Oops, it seems impossible (w/o refactoring the generic code).  See my
reply to Richi :(.

Can you also try benchmarking with the costs of SI and DI division
increased to (10, 10) instead of (14, 22) - allowing more CSE but not
reciprocal sequence reduction, and (10, 22) - only allowing reduction
for DI but not SI?

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Ping: [PATCH] mips: Fix C23 (...) functions returning large aggregates [PR114175]

2024-03-28 Thread Xi Ruoyao

Ping.

On Wed, 2024-03-20 at 15:10 +0800, Xi Ruoyao wrote:
> We were assuming TYPE_NO_NAMED_ARGS_STDARG_P don't have any named
> arguments and there is nothing to advance, but that is not the case
> for (...) functions returning by hidden reference which have one such
> artificial argument.  This is causing gcc.dg/c23-stdarg-{6,8,9}.c to
> fail.
> 
> Fix the issue by checking if arg.type is NULL, as r14-9503 explains.
> 
> gcc/ChangeLog:
> 
>   PR target/114175
>   * config/mips/mips.cc (mips_setup_incoming_varargs): Only skip
>   mips_function_arg_advance for TYPE_NO_NAMED_ARGS_STDARG_P
>   functions if arg.type is NULL.
> ---
> 
> Bootstrapped and regtested on mips64el-linux-gnuabi64.  Ok for trunk?
> 
>  gcc/config/mips/mips.cc | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
> index 68e2ae8d8fa..ce764a5cb35 100644
> --- a/gcc/config/mips/mips.cc
> +++ b/gcc/config/mips/mips.cc
> @@ -6834,7 +6834,13 @@ mips_setup_incoming_varargs (cumulative_args_t cum,
>   argument.  Advance a local copy of CUM past the last "real" named
>   argument, to find out how many registers are left over.  */
>    local_cum = *get_cumulative_args (cum);
> -  if (!TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (current_function_decl)))
> +
> +  /* For a C23 variadic function w/o any named argument, and w/o an
> + artifical argument for large return value, skip advancing args.
> + There is such an artifical argument iff. arg.type is non-NULL
> + (PR 114175).  */
> +  if (!TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (current_function_decl))
> +  || arg.type != NULL_TREE)
>  mips_function_arg_advance (pack_cumulative_args (&local_cum), arg);
>  
>    /* Found out how many registers we need to save.  */

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH] LoongArch: Increase division costs

2024-03-31 Thread Xi Ruoyao

On Fri, 2024-03-29 at 09:23 +0800, chenglulu wrote:

> I tested spec2006. In the floating-point program, the test items with large
> 
> fluctuations are removed, and the rest is basically unchanged.
> 
> The fixed-point 464.h264ref (10,10) was 6.7% higher than (5,5) and (10,22).

So IIUC (10,10) is better than (5,5), (10,22), and the originally
proposed (14,22)?  Then should I make a change to make all 4 costs (SF,
DF, SI, DI) 10?

I'd still want DI % 17 to be reduced as reciprocal sequence (but
not SI % 17) since DI % (smaller const) is quite important for
some workloads like competitive programming.  However "adapting with
different modulos" is not possible w/o refactoring generic code so it
must be deferred to at least GCC 15.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH] LoongArch: Increase division costs

2024-03-31 Thread Xi Ruoyao

On Mon, 2024-04-01 at 10:22 +0800, chenglulu wrote:
> 
> 在 2024/4/1 上午9:29, Xi Ruoyao 写道:
> > On Fri, 2024-03-29 at 09:23 +0800, chenglulu wrote:
> > 
> > > I tested spec2006. In the floating-point program, the test items with 
> > > large
> > > 
> > > fluctuations are removed, and the rest is basically unchanged.
> > > 
> > > The fixed-point 464.h264ref (10,10) was 6.7% higher than (5,5) and 
> > > (10,22).
> > So IIUC (10,10) is better than (5,5), (10,22), and the originally
> > proposed (14,22)?  Then should I make a change to make all 4 costs (SF,
> > DF, SI, DI) 10?
> 
> I think this may require the analysis of the spec's test case. I took a 
> look at the test results again,
> 
> where the scores of SPEC INT 462.libquantum fluctuated greatly, but the 
> combination of (10,22)
> 
> showed an overall upward trend compared to the scores of the other two
> combinations.
> 
> I don't know if (10,22) this combination happens to have the kind of 
> test cases in the changelog.
> 
> So can we change it together in GCC15?

Ok.  Abandoning this patch then.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH v5] LoongArch: Add support for TLS descriptors

2024-04-01 Thread Xi Ruoyao

Is this patch targeting GCC 14 or 15?  If 14 I guess we'd commit now...

Generally we don't add features in stage 4, but if we keep trad as the
default I think it'd be OK.  And RISC-V guys plan to push their TLS desc
implementation this week too.

On Tue, 2024-03-19 at 09:54 +0800, mengqinggang wrote:
> Add support for TLS descriptors on normal code model and extreme code model.
> 
> Normal code model instruction sequence:
>   -mno-explicit-relocs:
>     la.tls.desc   $r4, s
>     add.d $r12, $r4, $r2
>   -mexplicit-relocs:
>     pcalau12i $r4,%desc_pc_hi20(s)
>     addi.d$r4,$r4,%desc_pc_lo12(s)
>     ld.d  $r1,$r4,%desc_ld(s)
>     jirl  $r1,$r1,%desc_call(s)
>     add.d $r12, $r4, $r2
> 
> Extreme code model instruction sequence:
>   -mno-explicit-relocs:
>     la.tls.desc   $r4, $r12, s
>     add.d $r12, $r4, $r2
>   -mexplicit-relocs:
>     pcalau12i $r4,%desc_pc_hi20(s)
>     addi.d$r12,$r0,%desc_pc_lo12(s)
>     lu32i.d   $r12,%desc64_pc_lo20(s)
>     lu52i.d   $r12,$r12,%desc64_pc_hi12(s)
>     add.d $r4,$r4,$r12
>     ld.d  $r1,$r4,%desc_ld(s)
>     jirl  $r1,$r1,%desc_call(s)
>     add.d $r12, $r4, $r2
> 
> The default is still traditional TLS model, but can be configured with
> --with-tls={trad,desc}. The default can change to TLS descriptors once
> libc and LLVM support this.
> 
> gcc/ChangeLog:
> 
>   * config.gcc: Add --with-tls option to change TLS flavor.
>   * config/loongarch/genopts/loongarch.opt.in: Add -mtls-dialect to
>   configure TLS flavor.
>   * config/loongarch/loongarch-def.h (struct loongarch_target): Add
>   tls_dialect.
>   * config/loongarch/loongarch-driver.cc (la_driver_init): Add tls
>   flavor.
>   * config/loongarch/loongarch-opts.cc (loongarch_init_target): Add
>   tls_dialect.
>   (loongarch_config_target): Ditto.
>   (loongarch_update_gcc_opt_status): Ditto.
>   * config/loongarch/loongarch-opts.h (loongarch_init_target):Ditto.
>   (TARGET_TLS_DESC): New define.
>   * config/loongarch/loongarch.cc (loongarch_symbol_insns): Add TLS DESC
>   instructions sequence length.
>   (loongarch_legitimize_tls_address): New TLS DESC instruction sequence.
>   (loongarch_option_override_internal): Add la_opt_tls_dialect.
>   (loongarch_option_restore): Add la_target.tls_dialect.
>   * config/loongarch/loongarch.md (@got_load_tls_desc): Normal
>   code model for TLS DESC.
>   (got_load_tls_desc_off64): Extreme code model for TLS DESC.
>   * config/loongarch/loongarch.opt: Regenerated.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/loongarch/cmodel-extreme-1.c: Add -mtls-dialect=trad.
>   * gcc.target/loongarch/cmodel-extreme-2.c: Ditto.
>   * gcc.target/loongarch/explicit-relocs-auto-tls-ld-gd.c: Ditto.
>   * gcc.target/loongarch/explicit-relocs-medium-call36-auto-tls-ld-gd.c:
>   Ditto.
>   * gcc.target/loongarch/func-call-medium-1.c: Ditto.
>   * gcc.target/loongarch/func-call-medium-2.c: Ditto.
>   * gcc.target/loongarch/func-call-medium-3.c: Ditto.
>   * gcc.target/loongarch/func-call-medium-4.c: Ditto.
>   * gcc.target/loongarch/tls-extreme-macro.c: Ditto.
>   * gcc.target/loongarch/tls-gd-noplt.c: Ditto.
>   * gcc.target/loongarch/explicit-relocs-auto-extreme-tls-desc.c: New 
> test.
>   * gcc.target/loongarch/explicit-relocs-auto-tls-desc.c: New test.
>   * gcc.target/loongarch/explicit-relocs-extreme-tls-desc.c: New test.
>   * gcc.target/loongarch/explicit-relocs-tls-desc.c: New test.
> 
> Co-authored-by: Lulu Cheng 
> Co-authored-by: Xi Ruoyao 

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH v1] LoongArch: Set default alignment for functions jumps and loops [PR112919].

2024-04-06 Thread Xi Ruoyao

On Tue, 2024-04-02 at 15:03 +0800, Lulu Cheng wrote:
> +/* Alignment for functions loops and jumps for best performance.  For new
> +   uarchs the value should be measured via benchmarking.  See the 
> documentation
> +   for -falign-functions -falign-loops and -falign-jumps in invoke.texi for 
> the
   ^ ^

Better have two commas here.

Otherwise it should be OK.

> +   format.  */

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH] LoongArch: Enable switchable target

2024-04-07 Thread Xi Ruoyao

On Sun, 2024-04-07 at 15:47 +0800, Yang Yujie wrote:
> This patch fixes the back-end context switching in cases where functions
> should be built with their own target contexts instead of the
> global one, such as LTO linking and functions with target attributes (TBD).
> 
>   PR target/113233

Oops, so this PR isn't fixed with r14-7134 "LoongArch: Implement option
save/restore"?  Should I reopen it?

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH] LoongArch: Enable switchable target

2024-04-07 Thread Xi Ruoyao

On Sun, 2024-04-07 at 16:23 +0800, Yang Yujie wrote:
> On Sun, Apr 07, 2024 at 04:23:53PM +0800, Xi Ruoyao wrote:
> > On Sun, 2024-04-07 at 15:47 +0800, Yang Yujie wrote:
> > > This patch fixes the back-end context switching in cases where functions
> > > should be built with their own target contexts instead of the
> > > global one, such as LTO linking and functions with target attributes 
> > > (TBD).
> > > 
> > >   PR target/113233
> > 
> > Oops, so this PR isn't fixed with r14-7134 "LoongArch: Implement option
> > save/restore"?  Should I reopen it?
> > 
> > -- 
> > Xi Ruoyao 
> > School of Aerospace Science and Technology, Xidian University
> 
> Yes, the issue was not fixed with that patch. This one should do.

So reopened the PR.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH] ICF&SRA: Make ICF and SRA agree on padding

2024-04-07 Thread Xi Ruoyao

On Thu, 2024-04-04 at 23:19 +0200, Martin Jambor wrote:
> The patch has been approved by Honza in Bugzilla. (I hope.  He did write
> it looked reasonable.)  Together with the patch for PR 113907, it has
> passed bootstrap, LTO bootstrap and LTO profiledbootstrap and testing on
> x86_64-linux and bootstrap and LTO bootstrap on ppc64le-linux.  It also
> passed normal bootstrap on aarch64-linux but there many testcases failed
> because the compiler timed out.  The machine is old and slow and might
> have been oversubscribed so my plan is to try again on gcc185 from
> cfarm.  If that goes well, I intend to commit the patch and then start
> working on backports.

I've tried these two patches out on my own 24-core AArch64 machine. 
Bootstrapped (but no LTO or PGO) and regtested fine.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH] ICF&SRA: Make ICF and SRA agree on padding

2024-04-07 Thread Xi Ruoyao

On Thu, 2024-04-04 at 23:19 +0200, Martin Jambor wrote:
> +/* Given two types in an assignment, return true either if any one cannot be
> +   totally scalarized or if they have padding (i.e. not copied bits)  */
> +
> +bool
> +sra_total_scalarization_would_copy_same_data_p (tree t1, tree t2)
> +{
> +  sra_padding_collecting p1;
> +  if (!check_ts_and_push_padding_to_vec (t1, &p1))
> +    return true;
> +
> +  sra_padding_collecting p2;
> +  if (!check_ts_and_push_padding_to_vec (t2, &p2))
> +    return true;
> +
> +  unsigned l = p1.m_padding.length ();
> +  if (l != p2.m_padding.length ())
> +    return false;
> +  for (unsigned i = 0; i < l; i++)
> +    if (p1.m_padding[i].first != p2.m_padding[i].first
> + || p1.m_padding[i].second != p2.m_padding[i].second)
> +  return false;
> +
> +  return true;
> +}
> +

Better remove this trailing empty line from tree-sra.cc.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH] LoongArch: Enable switchable target

2024-04-07 Thread Xi Ruoyao

On Sun, 2024-04-07 at 15:47 +0800, Yang Yujie wrote:
>   * config/loongarch/loongarch-builtins.cc
> (loongarch_init_builtins):
>     Initialize all builtin functions at startup.

git gcc-verify complains that tab should be used instead of space for
this line.

>   (loongarch_expand_builtin): Turn assertion of builtin
> availability
>     into a test.

and this line.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH v2] LoongArch: Enable switchable target

2024-04-08 Thread Xi Ruoyao

On Mon, 2024-04-08 at 16:46 +0800, Yang Yujie wrote:
> v1 -> v2:
> Remove spaces from changelog.

I've rebuilt the base system with a GCC including this patch.  LTO+PGO
bootstrap fine, regtested fine, and no issues observed.

I do usually include the optimization flags into LDFLAGS when I do LTO,
so I don't really rely on this patch though.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH] Change gcc/ira-conflicts.cc build_conflict_bit_table to use size_t/%zu

2024-02-01 Thread Xi Ruoyao

On Thu, 2024-02-01 at 14:01 +0100, Jakub Jelinek wrote:
> On Thu, Feb 01, 2024 at 12:45:31PM +, Jonathan Yong wrote:
> > Attached patch OK? Copied inline for review convenience.
> 
> No, I think e.g. AIX doesn't support the z modifier.
> I don't see %zd or %zu used anywhere except in gcc/jit/ which presumably
> doesn't work on AIX.
> 
> If you really want to avoid truncation, perhaps do something like
>   if (internal_flag_ira_verbose > 0 && ira_dump_file != NULL)
>     {
>   if (sizeof (void *) <= sizeof (long))
>   fprintf (ira_dump_file,
>"+++Allocating %lu bytes for conflict table "
>"(uncompressed size %lu)\n",
>(unsigned long) (sizeof (IRA_INT_TYPE) * allocated_words_num),
>(unsigned long) (sizeof (IRA_INT_TYPE) * object_set_words
>     * ira_objects_num));
>   else
>   fprintf (ira_dump_file,
>"+++Allocating %l" PRIu64 "bytes for conflict table "
>"(uncompressed size %" PRIu64 ")\n",

Should use HOST_WIDE_INT_PRINT_UNSIGNED instead of PRIu64.

>(unsigned HOST_WIDE_INT) (sizeof (IRA_INT_TYPE)
>      * allocated_words_num),
>(unsigned HOST_WIDE_INT) (sizeof (IRA_INT_TYPE)
>      * object_set_words
>      * ira_objects_num));
>     }

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH] Change gcc/ira-conflicts.cc build_conflict_bit_table to use size_t/%zu

2024-02-01 Thread Xi Ruoyao

On Thu, 2024-02-01 at 14:55 +0100, Jakub Jelinek wrote:
> On Thu, Feb 01, 2024 at 01:42:03PM +, Jonathan Yong wrote:
> > On 2/1/24 13:06, Xi Ruoyao wrote:
> > > On Thu, 2024-02-01 at 14:01 +0100, Jakub Jelinek wrote:
> > > > On Thu, Feb 01, 2024 at 12:45:31PM +, Jonathan Yong wrote:
> > > > > Attached patch OK? Copied inline for review convenience.
> > > > 
> > > > No, I think e.g. AIX doesn't support the z modifier.
> > > > I don't see %zd or %zu used anywhere except in gcc/jit/ which presumably
> > > > doesn't work on AIX.
> > > > 
> > > 
> > > Should use HOST_WIDE_INT_PRINT_UNSIGNED instead of PRIu64.
> > > 
> > Updated the patch with the suggestions.

I mean if you are casting it to unsigned HOST_WIDE_INT, you should use
HOST_WIDE_INT_PRINT_UNSIGNED,  If you are casting it to size_t you
cannot use it (as Jakub has explained).

When you use printf-like things you have to keep the correspondence
between format specifier and the argument itself, 

> No, that is wrong.  That will break bootstrap on lots of hosts, any time
> size_t is not unsigned long (if unsigned long is 64-bit) or unsigned long
> long (if unsigned long is not 64-bit).
> That includes e.g. all targets where size_t is unsigned int, and some others
> too.
> 
>   Jakub
> 

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

[PATCH] LoongArch: Fix an ODR violation

2024-02-01 Thread Xi Ruoyao

When bootstrapping GCC 14 --with-build-config=bootstrap-lto, an ODR
violation is detected:

../../gcc/config/loongarch/loongarch-opts.cc:57: warning:
'abi_minimal_isa' violates the C++ One Definition Rule [-Wodr]
57 | abi_minimal_isa[N_ABI_BASE_TYPES][N_ABI_EXT_TYPES];
../../gcc/config/loongarch/loongarch-def.cc:186: note:
'abi_minimal_isa' was previously declared here
186 |   abi_minimal_isa = array,
../../gcc/config/loongarch/loongarch-def.cc:186: note:
code may be misoptimized unless '-fno-strict-aliasing' is used

Fix it by adding a proper declaration of abi_minimal_isa into
loongarch-def.h and remove the ODR-violating local declaration in
loongarch-opts.cc.

gcc/ChangeLog:

* config/loongarch/loongarch-def.h (abi_minimal_isa): Declare.
* config/loongarch/loongarch-opts.cc (abi_minimal_isa): Remove
the ODR-violating locale declaration.
---

Bootstrapped on loongarch64-linux-gnu.  Not fully regtested but it
should be an obvious fix.  Ok for trunk?

 gcc/config/loongarch/loongarch-def.h   | 3 +++
 gcc/config/loongarch/loongarch-opts.cc | 2 --
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/config/loongarch/loongarch-def.h 
b/gcc/config/loongarch/loongarch-def.h
index a1237ecf1fd..2dbf006d013 100644
--- a/gcc/config/loongarch/loongarch-def.h
+++ b/gcc/config/loongarch/loongarch-def.h
@@ -203,5 +203,8 @@ extern loongarch_def_array
   loongarch_cpu_align;
 extern loongarch_def_array
   loongarch_cpu_rtx_cost_data;
+extern loongarch_def_array<
+  loongarch_def_array,
+  N_ABI_BASE_TYPES> abi_minimal_isa;
 
 #endif /* LOONGARCH_DEF_H */
diff --git a/gcc/config/loongarch/loongarch-opts.cc 
b/gcc/config/loongarch/loongarch-opts.cc
index b87299513c9..7eeac43ed2f 100644
--- a/gcc/config/loongarch/loongarch-opts.cc
+++ b/gcc/config/loongarch/loongarch-opts.cc
@@ -53,8 +53,6 @@ static const int tm_multilib_list[] = { TM_MULTILIB_LIST };
 static int enabled_abi_types[N_ABI_BASE_TYPES][N_ABI_EXT_TYPES] = { 0 };
 
 #define isa_required(ABI) (abi_minimal_isa[(ABI).base][(ABI).ext])
-extern "C" const struct loongarch_isa
-abi_minimal_isa[N_ABI_BASE_TYPES][N_ABI_EXT_TYPES];
 
 static inline int
 is_multilib_enabled (struct loongarch_abi abi)
-- 
2.43.0

[PATCH] LoongArch: Avoid out-of-bounds access in loongarch_symbol_insns

2024-02-02 Thread Xi Ruoyao

We call loongarch_symbol_insns with mode = MAX_MACHINE_MODE sometimes.
But in loongarch_symbol_insns:

if (LSX_SUPPORTED_MODE_P (mode) || LASX_SUPPORTED_MODE_P (mode))
  return 0;

And LSX_SUPPORTED_MODE_P is defined as:

#define LSX_SUPPORTED_MODE_P(MODE) \
  (ISA_HAS_LSX \
   && GET_MODE_SIZE (MODE) == UNITS_PER_LSX_REG ... ...

GET_MODE_SIZE is expanded to a call to mode_to_bytes, which is defined:

ALWAYS_INLINE poly_uint16
mode_to_bytes (machine_mode mode)
{
#if GCC_VERSION >= 4001
  return (__builtin_constant_p (mode)
  ? mode_size_inline (mode) : mode_size[mode]);
#else
  return mode_size[mode];
#endif
}

There is an assertion in mode_size_inline:

gcc_assert (mode >= 0 && mode < NUM_MACHINE_MODES);

Note that NUM_MACHINE_MODES = MAX_MACHINE_MODE (emitted by genmodes.cc),
thus if __builtin_constant_p (mode) is evaluated true (it happens when
GCC is bootstrapped with LTO+PGO), the assertion will be triggered and
cause an ICE.  OTOH if __builtin_constant_p (mode) is evaluated false,
mode_size[mode] is still an out-of-bound array access (the length or the
mode_size array is NUM_MACHINE_MODES).

So we shouldn't call LSX_SUPPORTED_MODE_P or LASX_SUPPORTED_MODE_P with
MAX_MACHINE_MODE in loongarch_symbol_insns.  This is very similar to a
MIPS bug PR98491 fixed by me about 3 years ago.

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_symbol_insns): Do not
use LSX_SUPPORTED_MODE_P or LASX_SUPPORTED_MODE_P if mode is
MAX_MACHINE_MODE.
---

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?

 gcc/config/loongarch/loongarch.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 963e86d61af..6badef45d62 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -2007,7 +2007,8 @@ loongarch_symbol_insns (enum loongarch_symbol_type type, 
machine_mode mode)
 {
   /* LSX LD.* and ST.* cannot support loading symbols via an immediate
  operand.  */
-  if (LSX_SUPPORTED_MODE_P (mode) || LASX_SUPPORTED_MODE_P (mode))
+  if (mode != MAX_MACHINE_MODE
+  && (LSX_SUPPORTED_MODE_P (mode) || LASX_SUPPORTED_MODE_P (mode)))
 return 0;
 
   switch (type)
-- 
2.43.0

[PATCH] LoongArch: Fix wrong LSX FP vector negation

2024-02-03 Thread Xi Ruoyao

We expanded (neg x) to (minus const0 x) for LSX FP vectors, this is
wrong because -0.0 is not 0 - 0.0.  This causes some Python tests to
fail when Python is built with LSX enabled.

Use the vbitrevi.{d/w} instructions to simply reverse the sign bit
instead.  We are already doing this for LASX and now we can unify them
into simd.md.

gcc/ChangeLog:

* config/loongarch/lsx.md (neg2): Remove the
incorrect expand.
* config/loongarch/simd.md (simdfmt_as_i): New define_mode_attr.
(elmsgnbit): Likewise.
(neg2): New define_insn.
* config/loongarch/lasx.md (negv4df2, negv8sf2): Remove as they
are now instantiated in simd.md.
---

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?

 gcc/config/loongarch/lasx.md | 16 
 gcc/config/loongarch/lsx.md  | 11 ---
 gcc/config/loongarch/simd.md | 18 ++
 3 files changed, 18 insertions(+), 27 deletions(-)

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index e2115ffb884..ac84db7f0ce 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -3028,22 +3028,6 @@ (define_insn "absv8sf2"
   [(set_attr "type" "simd_logic")
(set_attr "mode" "V8SF")])
 
-(define_insn "negv4df2"
-  [(set (match_operand:V4DF 0 "register_operand" "=f")
-   (neg:V4DF (match_operand:V4DF 1 "register_operand" "f")))]
-  "ISA_HAS_LASX"
-  "xvbitrevi.d\t%u0,%u1,63"
-  [(set_attr "type" "simd_logic")
-   (set_attr "mode" "V4DF")])
-
-(define_insn "negv8sf2"
-  [(set (match_operand:V8SF 0 "register_operand" "=f")
-   (neg:V8SF (match_operand:V8SF 1 "register_operand" "f")))]
-  "ISA_HAS_LASX"
-  "xvbitrevi.w\t%u0,%u1,31"
-  [(set_attr "type" "simd_logic")
-   (set_attr "mode" "V8SF")])
-
 (define_insn "xvfmadd4"
   [(set (match_operand:FLASX 0 "register_operand" "=f")
(fma:FLASX (match_operand:FLASX 1 "register_operand" "f")
diff --git a/gcc/config/loongarch/lsx.md b/gcc/config/loongarch/lsx.md
index 7002edae4d4..b9b94b9079c 100644
--- a/gcc/config/loongarch/lsx.md
+++ b/gcc/config/loongarch/lsx.md
@@ -728,17 +728,6 @@ (define_expand "neg2"
   DONE;
 })
 
-(define_expand "neg2"
-  [(set (match_operand:FLSX 0 "register_operand")
-   (neg:FLSX (match_operand:FLSX 1 "register_operand")))]
-  "ISA_HAS_LSX"
-{
-  rtx reg = gen_reg_rtx (mode);
-  emit_move_insn (reg, CONST0_RTX (mode));
-  emit_insn (gen_sub3 (operands[0], reg, operands[1]));
-  DONE;
-})
-
 (define_expand "lsx_vrepli"
   [(match_operand:ILSX 0 "register_operand")
(match_operand 1 "const_imm10_operand")]
diff --git a/gcc/config/loongarch/simd.md b/gcc/config/loongarch/simd.md
index cb0a19447a1..00ff2823a4e 100644
--- a/gcc/config/loongarch/simd.md
+++ b/gcc/config/loongarch/simd.md
@@ -85,12 +85,21 @@ (define_mode_attr simdfmt [(V2DF "d") (V4DF "d")
 (define_mode_attr simdifmt_for_f [(V2DF "l") (V4DF "l")
  (V4SF "w") (V8SF "w")])
 
+;; Suffix for integer mode in LSX or LASX instructions to operating FP
+;; vectors using integer vector operations.
+(define_mode_attr simdfmt_as_i [(V2DF "d") (V4DF "d")
+   (V4SF "w") (V8SF "w")])
+
 ;; Size of vector elements in bits.
 (define_mode_attr elmbits [(V2DI "64") (V4DI "64")
   (V4SI "32") (V8SI "32")
   (V8HI "16") (V16HI "16")
   (V16QI "8") (V32QI "8")])
 
+;; The index of sign bit in FP vector elements.
+(define_mode_attr elmsgnbit [(V2DF "63") (V4DF "63")
+(V4SF "31") (V8SF "31")])
+
 ;; This attribute is used to form an immediate operand constraint using
 ;; "const__operand".
 (define_mode_attr bitimm [(V16QI "uimm3") (V32QI "uimm3")
@@ -457,6 +466,15 @@ (define_expand "reduc__scal_"
   DONE;
 })
 
+;; FP negation.
+(define_insn "neg2"
+  [(set (match_operand:FVEC 0 "register_operand" "=f")
+   (neg:FVEC (match_operand:FVEC 1 "register_operand" "f")))]
+  ""
+  "vbitrevi.\t%0,%1,"
+  [(set_attr "type" "simd_logic")
+   (set_attr "mode" "")])
+
 ; The LoongArch SX Instructions.
 (include "lsx.md")
 
-- 
2.43.0

Pushed: [PATCH] LoongArch: Fix an ODR violation

2024-02-03 Thread Xi Ruoyao

On Fri, 2024-02-02 at 10:42 +0800, chenglulu wrote:
> LGTM!
> 
> Thanks!

Pushed r14-8773.

> 在 2024/2/2 上午5:54, Xi Ruoyao 写道:
> > When bootstrapping GCC 14 --with-build-config=bootstrap-lto, an ODR
> > violation is detected:
> > 
> >  ../../gcc/config/loongarch/loongarch-opts.cc:57: warning:
> >  'abi_minimal_isa' violates the C++ One Definition Rule [-Wodr]
> >  57 | abi_minimal_isa[N_ABI_BASE_TYPES][N_ABI_EXT_TYPES];
> >  ../../gcc/config/loongarch/loongarch-def.cc:186: note:
> >  'abi_minimal_isa' was previously declared here
> >  186 |   abi_minimal_isa = array,
> >  ../../gcc/config/loongarch/loongarch-def.cc:186: note:
> >  code may be misoptimized unless '-fno-strict-aliasing' is used
> > 
> > Fix it by adding a proper declaration of abi_minimal_isa into
> > loongarch-def.h and remove the ODR-violating local declaration in
> > loongarch-opts.cc.
> > 
> > gcc/ChangeLog:
> > 
> > * config/loongarch/loongarch-def.h (abi_minimal_isa): Declare.
> > * config/loongarch/loongarch-opts.cc (abi_minimal_isa): Remove
> > the ODR-violating locale declaration.
> > ---
> > 
> > Bootstrapped on loongarch64-linux-gnu.  Not fully regtested but it
> > should be an obvious fix.  Ok for trunk?
> > 
> >   gcc/config/loongarch/loongarch-def.h   | 3 +++
> >   gcc/config/loongarch/loongarch-opts.cc | 2 --
> >   2 files changed, 3 insertions(+), 2 deletions(-)
> > 
> > diff --git a/gcc/config/loongarch/loongarch-def.h 
> > b/gcc/config/loongarch/loongarch-def.h
> > index a1237ecf1fd..2dbf006d013 100644
> > --- a/gcc/config/loongarch/loongarch-def.h
> > +++ b/gcc/config/loongarch/loongarch-def.h
> > @@ -203,5 +203,8 @@ extern loongarch_def_array > N_TUNE_TYPES>
> >     loongarch_cpu_align;
> >   extern loongarch_def_array
> >     loongarch_cpu_rtx_cost_data;
> > +extern loongarch_def_array<
> > +  loongarch_def_array,
> > +  N_ABI_BASE_TYPES> abi_minimal_isa;
> >   
> >   #endif /* LOONGARCH_DEF_H */
> > diff --git a/gcc/config/loongarch/loongarch-opts.cc 
> > b/gcc/config/loongarch/loongarch-opts.cc
> > index b87299513c9..7eeac43ed2f 100644
> > --- a/gcc/config/loongarch/loongarch-opts.cc
> > +++ b/gcc/config/loongarch/loongarch-opts.cc
> > @@ -53,8 +53,6 @@ static const int tm_multilib_list[] = { TM_MULTILIB_LIST 
> > };
> >   static int enabled_abi_types[N_ABI_BASE_TYPES][N_ABI_EXT_TYPES] = { 0 };
> >   
> >   #define isa_required(ABI) (abi_minimal_isa[(ABI).base][(ABI).ext])
> > -extern "C" const struct loongarch_isa
> > -abi_minimal_isa[N_ABI_BASE_TYPES][N_ABI_EXT_TYPES];
> >   
> >   static inline int
> >   is_multilib_enabled (struct loongarch_abi abi)
> 

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Pushed: [PATCH] LoongArch: Fix wrong LSX FP vector negation

2024-02-04 Thread Xi Ruoyao

On Sun, 2024-02-04 at 11:20 +0800, chenglulu wrote:
> 
> 在 2024/2/3 下午4:58, Xi Ruoyao 写道:
> > We expanded (neg x) to (minus const0 x) for LSX FP vectors, this is
> > wrong because -0.0 is not 0 - 0.0.  This causes some Python tests to
> > fail when Python is built with LSX enabled.
> > 
> > Use the vbitrevi.{d/w} instructions to simply reverse the sign bit
> > instead.  We are already doing this for LASX and now we can unify them
> > into simd.md.
> > 
> > gcc/ChangeLog:
> > 
> > * config/loongarch/lsx.md (neg2): Remove the
> > incorrect expand.
> > * config/loongarch/simd.md (simdfmt_as_i): New define_mode_attr.
> > (elmsgnbit): Likewise.
> > (neg2): New define_insn.
> > * config/loongarch/lasx.md (negv4df2, negv8sf2): Remove as they
> > are now instantiated in simd.md.
> > ---
> > 
> > Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?
> 
> LGTM!
> 
> Thanks!

Pushed r14-8785.


-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Pushed: [PATCH] LoongArch: Avoid out-of-bounds access in loongarch_symbol_insns

2024-02-04 Thread Xi Ruoyao

On Sun, 2024-02-04 at 11:19 +0800, chenglulu wrote:
> 
> 在 2024/2/2 下午5:55, Xi Ruoyao 写道:
> > We call loongarch_symbol_insns with mode = MAX_MACHINE_MODE sometimes.
> > But in loongarch_symbol_insns:
> > 
> >  if (LSX_SUPPORTED_MODE_P (mode) || LASX_SUPPORTED_MODE_P (mode))
> >    return 0;
> > 
> > And LSX_SUPPORTED_MODE_P is defined as:
> > 
> >  #define LSX_SUPPORTED_MODE_P(MODE) \
> >    (ISA_HAS_LSX \
> >     && GET_MODE_SIZE (MODE) == UNITS_PER_LSX_REG ... ...
> > 
> > GET_MODE_SIZE is expanded to a call to mode_to_bytes, which is defined:
> > 
> >  ALWAYS_INLINE poly_uint16
> >  mode_to_bytes (machine_mode mode)
> >  {
> >  #if GCC_VERSION >= 4001
> >    return (__builtin_constant_p (mode)
> >   ? mode_size_inline (mode) : mode_size[mode]);
> >  #else
> >    return mode_size[mode];
> >  #endif
> >  }
> > 
> > There is an assertion in mode_size_inline:
> > 
> >  gcc_assert (mode >= 0 && mode < NUM_MACHINE_MODES);
> > 
> > Note that NUM_MACHINE_MODES = MAX_MACHINE_MODE (emitted by genmodes.cc),
> > thus if __builtin_constant_p (mode) is evaluated true (it happens when
> > GCC is bootstrapped with LTO+PGO), the assertion will be triggered and
> > cause an ICE.  OTOH if __builtin_constant_p (mode) is evaluated false,
> > mode_size[mode] is still an out-of-bound array access (the length or the
> > mode_size array is NUM_MACHINE_MODES).
> > 
> > So we shouldn't call LSX_SUPPORTED_MODE_P or LASX_SUPPORTED_MODE_P with
> > MAX_MACHINE_MODE in loongarch_symbol_insns.  This is very similar to a
> > MIPS bug PR98491 fixed by me about 3 years ago.
> > 
> > gcc/ChangeLog:
> > 
> > * config/loongarch/loongarch.cc (loongarch_symbol_insns): Do not
> > use LSX_SUPPORTED_MODE_P or LASX_SUPPORTED_MODE_P if mode is
> > MAX_MACHINE_MODE.
> > ---
> > 
> > Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?
> 
> LGTM!

Pushed r14-8785.

> I have a question. I see that you often add compilation options in 
> BOOT_CFLAGS.
> 
> I also want to test it. Do you have a recommended set of compilation 
> options?

When I build a compiler for my system I use
{BOOT_{C,CXX,LD}FLAGS,{C,CXX,LD}FLAGS_FOR_TARGET}="-O3 -march=la664 -
mtune=la664 -pipe -fgraphite-identity -floop-nest-optimize -fipa-pta -
fdevirtualize-at-ltrans -fno-semantic-interposition -Wl,-O1 -Wl,--as-
needed"

and enable PGO (make profiledbootstrap) and LTO (--with-build-
config=bootstrap-lto).

All of them but GRAPHITE (-fgraphite-identity -floop-nest-optimize)
seems "pretty safe" on the architectures I have a hardware of.  GRAPHITE
is causing bootstrap failure on AArch64 with GCC 13 (PR109929) if
combined with PGO and the real cause is still not found yet.

But when I do a test build I normally only enable the flags which may
help to catch some issues, for example when a change only affects LTO I
add --with-build-config=bootstrap-lto, when changing something related
to LASX I use -O3 -mlasx (or -O3 -march=la664) as BOOT_CFLAGS.


-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

[PATCH] MIPS: Fix wrong MSA FP vector negation

2024-02-04 Thread Xi Ruoyao

We expanded (neg x) to (minus const0 x) for MSA FP vectors, this is
wrong because -0.0 is not 0 - 0.0.  This causes some Python tests to
fail when Python is built with MSA enabled.

Use the bnegi.df instructions to simply reverse the sign bit instead.

gcc/ChangeLog:

* config/mips/mips-msa.md (elmsgnbit): New define_mode_attr.
(neg2): Change the mode iterator from MSA to IMSA because
in FP arithmetic we cannot use (0 - x) for -x.
(neg2): New define_insn to implement FP vector negation,
using a bnegi instruction to negate the sign bit.
---

Bootstrapped and regtested on mips64el-linux-gnuabi64.  Ok for trunk
and/or release branches?

 gcc/config/mips/mips-msa.md | 18 +++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/gcc/config/mips/mips-msa.md b/gcc/config/mips/mips-msa.md
index 83d9a08e360..920161ed1d8 100644
--- a/gcc/config/mips/mips-msa.md
+++ b/gcc/config/mips/mips-msa.md
@@ -231,6 +231,10 @@ (define_mode_attr bitimm
(V4SI  "uimm5")
(V2DI  "uimm6")])
 
+;; The index of sign bit in FP vector elements.
+(define_mode_attr elmsgnbit [(V2DF "63") (V4DF "63")
+(V4SF "31") (V8SF "31")])
+
 (define_expand "vec_init"
   [(match_operand:MSA 0 "register_operand")
(match_operand:MSA 1 "")]
@@ -597,9 +601,9 @@ (define_expand "abs2"
 })
 
 (define_expand "neg2"
-  [(set (match_operand:MSA 0 "register_operand")
-   (minus:MSA (match_dup 2)
-  (match_operand:MSA 1 "register_operand")))]
+  [(set (match_operand:IMSA 0 "register_operand")
+   (minus:IMSA (match_dup 2)
+  (match_operand:IMSA 1 "register_operand")))]
   "ISA_HAS_MSA"
 {
   rtx reg = gen_reg_rtx (mode);
@@ -607,6 +611,14 @@ (define_expand "neg2"
   operands[2] = reg;
 })
 
+(define_insn "neg2"
+  [(set (match_operand:FMSA 0 "register_operand" "=f")
+   (neg (match_operand:FMSA 1 "register_operand" "f")))]
+  "ISA_HAS_MSA"
+  "bnegi.\t%w0,%w1,"
+  [(set_attr "type" "simd_bit")
+   (set_attr "mode" "")])
+
 (define_expand "msa_ldi"
   [(match_operand:IMSA 0 "register_operand")
(match_operand 1 "const_imm10_operand")]
-- 
2.43.0

Pushed: [PATCH] MIPS: Fix wrong MSA FP vector negation

2024-02-05 Thread Xi Ruoyao

On Mon, 2024-02-05 at 09:56 +0800, YunQiang Su wrote:
> Xi Ruoyao  于2024年2月5日周一 02:01写道：
> > 
> > We expanded (neg x) to (minus const0 x) for MSA FP vectors, this is
> > wrong because -0.0 is not 0 - 0.0.  This causes some Python tests to
> > fail when Python is built with MSA enabled.
> > 
> > Use the bnegi.df instructions to simply reverse the sign bit instead.
> > 
> > gcc/ChangeLog:
> > 
> >  * config/mips/mips-msa.md (elmsgnbit): New define_mode_attr.
> >  (neg2): Change the mode iterator from MSA to IMSA because
> >  in FP arithmetic we cannot use (0 - x) for -x.
> >  (neg2): New define_insn to implement FP vector negation,
> >  using a bnegi instruction to negate the sign bit.
> > ---
> > 
> > Bootstrapped and regtested on mips64el-linux-gnuabi64.  Ok for trunk
> > and/or release branches?
> > 
> >   gcc/config/mips/mips-msa.md | 18 +++---
> >   1 file changed, 15 insertions(+), 3 deletions(-)
> > 
> 
> LGTM, while I guess that we also need a test case.

Pushed to trunk and release branches, with a following obvious fix:

diff --git a/gcc/config/mips/mips-msa.md b/gcc/config/mips/mips-msa.md
index 920161ed1d8..779157f2a0c 100644
--- a/gcc/config/mips/mips-msa.md
+++ b/gcc/config/mips/mips-msa.md
@@ -613,7 +613,7 @@ (define_expand "neg2"
 
 (define_insn "neg2"
   [(set (match_operand:FMSA 0 "register_operand" "=f")
-   (neg (match_operand:FMSA 1 "register_operand" "f")))]
+   (neg:FMSA (match_operand:FMSA 1 "register_operand" "f")))]
   "ISA_HAS_MSA"
   "bnegi.\t%w0,%w1,"
   [(set_attr "type" "simd_bit")

I'll write a test case for gcc.dg/vect later (now I have to do
$SOME_REAL_LIFE_THING...)

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

[PATCH] testsuite: Add a test case for negating FP vectors containing zeros

2024-02-06 Thread Xi Ruoyao

Recently I've fixed two wrong FP vector negate implementation which
caused wrong sign bits in zeros in targets (r14-8786 and r14-8801).  To
prevent a similar issue from happening again, add a test case.

Tested on x86_64 (with SSE2, AVX, AVX2, and AVX512F), AArch64, MIPS
(with MSA), LoongArch (with LSX and LASX).

gcc/testsuite:

* gcc.dg/vect/vect-neg-zero.c: New test.
---

Ok for trunk?

 gcc/testsuite/gcc.dg/vect/vect-neg-zero.c | 39 +++
 1 file changed, 39 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-neg-zero.c

diff --git a/gcc/testsuite/gcc.dg/vect/vect-neg-zero.c 
b/gcc/testsuite/gcc.dg/vect/vect-neg-zero.c
new file mode 100644
index 000..adb032f5c6a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-neg-zero.c
@@ -0,0 +1,39 @@
+/* { dg-do run } */
+/* { dg-add-options ieee } */
+/* { dg-additional-options "-fsigned-zeros" } */
+
+double x[4] = {-0.0, 0.0, -0.0, 0.0};
+float y[8] = {-0.0, 0.0, -0.0, 0.0, -0.0, -0.0, 0.0, 0.0};
+
+static __attribute__ ((always_inline)) inline void
+test (int factor)
+{
+  double a[4];
+  float b[8];
+
+  asm ("" ::: "memory");
+
+  for (int i = 0; i < 2 * factor; i++)
+a[i] = -x[i];
+
+  for (int i = 0; i < 4 * factor; i++)
+b[i] = -y[i];
+
+#pragma GCC novector
+  for (int i = 0; i < 2 * factor; i++)
+if (__builtin_signbit (a[i]) == __builtin_signbit (x[i]))
+  __builtin_abort ();
+
+#pragma GCC novector
+  for (int i = 0; i < 4 * factor; i++)
+if (__builtin_signbit (b[i]) == __builtin_signbit (y[i]))
+  __builtin_abort ();
+}
+
+int
+main (void)
+{
+  test (1);
+  test (2);
+  return 0;
+}
-- 
2.43.0

LoongArch: Backport r14-4674 "LoongArch: Delete macro definition ASM_OUTPUT_ALIGN_WITH_NOP."?

2024-02-06 Thread Xi Ruoyao

Hi Lulu,

I'm proposing to backport r14-4674 "LoongArch: Delete macro definition
ASM_OUTPUT_ALIGN_WITH_NOP." to releases/gcc-12 and releases/gcc-13.  The
reasons:

1. Strictly speaking, the old ASM_OUTPUT_ALIGN_WITH_NOP macro may cause
a correctness issue.  For example, a developer may use -falign-
functions=16 and then use the low 4 bits of a function pointer to encode
some metainfo.  Then ASM_OUTPUT_ALIGN_WITH_NOP causes the functions not
really aligned to a 16 bytes boundary, causing some breakage.

2. With Binutils-2.42,  ASM_OUTPUT_ALIGN_WITH_NOP can cause illegal
opcodes.  For example:

.globl _start
_start:
.balign 32
nop
nop
nop
addi.d $a0, $r0, 1
.balign 16,54525952,4
addi.d $a0, $a0, 1

is assembled and linked to:

0220 <_start>:
 220:   0340nop
 224:   0340nop
 228:   0340nop
 22c:   02c00404li.d$a0, 1
 230:   .word   0x   # <== OOPS!
 234:   02c00484addi.d  $a0, $a0, 1

Arguably this is a bug in GAS (it should at least error out for the
unsupported case where .balign 16,54525952,4 appears with -mrelax; I'd
prefer it to support the 3-operand .align directive even -mrelax for
reasons I've given in [1]).  But we can at least work it around by
removing ASM_OUTPUT_ALIGN_WITH_NOP to allow using GCC 13.3 with Binutils
2.42.

3. Without ASM_OUTPUT_ALIGN_WITH_NOP, GCC just outputs something like
".align 5" which works as expected since Binutils-2.38.

4. GCC < 14 does not have a default setting of -falign-*, so changing
this won't affect anyone who do not specify -falign-* explicitly.

[1]:https://github.com/loongson-community/discussions/issues/41#issuecomment-1925872603

Is it OK to backport r14-4674 into releases/gcc-12 and releases/gcc-13
then?

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH] testsuite: Add a test case for negating FP vectors containing zeros

2024-02-06 Thread Xi Ruoyao

On Tue, 2024-02-06 at 17:55 +0800, Xi Ruoyao wrote:
> Recently I've fixed two wrong FP vector negate implementation which
> caused wrong sign bits in zeros in targets (r14-8786 and r14-8801).  To
> prevent a similar issue from happening again, add a test case.
> 
> Tested on x86_64 (with SSE2, AVX, AVX2, and AVX512F), AArch64, MIPS
> (with MSA), LoongArch (with LSX and LASX).
> 
> gcc/testsuite:
> 
>   * gcc.dg/vect/vect-neg-zero.c: New test.
> ---
> 
> Ok for trunk?
> 
>  gcc/testsuite/gcc.dg/vect/vect-neg-zero.c | 39 +++
>  1 file changed, 39 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-neg-zero.c
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-neg-zero.c 
> b/gcc/testsuite/gcc.dg/vect/vect-neg-zero.c
> new file mode 100644
> index 000..adb032f5c6a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-neg-zero.c
> @@ -0,0 +1,39 @@
> +/* { dg-do run } */

This patch fails on Linaro CI for ARM.  I guess I need to remove this {
dg-do run } line and let the test framework to decide run or compile.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: LoongArch: Backport r14-4674 "LoongArch: Delete macro definition ASM_OUTPUT_ALIGN_WITH_NOP."?

2024-02-09 Thread Xi Ruoyao

On Fri, 2024-02-09 at 00:02 +0800, chenglulu wrote:
> 
> 在 2024/2/7 上午12:23, Xi Ruoyao 写道:
> > Hi Lulu,
> > 
> > I'm proposing to backport r14-4674 "LoongArch: Delete macro definition
> > ASM_OUTPUT_ALIGN_WITH_NOP." to releases/gcc-12 and releases/gcc-13.  The
> > reasons:
> > 
> > 1. Strictly speaking, the old ASM_OUTPUT_ALIGN_WITH_NOP macro may cause
> > a correctness issue.  For example, a developer may use -falign-
> > functions=16 and then use the low 4 bits of a function pointer to encode
> > some metainfo.  Then ASM_OUTPUT_ALIGN_WITH_NOP causes the functions not
> > really aligned to a 16 bytes boundary, causing some breakage.
> > 
> > 2. With Binutils-2.42,  ASM_OUTPUT_ALIGN_WITH_NOP can cause illegal
> > opcodes.  For example:
> > 
> > .globl _start
> > _start:
> > .balign 32
> > nop
> > nop
> > nop
> > addi.d $a0, $r0, 1
> > .balign 16,54525952,4
> > addi.d $a0, $a0, 1
> > 
> > is assembled and linked to:
> > 
> > 0220 <_start>:
> >   220:  0340    nop
> >   224:  0340    nop
> >   228:  0340    nop
> >   22c:  02c00404    li.d$a0, 1
> >   230:      .word   0x   # <== OOPS!
> >   234:  02c00484    addi.d  $a0, $a0, 1
> > 
> > Arguably this is a bug in GAS (it should at least error out for the
> > unsupported case where .balign 16,54525952,4 appears with -mrelax; I'd
> > prefer it to support the 3-operand .align directive even -mrelax for
> > reasons I've given in [1]).  But we can at least work it around by
> > removing ASM_OUTPUT_ALIGN_WITH_NOP to allow using GCC 13.3 with Binutils
> > 2.42.
> > 
> > 3. Without ASM_OUTPUT_ALIGN_WITH_NOP, GCC just outputs something like
> > ".align 5" which works as expected since Binutils-2.38.
> > 
> > 4. GCC < 14 does not have a default setting of -falign-*, so changing
> > this won't affect anyone who do not specify -falign-* explicitly.
> > 
> > [1]:https://github.com/loongson-community/discussions/issues/41#issuecomment-1925872603
> > 
> > Is it OK to backport r14-4674 into releases/gcc-12 and releases/gcc-13
> > then?
> > 
> Ok, I agree with you.
> 
> Thanks!

Oops, with Binutils-2.41 GAS will fail to assemble some conditional
branches if we do this :(.

Not sure what to do (maybe backporting both this and a simplified
version of PR112330 fix?)  Let's reconsider after the holiday...

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: LoongArch: Backport r14-4674 "LoongArch: Delete macro definition ASM_OUTPUT_ALIGN_WITH_NOP."?

2024-02-20 Thread Xi Ruoyao

On Tue, 2024-02-20 at 10:07 +0800, chenglulu wrote:

> So I think that without worrying about performance and ensuring that 
> there is no problem
> 
> with binutils, I think we can make the following modifications:
> 
>    -/* "nop" instruction 54525952 (andi $r0,$r0,0) is
>    -   used for padding.  */
>    +/* ".align num,,4" will insert "nop"(andi $r0,$r0,0) into padding by
>    +   default.  */
>     #define ASM_OUTPUT_ALIGN_WITH_NOP(STREAM, LOG) \
>    -  fprintf (STREAM, "\t.align\t%d,54525952,4\n", (LOG))
>    +  fprintf (STREAM, "\t.align\t%d,,4\n", (LOG))
> 
> What do you think of it?

Unfortunately it will cause warnings with GAS 2.41 or earlier like

t1.s:1: Warning: expected fill pattern missing
t1.s:5: Warning: expected fill pattern missing

And AFAIK these things may cause many test failures due to "excessive
errors" if running the GCC test suite with these earlier GAS versions. 
Maybe we'll have to add some autoconf-based probing for the linker
anyway?

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: LoongArch: Backport r14-4674 "LoongArch: Delete macro definition ASM_OUTPUT_ALIGN_WITH_NOP."?

2024-02-20 Thread Xi Ruoyao

On Tue, 2024-02-20 at 19:25 +0800, Xi Ruoyao wrote:
> On Tue, 2024-02-20 at 10:07 +0800, chenglulu wrote:
> 
> > So I think that without worrying about performance and ensuring that
> > there is no problem
> > 
> > with binutils, I think we can make the following modifications:
> > 
> >    -/* "nop" instruction 54525952 (andi $r0,$r0,0) is
> >    -   used for padding.  */
> >    +/* ".align num,,4" will insert "nop"(andi $r0,$r0,0) into padding by
> >    +   default.  */
> >     #define ASM_OUTPUT_ALIGN_WITH_NOP(STREAM, LOG) \
> >    -  fprintf (STREAM, "\t.align\t%d,54525952,4\n", (LOG))
> >    +  fprintf (STREAM, "\t.align\t%d,,4\n", (LOG))
> > 
> > What do you think of it?
> 
> Unfortunately it will cause warnings with GAS 2.41 or earlier like
> 
> t1.s:1: Warning: expected fill pattern missing
> t1.s:5: Warning: expected fill pattern missing
> 
> And AFAIK these things may cause many test failures due to "excessive
> errors" if running the GCC test suite with these earlier GAS versions.
> Maybe we'll have to add some autoconf-based probing for the linker
> anyway?

Or just silence the warning passing "--no-warn" to the assembler but I'm
highly unsure if this is really a good idea :(.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: LoongArch: Backport r14-4674 "LoongArch: Delete macro definition ASM_OUTPUT_ALIGN_WITH_NOP."?

2024-02-20 Thread Xi Ruoyao

On Tue, 2024-02-20 at 19:50 +0800, chenglulu wrote:
> 
> 在 2024/2/20 下午7:31, Xi Ruoyao 写道:
> > On Tue, 2024-02-20 at 19:25 +0800, Xi Ruoyao wrote:
> > > On Tue, 2024-02-20 at 10:07 +0800, chenglulu wrote:
> > > 
> > > > So I think that without worrying about performance and ensuring that
> > > > there is no problem
> > > > 
> > > > with binutils, I think we can make the following modifications:
> > > > 
> > > >     -/* "nop" instruction 54525952 (andi $r0,$r0,0) is
> > > >     -   used for padding.  */
> > > >     +/* ".align num,,4" will insert "nop"(andi $r0,$r0,0) into padding 
> > > > by
> > > >     +   default.  */
> > > >  #define ASM_OUTPUT_ALIGN_WITH_NOP(STREAM, LOG) \
> > > >     -  fprintf (STREAM, "\t.align\t%d,54525952,4\n", (LOG))
> > > >     +  fprintf (STREAM, "\t.align\t%d,,4\n", (LOG))
> > > > 
> > > > What do you think of it?
> > > Unfortunately it will cause warnings with GAS 2.41 or earlier like
> > > 
> > > t1.s:1: Warning: expected fill pattern missing
> > > t1.s:5: Warning: expected fill pattern missing
> > > 
> > > And AFAIK these things may cause many test failures due to "excessive
> > > errors" if running the GCC test suite with these earlier GAS versions.
> > > Maybe we'll have to add some autoconf-based probing for the linker
> > > anyway?
> > Or just silence the warning passing "--no-warn" to the assembler but I'm
> > highly unsure if this is really a good idea :(.
> > 
> I am not opposed to adding detection code, but I looked at this problem 
> today
> 
> and I think this change is the smallest change. I asked Meng Qinggang and he
> 
> said that the warning of GAS 2.41 can be removed.

Yes, but we cannot change a released binutils-2.41 tarball and Binutils
folks don't make point releases like GCC.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

[PATCH] LoongArch: Don't falsely claim gold supported in toplevel configure

2024-02-22 Thread Xi Ruoyao

The gold linker has never been ported to LoongArch (and it seems
unlikely to be ported in the future as the new architectures are
focusing on lld and/or mold for fast linkers).

ChangeLog:

* configure.ac (ENABLE_GOLD): Remove loongarch*-*-* from target
list.
* configure: Regenerate.
---

Ok for GCC trunk (to get synced into Binutils later)?

 configure| 2 +-
 configure.ac | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/configure b/configure
index 874966fb9f0..02b435c1163 100755
--- a/configure
+++ b/configure
@@ -3092,7 +3092,7 @@ case "${ENABLE_GOLD}" in
   # Check for target supported by gold.
   case "${target}" in
 i?86-*-* | x86_64-*-* | sparc*-*-* | powerpc*-*-* | arm*-*-* \
-| aarch64*-*-* | tilegx*-*-* | mips*-*-* | s390*-*-* | loongarch*-*-*)
+| aarch64*-*-* | tilegx*-*-* | mips*-*-* | s390*-*-*)
  configdirs="$configdirs gold"
  if test x${ENABLE_GOLD} = xdefault; then
default_ld=gold
diff --git a/configure.ac b/configure.ac
index 4f34004a072..1a19c07a27b 100644
--- a/configure.ac
+++ b/configure.ac
@@ -364,7 +364,7 @@ case "${ENABLE_GOLD}" in
   # Check for target supported by gold.
   case "${target}" in
 i?86-*-* | x86_64-*-* | sparc*-*-* | powerpc*-*-* | arm*-*-* \
-| aarch64*-*-* | tilegx*-*-* | mips*-*-* | s390*-*-* | loongarch*-*-*)
+| aarch64*-*-* | tilegx*-*-* | mips*-*-* | s390*-*-*)
  configdirs="$configdirs gold"
  if test x${ENABLE_GOLD} = xdefault; then
default_ld=gold
-- 
2.43.2

[GCC 13 PATCH] LoongArch: Don't default to -mno-explicit-relocs if -mno-relax

2024-02-22 Thread Xi Ruoyao

To improve Binutils compatibility we've had to backported relaxation
support.  But if a user just updates to GCC 13.3 and sticks with
Binutils 2.41, there is no reason to use -mno-explicit-relocs as the
default because we are turning off relaxation for Binutils 2.41 (it
lacks conditional branch relaxation support) anyway.

So like GCC 14, make the default of -m[no-]explicit-relocs depend on
-m[no-]relax instead of HAVE_AS_MRELAX_OPTION.  Also update the doc to
reflect the behavior change.

gcc/ChangeLog:

* config/loongarch/genopts/loongarch.opt.in
(TARGET_EXPLICIT_RELOCS): Init to M_OPTION_NOT_SEEN.
* config/loongarch/loongarch.opt: Regenerate.
* config/loongarch/loongarch.cc
(loongarch_option_override_internal): Set the default of
TARGET_EXPLICIT_RELOCS to HAVE_AS_EXPLICIT_RELOCS
&& !loongarch_mrelax.
* doc/invoke.texi (-m[no-]explicit-relocs): Update for
LoongArch.
---

Ok for releases/gcc-13?

 gcc/config/loongarch/genopts/loongarch.opt.in |  2 +-
 gcc/config/loongarch/loongarch.cc |  4 
 gcc/config/loongarch/loongarch.opt|  2 +-
 gcc/doc/invoke.texi   | 11 +--
 4 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/gcc/config/loongarch/genopts/loongarch.opt.in 
b/gcc/config/loongarch/genopts/loongarch.opt.in
index da6fedd153e..76acd35d39c 100644
--- a/gcc/config/loongarch/genopts/loongarch.opt.in
+++ b/gcc/config/loongarch/genopts/loongarch.opt.in
@@ -155,7 +155,7 @@ Target Joined RejectNegative UInteger 
Var(loongarch_max_inline_memcpy_size) Init
 -mmax-inline-memcpy-size=SIZE  Set the max size of memcpy to inline, default 
is 1024.
 
 mexplicit-relocs
-Target Var(TARGET_EXPLICIT_RELOCS) Init(HAVE_AS_EXPLICIT_RELOCS & 
!HAVE_AS_MRELAX_OPTION)
+Target Var(TARGET_EXPLICIT_RELOCS) Init(M_OPTION_NOT_SEEN)
 Use %reloc() assembly operators.
 
 ; The code model option names for -mcmodel.
diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 768e2427285..e78b81cd8fc 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -6222,6 +6222,10 @@ loongarch_option_override_internal (struct gcc_options 
*opts)
gcc_unreachable ();
 }
 
+  if (TARGET_EXPLICIT_RELOCS == M_OPTION_NOT_SEEN)
+TARGET_EXPLICIT_RELOCS = (HAVE_AS_EXPLICIT_RELOCS
+ && !loongarch_mrelax);
+
   /* Validate the guard size.  */
   int guard_size = param_stack_clash_protection_guard_size;
 
diff --git a/gcc/config/loongarch/loongarch.opt 
b/gcc/config/loongarch/loongarch.opt
index 59b1e06d3f2..e61fbaed2c1 100644
--- a/gcc/config/loongarch/loongarch.opt
+++ b/gcc/config/loongarch/loongarch.opt
@@ -162,7 +162,7 @@ Target Joined RejectNegative UInteger 
Var(loongarch_max_inline_memcpy_size) Init
 -mmax-inline-memcpy-size=SIZE  Set the max size of memcpy to inline, default 
is 1024.
 
 mexplicit-relocs
-Target Var(TARGET_EXPLICIT_RELOCS) Init(HAVE_AS_EXPLICIT_RELOCS & 
!HAVE_AS_MRELAX_OPTION)
+Target Var(TARGET_EXPLICIT_RELOCS) Init(M_OPTION_NOT_SEEN)
 Use %reloc() assembly operators.
 
 ; The code model option names for -mcmodel.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 99657fb44d8..792ce283bb9 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -25830,12 +25830,11 @@ The default code model is @code{normal}.
 @itemx -mno-explicit-relocs
 Use or do not use assembler relocation operators when dealing with symbolic
 addresses.  The alternative is to use assembler macros instead, which may
-limit optimization.  The default value for the option is determined during
-GCC build-time by detecting corresponding assembler support:
-@code{-mexplicit-relocs} if said support is present,
-@code{-mno-explicit-relocs} otherwise.  This option is mostly useful for
-debugging, or interoperation with assemblers different from the build-time
-one.
+limit instruction scheduling but allow linker relaxation.  The default
+value for the option is determined with the assembler capability detected
+during GCC build-time and the setting of @code{-mrelax}:
+@code{-mexplicit-relocs} if the assembler supports relocation operators
+but @code{-mrelax} is not enabled, @code{-mno-explicit-relocs} otherwise.
 
 @opindex mdirect-extern-access
 @item -mdirect-extern-access
-- 
2.43.2

Re: [PATCH] LoongArch: Don't falsely claim gold supported in toplevel configure

2024-02-22 Thread Xi Ruoyao

On Fri, 2024-02-23 at 11:16 +0800, chenglulu wrote:
> 
> 在 2024/2/22 下午5:17, Xi Ruoyao 写道:
> > The gold linker has never been ported to LoongArch (and it seems
> > unlikely to be ported in the future as the new architectures are
> > focusing on lld and/or mold for fast linkers).
> > 
> > ChangeLog:
> > 
> >     * configure.ac (ENABLE_GOLD): Remove loongarch*-*-* from target
> >     list.
> >     * configure: Regenerate.
> > ---
> > 
> > Ok for GCC trunk (to get synced into Binutils later)?
> 
> I have no problem. But I have a question. Is this modification simply 
> because we don’t
> 
> support it or is there an error somewhere?

If a user specify --enable-gold building Binutils, with loongarch in
this list the building system will attempt to build gold and fail.  If
removing loongarch from the list the building system will ignore --
enable-gold.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Pushed: [PATCH] LoongArch: Don't falsely claim gold supported in toplevel configure

2024-02-23 Thread Xi Ruoyao

On Fri, 2024-02-23 at 11:37 +0800, chenglulu wrote:
> 
> 在 2024/2/23 上午11:27, Xi Ruoyao 写道:
> > On Fri, 2024-02-23 at 11:16 +0800, chenglulu wrote:
> > > 在 2024/2/22 下午5:17, Xi Ruoyao 写道:
> > > > The gold linker has never been ported to LoongArch (and it seems
> > > > unlikely to be ported in the future as the new architectures are
> > > > focusing on lld and/or mold for fast linkers).
> > > > 
> > > > ChangeLog:
> > > > 
> > > >     * configure.ac (ENABLE_GOLD): Remove loongarch*-*-* from target
> > > >     list.
> > > >     * configure: Regenerate.
> > > > ---
> > > > 
> > > > Ok for GCC trunk (to get synced into Binutils later)?
> > > I have no problem. But I have a question. Is this modification simply
> > > because we don’t
> > > 
> > > support it or is there an error somewhere?
> > If a user specify --enable-gold building Binutils, with loongarch in
> > this list the building system will attempt to build gold and fail.  If
> > removing loongarch from the list the building system will ignore --
> > enable-gold.
> > 
> Okay, I understand.

Pushed r14-9149 and the Binutils maintainer will pick it up before the
next Binutils release (AFAIK).

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Pushed: [GCC 13 PATCH] LoongArch: Don't default to -mno-explicit-relocs if -mno-relax

2024-02-23 Thread Xi Ruoyao

On Thu, 2024-02-22 at 19:09 +0800, chenglulu wrote:
> 
> 在 2024/2/22 下午6:20, Xi Ruoyao 写道:
> > To improve Binutils compatibility we've had to backported relaxation
> > support.  But if a user just updates to GCC 13.3 and sticks with
> > Binutils 2.41, there is no reason to use -mno-explicit-relocs as the
> > default because we are turning off relaxation for Binutils 2.41 (it
> > lacks conditional branch relaxation support) anyway.
> > 
> > So like GCC 14, make the default of -m[no-]explicit-relocs depend on
> > -m[no-]relax instead of HAVE_AS_MRELAX_OPTION.  Also update the doc
> > to
> > reflect the behavior change.
> > 
> > gcc/ChangeLog:
> > 
> > * config/loongarch/genopts/loongarch.opt.in
> > (TARGET_EXPLICIT_RELOCS): Init to M_OPTION_NOT_SEEN.
> > * config/loongarch/loongarch.opt: Regenerate.
> > * config/loongarch/loongarch.cc
> > (loongarch_option_override_internal): Set the default of
> > TARGET_EXPLICIT_RELOCS to HAVE_AS_EXPLICIT_RELOCS
> > && !loongarch_mrelax.
> > * doc/invoke.texi (-m[no-]explicit-relocs): Update for
> > LoongArch.
> > ---
> > 
> > Ok for releases/gcc-13?
> 
> LGTM!

Pushed r13-8357.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

[PATCH 1/2] LoongArch: NFC: Deduplicate crc instruction defines

2024-02-25 Thread Xi Ruoyao

Introduce an iterator for UNSPEC_CRC and UNSPEC_CRCC to make the next
change easier.

gcc/ChangeLog:

* config/loongarch/loongarch.md (CRC): New define_int_iterator.
(crc): New define_int_attr.
(loongarch_crc_w__w, loongarch_crcc_w__w): Unify
into ...
(loongarch__w__w): ... here.
---
 gcc/config/loongarch/loongarch.md | 18 +-
 1 file changed, 5 insertions(+), 13 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 2ce7a151880..4ded1b3a117 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -4251,24 +4251,16 @@ (define_peephole2
 
 
 (define_mode_iterator QHSD [QI HI SI DI])
+(define_int_iterator CRC [UNSPEC_CRC UNSPEC_CRCC])
+(define_int_attr crc [(UNSPEC_CRC "crc") (UNSPEC_CRCC "crcc")])
 
-(define_insn "loongarch_crc_w__w"
+(define_insn "loongarch__w__w"
   [(set (match_operand:SI 0 "register_operand" "=r")
(unspec:SI [(match_operand:QHSD 1 "register_operand" "r")
   (match_operand:SI 2 "register_operand" "r")]
-UNSPEC_CRC))]
+CRC))]
   ""
-  "crc.w..w\t%0,%1,%2"
-  [(set_attr "type" "unknown")
-   (set_attr "mode" "")])
-
-(define_insn "loongarch_crcc_w__w"
-  [(set (match_operand:SI 0 "register_operand" "=r")
-   (unspec:SI [(match_operand:QHSD 1 "register_operand" "r")
-  (match_operand:SI 2 "register_operand" "r")]
-UNSPEC_CRCC))]
-  ""
-  "crcc.w..w\t%0,%1,%2"
+  ".w..w\t%0,%1,%2"
   [(set_attr "type" "unknown")
(set_attr "mode" "")])
 
-- 
2.44.0

[PATCH 2/2] LoongArch: Remove unneeded sign extension after crc/crcc instructions

2024-02-25 Thread Xi Ruoyao

The specification of crc/crcc instructions is clear that the output is
sign-extended to GRLEN.  Add a define_insn to tell the compiler this
fact and allow it to remove the unneeded sign extension on crc/crcc
output.  As crc/crcc instructions are usually used in a tight loop,
this should produce a significant performance gain.

gcc/ChangeLog:

* config/loongarch/loongarch.md
(loongarch__w__w_extended): New define_insn.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/crc-sext.c: New test;
---

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?

 gcc/config/loongarch/loongarch.md | 11 +++
 gcc/testsuite/gcc.target/loongarch/crc-sext.c | 13 +
 2 files changed, 24 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/crc-sext.c

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 4ded1b3a117..525e1e82183 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -4264,6 +4264,17 @@ (define_insn "loongarch__w__w"
   [(set_attr "type" "unknown")
(set_attr "mode" "")])
 
+(define_insn "loongarch__w__w_extended"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+   (sign_extend:DI
+ (unspec:SI [(match_operand:QHSD 1 "register_operand" "r")
+ (match_operand:SI 2 "register_operand" "r")]
+CRC)))]
+  "TARGET_64BIT"
+  ".w..w\t%0,%1,%2"
+  [(set_attr "type" "unknown")
+   (set_attr "mode" "")])
+
 ;; With normal or medium code models, if the only use of a pc-relative
 ;; address is for loading or storing a value, then relying on linker
 ;; relaxation is not better than emitting the machine instruction directly.
diff --git a/gcc/testsuite/gcc.target/loongarch/crc-sext.c 
b/gcc/testsuite/gcc.target/loongarch/crc-sext.c
new file mode 100644
index 000..9ade5a8e4ca
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/crc-sext.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=loongarch64" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+/*
+**my_crc:
+** crc.w.d.w   \$r4,\$r4,\$r5
+** jr  \$r1
+*/
+int my_crc(long long dword, int crc)
+{
+   return __builtin_loongarch_crc_w_d_w(dword, crc);
+}
-- 
2.44.0

Re: [PATCH v2] LoongArch: Add support for TLS descriptors

2024-02-28 Thread Xi Ruoyao

On Thu, 2024-02-29 at 09:42 +0800, mengqinggang wrote:
> Generate la.tls.desc macro instruction for TLS descriptors model.
> 
> la.tls.desc expand to
>   pcalau12i $a0, %desc_pc_hi20(a)
>   ld.d  $a1, $a0, %desc_ld_pc_lo12(a)
>   addi.d    $a0, $a0, %desc_add_pc_lo12(a)
>   jirl  $ra, $a1, %desc_call(a)
> 
> The default is TLS descriptors, but can be configure with
> -mtls-dialect={desc,trad}.

Please keep trad as the default for now.  Glibc-2.40 will be released
after GCC 14.1 but we don't want to end up in a situation where the
default configuration of the latest GCC release creating something not
working with latest Glibc release.

And there's also musl libc we need to take into account.

Or you can write some autoconf test for if the assembler supports
tlsdesc and check TARGET_GLIBC_MAJOR & TARGET_GLIBC_MINOR for Glibc
version to decide if enable desc by default.  If you want this but don't
have time to implement you can leave trad the default and I'll take care
of this.

/* snip */

> +(define_insn "@got_load_tls_desc"
> +  [(set (match_operand:P 0 "register_operand" "=r")
> + (unspec:P
> +     [(match_operand:P 1 "symbolic_operand" "")]
> +     UNSPEC_TLS_DESC))
> +    (clobber (reg:SI FCC0_REGNUM))
> +    (clobber (reg:SI FCC1_REGNUM))
> +    (clobber (reg:SI FCC2_REGNUM))
> +    (clobber (reg:SI FCC3_REGNUM))
> +    (clobber (reg:SI FCC4_REGNUM))
> +    (clobber (reg:SI FCC5_REGNUM))
> +    (clobber (reg:SI FCC6_REGNUM))
> +    (clobber (reg:SI FCC7_REGNUM))
> +    (clobber (reg:SI A1_REGNUM))
> +    (clobber (reg:SI RETURN_ADDR_REGNUM))]

Ok, the clobber list is correct.

> +  "TARGET_TLS_DESC"
> +  "la.tls.desc\t%0,%1"

With -mexplicit-relocs=always we should emit %desc_pc_lo12 etc. instead
of la.tls.desc.  As we don't want to add too many code we can just hard
code the 4 instructions here instead of splitting this insn, just
something like

{ return TARGET_EXPLICIT_RELOCS_ALWAS ? "......" : "la.tls.desc\t%0,%1"; }

> +  [(set_attr "got" "load")
> +   (set_attr "mode" "")])

We need (set_attr "length" "16") in this list as this actually expands
into 16 bytes.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH v2] LoongArch: Add support for TLS descriptors

2024-02-28 Thread Xi Ruoyao

On Thu, 2024-02-29 at 14:08 +0800, Xi Ruoyao wrote:
> > +  "TARGET_TLS_DESC"
> > +  "la.tls.desc\t%0,%1"
> 
> With -mexplicit-relocs=always we should emit %desc_pc_lo12 etc. instead
> of la.tls.desc.  As we don't want to add too many code we can just hard
> code the 4 instructions here instead of splitting this insn, just
> something like
> 
> { return TARGET_EXPLICIT_RELOCS_ALWAS ? ".." : "la.tls.desc\t%0,%1"; }

And if -mcmodel=extreme we should use a 3-operand la.tls.desc.  Or if we
don't want to support this we can just error out if -mcmodel=extreme -
mtls-dialect=desc.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

[PATCH v2] testsuite: Make pr104992.c irrelated to target vector feature [PR113418]

2024-02-28 Thread Xi Ruoyao

The vect_int_mod target selector is evaluated with the options in
DEFAULT_VECTCFLAGS in effect, but these options are not automatically
passed to tests out of the vect directories.  So this test fails on
targets where integer vector modulo operation is supported but requiring
an option to enable, for example LoongArch.

In this test case, the only expected optimization not happened in
original is in corge because it needs forward propogation.  So we can
scan the forwprop2 dump (where the vector operation is not expanded to
scalars yet) instead of optimized, then we don't need to consider
vect_int_mod or not.

gcc/testsuite/ChangeLog:

PR testsuite/113418
* gcc.dg/pr104992.c (dg-options): Use -fdump-tree-forwprop2
instead of -fdump-tree-optimized.
(dg-final): Scan forwprop2 dump instead of optimized, and remove
the use of vect_int_mod.
* lib/target-supports.exp (check_effective_target_vect_int_mod):
Remove because it's not used anymore.
---

v1->v2: Remove check_effective_target_vect_int_mod as it's now unused.

This fixes the test failure on loongarch64-linux-gnu.  Also tested on
x86_64-linux-gnu.  Ok for trunk?

 gcc/testsuite/gcc.dg/pr104992.c   |  5 ++---
 gcc/testsuite/lib/target-supports.exp | 13 -
 2 files changed, 2 insertions(+), 16 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/pr104992.c b/gcc/testsuite/gcc.dg/pr104992.c
index 82f8c75559c..6fd513d34b2 100644
--- a/gcc/testsuite/gcc.dg/pr104992.c
+++ b/gcc/testsuite/gcc.dg/pr104992.c
@@ -1,6 +1,6 @@
 /* PR tree-optimization/104992 */
 /* { dg-do compile } */
-/* { dg-options "-O2 -Wno-psabi -fdump-tree-optimized" } */
+/* { dg-options "-O2 -Wno-psabi -fdump-tree-forwprop2" } */
 
 #define vector __attribute__((vector_size(4*sizeof(int
 
@@ -54,5 +54,4 @@ __attribute__((noipa)) unsigned waldo (unsigned x, unsigned 
y, unsigned z) {
 return x / y * z == x;
 }
 
-/* { dg-final { scan-tree-dump-times " % " 9 "optimized" { target { ! 
vect_int_mod } } } } */
-/* { dg-final { scan-tree-dump-times " % " 6 "optimized" { target vect_int_mod 
} } } */
+/* { dg-final { scan-tree-dump-times " % " 6 "forwprop2" } } */
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 4138cc9a662..ae33c4f1e3a 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -9064,19 +9064,6 @@ proc check_effective_target_vect_long_mult { } {
 return $answer
 }
 
-# Return 1 if the target supports vector int modulus, 0 otherwise.
-
-proc check_effective_target_vect_int_mod { } {
-return [check_cached_effective_target_indexed vect_int_mod {
-  expr { ([istarget powerpc*-*-*]
- && [check_effective_target_has_arch_pwr10])
- || [istarget amdgcn-*-*]
- || ([istarget loongarch*-*-*]
-&& [check_effective_target_loongarch_sx])
- || ([istarget riscv*-*-*]
-&& [check_effective_target_riscv_v]) }}]
-}
-
 # Return 1 if the target supports vector even/odd elements extraction, 0 
otherwise.
 
 proc check_effective_target_vect_extract_even_odd { } {
-- 
2.44.0

[PATCH v2] testsuite: Add a test case for negating FP vectors containing zeros

2024-02-28 Thread Xi Ruoyao

Recently I've fixed two wrong FP vector negate implementation which
caused wrong sign bits in zeros in targets (r14-8786 and r14-8801).  To
prevent a similar issue from happening again, add a test case.

Tested on x86_64 (with SSE2, AVX, AVX2, and AVX512F), AArch64, MIPS
(with MSA), LoongArch (with LSX and LASX).

gcc/testsuite:

* gcc.dg/vect/vect-neg-zero.c: New test.
---

v1->v2: Remove { dg-do run } which was likely triggering a SIGILL on
Linaro ARM CI.

Ok for trunk?

 gcc/testsuite/gcc.dg/vect/vect-neg-zero.c | 38 +++
 1 file changed, 38 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-neg-zero.c

diff --git a/gcc/testsuite/gcc.dg/vect/vect-neg-zero.c 
b/gcc/testsuite/gcc.dg/vect/vect-neg-zero.c
new file mode 100644
index 000..6af4a02c517
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-neg-zero.c
@@ -0,0 +1,38 @@
+/* { dg-add-options ieee } */
+/* { dg-additional-options "-fsigned-zeros" } */
+
+double x[4] = {-0.0, 0.0, -0.0, 0.0};
+float y[8] = {-0.0, 0.0, -0.0, 0.0, -0.0, -0.0, 0.0, 0.0};
+
+static __attribute__ ((always_inline)) inline void
+test (int factor)
+{
+  double a[4];
+  float b[8];
+
+  asm ("" ::: "memory");
+
+  for (int i = 0; i < 2 * factor; i++)
+a[i] = -x[i];
+
+  for (int i = 0; i < 4 * factor; i++)
+b[i] = -y[i];
+
+#pragma GCC novector
+  for (int i = 0; i < 2 * factor; i++)
+if (__builtin_signbit (a[i]) == __builtin_signbit (x[i]))
+  __builtin_abort ();
+
+#pragma GCC novector
+  for (int i = 0; i < 4 * factor; i++)
+if (__builtin_signbit (b[i]) == __builtin_signbit (y[i]))
+  __builtin_abort ();
+}
+
+int
+main (void)
+{
+  test (1);
+  test (2);
+  return 0;
+}
-- 
2.44.0

[PATCH] LoongArch: Emit R_LARCH_RELAX for TLS IE with non-extreme code model to allow the IE to LE linker relaxation

2024-02-28 Thread Xi Ruoyao

In Binutils we need to make IE to LE relaxation only allowed when there
is an R_LARCH_RELAX after R_LARCH_TLE_IE_PC_{HI20,LO12} so an invalid
"partial" relaxation won't happen with the extreme code model.  So if we
are emitting %ie_pc_{hi20,lo12} in a non-extreme code model, emit an
R_LARCH_RELAX to allow the relaxation.  The IE to LE relaxation does not
require the pcalau12i and the ld instruction to be adjacent, so we don't
need to limit ourselves to use the macro.

For the distro maintainers backporting changes: this change depends on
r14-8721, without r14-8721 R_LARCH_RELAX can be emitted mistakenly in
the extreme code model.

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_print_operand_reloc):
Support 'Q' for R_LARCH_RELAX for TLS IE.
(loongarch_output_move): Use 'Q' to print R_LARCH_RELAX for TLS
IE.
* config/loongarch/loongarch.md (ld_from_got): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/tls-ie-relax.c: New test.
* gcc.target/loongarch/tls-ie-norelax.c: New test.
* gcc.target/loongarch/tls-ie-extreme.c: New test.
---

Bootstrapped & regtested on loongarch64-linux-gnu.  Ok for trunk?

 gcc/config/loongarch/loongarch.cc | 15 ++-
 gcc/config/loongarch/loongarch.md |  2 +-
 .../gcc.target/loongarch/tls-ie-extreme.c |  5 +
 .../gcc.target/loongarch/tls-ie-norelax.c |  5 +
 gcc/testsuite/gcc.target/loongarch/tls-ie-relax.c | 11 +++
 5 files changed, 36 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/tls-ie-extreme.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/tls-ie-norelax.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/tls-ie-relax.c

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 0428b6e65d5..70e31bb831c 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -4981,7 +4981,7 @@ loongarch_output_move (rtx dest, rtx src)
  if (type == SYMBOL_TLS_LE)
return "lu12i.w\t%0,%h1";
  else
-   return "pcalau12i\t%0,%h1";
+   return "%Q1pcalau12i\t%0,%h1";
}
 
   if (src_code == CONST_INT)
@@ -6145,6 +6145,7 @@ loongarch_print_operand_reloc (FILE *file, rtx op, bool 
hi64_part,
'L'  Print the low-part relocation associated with OP.
'm' Print one less than CONST_INT OP in decimal.
'N' Print the inverse of the integer branch condition for comparison OP.
+   'Q'  Print R_LARCH_RELAX for TLS IE.
'r'  Print address 12-31bit relocation associated with OP.
'R'  Print address 32-51bit relocation associated with OP.
'T' Print 'f' for (eq:CC ...), 't' for (ne:CC ...),
@@ -6282,6 +6283,18 @@ loongarch_print_operand (FILE *file, rtx op, int letter)
letter);
   break;
 
+case 'Q':
+  if (!TARGET_LINKER_RELAXATION)
+   break;
+
+  if (code == HIGH)
+   op = XEXP (op, 0);
+
+  if (loongarch_classify_symbolic_expression (op) == SYMBOL_TLS_IE)
+   fprintf (file, ".reloc\t.,R_LARCH_RELAX\n\t");
+
+  break;
+
 case 'r':
   loongarch_print_operand_reloc (file, op, false /* hi64_part */,
 true /* lo_reloc */);
diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index f3b5c641fce..525e1e82183 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -2620,7 +2620,7 @@ (define_insn "@ld_from_got"
(match_operand:P 2 "symbolic_operand")))]
UNSPEC_LOAD_FROM_GOT))]
   ""
-  "ld.\t%0,%1,%L2"
+  "%Q2ld.\t%0,%1,%L2"
   [(set_attr "type" "move")]
 )
 
diff --git a/gcc/testsuite/gcc.target/loongarch/tls-ie-extreme.c 
b/gcc/testsuite/gcc.target/loongarch/tls-ie-extreme.c
new file mode 100644
index 000..00c545a3e8c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/tls-ie-extreme.c
@@ -0,0 +1,5 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=loongarch64 -mabi=lp64d -mcmodel=extreme 
-mexplicit-relocs=auto -mrelax" } */
+/* { dg-final { scan-assembler-not "R_LARCH_RELAX" { target tls_native } } } */
+
+#include "tls-ie-relax.c"
diff --git a/gcc/testsuite/gcc.target/loongarch/tls-ie-norelax.c 
b/gcc/testsuite/gcc.target/loongarch/tls-ie-norelax.c
new file mode 100644
index 000..dd6bf3634a4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/tls-ie-norelax.c
@@ -0,0 +1,5 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mcmodel=normal -mexplicit-relocs -mno-relax" } */
+/* { dg-final { scan-assembler-not "R_LARCH_RELAX" { target tls_native } } } */
+
+#include "tls-ie-relax.c"
diff --git a/gcc/testsuite/gcc.target/loongarch/tls-ie-relax.c 
b/gcc/testsuite/gcc.target/loongarch/tls-ie-relax.c
new file mode 100644
index 000..e9f7569b1da
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/tls-ie-relax.c
@@ -0,0

[PATCH] LoongArch: Allow s9 as a register alias

2024-02-28 Thread Xi Ruoyao

The psABI allows using s9 as an alias of r22.

gcc/ChangeLog:

* config/loongarch/loongarch.h (ADDITIONAL_REGISTER_NAMES): Add
s9 as an alias of r22.
---

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?

 gcc/config/loongarch/loongarch.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/config/loongarch/loongarch.h b/gcc/config/loongarch/loongarch.h
index 8b453ab3140..bf2351f0968 100644
--- a/gcc/config/loongarch/loongarch.h
+++ b/gcc/config/loongarch/loongarch.h
@@ -931,6 +931,7 @@ typedef struct {
   { "t8",  20 + GP_REG_FIRST },\
   { "x",   21 + GP_REG_FIRST },\
   { "fp",  22 + GP_REG_FIRST },\
+  { "s9",  22 + GP_REG_FIRST },\
   { "s0",  23 + GP_REG_FIRST },\
   { "s1",  24 + GP_REG_FIRST },\
   { "s2",  25 + GP_REG_FIRST },\
-- 
2.44.0

Re: [PATCH v2] testsuite: Add a test case for negating FP vectors containing zeros

2024-02-29 Thread Xi Ruoyao

On Thu, 2024-02-29 at 15:09 +0800, Xi Ruoyao wrote:
> Recently I've fixed two wrong FP vector negate implementation which
> caused wrong sign bits in zeros in targets (r14-8786 and r14-8801).  To
> prevent a similar issue from happening again, add a test case.
> 
> Tested on x86_64 (with SSE2, AVX, AVX2, and AVX512F), AArch64, MIPS
> (with MSA), LoongArch (with LSX and LASX).
> 
> gcc/testsuite:
> 
>   * gcc.dg/vect/vect-neg-zero.c: New test.
> ---
> 
> v1->v2: Remove { dg-do run } which was likely triggering a SIGILL on
> Linaro ARM CI.

Oops, still failing ARM CI.  Not sure why...

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH v2 2/2] LoongArch: When the code model is extreme, the symbol address is obtained through macro instructions regardless of the value of -mexplicit-relocs.

2024-01-12 Thread Xi Ruoyao

在 2024-01-12星期五的 09:46 +0800，chenglulu写道：

> > I found an issue bootstrapping GCC with -mcmodel=extreme in BOOT_CFLAGS:
> > we need a target hook to tell the generic code
> > UNSPEC_LA_PCREL_64_PART{1,2} are just a wrapper around symbols, or we'll
> > see millions lines of messages like
> > 
> > ../../gcc/gcc/tree.h:4171:1: note: non-delegitimized UNSPEC
> > UNSPEC_LA_PCREL_64_PART1 (42) found in variable location
> 
> I build GCC with -mcmodel=extreme in BOOT_CFLAGS, but I haven't reproduced 
> the problem you mentioned.
> 
>     $ ../configure --host=loongarch64-linux-gnu 
> --target=loongarch64-linux-gnu --build=loongarch64-linux-gnu \
>     --with-arch=loongarch64 --with-abi=lp64d --enable-tls 
> --enable-languages=c,c++,fortran,lto --enable-plugin \
>     --disable-multilib --disable-host-shared --enable-bootstrap 
> --enable-checking=release
>     $ make BOOT_FLAGS="-mcmodel=extreme"
> 
> What did I do wrong?:-(

BOOT_CFLAGS, not BOOT_FLAGS :).

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH v2 2/2] LoongArch: When the code model is extreme, the symbol address is obtained through macro instructions regardless of the value of -mexplicit-relocs.

2024-01-13 Thread Xi Ruoyao

在 2024-01-13星期六的 15:01 +0800，chenglulu写道：
> 
> 在 2024/1/12 下午7:42, Xi Ruoyao 写道:
> > 在 2024-01-12星期五的 09:46 +0800，chenglulu写道：
> > 
> > > > I found an issue bootstrapping GCC with -mcmodel=extreme in BOOT_CFLAGS:
> > > > we need a target hook to tell the generic code
> > > > UNSPEC_LA_PCREL_64_PART{1,2} are just a wrapper around symbols, or we'll
> > > > see millions lines of messages like
> > > > 
> > > > ../../gcc/gcc/tree.h:4171:1: note: non-delegitimized UNSPEC
> > > > UNSPEC_LA_PCREL_64_PART1 (42) found in variable location
> > > I build GCC with -mcmodel=extreme in BOOT_CFLAGS, but I haven't 
> > > reproduced the problem you mentioned.
> > > 
> > >  $ ../configure --host=loongarch64-linux-gnu 
> > > --target=loongarch64-linux-gnu --build=loongarch64-linux-gnu \
> > >  --with-arch=loongarch64 --with-abi=lp64d --enable-tls 
> > > --enable-languages=c,c++,fortran,lto --enable-plugin \
> > >  --disable-multilib --disable-host-shared --enable-bootstrap 
> > > --enable-checking=release
> > >  $ make BOOT_FLAGS="-mcmodel=extreme"
> > > 
> > > What did I do wrong?:-(
> > BOOT_CFLAGS, not BOOT_FLAGS :).
> > 
> This is so strange. My compilation here stopped due to syntax problems,
> 
> and I still haven't reproduced the information you mentioned about 
> UNSPEC_LA_PCREL_64_PART1.

I used:

../gcc/configure --with-system-zlib --disable-fixincludes \
 --enable-default-ssp --enable-default-pie \
 --disable-werror --disable-multilib \
 --prefix=/home/xry111/gcc-dev

and then

make STAGE1_{C,CXX}FLAGS="-O2 -g" -j8 \
 BOOT_{C,CXX}FLAGS="-O2 -g -mcmodel=extreme" &| tee gcc-build.log

I guess "-g" is needed to reproduce the issue as well as the messages
were produced in dwarf generation.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH v2] LoongArch: testsuite:Added additional vectorization "-mlsx" option.

2024-01-13 Thread Xi Ruoyao

在 2024-01-13星期六的 15:28 +0800，chenxiaolong写道：
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/pr104992.c: Added additional "-mlsx" compilation options.
>   * gcc.dg/signbit-2.c: Dito.
>   * gcc.dg/tree-ssa/scev-16.c: Dito.
>   * gfortran.dg/graphite/vect-pr40979.f90: Dito.
>   * gfortran.dg/vect/fast-math-mgrid-resid.f: Dito.

I don't feel it right about the changes to pr104992.c and scev-16.c
because no other architectures add special options there.  Why are we
so special?

> ---
>  gcc/testsuite/gcc.dg/pr104992.c    | 1 +
>  gcc/testsuite/gcc.dg/signbit-2.c   | 1 +
>  gcc/testsuite/gcc.dg/tree-ssa/scev-16.c    | 1 +
>  gcc/testsuite/gfortran.dg/graphite/vect-pr40979.f90    | 1 +
>  gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f | 1 +
>  5 files changed, 5 insertions(+)
> 
> diff --git a/gcc/testsuite/gcc.dg/pr104992.c b/gcc/testsuite/gcc.dg/pr104992.c
> index 82f8c75559c..a77992fa491 100644
> --- a/gcc/testsuite/gcc.dg/pr104992.c
> +++ b/gcc/testsuite/gcc.dg/pr104992.c
> @@ -1,6 +1,7 @@
>  /* PR tree-optimization/104992 */
>  /* { dg-do compile } */
>  /* { dg-options "-O2 -Wno-psabi -fdump-tree-optimized" } */
> +/* { dg-additional-options "-mlsx" { target loongarch_sx } } */
>  
>  #define vector __attribute__((vector_size(4*sizeof(int
>  
> diff --git a/gcc/testsuite/gcc.dg/signbit-2.c 
> b/gcc/testsuite/gcc.dg/signbit-2.c
> index 62bb4047d74..5511bb78149 100644
> --- a/gcc/testsuite/gcc.dg/signbit-2.c
> +++ b/gcc/testsuite/gcc.dg/signbit-2.c
> @@ -5,6 +5,7 @@
>  /* { dg-additional-options "-msse2 -mno-avx512f" { target { i?86-*-* 
> x86_64-*-* } } } */
>  /* { dg-additional-options "-march=armv8-a" { target aarch64_sve } } */
>  /* { dg-additional-options "-maltivec" { target powerpc_altivec_ok } } */
> +/* { dg-additional-options "-mlsx" { target loongarch_sx } } */
>  /* { dg-skip-if "no fallback for MVE" { arm_mve } } */
>  
>  #include 
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c
> index 120f40c0b6c..06cfbbcfae5 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c
> @@ -1,6 +1,7 @@
>  /* { dg-do compile } */
>  /* { dg-require-effective-target vect_int } */
>  /* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details" } */
> +/* { dg-additional-options "-mlsx" { target { loongarch*-*-* } } } */
>  
>  int A[1024 * 2];
>  
> diff --git a/gcc/testsuite/gfortran.dg/graphite/vect-pr40979.f90 
> b/gcc/testsuite/gfortran.dg/graphite/vect-pr40979.f90
> index a42290948c4..6f2ad1166a4 100644
> --- a/gcc/testsuite/gfortran.dg/graphite/vect-pr40979.f90
> +++ b/gcc/testsuite/gfortran.dg/graphite/vect-pr40979.f90
> @@ -1,6 +1,7 @@
>  ! { dg-do compile }
>  ! { dg-require-effective-target vect_double }
>  ! { dg-additional-options "-msse2" { target { { i?86-*-* x86_64-*-* } && 
> ilp32 } } }
> +! { dg-additional-options "-mlsx" { target { loongarch*-*-* } } }
>  
>  module mqc_m
>  integer, parameter, private :: longreal = selected_real_kind(15,90)
> diff --git a/gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f 
> b/gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f
> index 08965cc5e20..97b88821731 100644
> --- a/gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f
> +++ b/gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f
> @@ -2,6 +2,7 @@
>  ! { dg-require-effective-target vect_double }
>  ! { dg-options "-O3 --param vect-max-peeling-for-alignment=0 
> -fpredictive-commoning -fdump-tree-pcom-details -std=legacy" }
>  ! { dg-additional-options "-mprefer-avx128" { target { i?86-*-* x86_64-*-* } 
> } }
> +! { dg-additional-options "-mlsx" { target { loongarch*-*-* } } }
>  ! { dg-additional-options "-mzarch" { target { s390*-*-* } } }
>  
>  *** RESID COMPUTES THE RESIDUAL:  R = V - AU

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1155 matches

Mail list logo