[PATCH] LoongArch: Replace UNSPEC_FCOPYSIGN with copysign RTL
When I added copysign support for LoongArch (r13-3702), we did not have a copysign RTL insn, so I had to use UNSPEC to represent the copysign instruction. Now the copysign RTX code has been added in r14-1586, so this patch removes those UNSPECs, and it uses the native RTL copysign insn. Inspired by rs6000 patch "Cleanup: Replace UNSPEC_COPYSIGN with copysign RTL" [1] from Michael Meissner. [1]: https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631701.html gcc/ChangeLog: * config/loongarch/loongarch.md (UNSPEC_FCOPYSIGN): Delete. (copysign3): Use copysign RTL instead of UNSPEC. --- Bootstrapped and regtested on loongarch64-linux-gnu. Ok for trunk? gcc/config/loongarch/loongarch.md | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md index 2b09209945b..9916c741641 100644 --- a/gcc/config/loongarch/loongarch.md +++ b/gcc/config/loongarch/loongarch.md @@ -37,7 +37,6 @@ (define_c_enum "unspec" [ UNSPEC_FCLASS UNSPEC_FMAX UNSPEC_FMIN - UNSPEC_FCOPYSIGN UNSPEC_FTINT UNSPEC_FTINTRM UNSPEC_FTINTRP @@ -1130,9 +1129,8 @@ (define_insn "abs2" (define_insn "copysign3" [(set (match_operand:ANYF 0 "register_operand" "=f") - (unspec:ANYF [(match_operand:ANYF 1 "register_operand" "f") - (match_operand:ANYF 2 "register_operand" "f")] -UNSPEC_FCOPYSIGN))] + (copysign:ANYF (match_operand:ANYF 1 "register_operand" "f") + (match_operand:ANYF 2 "register_operand" "f")))] "TARGET_HARD_FLOAT" "fcopysign.\t%0,%1,%2" [(set_attr "type" "fcopysign") -- 2.42.0
Re: [PATCH] Support g++ 4.8 as a host compiler.
On Wed, 2023-10-04 at 23:19 +0100, Roger Sayle wrote: > > The recent patch to remove poly_int_pod triggers a bug in g++ 4.8.5's > C++ 11 support which mistakenly believes poly_uint16 has a non-trivial > constructor. This in turn prohibits it from being used as a member in > a union (rtxunion) that constructed statically, resulting in a (fatal) > error during stage 1. A workaround is to add an explicit constructor > to the problematic union, which allows mainline to be bootstrapped with > the system compiler on older RedHat 7 systems. > > This patch has been tested on x86_64-pc-linux-gnu where it allows a > bootstrap to complete when using g++ 4.8.5 as the host compiler. > Ok for mainline? > > > 2023-10-04 Roger Sayle > > gcc/ChangeLog > * rtl.h (rtx_def::u): Add explicit constructor to workaround > issue using g++ 4.8 as a host compiler. AFAIK G++ 5.1 also has a bug (https://gcc.gnu.org/PR65801) breaking building recent GCC. I don't think it's really "maintainable" to ensure current GCC able to be built with a buggy host compiler. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH] LoongArch: Reimplement multilib build option handling.
On Sat, 2023-10-07 at 11:41 +0800, Yang Yujie wrote: > Thanks for the testing! > > This error seems to be difficult to reproduce since it is a makefile > dependency > problem. I think appending loongarch-multilib.h to $(GTM_H) instead of > $(TM_H) > could help. FWIW such issues are easier to reproduce with a high -j number. I can easily reproduce it with -j32 on a 3C5000-based server. > > And when this is fixed, it might be a nice idea to have a > > --with-multilib-list config in ./contrib/config-list.mk . > > Thanks, will add this later too. > > P.S. Currently support for "f32" is not active, and it should probably be > avoided if you want to build a working rootfs. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
[PATCH] LoongArch: Use fcmp.caf.s instead of movgr2cf for zeroing a fcc
During the review of a LLVM change [1], on LA464 we found that zeroing a fcc with fcmp.caf.s is much faster than a movgr2cf from $r0. [1]: https://github.com/llvm/llvm-project/pull/69300 gcc/ChangeLog: * config/loongarch/loongarch.md (movfcc): Use fcmp.caf.s for zeroing a fcc. --- Bootstrapped and regtested on loongarch64-linux-gnu. Ok for trunk? gcc/config/loongarch/loongarch.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md index 68897799505..743e75907a6 100644 --- a/gcc/config/loongarch/loongarch.md +++ b/gcc/config/loongarch/loongarch.md @@ -2151,7 +2151,7 @@ (define_insn "movfcc" [(set (match_operand:FCC 0 "register_operand" "=z") (const_int 0))] "" - "movgr2cf\t%0,$r0") + "fcmp.caf.s\t%0,$f0,$f0") ;; Conditional move instructions. -- 2.42.0
Pushed: [PATCH] LoongArch: Use fcmp.caf.s instead of movgr2cf for zeroing a fcc
On Wed, 2023-10-18 at 09:34 +0800, chenglulu wrote: > > 在 2023/10/17 下午10:24, WANG Xuerui 写道: > > > > On 10/17/23 22:06, Xi Ruoyao wrote: > > > During the review of a LLVM change [1], on LA464 we found that zeroing > > "an" LLVM change (because the word LLVM is pronounced letter-by-letter) > > > a fcc with fcmp.caf.s is much faster than a movgr2cf from $r0. > > Similarly, "an" fcc > > > > > > [1]: https://github.com/llvm/llvm-project/pull/69300 > > > > > > gcc/ChangeLog: > > > > > > * config/loongarch/loongarch.md (movfcc): Use fcmp.caf.s for > > > zeroing a fcc. > > > --- > > > > > > Bootstrapped and regtested on loongarch64-linux-gnu. Ok for trunk? > > Ok! Pushed r14-4712 with the commit message modified following Xuerui's suggestion. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
[PATCH 1/5] LoongArch: Add enum-style -mexplicit-relocs= option
To take a better balance between scheduling and relaxation when -flto is enabled, add three-way -mexplicit-relocs={auto,none,always} options. The old -mexplicit-relocs and -mno-explicit-relocs options are still supported, they are mapped to -mexplicit-relocs=always and -mexplicit-relocs=none. The default choice is determined by probing assembler capabilities at build time. If the assembler does not supports explicit relocs at all, the default will be none; if it supports explicit relocs but not relaxation, the default will be always; if both explicit relocs and relaxation are supported, the default will be auto. Currently auto is same as none. We will make auto more clever in following changes. gcc/ChangeLog: * config/loongarch/genopts/loongarch-strings: Add strings for -mexplicit-relocs={auto,none,always}. * config/loongarch/genopts/loongarch.opt.in: Add options for -mexplicit-relocs={auto,none,always}. * config/loongarch/loongarch-str.h: Regenerate. * config/loongarch/loongarch.opt: Regenerate. * config/loongarch/loongarch-def.h (EXPLICIT_RELOCS_AUTO): Define. (EXPLICIT_RELOCS_NONE): Define. (EXPLICIT_RELOCS_ALWAYS): Define. (N_EXPLICIT_RELOCS_TYPES): Define. * config/loongarch/loongarch.cc (loongarch_option_override_internal): Error out if the old-style -m[no-]explicit-relocs option is used with -mexplicit-relocs={auto,none,always} together. Map -mno-explicit-relocs to -mexplicit-relocs=none and -mexplicit-relocs to -mexplicit-relocs=always for backward compatibility. Set a proper default for -mexplicit-relocs= based on configure-time probed linker capability. Update a diagnostic message to mention -mexplicit-relocs=always instead of the old-style -mexplicit-relocs. (loongarch_handle_model_attribute): Update a diagnostic message to mention -mexplicit-relocs=always instead of the old-style -mexplicit-relocs. * config/loongarch/loongarch.h (TARGET_EXPLICIT_RELOCS): Define. --- .../loongarch/genopts/loongarch-strings | 6 + gcc/config/loongarch/genopts/loongarch.opt.in | 21 ++-- gcc/config/loongarch/loongarch-def.h | 6 + gcc/config/loongarch/loongarch-str.h | 5 gcc/config/loongarch/loongarch.cc | 24 +-- gcc/config/loongarch/loongarch.h | 3 +++ gcc/config/loongarch/loongarch.opt| 21 ++-- 7 files changed, 80 insertions(+), 6 deletions(-) diff --git a/gcc/config/loongarch/genopts/loongarch-strings b/gcc/config/loongarch/genopts/loongarch-strings index adecaec3eda..8e412f7536e 100644 --- a/gcc/config/loongarch/genopts/loongarch-strings +++ b/gcc/config/loongarch/genopts/loongarch-strings @@ -63,3 +63,9 @@ STR_CMODEL_TS tiny-static STR_CMODEL_MEDIUM medium STR_CMODEL_LARGE large STR_CMODEL_EXTREMEextreme + +# -mexplicit-relocs +OPTSTR_EXPLICIT_RELOCS explicit-relocs +STR_EXPLICIT_RELOCS_AUTO auto +STR_EXPLICIT_RELOCS_NONE none +STR_EXPLICIT_RELOCS_ALWAYS always diff --git a/gcc/config/loongarch/genopts/loongarch.opt.in b/gcc/config/loongarch/genopts/loongarch.opt.in index 4a2d7438f1b..e1fe0c7086e 100644 --- a/gcc/config/loongarch/genopts/loongarch.opt.in +++ b/gcc/config/loongarch/genopts/loongarch.opt.in @@ -170,10 +170,27 @@ mmax-inline-memcpy-size= Target Joined RejectNegative UInteger Var(loongarch_max_inline_memcpy_size) Init(1024) -mmax-inline-memcpy-size=SIZE Set the max size of memcpy to inline, default is 1024. -mexplicit-relocs -Target Var(TARGET_EXPLICIT_RELOCS) Init(HAVE_AS_EXPLICIT_RELOCS & !HAVE_AS_MRELAX_OPTION) +Enum +Name(explicit_relocs) Type(int) +The code model option names for -mexplicit-relocs: + +EnumValue +Enum(explicit_relocs) String(@@STR_EXPLICIT_RELOCS_AUTO@@) Value(EXPLICIT_RELOCS_AUTO) + +EnumValue +Enum(explicit_relocs) String(@@STR_EXPLICIT_RELOCS_NONE@@) Value(EXPLICIT_RELOCS_NONE) + +EnumValue +Enum(explicit_relocs) String(@@STR_EXPLICIT_RELOCS_ALWAYS@@) Value(EXPLICIT_RELOCS_ALWAYS) + +mexplicit-relocs= +Target RejectNegative Joined Enum(explicit_relocs) Var(la_opt_explicit_relocs) Init(M_OPT_UNSET) Use %reloc() assembly operators. +mexplicit-relocs +Target Var(la_opt_explicit_relocs_backward) Init(M_OPT_UNSET) +Use %reloc() assembly operators (for backward compatibility). + ; The code model option names for -mcmodel. Enum Name(cmodel) Type(int) diff --git a/gcc/config/loongarch/loongarch-def.h b/gcc/config/loongarch/loongarch-def.h index 769efcb70fb..6e2a6987910 100644 --- a/gcc/config/loongarch/loongarch-def.h +++ b/gcc/config/loongarch/loongarch-def.h @@ -99,6 +99,12 @@ extern const char* loongarch_cmodel_strings[]; #define CMODEL_EXTREME 5 #define N_CMODEL_TYPES 6 +/* enum explicit_relocs */ +#define EXPLICIT_RELOCS_AUTO 0 +#define EXPLICIT_RELOCS_NONE 1 +#de
[PATCH 3/5] LoongArch: Use explicit relocs for TLS access with -mexplicit-relocs=auto
The linker does not know how to relax TLS access for LoongArch, so let's emit machine instructions with explicit relocs for TLS. gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_explicit_relocs_p): Return true for TLS symbol types if -mexplicit-relocs=auto. (loongarch_call_tls_get_addr): Replace TARGET_EXPLICIT_RELOCS with la_opt_explicit_relocs != EXPLICIT_RELOCS_NONE. (loongarch_legitimize_tls_address): Likewise. * config/loongarch/loongarch.md (@tls_low): Remove TARGET_EXPLICIT_RELOCS from insn condition. gcc/testsuite/ChangeLog: * gcc.target/loongarch/explicit-relocs-auto-tls-ld-gd.c: New test. * gcc.target/loongarch/explicit-relocs-auto-tls-le-ie.c: New test. --- gcc/config/loongarch/loongarch.cc | 37 --- gcc/config/loongarch/loongarch.md | 2 +- .../explicit-relocs-auto-tls-ld-gd.c | 9 + .../explicit-relocs-auto-tls-le-ie.c | 6 +++ 4 files changed, 40 insertions(+), 14 deletions(-) create mode 100644 gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-tls-ld-gd.c create mode 100644 gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-tls-le-ie.c diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index c12d77ea144..c782f571abc 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -1936,16 +1936,27 @@ loongarch_explicit_relocs_p (enum loongarch_symbol_type type) if (la_opt_explicit_relocs != EXPLICIT_RELOCS_AUTO) return la_opt_explicit_relocs == EXPLICIT_RELOCS_ALWAYS; - /* If we are performing LTO for a final link, and we have the linker - plugin so we know the resolution of the symbols, then all GOT - references are binding to external symbols or preemptable symbols. - So the linker cannot relax them. */ - return (in_lto_p - && !flag_incremental_link - && HAVE_LTO_PLUGIN == 2 - && (!global_options_set.x_flag_use_linker_plugin - || global_options.x_flag_use_linker_plugin) - && type == SYMBOL_GOT_DISP); + switch (type) +{ + case SYMBOL_TLS_IE: + case SYMBOL_TLS_LE: + case SYMBOL_TLSGD: + case SYMBOL_TLSLDM: + /* The linker don't know how to relax TLS accesses. */ + return true; + case SYMBOL_GOT_DISP: + /* If we are performing LTO for a final link, and we have the + linker plugin so we know the resolution of the symbols, then + all GOT references are binding to external symbols or + preemptable symbols. So the linker cannot relax them. */ + return (in_lto_p + && !flag_incremental_link + && HAVE_LTO_PLUGIN == 2 + && (!global_options_set.x_flag_use_linker_plugin + || global_options.x_flag_use_linker_plugin)); + default: + return false; +} } /* Returns the number of instructions necessary to reference a symbol. */ @@ -2753,7 +2764,7 @@ loongarch_call_tls_get_addr (rtx sym, enum loongarch_symbol_type type, rtx v0) start_sequence (); - if (TARGET_EXPLICIT_RELOCS) + if (la_opt_explicit_relocs != EXPLICIT_RELOCS_NONE) { /* Split tls symbol to high and low. */ rtx high = gen_rtx_HIGH (Pmode, copy_rtx (loc)); @@ -2918,7 +2929,7 @@ loongarch_legitimize_tls_address (rtx loc) tp = gen_rtx_REG (Pmode, THREAD_POINTER_REGNUM); tmp1 = gen_reg_rtx (Pmode); dest = gen_reg_rtx (Pmode); - if (TARGET_EXPLICIT_RELOCS) + if (la_opt_explicit_relocs != EXPLICIT_RELOCS_NONE) { tmp2 = loongarch_unspec_address (loc, SYMBOL_TLS_IE); tmp3 = gen_reg_rtx (Pmode); @@ -2955,7 +2966,7 @@ loongarch_legitimize_tls_address (rtx loc) tmp1 = gen_reg_rtx (Pmode); dest = gen_reg_rtx (Pmode); - if (TARGET_EXPLICIT_RELOCS) + if (la_opt_explicit_relocs != EXPLICIT_RELOCS_NONE) { tmp2 = loongarch_unspec_address (loc, SYMBOL_TLS_LE); tmp3 = gen_reg_rtx (Pmode); diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md index bec73f1bc91..695c8eb9a6f 100644 --- a/gcc/config/loongarch/loongarch.md +++ b/gcc/config/loongarch/loongarch.md @@ -2257,7 +2257,7 @@ (define_insn "@tls_low" (unspec:P [(mem:P (lo_sum:P (match_operand:P 1 "register_operand" "r") (match_operand:P 2 "symbolic_operand" "")))] UNSPEC_TLS_LOW))] - "TARGET_EXPLICIT_RELOCS" + "" "addi.\t%0,%1,%L2" [(set_attr "type" "arith") (set_attr "mode" "")]) diff --git a/gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-tls-ld-gd.c b/gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-tls-ld-gd.c new file mode 100644 index 000..957ff98df62 --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-tls-ld-gd
[PATCH 5/5] LoongArch: Document -mexplicit-relocs={auto,none,always}
gcc/ChangeLog: * doc/invoke.texi (-mexplicit-relocs=style): Document. (-mexplicit-relocs): Document as an alias of -mexplicit-relocs=always. (-mno-explicit-relocs): Document as an alias of -mexplicit-relocs=none. (-mcmodel=extreme): Mention -mexplicit-relocs=always instead of -mexplicit-relocs. --- gcc/doc/invoke.texi | 37 + 1 file changed, 25 insertions(+), 12 deletions(-) diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 16c45843123..f4633715e2b 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -1038,7 +1038,7 @@ Objective-C and Objective-C++ Dialects}. -mcond-move-float -mno-cond-move-float -memcpy -mno-memcpy -mstrict-align -mno-strict-align -mmax-inline-memcpy-size=@var{n} --mexplicit-relocs -mno-explicit-relocs +-mexplicit-relocs=@var{style} -mexplicit-relocs -mno-explicit-relocs -mdirect-extern-access -mno-direct-extern-access -mcmodel=@var{code-model}} @@ -26194,26 +26194,39 @@ The text segment and data segment must be within 2GB addressing space. @item extreme This mode does not limit the size of the code segment and data segment. -The @option{-mcmodel=extreme} option is incompatible with @option{-fplt} and -@option{-mno-explicit-relocs}. +The @option{-mcmodel=extreme} option is incompatible with @option{-fplt}, +and it requires @option{-mexplicit-relocs=always}. @end table The default code model is @code{normal}. -@opindex mexplicit-relocs -@opindex mno-explicit-relocs -@item -mexplicit-relocs -@itemx -mno-explicit-relocs -Use or do not use assembler relocation operators when dealing with symbolic +@item -mexplicit-relocs=@var{style} +Set when to use assembler relocation operators when dealing with symbolic addresses. The alternative is to use assembler macros instead, which may -limit instruction scheduling but allow linker relaxation. The default +limit instruction scheduling but allow linker relaxation. +with @option{-mexplicit-relocs=none} the assembler macros are always used, +with @option{-mexplicit-relocs=always} the assembler relocation operators +are always used, with @option{-mexplicit-relocs=auto} the compiler will +use the relocation operators where the linker relaxation is impossible to +improve the code quality, and macros elsewhere. The default value for the option is determined during GCC build-time by detecting corresponding assembler support: -@code{-mno-explicit-relocs} if the assembler supports relaxation or it -does not support relocation operators at all, -@code{-mexplicit-relocs} otherwise. This option is mostly useful for +@option{-mexplicit-relocs=none} if the assembler does not support +relocation operators at all, +@option{-mexplicit-relocs=always} if the assembler supports relocation +operators but does not support relaxation, +@option{-mexplicit-relocs=auto} if the assembler supports both relocation +operators and relaxation. This option is mostly useful for debugging, or interoperation with assemblers different from the build-time one. +@opindex mexplicit-relocs +@item -mexplicit-relocs +An alias of @option{-mexplicit-relocs=always} for backward compatibility. + +@opindex mno-explicit-relocs +@item -mno-explicit-relocs +An alias of @option{-mexplicit-relocs=none} for backward compatibility. + @opindex mdirect-extern-access @item -mdirect-extern-access @itemx -mno-direct-extern-access -- 2.42.0
[PATCH 0/5] LoongArch: Better balance between relaxation and scheduling
For relaxation we are now generating assembler macros for symbolic addresses everywhere, but this is limiting scheduling and there are known situations where the relaxation cannot improve the code. 1. When we are performing LTO during a final link and the linker plugin is used, la.global won't be relaxed because they reference to an external or preemptable symbol. 2. The linker currently do not relax la.tls.*. 3. For la.local + ld/st pairs, if the address is only used once, emitting pcalau12i + ld/st is always not worse than relying on linker relaxation. Add -mexplicit-relocs=auto to allow the compiler to use explicit relocs for these cases, but assembler macros for other cases. Use it as the default if the assembler supports both explicit relocs and relaxation. LTO-bootstrapped and regtested on loongarch64-linux-gnu. Ok for trunk? Xi Ruoyao (5): LoongArch: Add enum-style -mexplicit-relocs= option LoongArch: Use explicit relocs for GOT access when -mexplicit-relocs=auto and LTO during a final link with linker plugin LoongArch: Use explicit relocs for TLS access with -mexplicit-relocs=auto LoongArch: Use explicit relocs for addresses only used for one load or store with -mexplicit-relocs=auto and -mcmodel={normal,medium} LoongArch: Document -mexplicit-relocs={auto,none,always} .../loongarch/genopts/loongarch-strings | 6 + gcc/config/loongarch/genopts/loongarch.opt.in | 21 ++- gcc/config/loongarch/loongarch-def.h | 6 + gcc/config/loongarch/loongarch-protos.h | 1 + gcc/config/loongarch/loongarch-str.h | 5 + gcc/config/loongarch/loongarch.cc | 75 -- gcc/config/loongarch/loongarch.h | 3 + gcc/config/loongarch/loongarch.md | 128 +- gcc/config/loongarch/loongarch.opt| 21 ++- gcc/config/loongarch/predicates.md| 15 +- gcc/doc/invoke.texi | 37 +++-- .../loongarch/explicit-relocs-auto-lto.c | 26 ...-relocs-auto-single-load-store-no-anchor.c | 6 + .../explicit-relocs-auto-single-load-store.c | 14 ++ .../explicit-relocs-auto-tls-ld-gd.c | 9 ++ .../explicit-relocs-auto-tls-le-ie.c | 6 + 16 files changed, 343 insertions(+), 36 deletions(-) create mode 100644 gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-lto.c create mode 100644 gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-single-load-store-no-anchor.c create mode 100644 gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-single-load-store.c create mode 100644 gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-tls-ld-gd.c create mode 100644 gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-tls-le-ie.c -- 2.42.0
[PATCH 2/5] LoongArch: Use explicit relocs for GOT access when -mexplicit-relocs=auto and LTO during a final link with linker plugin
If we are performing LTO for a final link and linker plugin is enabled, then we are sure any GOT access may resolve to a symbol out of the link unit (otherwise the linker plugin will tell us the symbol should be resolved locally and we'll use PC-relative access instead). Produce machine instructions with explicit relocs instead of la.global for better scheduling. gcc/ChangeLog: * config/loongarch/loongarch-protos.h (loongarch_explicit_relocs_p): Declare new function. * config/loongarch/loongarch.cc (loongarch_explicit_relocs_p): Implement. (loongarch_symbol_insns): Call loongarch_explicit_relocs_p for SYMBOL_GOT_DISP, instead of using TARGET_EXPLICIT_RELOCS. (loongarch_split_symbol): Call loongarch_explicit_relocs_p for deciding if return early, instead of using TARGET_EXPLICIT_RELOCS. (loongarch_output_move): CAll loongarch_explicit_relocs_p instead of using TARGET_EXPLICIT_RELOCS. * config/loongarch/loongarch.md (*low): Remove TARGET_EXPLICIT_RELOCS from insn condition. (@ld_from_got): Likewise. * config/loongarch/predicates.md (move_operand): Call loongarch_explicit_relocs_p instead of using TARGET_EXPLICIT_RELOCS. gcc/testsuite/ChangeLog: * gcc.target/loongarch/explicit-relocs-auto-lto.c: New test. --- gcc/config/loongarch/loongarch-protos.h | 1 + gcc/config/loongarch/loongarch.cc | 34 +++ gcc/config/loongarch/loongarch.md | 4 +-- gcc/config/loongarch/predicates.md| 8 ++--- .../loongarch/explicit-relocs-auto-lto.c | 26 ++ 5 files changed, 59 insertions(+), 14 deletions(-) create mode 100644 gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-lto.c diff --git a/gcc/config/loongarch/loongarch-protos.h b/gcc/config/loongarch/loongarch-protos.h index 72ae9918b09..cb8fc36b086 100644 --- a/gcc/config/loongarch/loongarch-protos.h +++ b/gcc/config/loongarch/loongarch-protos.h @@ -220,4 +220,5 @@ extern rtx loongarch_gen_const_int_vector_shuffle (machine_mode, int); extern tree loongarch_build_builtin_va_list (void); extern rtx loongarch_build_signbit_mask (machine_mode, bool, bool); +extern bool loongarch_explicit_relocs_p (enum loongarch_symbol_type); #endif /* ! GCC_LOONGARCH_PROTOS_H */ diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index 5df8b12ed92..c12d77ea144 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -1925,6 +1925,29 @@ loongarch_symbolic_constant_p (rtx x, enum loongarch_symbol_type *symbol_type) gcc_unreachable (); } +/* If -mexplicit-relocs=auto, we use machine operations with reloc hints + for cases where the linker is unable to relax so we can schedule the + machine operations, otherwise use an assembler pseudo-op so the + assembler will generate R_LARCH_RELAX. */ + +bool +loongarch_explicit_relocs_p (enum loongarch_symbol_type type) +{ + if (la_opt_explicit_relocs != EXPLICIT_RELOCS_AUTO) +return la_opt_explicit_relocs == EXPLICIT_RELOCS_ALWAYS; + + /* If we are performing LTO for a final link, and we have the linker + plugin so we know the resolution of the symbols, then all GOT + references are binding to external symbols or preemptable symbols. + So the linker cannot relax them. */ + return (in_lto_p + && !flag_incremental_link + && HAVE_LTO_PLUGIN == 2 + && (!global_options_set.x_flag_use_linker_plugin + || global_options.x_flag_use_linker_plugin) + && type == SYMBOL_GOT_DISP); +} + /* Returns the number of instructions necessary to reference a symbol. */ static int @@ -1940,7 +1963,7 @@ loongarch_symbol_insns (enum loongarch_symbol_type type, machine_mode mode) case SYMBOL_GOT_DISP: /* The constant will have to be loaded from the GOT before it is used in an address. */ - if (!TARGET_EXPLICIT_RELOCS && mode != MAX_MACHINE_MODE) + if (!loongarch_explicit_relocs_p (type) && mode != MAX_MACHINE_MODE) return 0; return 3; @@ -3038,7 +3061,7 @@ loongarch_symbol_extreme_p (enum loongarch_symbol_type type) If so, and if LOW_OUT is nonnull, emit the high part and store the low part in *LOW_OUT. Leave *LOW_OUT unchanged otherwise. - Return false if build with '-mno-explicit-relocs'. + Return false if build with '-mexplicit-relocs=none'. TEMP is as for loongarch_force_temporary and is used to load the high part into a register. @@ -3052,12 +3075,9 @@ loongarch_split_symbol (rtx temp, rtx addr, machine_mode mode, rtx *low_out) { enum loongarch_symbol_type symbol_type; - /* If build with '-mno-explicit-relocs', don't split symbol. */ - if (!TARGET_EXPLICIT_RELOCS) -return false; - if ((GET_CODE (addr) == HIGH && mode == MAX_MACHINE_MODE) || !loongarch_symbolic_constant_p (addr, &symbol_type) + ||
[PATCH 4/5] LoongArch: Use explicit relocs for addresses only used for one load or store with -mexplicit-relocs=auto and -mcmodel={normal, medium}
In these cases, if we use explicit relocs, we end up with 2 instructions: pcalau12it0, %pc_hi20(x) ld.d t0, t0, %pc_lo12(x) If we use la.local pseudo-op, in the best scenario (x is in +/- 2MiB range) we still have 2 instructions: pcaddi t0, %pcrel_20(x) ld.d t0, t0, 0 If x is out of the range we'll have 3 instructions. So for these cases just emit machine instructions with explicit relocs. gcc/ChangeLog: * config/loongarch/predicates.md (symbolic_pcrel_operand): New predicate. * config/loongarch/loongarch.md (define_peephole2): Optimize la.local + ld/st to pcalau12i + ld/st if the address is only used once if -mexplicit-relocs=auto and -mcmodel=normal or medium. gcc/testsuite/ChangeLog: * gcc.target/loongarch/explicit-relocs-auto-single-load-store.c: New test. * gcc.target/loongarch/explicit-relocs-auto-single-load-store-no-anchor.c: New test. --- gcc/config/loongarch/loongarch.md | 122 ++ gcc/config/loongarch/predicates.md| 7 + ...-relocs-auto-single-load-store-no-anchor.c | 6 + .../explicit-relocs-auto-single-load-store.c | 14 ++ 4 files changed, 149 insertions(+) create mode 100644 gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-single-load-store-no-anchor.c create mode 100644 gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-single-load-store.c diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md index 695c8eb9a6f..13473472171 100644 --- a/gcc/config/loongarch/loongarch.md +++ b/gcc/config/loongarch/loongarch.md @@ -65,6 +65,7 @@ (define_c_enum "unspec" [ UNSPEC_LOAD_FROM_GOT UNSPEC_PCALAU12I + UNSPEC_PCALAU12I_GR UNSPEC_ORI_L_LO12 UNSPEC_LUI_L_HI20 UNSPEC_LUI_H_LO20 @@ -2297,6 +2298,16 @@ (define_insn "@pcalau12i" "pcalau12i\t%0,%%pc_hi20(%1)" [(set_attr "type" "move")]) +;; @pcalau12i may be used for sibcall so it has a strict constraint. This +;; allows any general register as the operand. +(define_insn "@pcalau12i_gr" + [(set (match_operand:P 0 "register_operand" "=r") + (unspec:P [(match_operand:P 1 "symbolic_operand" "")] + UNSPEC_PCALAU12I_GR))] + "" + "pcalau12i\t%0,%%pc_hi20(%1)" + [(set_attr "type" "move")]) + (define_insn "@ori_l_lo12" [(set (match_operand:P 0 "register_operand" "=r") (unspec:P [(match_operand:P 1 "register_operand" "r") @@ -3748,6 +3759,117 @@ (define_insn "loongarch_crcc_w__w" [(set_attr "type" "unknown") (set_attr "mode" "")]) +;; With normal or medium code models, if the only use of a pc-relative +;; address is for loading or storing a value, then relying on linker +;; relaxation is not better than emitting the machine instruction directly. +;; Even if the la.local pseudo op can be relaxed, we get: +;; +;; pcaddi $t0, %pcrel_20(x) +;; ld.d $t0, $t0, 0 +;; +;; There are still two instructions, same as using the machine instructions +;; and explicit relocs: +;; +;; pcalau12i $t0, %pc_hi20(x) +;; ld.d $t0, $t0, %pc_lo12(x) +;; +;; And if the pseudo op cannot be relaxed, we'll get a worse result (with +;; 3 instructions). +(define_peephole2 + [(set (match_operand:P 0 "register_operand") + (match_operand:P 1 "symbolic_pcrel_operand")) + (set (match_operand:GPR 2 "register_operand") + (mem:GPR (match_dup 0)))] + "la_opt_explicit_relocs == EXPLICIT_RELOCS_AUTO \ + && (TARGET_CMODEL_NORMAL || TARGET_CMODEL_MEDIUM) \ + && (peep2_reg_dead_p (2, operands[0]) \ + || REGNO (operands[0]) == REGNO (operands[2]))" + [(set (match_dup 2) (mem:GPR (lo_sum:P (match_dup 0) (match_dup 1] + { +emit_insn (gen_pcalau12i_gr (operands[0], operands[1])); + }) + +(define_peephole2 + [(set (match_operand:P 0 "register_operand") + (match_operand:P 1 "symbolic_pcrel_operand")) + (set (match_operand:GPR 2 "register_operand") + (mem:GPR (plus (match_dup 0) + (match_operand 3 "const_int_operand"] + "la_opt_explicit_relocs == EXPLICIT_RELOCS_AUTO \ + && (TARGET_CMODEL_NORMAL || TARGET_CMODEL_MEDIUM) \ + && (peep2_reg_dead_p (2, operands[0]) \ + || REGNO (operands[0]) == REGNO (operands[2]))" + [(set (match_dup 2) (mem:GPR (lo_sum:P (match_dup 0) (match_dup 1] + { +operands[1] = plus_constant (Pmode, operands[1], INTVAL (operands[3])); +emit_insn (gen_pcalau12i_gr (operands[0], operands[1])); + }) + +(define_peephole2 + [(set (match_operand:P 0 "register_operand") + (match_operand:P 1 "symbolic_pcrel_operand")) + (set (match_operand:GPR 2 "register_operand") + (any_extend:GPR (mem:SUBDI (match_dup 0] + "la_opt_explicit_relocs == EXPLICIT_RELOCS_AUTO \ + && (TARGET_CMODEL_NORMAL || TARGET_CMODEL_MEDIUM) \ + && (peep2_reg_dead_p (2, operands[0]) \ + || REGNO (operands[0]) == REGNO (operands[2]))" + [(set (match_dup 2) + (any_extend:GPR (mem:SUBDI (lo_sum:P
Re: [PATCH 2/5] LoongArch: Use explicit relocs for GOT access when -mexplicit-relocs=auto and LTO during a final link with linker plugin
On Sat, 2023-10-21 at 15:32 +0800, chenglulu wrote: > > + /* If we are performing LTO for a final link, and we have the linker > > + plugin so we know the resolution of the symbols, then all GOT > > + references are binding to external symbols or preemptable symbols. > > + So the linker cannot relax them. */ > > + return (in_lto_p > > + && !flag_incremental_link > > I don’t quite understand this condition "!flag_incremental_link". Can > you explain it? Others LGTM. > > Thanks. If we have two (or several) .o files containing LTO bytecode, GCC supports "LTO incremental linking" with: gcc a.o b.o -o ab.o -O2 -flto -flinker-output=nolto-rel The resulted ab.o will include data and code in a.o and b.o, but it contains machine code instead of LTO bytecode. Now if ab.o refers to an external symbol c, the linker may relax "la.global c" to "la.local c" (if ab.o is linked together with another file c.o which contains the definition of c) or not. As we cannot exclude the possibility of a relaxation on la.global for incremental linking, just emit la.global and let the linker to do the correct thing. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Pushed: [PATCH 0/5] LoongArch: Better balance between relaxation and scheduling
Pushed r14-{4848..4852}. On Thu, 2023-10-19 at 22:02 +0800, Xi Ruoyao wrote: > For relaxation we are now generating assembler macros for symbolic > addresses everywhere, but this is limiting scheduling and there are > known situations where the relaxation cannot improve the code. > > 1. When we are performing LTO during a final link and the linker plugin > is used, la.global won't be relaxed because they reference to an > external or preemptable symbol. > 2. The linker currently do not relax la.tls.*. > 3. For la.local + ld/st pairs, if the address is only used once, > emitting pcalau12i + ld/st is always not worse than relying on linker > relaxation. > > Add -mexplicit-relocs=auto to allow the compiler to use explicit relocs > for these cases, but assembler macros for other cases. Use it as the > default if the assembler supports both explicit relocs and relaxation. > > LTO-bootstrapped and regtested on loongarch64-linux-gnu. Ok for trunk? > > Xi Ruoyao (5): > LoongArch: Add enum-style -mexplicit-relocs= option > LoongArch: Use explicit relocs for GOT access when > -mexplicit-relocs=auto and LTO during a final link with linker > plugin > LoongArch: Use explicit relocs for TLS access with > -mexplicit-relocs=auto > LoongArch: Use explicit relocs for addresses only used for one load or > store with -mexplicit-relocs=auto and -mcmodel={normal,medium} > LoongArch: Document -mexplicit-relocs={auto,none,always} > > .../loongarch/genopts/loongarch-strings | 6 + > gcc/config/loongarch/genopts/loongarch.opt.in | 21 ++- > gcc/config/loongarch/loongarch-def.h | 6 + > gcc/config/loongarch/loongarch-protos.h | 1 + > gcc/config/loongarch/loongarch-str.h | 5 + > gcc/config/loongarch/loongarch.cc | 75 -- > gcc/config/loongarch/loongarch.h | 3 + > gcc/config/loongarch/loongarch.md | 128 +- > gcc/config/loongarch/loongarch.opt | 21 ++- > gcc/config/loongarch/predicates.md | 15 +- > gcc/doc/invoke.texi | 37 +++-- > .../loongarch/explicit-relocs-auto-lto.c | 26 > ...-relocs-auto-single-load-store-no-anchor.c | 6 + > .../explicit-relocs-auto-single-load-store.c | 14 ++ > .../explicit-relocs-auto-tls-ld-gd.c | 9 ++ > .../explicit-relocs-auto-tls-le-ie.c | 6 + > 16 files changed, 343 insertions(+), 36 deletions(-) > create mode 100644 > gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-lto.c > create mode 100644 > gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-single-load-store-no-anchor.c > create mode 100644 > gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-single-load-store.c > create mode 100644 > gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-tls-ld-gd.c > create mode 100644 > gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-tls-le-ie.c > -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
[PATCH] LoongArch: Define HAVE_AS_TLS to 0 if it's undefined
Now loongarch.md uses HAVE_AS_TLS, we need this to fix the failure building a cross compiler if the cross assembler is not installed yet. gcc/ChangeLog: * config/loongarch/loongarch-opts.h (HAVE_AS_TLS): Define to 0 if not defined yet. --- Ok for trunk? gcc/config/loongarch/loongarch-opts.h | 4 1 file changed, 4 insertions(+) diff --git a/gcc/config/loongarch/loongarch-opts.h b/gcc/config/loongarch/loongarch-opts.h index 2756939b05d..f204828015e 100644 --- a/gcc/config/loongarch/loongarch-opts.h +++ b/gcc/config/loongarch/loongarch-opts.h @@ -101,4 +101,8 @@ loongarch_update_gcc_opt_status (struct loongarch_target *target, #define HAVE_AS_MRELAX_OPTION 0 #endif +#ifndef HAVE_AS_TLS +#define HAVE_AS_TLS 0 +#endif + #endif /* LOONGARCH_OPTS_H */ -- 2.42.0
Re: [PATCH] LoongArch: Define HAVE_AS_TLS to 0 if it's undefined
On Mon, 2023-10-30 at 19:50 +0800, chenglulu wrote: > 在 2023/10/30 下午7:42, Xi Ruoyao 写道: > > Now loongarch.md uses HAVE_AS_TLS, we need this to fix the failure > > building a cross compiler if the cross assembler is not installed yet. > > > > gcc/ChangeLog: > > > > * config/loongarch/loongarch-opts.h (HAVE_AS_TLS): Define to 0 > > if not defined yet. > > --- > > > > Ok for trunk? > I have no problem with this submission, but I don't understand the > circumstances surrounding the error. When the developers hack GCC they sometimes build a cross compiler with no cross assembler, then HAVE_AS_TLS will just be undefined. And in the future we may have an assmebler w/o TLS support (for example a tiny assembler for bare-metal target), then HAVE_AS_TLS will be undefined too. The error message is: g++ -c -g -O2 -DIN_GCC -DCROSS_DIRECTORY_STRUCTURE -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Wconditionally-supported -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -fno-common -DHAVE_CONFIG_H -DGENERATOR_FILE -I. -Ibuild -I../../gcc/gcc -I../../gcc/gcc/build -I../../gcc/gcc/../include -I../../gcc/gcc/../libcpp/include \ -o build/gencondmd.o build/gencondmd.cc ../../gcc/gcc/config/loongarch/loongarch.md:3655:2: error: 'HAVE_AS_TLS' was not declared in this scope 3655 | "HAVE_AS_TLS" | ^~~ ../../gcc/gcc/config/loongarch/loongarch.md:3655:2: error: 'HAVE_AS_TLS' was not declared in this scope 3655 | "HAVE_AS_TLS" | ^~~ ../../gcc/gcc/config/loongarch/loongarch.md:3655:2: error: 'HAVE_AS_TLS' was not declared in this scope 3655 | "HAVE_AS_TLS" | ^~~ ../../gcc/gcc/config/loongarch/loongarch.md:3655:2: error: 'HAVE_AS_TLS' was not declared in this scope 3655 | "HAVE_AS_TLS" | ^~~ make[1]: *** [Makefile:2962: build/gencondmd.o] Error 1 -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Pushed: [PATCH v2] LoongArch: Define HAVE_AS_TLS to 0 if it's undefined [PR112299]
Pushed r14-5030. The subject and ChangeLog are updated to include the PR number. The code change is same as v1. On Mon, 2023-10-30 at 20:44 +0800, chenglulu wrote: > > 在 2023/10/30 下午8:26, Xi Ruoyao 写道: > > On Mon, 2023-10-30 at 19:50 +0800, chenglulu wrote: > > > 在 2023/10/30 下午7:42, Xi Ruoyao 写道: > > > > Now loongarch.md uses HAVE_AS_TLS, we need this to fix the failure > > > > building a cross compiler if the cross assembler is not installed yet. > > > > > > > > gcc/ChangeLog: > > > > > > > > * config/loongarch/loongarch-opts.h (HAVE_AS_TLS): Define to 0 > > > > if not defined yet. > > > > --- > > > > > > > > Ok for trunk? > > > I have no problem with this submission, but I don't understand the > > > circumstances surrounding the error. > > When the developers hack GCC they sometimes build a cross compiler with > > no cross assembler, then HAVE_AS_TLS will just be undefined. And in the > > future we may have an assmebler w/o TLS support (for example a tiny > > assembler for bare-metal target), then HAVE_AS_TLS will be undefined > > too. > > Ok! > > Thanks! > > > > > The error message is: > > > > g++ -c -g -O2 -DIN_GCC -DCROSS_DIRECTORY_STRUCTURE -fno-exceptions > > -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing > > -Wwrite-strings -Wcast-qual -Wmissing-format-attribute > > -Wconditionally-supported -Woverloaded-virtual -pedantic -Wno-long-long > > -Wno-variadic-macros -Wno-overlength-strings -fno-common -DHAVE_CONFIG_H > > -DGENERATOR_FILE -I. -Ibuild -I../../gcc/gcc -I../../gcc/gcc/build > > -I../../gcc/gcc/../include -I../../gcc/gcc/../libcpp/include \ > > -o build/gencondmd.o build/gencondmd.cc > > ../../gcc/gcc/config/loongarch/loongarch.md:3655:2: error: 'HAVE_AS_TLS' > > was not declared in this scope > > 3655 | "HAVE_AS_TLS" > > | ^~~ > > ../../gcc/gcc/config/loongarch/loongarch.md:3655:2: error: 'HAVE_AS_TLS' > > was not declared in this scope > > 3655 | "HAVE_AS_TLS" > > | ^~~ > > ../../gcc/gcc/config/loongarch/loongarch.md:3655:2: error: 'HAVE_AS_TLS' > > was not declared in this scope > > 3655 | "HAVE_AS_TLS" > > | ^~~ > > ../../gcc/gcc/config/loongarch/loongarch.md:3655:2: error: 'HAVE_AS_TLS' > > was not declared in this scope > > 3655 | "HAVE_AS_TLS" > > | ^~~ > > make[1]: *** [Makefile:2962: build/gencondmd.o] Error 1 > > > -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
[PATCH] LoongArch: Disable relaxation if the assembler don't support conditional branch relaxation [PR112330]
As the commit message of r14-4674 has indicated, if the assembler does not support conditional branch relaxation, a relocation overflow may happen on conditional branches when relaxation is enabled because the number of NOP instructions inserted by the assembler will be more than the number estimated by GCC. To work around this issue, disable relaxation by default if the assembler is detected incapable to perform conditional branch relaxation at GCC build time. We also need to pass -mno-relax to the assembler to really disable relaxation. But, if the assembler does not support -mrelax option at all, we should not pass -mno-relax to the assembler or it will immediately error out. Also handle this with the build time assembler capability probing, and add a pair of options -m[no-]pass-mrelax-to-as to allow using a different assembler from the build-time one. With this change, if GCC is built with GAS 2.41, relaxation will be disabled by default. So the default value of -mexplicit-relocs= is also changed to 'always' if -mno-relax is specified or implied by the build-time default, because using assembler macros for symbol addresses produces no benefit when relaxation is disabled. gcc/ChangeLog: PR target/112330 * config/loongarch/genopts/loongarch.opt.in: Add -m[no]-pass-relax-to-as. Change the default of -m[no]-relax to account conditional branch relaxation support status. * config/loongarch/loongarch.opt: Regenerate. * configure.ac (gcc_cv_as_loongarch_cond_branch_relax): Check if the assembler supports conditional branch relaxation. * configure: Regenerate. * config.in: Regenerate. * config/loongarch/loongarch-opts.h (HAVE_AS_COND_BRANCH_RELAXATION): Define to 0 if not defined. * config/loongarch/loongarch-driver.h (ASM_MRELAX_DEFAULT): Define. (ASM_MRELAX_SPEC): Define. (ASM_SPEC): Use ASM_MRELAX_SPEC instead of "%{mno-relax}". * config/loongarch/loongarch.cc: Take the setting of -m[no-]relax into account when determining the default of -mexplicit-relocs=. * doc/invoke.texi: Document -m[no-]relax and -m[no-]pass-mrelax-to-as for LoongArch. Update the default value of -mexplicit-relocs=. --- Bootstrapped and regtested on loongarch64-linux-gnu twice: once with Binutils 2.41, another with Binutils 2.41.50.20231105. With Binutils 2.41.50.20231105 there is a regression: the compilation of c-c++-common/asan/pr59063-2.c timeouts. My diagnostic has shown that the timeout was caused by the linker (it seemed running indefinitely), so it's more likely a Binutils regression rather than GCC regression and I'll leave this for Qinggang. Ok for trunk? gcc/config.in | 6 +++ gcc/config/loongarch/genopts/loongarch.opt.in | 6 ++- gcc/config/loongarch/loongarch-driver.h | 16 +++- gcc/config/loongarch/loongarch-opts.h | 4 ++ gcc/config/loongarch/loongarch.cc | 2 +- gcc/config/loongarch/loongarch.opt| 6 ++- gcc/configure | 39 ++- gcc/configure.ac | 10 + gcc/doc/invoke.texi | 36 + 9 files changed, 111 insertions(+), 14 deletions(-) diff --git a/gcc/config.in b/gcc/config.in index 03faee1c6ac..7728e53ca1f 100644 --- a/gcc/config.in +++ b/gcc/config.in @@ -386,6 +386,12 @@ #endif +/* Define if your assembler supports conditional branch relaxation. */ +#ifndef USED_FOR_TARGET +#undef HAVE_AS_COND_BRANCH_RELAXATION +#endif + + /* Define if your assembler supports the --debug-prefix-map option. */ #ifndef USED_FOR_TARGET #undef HAVE_AS_DEBUG_PREFIX_MAP diff --git a/gcc/config/loongarch/genopts/loongarch.opt.in b/gcc/config/loongarch/genopts/loongarch.opt.in index e1fe0c7086e..158701d327a 100644 --- a/gcc/config/loongarch/genopts/loongarch.opt.in +++ b/gcc/config/loongarch/genopts/loongarch.opt.in @@ -223,10 +223,14 @@ Target Var(TARGET_DIRECT_EXTERN_ACCESS) Init(0) Avoid using the GOT to access external symbols. mrelax -Target Var(loongarch_mrelax) Init(HAVE_AS_MRELAX_OPTION) +Target Var(loongarch_mrelax) Init(HAVE_AS_MRELAX_OPTION && HAVE_AS_COND_BRANCH_RELAXATION) Take advantage of linker relaxations to reduce the number of instructions required to materialize symbol addresses. +mpass-mrelax-to-as +Target Var(loongarch_pass_mrelax_to_as) Init(HAVE_AS_MRELAX_OPTION) +Pass -mrelax or -mno-relax option to the assembler. + -param=loongarch-vect-unroll-limit= Target Joined UInteger Var(loongarch_vect_unroll_limit) Init(6) IntegerRange(1, 64) Param Used to limit unroll factor which indicates how much the autovectorizer may diff --git a/gcc/config/loongarch/loongarch-driver.h b/gcc/config/loongarch/loongarch-driver.h index d859afcc9fe..20d233cc938 100644 --- a/gcc/config/loongarch/loongarch-driver.h +++ b/gcc/config/loongarch/loo
[PATCH] LoongArch: Optimize single-used address with -mexplicit-relocs=auto for fld/fst
fld and fst have same address mode as ld.w and st.w, so the same optimization as r14-4851 should be applied for them too. gcc/ChangeLog: * config/loongarch/loongarch.md (LD_AT_LEAST_32_BIT): New mode iterator. (ST_ANY): New mode iterator. (define_peephole2): Use LD_AT_LEAST_32_BIT instead of GPR and ST_ANY instead of QHWD for applicable patterns. --- Bootstrapped and regtested on loongarch64-linux-gnu. Ok for trunk? gcc/config/loongarch/loongarch.md | 46 +-- 1 file changed, 32 insertions(+), 14 deletions(-) diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md index 4dd716e1941..9c247242215 100644 --- a/gcc/config/loongarch/loongarch.md +++ b/gcc/config/loongarch/loongarch.md @@ -400,6 +400,22 @@ (define_mode_iterator SPLITF (DI "!TARGET_64BIT && TARGET_DOUBLE_FLOAT") (TF "TARGET_64BIT && TARGET_DOUBLE_FLOAT")]) +;; A mode for anything with 32 bits or more, and able to be loaded with +;; the same addressing mode as ld.w. +(define_mode_iterator LD_AT_LEAST_32_BIT + [SI + (DI "TARGET_64BIT") + (SF "TARGET_SINGLE_FLOAT || TARGET_DOUBLE_FLOAT") + (DF "TARGET_DOUBLE_FLOAT")]) + +;; A mode for anything able to be stored with the same addressing mode as +;; st.w. +(define_mode_iterator ST_ANY + [QI HI SI + (DI "TARGET_64BIT") + (SF "TARGET_SINGLE_FLOAT || TARGET_DOUBLE_FLOAT") + (DF "TARGET_DOUBLE_FLOAT")]) + ;; In GPR templates, a string like "mul." will expand to "mul.w" in the ;; 32-bit version and "mul.d" in the 64-bit version. (define_mode_attr d [(SI "w") (DI "d")]) @@ -3785,13 +3801,14 @@ (define_insn "loongarch_crcc_w__w" (define_peephole2 [(set (match_operand:P 0 "register_operand") (match_operand:P 1 "symbolic_pcrel_operand")) - (set (match_operand:GPR 2 "register_operand") - (mem:GPR (match_dup 0)))] + (set (match_operand:LD_AT_LEAST_32_BIT 2 "register_operand") + (mem:LD_AT_LEAST_32_BIT (match_dup 0)))] "la_opt_explicit_relocs == EXPLICIT_RELOCS_AUTO \ && (TARGET_CMODEL_NORMAL || TARGET_CMODEL_MEDIUM) \ && (peep2_reg_dead_p (2, operands[0]) \ || REGNO (operands[0]) == REGNO (operands[2]))" - [(set (match_dup 2) (mem:GPR (lo_sum:P (match_dup 0) (match_dup 1] + [(set (match_dup 2) + (mem:LD_AT_LEAST_32_BIT (lo_sum:P (match_dup 0) (match_dup 1] { emit_insn (gen_pcalau12i_gr (operands[0], operands[1])); }) @@ -3799,14 +3816,15 @@ (define_peephole2 (define_peephole2 [(set (match_operand:P 0 "register_operand") (match_operand:P 1 "symbolic_pcrel_operand")) - (set (match_operand:GPR 2 "register_operand") - (mem:GPR (plus (match_dup 0) - (match_operand 3 "const_int_operand"] + (set (match_operand:LD_AT_LEAST_32_BIT 2 "register_operand") + (mem:LD_AT_LEAST_32_BIT (plus (match_dup 0) + (match_operand 3 "const_int_operand"] "la_opt_explicit_relocs == EXPLICIT_RELOCS_AUTO \ && (TARGET_CMODEL_NORMAL || TARGET_CMODEL_MEDIUM) \ && (peep2_reg_dead_p (2, operands[0]) \ || REGNO (operands[0]) == REGNO (operands[2]))" - [(set (match_dup 2) (mem:GPR (lo_sum:P (match_dup 0) (match_dup 1] + [(set (match_dup 2) + (mem:LD_AT_LEAST_32_BIT (lo_sum:P (match_dup 0) (match_dup 1] { operands[1] = plus_constant (Pmode, operands[1], INTVAL (operands[3])); emit_insn (gen_pcalau12i_gr (operands[0], operands[1])); @@ -3850,13 +3868,13 @@ (define_peephole2 (define_peephole2 [(set (match_operand:P 0 "register_operand") (match_operand:P 1 "symbolic_pcrel_operand")) - (set (mem:QHWD (match_dup 0)) - (match_operand:QHWD 2 "register_operand"))] + (set (mem:ST_ANY (match_dup 0)) + (match_operand:ST_ANY 2 "register_operand"))] "la_opt_explicit_relocs == EXPLICIT_RELOCS_AUTO \ && (TARGET_CMODEL_NORMAL || TARGET_CMODEL_MEDIUM) \ && (peep2_reg_dead_p (2, operands[0])) \ && REGNO (operands[0]) != REGNO (operands[2])" - [(set (mem:QHWD (lo_sum:P (match_dup 0) (match_dup 1))) (match_dup 2))] + [(set (mem:ST_ANY (lo_sum:P (match_dup 0) (match_dup 1))) (match_dup 2))] { emit_insn (gen_pcalau12i_gr (operands[0], operands[1])); }) @@ -3864,14 +3882,14 @@ (define_peephole2 (define_peephole2 [(set (match_operand:P 0 "register_operand") (match_operand:P 1 "symbolic_pcrel_operand")) - (set (mem:QHWD (plus (match_dup 0) - (match_operand 3 "const_int_operand"))) - (match_operand:QHWD 2 "register_operand"))] + (set (mem:ST_ANY (plus (match_dup 0) + (match_operand 3 "const_int_operand"))) + (match_operand:ST_ANY 2 "register_operand"))] "la_opt_explicit_relocs == EXPLICIT_RELOCS_AUTO \ && (TARGET_CMODEL_NORMAL || TARGET_CMODEL_MEDIUM) \ && (peep2_reg_dead_p (2, operands[0])) \ && REGNO (operands[0]) != REGNO (operands[2])" - [(set (mem:QHWD (lo_sum:P (match_dup 0) (match_dup 1))) (ma
[PATCH] LoongArch: Remove redundant barrier instructions before LL-SC loops
This is isomorphic to the LLVM changes [1-2]. On LoongArch, the LL and SC instructions has memory barrier semantics: - LL: + - SC: + But the compare and swap operation is allowed to fail, and if it fails the SC instruction is not executed, thus the guarantee of acquiring semantics cannot be ensured. Therefore, an acquire barrier needs to be generated when failure_memorder includes an acquire operation. On CPUs implementing LoongArch v1.10 or later, "dbar 0b10100" is an acquire barrier; on CPUs implementing LoongArch v1.00, it is a full barrier. So it's always enough for acquire semantics. OTOH if an acquire semantic is not needed, we still needs the "dbar 0x700" as the load-load barrier like all LL-SC loops. [1]:https://github.com/llvm/llvm-project/pull/67391 [2]:https://github.com/llvm/llvm-project/pull/69339 gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_memmodel_needs_release_fence): Remove. (loongarch_cas_failure_memorder_needs_acquire): New static function. (loongarch_print_operand): Redefine 'G' for the barrier on CAS failure. * config/loongarch/sync.md (atomic_cas_value_strong): Remove the redundant barrier before the LL instruction, and emit an acquire barrier on failure if needed by failure_memorder. (atomic_cas_value_cmp_and_7_): Likewise. (atomic_cas_value_add_7_): Remove the unnecessary barrier before the LL instruction. (atomic_cas_value_sub_7_): Likewise. (atomic_cas_value_and_7_): Likewise. (atomic_cas_value_xor_7_): Likewise. (atomic_cas_value_or_7_): Likewise. (atomic_cas_value_nand_7_): Likewise. (atomic_cas_value_exchange_7_): Likewise. gcc/testsuite/ChangeLog: * gcc.target/loongarch/cas-acquire.c: New test. --- Bootstrapped and regtested on loongarch64-linux-gnu. Ok for trunk and/or GCC 12/13 (for fixing the acquire semantics in failure_memorder)? gcc/config/loongarch/loongarch.cc | 27 +++--- gcc/config/loongarch/sync.md | 49 +-- .../gcc.target/loongarch/cas-acquire.c| 84 +++ 3 files changed, 118 insertions(+), 42 deletions(-) create mode 100644 gcc/testsuite/gcc.target/loongarch/cas-acquire.c diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index 9b63f0dc322..d9b7a1076a2 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -5833,25 +5833,22 @@ loongarch_memmodel_needs_rel_acq_fence (enum memmodel model) } } -/* Return true if a FENCE should be emitted to before a memory access to - implement the release portion of memory model MODEL. */ +/* Return true if a FENCE should be emitted after a failed CAS to + implement the acquire semantic of failure_memorder. */ static bool -loongarch_memmodel_needs_release_fence (enum memmodel model) +loongarch_cas_failure_memorder_needs_acquire (enum memmodel model) { - switch (model) + switch (memmodel_base (model)) { +case MEMMODEL_ACQUIRE: case MEMMODEL_ACQ_REL: +case MEMMODEL_CONSUME: case MEMMODEL_SEQ_CST: -case MEMMODEL_SYNC_SEQ_CST: -case MEMMODEL_RELEASE: -case MEMMODEL_SYNC_RELEASE: return true; -case MEMMODEL_ACQUIRE: -case MEMMODEL_CONSUME: -case MEMMODEL_SYNC_ACQUIRE: case MEMMODEL_RELAXED: +case MEMMODEL_RELEASE: return false; default: @@ -5966,7 +5963,8 @@ loongarch_print_operand_reloc (FILE *file, rtx op, bool hi64_part, 'd' Print CONST_INT OP in decimal. 'E' Print CONST_INT OP element 0 of a replicated CONST_VECTOR in decimal. 'F' Print the FPU branch condition for comparison OP. - 'G' Print a DBAR insn if the memory model requires a release. + 'G' Print a DBAR insn for CAS failure (with an acquire semantic if + needed, otherwise a simple load-load barrier). 'H' Print address 52-61bit relocation associated with OP. 'h' Print the high-part relocation associated with OP. 'i' Print i if the operand is not a register. @@ -6057,8 +6055,11 @@ loongarch_print_operand (FILE *file, rtx op, int letter) break; case 'G': - if (loongarch_memmodel_needs_release_fence ((enum memmodel) INTVAL (op))) - fputs ("dbar\t0", file); + if (loongarch_cas_failure_memorder_needs_acquire ( + memmodel_from_int (INTVAL (op + fputs ("dbar\t0b10100", file); + else + fputs ("dbar\t0x700", file); break; case 'h': diff --git a/gcc/config/loongarch/sync.md b/gcc/config/loongarch/sync.md index 9924d522bcd..db3a21690b8 100644 --- a/gcc/config/loongarch/sync.md +++ b/gcc/config/loongarch/sync.md @@ -129,19 +129,18 @@ (define_insn "atomic_cas_value_strong" (clobber (match_scratch:GPR 6 "=&r"))] "" { - return "%G5\\n\\t" -"1:\\n\\t" + return "1:\\n\\t" "ll.\\t%0,%1\\n\\t" "bne\\t%0,%z2,2f\\n\\t" "or%i3\\t
Re: [PATCH] MIPS: Fix PR target/98491 (ChangeLog)
Well, it just dislike my mail server :(. Switch to the mail server of my university. On 2021-02-12 22:54 +0800, Xi Ruoyao wrote: > Resend the mail. I had to fill in a form to send mail to Robert. > > On 2021-02-12 22:17 +0800, Xi Ruoyao wrote: > > On 2021-01-11 01:01 +0800, Xi Ruoyao wrote: > > > Hi Jeff and Jakub, > > > > > > On 2021-01-04 14:19 -0700, Jeff Law wrote: > > > > On 1/4/21 2:00 PM, Jakub Jelinek wrote: > > > > > On Mon, Jan 04, 2021 at 01:51:59PM -0700, Jeff Law via Gcc-patches > > > > > wrote: > > > > > > > Sorry, I forgot to include the ChangeLog: > > > > > > > > > > > > > > gcc/ChangeLog: > > > > > > > > > > > > > > 2020-12-31 Xi Ruoyao > > > > > > > > > > > > > > PR target/98491 > > > > > > > * config/mips/mips.c (mips_symbol_insns): Do not use > > > > > > > MSA_SUPPORTED_MODE_P if mode is MAX_MACHINE_MODE. > > > > > > So I absolutely agree the current code is wrong as it does an out of > > > > > > bounds array access. > > > > > > > > > > > > > > > > > > Would it be better to instead to change MSA_SUPPORTED_MODE_P to > > > > > > evaluate > > > > > > to zero if MODE is MAX_MACHINE_MODE? That would protect all the > > > > > > uses > > > > > > of > > > > > > MSA_SUPPORTED_MODE_P. Something like this perhaps? > > > > > But MAX_MACHINE_MODE is the one past last valid mode, I'm not aware of > > > > > any target that would protect all macros that deal with modes that > > > > > way. > > > > > > > > > > So, perhaps best would be stop using the MAX_MACHINE_MODE as magic > > > > > value > > > > > for that function and instead use say VOIDmode that shouldn't normally > > > > > appear either? > > > > I think we have to allow VOIDmode because constants don't necessarily > > > > have modes. And I certainly agree that using MAX_MACHINE_MODE like > > > > this is ugly and error prone (as we can see from the BZ). > > > > > > > > I also couldn't convince myself that the code and comments were actually > > > > consistent, particularly for MSA targets which the comment claims can > > > > never handle constants for ld/st (and thus should be returning 0 for > > > > MAX_MACHINE_MODE). Though maybe mips_symbol_insns_1 ultimately handles > > > > that correctly. > > > > > > > > > > > > > > > > > > But I don't really see anything wrong on the mips_symbol_insns above > > > > > change either. > > > > Me neither. I'm just questioning if bullet-proofing in the > > > > MSA_SUPPORTED_MODE_P would be a better option. While I've worked in the > > > > MIPS port in the past, I don't really have any significannt experience > > > > with the MSA support. > > > > > > I can't understand the comment either. To me it looks like it's possible > > > to > > > remove this "if (MSA_SUPPORTED_P (mode)) return 0;" > > > > > > CC Robert to get some help. > > > > Happy new lunar year folks. > > > > I found a newer email address of Robert. Hope it is still being used. > > > > Could someone update MAINTAINERS file by the way? >
[PATCH] Fix symver attribute with LTO
Hi, with Jan's patch commited in r278878 we can use symver attribute for functions and variables. The symver attribute is designed for replacing toplevel asm statements containing ".symver" which may be removed by LTO. Unfortunately, a quick test shown GCC still generates buggy so file with LTO and new symver attribute. Two issues: 1. The symver node in symtab is marked as PREVAILING_DEF_IRONLY (no EXP) and then removed by LTO. 2. The actual function body implementing the symver-ed function is also marked as PREVAILING_DEF_IRONLY and then removed or marked as local. So no ".globl" directive is outputed for it. Both issue cause symbols with symver missing in DSO (with LTO enabled). I modified fuse-3.9.0 code to use new symver attribute and tried to build it with GCC trunk and LTO. The result is a buggy DSO. With this patch applied, fuse-3.9.0 can be built with LTO enabled and no problem. I'll test symver patch and this patch with more packages. Bootstrapped/regtested x86_64-linux. I'm not a maintainer. gcc/ChangeLog: 2019-12-17 Xi Ruoyao * cgraph.h (symtab_node::used_from_object_file_p): Symver nodes are part of DSO ABI so always used by non-LTO object files. * ipa-visibility.c (cgraph_externally_visible_p): Functions with symver attributes should always be visible. Index: gcc/cgraph.h === --- gcc/cgraph.h(revision 279452) +++ gcc/cgraph.h(working copy) @@ -2682,7 +2682,7 @@ symtab_node::used_from_object_file_p (vo { if (!TREE_PUBLIC (decl) || DECL_EXTERNAL (decl)) return false; - if (resolution_used_from_other_file_p (resolution)) + if (symver || resolution_used_from_other_file_p (resolution)) return true; return false; } Index: gcc/ipa-visibility.c === --- gcc/ipa-visibility.c(revision 279452) +++ gcc/ipa-visibility.c(working copy) @@ -216,6 +216,8 @@ cgraph_externally_visible_p (struct cgra return true; if (lookup_attribute ("noipa", DECL_ATTRIBUTES (node->decl))) return true; + if (lookup_attribute ("symver", DECL_ATTRIBUTES (node->decl))) +return true; if (TARGET_DLLIMPORT_DECL_ATTRIBUTES && lookup_attribute ("dllexport", DECL_ATTRIBUTES (node->decl))) -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH] Fix symver attribute with LTO
On 2019-12-17 09:32 +0100, Jan Hubicka wrote: > > Hi, > > with Jan's patch commited in r278878 we can use symver attribute for > > functions > > and variables. The symver attribute is designed for replacing toplevel asm > > statements containing ".symver" which may be removed by LTO. Unfortunately, > > a quick test shown GCC still generates buggy so file with LTO and new symver > > attribute. > > Thanks for looking into this. It was on my TODo list to actually > convert some packages, so it is great you did that. > > Two issues: > > > > 1. The symver node in symtab is marked as PREVAILING_DEF_IRONLY (no EXP) and > >then removed by LTO. > > This is however wrong - linker should not mark it as > PREVAILING_DEF_IRONLY if it is used externally. What linker do you use? > On my testcases this was working with > GNU ld (GNU Binutils) 2.31.51.20181222 > I could easily imagine that some linkers get it wrong which should be > reported to bintuils bugzilla but it is also easy to work around as done > in your patch. Hi Jan, I'm using GNU ld 2.33.1. I'll attach a testcase simplified from fuse-3.9 code. "local: *;" in the versioning script triggers the issue. Without it there would be no problem. > > 2. The actual function body implementing the symver-ed function is also > > marked > >as PREVAILING_DEF_IRONLY and then removed or marked as local. So no > > ".globl" > >directive is outputed for it. > > Here is the symver-ed function exported from the DSO (or is it set > to have hidden attribute)? > Again this was working for me, so it would be good to understand this > issue. It's also triggered by "local: *;". Untar the attachment and use "make" to build it, then "make show-dynamic-syms" to dump the dynamic symbol table. I believe (with 99% chance) you'll see only foo (VERS_1) and foo_v1 (because foo_v1 is marked as global in the version script). And foo (VERS_2) would be missing. With this patch foo (VERS_2) would show up. We can't mark "foo_v2" to be "global" because it should not be a part of DSO ABI. The other 1% chance would be a regression in Binutils. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University pr48200.tar.gz Description: application/compressed-tar
Re: [PATCH] Fix symver attribute with LTO
On 2019-12-17 18:47 +0100, Jan Hubicka wrote: > > Would it be equivalent to: > > 1) output foo_v2 local > > 2) producing static alias with local name (.L1) > > 3) do .symver .L1,foo@@@VERS_2 > > That is somewhat more systematic and would not lead to false > > visibilities. > > I spent some time playing with this. An in order to > 1) be able to handle foo_v2 according to the resolution info >(so it behaves like a regular symbol and can be called dirrectly, > localized and optimized) > 2) get intended objdump -T relocations > 3) do not polute global symbol tables > > I ended up with the following codegen: > > .type foo_v2, @function > foo_v2: > .LFB1: > .cfi_startproc > movl$2, %eax > ret > .cfi_endproc > .LFE1: > .size foo_v2, .-foo_v2 > .globl .LSYMVER0 > .set.LSYMVER0,foo_v2 > .symver .LSYMVER0, foo@@@VERS_2 > > This uses @@@ symver version of gas which seems to have odd semantics of > requiring to be passed global symbol name which it then tkes away and > produces foo@@VERS_2. > > So the nm outoutp of the ltrans unit is: > T foo_v1 > 0010 t foo_v2 > T foo@VERS_1 > 0010 T foo@@VERS_2 > > So the difference to your patch is that foo_v2 is static which enables > normal optimizations. > > Since additional symbol alias is produced this would also make it > possible to attach multiple symver attributes with @@ string. > > Does somehting like this make sense to you? Modulo the obvious buffer > overflow issue? > Honza Unfortunately, I got an ICE with my testcase with the patch applied to trunk. lto1: internal compiler error: tree check: expected tree that contains ‘decl minimal’ structure, have ‘identifier_node’ in do_assemble_symver, at varasm.c:5986 0x6fa648 tree_contains_struct_check_failed(tree_node const*, tree_node_structure_enum, char const*, int, char const*) ../../gcc/gcc/tree.c:9859 0x71466e contains_struct_check(tree_node*, tree_node_structure_enum, char const*, int, char const*) ../../gcc/gcc/tree.h:3387 0x71466e do_assemble_symver(tree_node*, tree_node*) ../../gcc/gcc/varasm.c:5986 0x89e409 cgraph_node::assemble_thunks_and_aliases() ../../gcc/gcc/cgraphunit.c:2225 0x89e698 cgraph_node::expand() ../../gcc/gcc/cgraphunit.c:2351 0x89f62f expand_all_functions ../../gcc/gcc/cgraphunit.c:2456 0x89f62f symbol_table::compile() ../../gcc/gcc/cgraphunit.c:2806 0x7fb589 lto_main() ../../gcc/gcc/lto/lto.c:658 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See <https://gcc.gnu.org/bugs/> for instructions. lto-wrapper: fatal error: /home/xry111/gcc-test/bin/gcc returned 1 exit status compilation terminated. /usr/bin/ld: error: lto-wrapper failed collect2: error: ld returned 1 exit status make: *** [Makefile:4: obj/test.so] Error 1 The change to lto/lto-common.c makes sense. I tried it instead of my change to cgraph.h and everything is OK. I'll investigate the change to varasm.c a little. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH] Fix symver attribute with LTO
IDENTIFIER_POINTER > + (DECL_ASSEMBLER_NAME (tmpdecl)), > +buf); > +} > #else >error ("symver is only supported on ELF platforms"); > #endif > Index: lto/lto-common.c > === > --- lto/lto-common.c (revision 279467) > +++ lto/lto-common.c (working copy) > @@ -2818,6 +2818,10 @@ read_cgraph_and_symbols (unsigned nfiles > IDENTIFIER_POINTER >(DECL_ASSEMBLER_NAME (snode->decl))); > } > + /* Symbol versions are always used externally, but linker does not > +report that correctly. */ > + else if (snode->symver && *res == LDPR_PREVAILING_DEF_IRONLY) > + snode->resolution = LDPR_PREVAILING_DEF_IRONLY_EXP; This is absolutely correct. > else > snode->resolution = *res; >} I still believe we should consider symver targets to be externally visible in cgraph_externally_visible_p. There is a comment saying "if linker counts on us, we must preserve the function". That's true in our case. And, I think .globl .LSYMVER0 .set.LSYMVER0, foo_v2 .symver .LSYMVER0, foo@@VERS_2 is exactly same as .globl foo_v2 .symver foo_v2, foo@@VERS_2 except there is an unnecessary ".LSYMVER0". Adding ".globl foo_v2" or ".globl foo_v1" won't cause them to be "global" in the final DSO because the linker will hide them according to the version script. So if it's safe we can force a ".globl foo_v2" before ".symver foo_v2, foo@@VERS_2". But I can't prove it's safe so I think it's better to consider this case in cgraph_externally_visible_p. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH] Fix symver attribute with LTO
in our case. > > > > And, I think > > > > .globl .LSYMVER0 > > .set.LSYMVER0, foo_v2 > > .symver .LSYMVER0, foo@@VERS_2 > I produce > .symver .LSYMVER0, foo@@@VERS_2 > > > is exactly same as > > > > .globl foo_v2 > > .symver foo_v2, foo@@VERS_2 > > > > except there is an unnecessary ".LSYMVER0". Adding ".globl foo_v2" or > > ".globl > > foo_v1" won't cause them to be "global" in the final DSO because the linker > > will > > hide them according to the version script. > > The difference is that in first case compiler can fully control foo_v2 > symbol (with LTO it will turn it into static symbol, it will inline > calls to it and do other things), while in the second case we need to > treat foo_v2 specially. > > So if it's safe we can force a ".globl foo_v2" before ".symver foo_v2, > > foo@@VERS_2". But I can't prove it's safe so I think it's better to > > consider > > this case in cgraph_externally_visible_p. > > It sort of makes things work, but for example it will prevent gcc from > inlining calls to foo_v2. I think we will still need to do something > about -fvisibility=hidden. > > It is sad that we do not have way to produce symbol version without a > corresponding symbol of global visiblity. If we had we could support > multiple symver aliases from one symbol and avoid the need to explicitly > hide the unnecesary symbols in the map files... Explicitly hiding the unnecessary symbols in map files is now how we handle this [with __asm__(".symver foo_v2, foo@@VERS_2")]. We can continue to do in this way and leave it as an enhancement. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH] Fix symver attribute with LTO
On 2019-12-18 14:19 +0100, Jan Hubicka wrote: > The problem here is that we lie to the compiler (by pretending that > foo_v2 is exported from DSO while it is not) and force user to do the > same. > > We support two ways to hide symbol - either at compile time via > attribute((visibility("hidden"))) or at link-time via map file. The > first produces better code because compiler can do more optimizations > knowing that the symbol can not be interposed. I just get your point: if the library calls foo_v2 it won't be interposed. If it supposes a call to be interposed it should call foo() [foo@@VER_2] instead of foo_v2(). But it seems there is no way we can do this [even with traditional __asm__("symver foo, foo@@VER_2")]. For this purpose we should either: 1. Change GAS (introducing some new syntax like '' or '.symver_export') or 2. Add some mangled symbol name in GCC (like ".LSYMVERx" or "foo_v2.symver_export"). -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH] Fix symver attribute with LTO
On 2019-12-19 11:06 +0100, Jan Hubicka wrote: > This is variant of your patch I comitted. It also adds verification so > we get ICE rather then wrong code. In addition I moved the checks away > rom used_from_object_file. This function is about non-LTO objects > linked into the DSO and thus does not really fit for the check. > Lastly we can not rely on symver attribute to still be present here. > > Regtested x86_64-linux and comitted. > Honza > * cgraph.c (cgraph_node_cannot_be_local_p_1): Prevent targets of > symver attributes to be localized. > * ipa-visibility.c (cgraph_externally_visible_p, > varpool_node::externally_visible_p): Likewise. > * symtab.c (symtab_node::verify_base): Check visibility of symbol > versions. > > * lto-common.c (read_cgraph_and_symbols): Work around binutils > PR25424 /* snip */ > Index: ipa-visibility.c > === > --- ipa-visibility.c (revision 279523) > +++ ipa-visibility.c (working copy) > @@ -220,6 +220,14 @@ cgraph_externally_visible_p (struct cgra >&& lookup_attribute ("dllexport", > DECL_ATTRIBUTES (node->decl))) > return true; > + > + /* Limitation of gas requires us to output targets of symver aliases as > + global symbols. This is binutils PR 25295. */ > + ipa_ref *ref; > + FOR_EACH_ALIAS (node, ref) > +if (ref->referring->symver) > + return true; > + >if (node->resolution == LDPR_PREVAILING_DEF_IRONLY) > return false; >/* When doing LTO or whole program, we can bring COMDAT functoins static. > @@ -284,14 +292,13 @@ varpool_node::externally_visible_p (void > DECL_ATTRIBUTES (decl))) > return true; > > - /* See if we have linker information about symbol not being used or > - if we need to make guess based on the declaration. > + /* Limitation of gas requires us to output targets of symver aliases as > + global symbols. This is binutils PR 25295. */ > + ipa_ref *ref; > + FOR_EACH_ALIAS (this, ref) > +if (ref->referring->symver) > + return true; > > - Even if the linker clams the symbol is unused, never bring internal > - symbols that are declared by user as used or externally visible. > - This is needed for i.e. references from asm statements. */ > - if (used_from_object_file_p ()) > -return true; Are these two lines removed intentionally? >if (resolution == LDPR_PREVAILING_DEF_IRONLY) > return false; > > Index: lto/lto-common.c > === -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH] Fix symver attribute with LTO
On 2019-12-19 19:12 +0800, Xi Ruoyao wrote: > On 2019-12-19 11:06 +0100, Jan Hubicka wrote: > > - /* See if we have linker information about symbol not being used or > > - if we need to make guess based on the declaration. > > + /* Limitation of gas requires us to output targets of symver aliases as > > + global symbols. This is binutils PR 25295. */ > > + ipa_ref *ref; > > + FOR_EACH_ALIAS (this, ref) > > +if (ref->referring->symver) > > + return true; > > > > - Even if the linker clams the symbol is unused, never bring internal > > - symbols that are declared by user as used or externally visible. > > - This is needed for i.e. references from asm statements. */ > > - if (used_from_object_file_p ()) > > -return true; > > Are these two lines removed intentionally? Oh I see, it was a duplicated branch. Sorry for noise. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH] LoongArch: Fix inconsistent description in *sge_
On Mon, 2024-03-04 at 11:03 +0800, Guo Jie wrote: > The constraint of op[1] is inconsistent with the output template. > > gcc/ChangeLog: > > * config/loongarch/loongarch.md > (define_insn "*sge_"): Fix inconsistency > error. > > --- > gcc/config/loongarch/loongarch.md | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/gcc/config/loongarch/loongarch.md > b/gcc/config/loongarch/loongarch.md > index f3b5c641fce..2d25374bdc9 100644 > --- a/gcc/config/loongarch/loongarch.md > +++ b/gcc/config/loongarch/loongarch.md > @@ -3357,10 +3357,10 @@ (define_insn "*sgt_" > > (define_insn "*sge_" > [(set (match_operand:GPR 0 "register_operand" "=r") > - (any_ge:GPR (match_operand:X 1 "register_operand" "r") > + (any_ge:GPR (match_operand:X 1 "arith_operand" "rI") > (const_int 1)))] No, arith_operand is just register_operand or const_imm12_operand, but comparing a const_imm12_operand with (const_int 1) should be folded into a constant (even at -O0, AFAIK). So allowing const_imm12_operand here makes no benefit. > "" > - "slti\t%0,%.,%1" > + "slt%i1\t%0,%.,%1" > [(set_attr "type" "slt") > (set_attr "mode" "")]) > -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH v2] LoongArch: Fix inconsistent description in *sge_
On Tue, 2024-03-05 at 16:05 +0800, Guo Jie wrote: > The constraint of op[1] is inconsistent with the output template. > > gcc/ChangeLog: > > * config/loongarch/loongarch.md > (define_insn "*sge_"): Fix inconsistency > error. > > --- > Update in v2: > Remove useless support for op[1] is const_imm12_operand. > > --- > gcc/config/loongarch/loongarch.md | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/gcc/config/loongarch/loongarch.md > b/gcc/config/loongarch/loongarch.md > index f3b5c641fce..e35a001e0ed 100644 > --- a/gcc/config/loongarch/loongarch.md > +++ b/gcc/config/loongarch/loongarch.md > @@ -3360,7 +3360,7 @@ (define_insn "*sge_" > (any_ge:GPR (match_operand:X 1 "register_operand" "r") > (const_int 1)))] > "" > - "slti\t%0,%.,%1" > + "slt\t%0,%.,%1" > [(set_attr "type" "slt") > (set_attr "mode" "")]) Hmm, this define_insn seems never really used or it would generate something like "sltui $r4,$r0,$r4" and trigger an assembler failure. The generic path seems already converting "x >= 1" to "x > 0". So it seems we should just remove this define_insn? -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
[PATCH v3] testsuite: Add a test case for negating FP vectors containing zeros
Recently I've fixed two wrong FP vector negate implementation which caused wrong sign bits in zeros in targets (r14-8786 and r14-8801). To prevent a similar issue from happening again, add a test case. Tested on x86_64 (with SSE2, AVX, AVX2, and AVX512F), AArch64, MIPS (with MSA), LoongArch (with LSX and LASX). gcc/testsuite: * gcc.dg/vect/vect-neg-zero.c: New test. --- - v1 -> v2: Remove { dg-do run } which may cause SIGILL. - v2 -> v3: Add -fno-associative-math to fix an excessive warning on arm. Ok for trunk? gcc/testsuite/gcc.dg/vect/vect-neg-zero.c | 38 +++ 1 file changed, 38 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/vect/vect-neg-zero.c diff --git a/gcc/testsuite/gcc.dg/vect/vect-neg-zero.c b/gcc/testsuite/gcc.dg/vect/vect-neg-zero.c new file mode 100644 index 000..21fa00cfa15 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-neg-zero.c @@ -0,0 +1,38 @@ +/* { dg-add-options ieee } */ +/* { dg-additional-options "-fno-associative-math -fsigned-zeros" } */ + +double x[4] = {-0.0, 0.0, -0.0, 0.0}; +float y[8] = {-0.0, 0.0, -0.0, 0.0, -0.0, -0.0, 0.0, 0.0}; + +static __attribute__ ((always_inline)) inline void +test (int factor) +{ + double a[4]; + float b[8]; + + asm ("" ::: "memory"); + + for (int i = 0; i < 2 * factor; i++) +a[i] = -x[i]; + + for (int i = 0; i < 4 * factor; i++) +b[i] = -y[i]; + +#pragma GCC novector + for (int i = 0; i < 2 * factor; i++) +if (__builtin_signbit (a[i]) == __builtin_signbit (x[i])) + __builtin_abort (); + +#pragma GCC novector + for (int i = 0; i < 4 * factor; i++) +if (__builtin_signbit (b[i]) == __builtin_signbit (y[i])) + __builtin_abort (); +} + +int +main (void) +{ + test (1); + test (2); + return 0; +} -- 2.44.0
[PATCH v2] LoongArch: Allow s9 as a register alias
The psABI allows using s9 as an alias of r22. gcc/ChangeLog: * config/loongarch/loongarch.h (ADDITIONAL_REGISTER_NAMES): Add s9 as an alias of r22. --- v1 -> v2: Add a test case. Ok for trunk? gcc/config/loongarch/loongarch.h | 1 + gcc/testsuite/gcc.target/loongarch/regname-fp-s9.c | 3 +++ 2 files changed, 4 insertions(+) create mode 100644 gcc/testsuite/gcc.target/loongarch/regname-fp-s9.c diff --git a/gcc/config/loongarch/loongarch.h b/gcc/config/loongarch/loongarch.h index 8b453ab3140..bf2351f0968 100644 --- a/gcc/config/loongarch/loongarch.h +++ b/gcc/config/loongarch/loongarch.h @@ -931,6 +931,7 @@ typedef struct { { "t8", 20 + GP_REG_FIRST },\ { "x", 21 + GP_REG_FIRST },\ { "fp", 22 + GP_REG_FIRST },\ + { "s9", 22 + GP_REG_FIRST },\ { "s0", 23 + GP_REG_FIRST },\ { "s1", 24 + GP_REG_FIRST },\ { "s2", 25 + GP_REG_FIRST },\ diff --git a/gcc/testsuite/gcc.target/loongarch/regname-fp-s9.c b/gcc/testsuite/gcc.target/loongarch/regname-fp-s9.c new file mode 100644 index 000..d2e3b80f83c --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/regname-fp-s9.c @@ -0,0 +1,3 @@ +/* { dg-do compile } */ +register long s9 asm("s9"); /* { dg-note "conflicts with 's9'" } */ +register long fp asm("fp"); /* { dg-warning "register of 'fp' used for multiple global register variables" } */ -- 2.44.0
[PATCH] LoongArch: testsuite: Rewrite {x, }vfcmp-{d, f}.c to avoid named registers
Loops on named vector register are not vectorized (see comment 11 of PR113622), so the these test cases have been failing for a while. Rewrite them using check-function-bodies to remove hard coding register names. A barrier is needed to always load the first operand before the second operand. gcc/testsuite/ChangeLog: * gcc.target/loongarch/vfcmp-f.c: Rewrite to avoid named registers. * gcc.target/loongarch/vfcmp-d.c: Likewise. * gcc.target/loongarch/xvfcmp-f.c: Likewise. * gcc.target/loongarch/xvfcmp-d.c: Likewise. --- Tested on loongarch64-linux-gnu. Ok for trunk? gcc/testsuite/gcc.target/loongarch/vfcmp-d.c | 202 -- gcc/testsuite/gcc.target/loongarch/vfcmp-f.c | 347 ++ gcc/testsuite/gcc.target/loongarch/xvfcmp-d.c | 202 -- gcc/testsuite/gcc.target/loongarch/xvfcmp-f.c | 204 -- 4 files changed, 816 insertions(+), 139 deletions(-) diff --git a/gcc/testsuite/gcc.target/loongarch/vfcmp-d.c b/gcc/testsuite/gcc.target/loongarch/vfcmp-d.c index 8b870ef38a0..87e4ed19e96 100644 --- a/gcc/testsuite/gcc.target/loongarch/vfcmp-d.c +++ b/gcc/testsuite/gcc.target/loongarch/vfcmp-d.c @@ -1,28 +1,188 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -mlsx -ffixed-f0 -ffixed-f1 -ffixed-f2 -fno-vect-cost-model" } */ +/* { dg-options "-O2 -mlsx -fno-vect-cost-model" } */ +/* { dg-final { check-function-bodies "**" "" } } */ #define F double #define I long long #include "vfcmp-f.c" -/* { dg-final { scan-assembler "compare_quiet_equal:.*\tvfcmp\\.ceq\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_quiet_equal\n" } } */ -/* { dg-final { scan-assembler "compare_quiet_not_equal:.*\tvfcmp\\.cune\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_quiet_not_equal\n" } } */ -/* { dg-final { scan-assembler "compare_signaling_greater:.*\tvfcmp\\.slt\\.d\t\\\$vr2,\\\$vr1,\\\$vr0.*-compare_signaling_greater\n" } } */ -/* { dg-final { scan-assembler "compare_signaling_greater_equal:.*\tvfcmp\\.sle\\.d\t\\\$vr2,\\\$vr1,\\\$vr0.*-compare_signaling_greater_equal\n" } } */ -/* { dg-final { scan-assembler "compare_signaling_less:.*\tvfcmp\\.slt\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_signaling_less\n" } } */ -/* { dg-final { scan-assembler "compare_signaling_less_equal:.*\tvfcmp\\.sle\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_signaling_less_equal\n" } } */ -/* { dg-final { scan-assembler "compare_signaling_not_greater:.*\tvfcmp\\.sule\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_signaling_not_greater\n" } } */ -/* { dg-final { scan-assembler "compare_signaling_less_unordered:.*\tvfcmp\\.sult\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_signaling_less_unordered\n" } } */ -/* { dg-final { scan-assembler "compare_signaling_not_less:.*\tvfcmp\\.sule\\.d\t\\\$vr2,\\\$vr1,\\\$vr0.*-compare_signaling_not_less\n" } } */ -/* { dg-final { scan-assembler "compare_signaling_greater_unordered:.*\tvfcmp\\.sult\\.d\t\\\$vr2,\\\$vr1,\\\$vr0.*-compare_signaling_greater_unordered\n" } } */ -/* { dg-final { scan-assembler "compare_quiet_less:.*\tvfcmp\\.clt\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_quiet_less\n" } } */ -/* { dg-final { scan-assembler "compare_quiet_less_equal:.*\tvfcmp\\.cle\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_quiet_less_equal\n" } } */ -/* { dg-final { scan-assembler "compare_quiet_greater:.*\tvfcmp\\.clt\\.d\t\\\$vr2,\\\$vr1,\\\$vr0.*-compare_quiet_greater\n" } } */ -/* { dg-final { scan-assembler "compare_quiet_greater_equal:.*\tvfcmp\\.cle\\.d\t\\\$vr2,\\\$vr1,\\\$vr0.*-compare_quiet_greater_equal\n" } } */ -/* { dg-final { scan-assembler "compare_quiet_not_less:.*\tvfcmp\\.cule\\.d\t\\\$vr2,\\\$vr1,\\\$vr0.*-compare_quiet_not_less\n" } } */ -/* { dg-final { scan-assembler "compare_quiet_greater_unordered:.*\tvfcmp\\.cult\\.d\t\\\$vr2,\\\$vr1,\\\$vr0.*-compare_quiet_greater_unordered\n" } } */ -/* { dg-final { scan-assembler "compare_quiet_not_greater:.*\tvfcmp\\.cule\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_quiet_not_greater\n" } } */ -/* { dg-final { scan-assembler "compare_quiet_less_unordered:.*\tvfcmp\\.cult\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_quiet_less_unordered\n" } } */ -/* { dg-final { scan-assembler "compare_quiet_unordered:.*\tvfcmp\\.cun\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_quiet_unordered\n" } } */ -/* { dg-final { scan-assembler "compare_quiet_ordered:.*\tvfcmp\\.cor\\.d\t\\\$vr2,\\\$vr0,\\\$vr1.*-compare_quiet_ordered\n" } } */ +/* +** compare_quiet_equal: +** vld (\$vr[0-9]+),\$r4,0 +** vld (\$vr[0-9]+),\$r5,0 +** vfcmp.ceq.d (\$vr[0-9]+),(\1,\2|\2,\1) +** vst \3,\$r6,0 +** jr \$r1 +*/ + +/* +** compare_quiet_not_equal: +** vld (\$vr[0-9]+),\$r4,0 +** vld (\$vr[0-9]+),\$r5,0 +** vfcmp.cune.d(\$vr[0-9]+),(\1,\2|\2,\1) +** vst \3,\$r6,0 +** jr \$r1 +*/ + +/* +** compare_signaling_greater: +** vld (\$vr[0-9]+),\$r4,0 +** vld (\$vr[0-9]+),\$r5,0 +** vfcmp.slt.d (\$vr[0-9]+),\2,\1 +** vst \3,\$r6,0 +**
Re: [PATCH] LoongArch: Emit R_LARCH_RELAX for TLS IE with non-extreme code model to allow the IE to LE linker relaxation
On Thu, 2024-03-07 at 10:43 +0800, mengqinggang wrote: > Hi, > > Whether to add an option to control the generation of R_LARCH_RELAX, > similar to as -mrelax/-mno-relax. There are already -mrelax and -mno-relax, they can be checked in the compiler code with TARGET_LINKER_RELAXATION. /* snip */ > > + case 'Q': > > + if (!TARGET_LINKER_RELAXATION) > > +break; So with -mno-relax we'll break early here, then no R_LARCH_RELAX will be printed. > > + if (code == HIGH) > > +op = XEXP (op, 0); > > + > > + if (loongarch_classify_symbolic_expression (op) == SYMBOL_TLS_IE) > > +fprintf (file, ".reloc\t.,R_LARCH_RELAX\n\t"); > > + > > + break; The tls-ie-norelax.c test case also checks for -mno-relax: > > +/* { dg-do compile } */ > > +/* { dg-options "-O2 -mcmodel=normal -mexplicit-relocs -mno-relax" } */ > > +/* { dg-final { scan-assembler-not "R_LARCH_RELAX" { target tls_native } } > > } */ i.e. -mno-relax is used compiling this test case, and the compiled assembly code should not contain R_LARCH_RELAX. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH v1] LoongArch: Fixed an issue with the implementation of the template atomic_compare_and_swapsi.
On Thu, 2024-03-07 at 09:12 +0800, Lulu Cheng wrote: > + output_asm_insn ("1:", operands); > + output_asm_insn ("ll.\t%0,%1", operands); > + > + /* Like the test case atomic-cas-int.C, in loongarch64, O1 and higher, the > + return value of the val_without_const_folding will not be truncated and > + will be passed directly to the function compare_exchange_strong. > + However, the instruction 'bne' does not distinguish between 32-bit and > + 64-bit operations. so if the upper 32 bits of the register are not > + extended by the 32nd bit symbol, then the comparison may not be valid > + here. This will affect the result of the operation. */ > + > + if (TARGET_64BIT && REG_P (operands[2]) > + && GET_MODE (operands[2]) == SImode) > + { > + output_asm_insn ("addi.w\t%5,%2,0", operands); > + output_asm_insn ("bne\t%0,%5,2f", operands); It should be better to extend the expected value before the ll/sc loop (like what LLVM does), instead of repeating the extending in each iteration. Something like: diff --git a/gcc/config/loongarch/sync.md b/gcc/config/loongarch/sync.md index 8f35a5b48d2..c21781947fd 100644 --- a/gcc/config/loongarch/sync.md +++ b/gcc/config/loongarch/sync.md @@ -234,11 +234,11 @@ (define_insn "atomic_exchange_short" "amswap%A3.\t%0,%z2,%1" [(set (attr "length") (const_int 4))]) -(define_insn "atomic_cas_value_strong" +(define_insn "atomic_cas_value_strong" [(set (match_operand:GPR 0 "register_operand" "=&r") (match_operand:GPR 1 "memory_operand" "+ZC")) (set (match_dup 1) - (unspec_volatile:GPR [(match_operand:GPR 2 "reg_or_0_operand" "rJ") + (unspec_volatile:GPR [(match_operand:X 2 "reg_or_0_operand" "rJ") (match_operand:GPR 3 "reg_or_0_operand" "rJ") (match_operand:SI 4 "const_int_operand")] ;; mod_s UNSPEC_COMPARE_AND_SWAP)) @@ -246,10 +246,10 @@ (define_insn "atomic_cas_value_strong" "" { return "1:\\n\\t" -"ll.\\t%0,%1\\n\\t" +"ll.\\t%0,%1\\n\\t" "bne\\t%0,%z2,2f\\n\\t" "or%i3\\t%5,$zero,%3\\n\\t" -"sc.\\t%5,%1\\n\\t" +"sc.\\t%5,%1\\n\\t" "beqz\\t%5,1b\\n\\t" "b\\t3f\\n\\t" "2:\\n\\t" @@ -301,9 +301,23 @@ (define_expand "atomic_compare_and_swap" operands[3], operands[4], operands[6])); else -emit_insn (gen_atomic_cas_value_strong (operands[1], operands[2], - operands[3], operands[4], - operands[6])); +{ + rtx (*cas)(rtx, rtx, rtx, rtx, rtx) = + TARGET_64BIT ? gen_atomic_cas_value_strongdi +: gen_atomic_cas_value_strongsi; + rtx expect = operands[3]; + + if (mode == SImode + && TARGET_64BIT + && operands[3] != const0_rtx) + { + expect = gen_reg_rtx (DImode); + emit_insn (gen_extendsidi2 (expect, operands[3])); + } + + emit_insn (cas (operands[1], operands[2], expect, operands[4], + operands[6])); +} rtx compare = operands[1]; if (operands[3] != const0_rtx) It produces: slli.w $r4,$r4,0 1: ll.w$r14,$r3,0 bne $r14,$r4,2f or $r15,$zero,$r12 sc.w$r15,$r3,0 beqz$r15,1b b 3f 2: dbar0b10100 3: for the test case and the compiled test case runs successfully. I've not done a full bootstrap yet though. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH v1] LoongArch: Fixed an issue with the implementation of the template atomic_compare_and_swapsi.
On Thu, 2024-03-07 at 21:07 +0800, chenglulu wrote: > > 在 2024/3/7 下午8:52, Xi Ruoyao 写道: > > It should be better to extend the expected value before the ll/sc loop > > (like what LLVM does), instead of repeating the extending in each > > iteration. Something like: > > I wanted to do this at first, but it didn't work out. > > But then I thought about it, and there are two benefits to putting it in > the middle of ll/sc: > > 1. If there is an operation that uses the $r4 register after this atomic > operation, another > > register is required to store $r4. > > 2. ll.w requires long cycles, so putting an addi.w command after ll.w > won't make a difference. > > So based on the above, I didn't try again, but directly made a > modification like a patch. Ah, the explanation makes sense to me. Ok with the original patch then. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH v4] LoongArch: Add support for TLS descriptors
On Tue, 2024-03-12 at 17:20 +0800, mengqinggang wrote: > +(define_insn "@got_load_tls_desc" > + [(set (match_operand:P 0 "register_operand" "=r") > + (unspec:P > + [(match_operand:P 1 "symbolic_operand" "")] > + UNSPEC_TLS_DESC)) > + (clobber (reg:SI FCC0_REGNUM)) > + (clobber (reg:SI FCC1_REGNUM)) > + (clobber (reg:SI FCC2_REGNUM)) > + (clobber (reg:SI FCC3_REGNUM)) > + (clobber (reg:SI FCC4_REGNUM)) > + (clobber (reg:SI FCC5_REGNUM)) > + (clobber (reg:SI FCC6_REGNUM)) > + (clobber (reg:SI FCC7_REGNUM)) > + (clobber (reg:SI RETURN_ADDR_REGNUM))] > + "TARGET_TLS_DESC" > +{ > + return TARGET_EXPLICIT_RELOCS > + ? "pcalau12i\t$r4,%%desc_pc_hi20(%1)\n\ > + \taddi.d\t$r4,$r4,%%desc_pc_lo12(%1)\n\ > + \tld.d\t$r1,$r4,%%desc_ld(%1)\n\ > + \tjirl\t$r1,$r1,%%desc_call(%1)" Use something like ? "pcalau12i\t$r4,%%desc_pc_hi20(%1)\n\t" "addi.d\t$r4,$r4,%%desc_pc_lo12(%1)\n\t" "ld.d\t$r1,$r4,%%desc_ld(%1)\n\t" "jirl\t$r1,$r1,%%desc_call(%1)" : "la.tls.desc\t%0,%1"; to prevent additional white spaces in the output asm before tabs. > + : "la.tls.desc\t%0,%1"; > +} > + [(set_attr "got" "load") > + (set_attr "mode" "") > + (set_attr "length" "16")]) > + > +(define_insn "got_load_tls_desc_off64" > + [(set (match_operand:DI 0 "register_operand" "=r") > + (unspec:DI > + [(match_operand:DI 1 "symbolic_operand" "")] > + UNSPEC_TLS_DESC_OFF64)) > + (clobber (reg:SI FCC0_REGNUM)) > + (clobber (reg:SI FCC1_REGNUM)) > + (clobber (reg:SI FCC2_REGNUM)) > + (clobber (reg:SI FCC3_REGNUM)) > + (clobber (reg:SI FCC4_REGNUM)) > + (clobber (reg:SI FCC5_REGNUM)) > + (clobber (reg:SI FCC6_REGNUM)) > + (clobber (reg:SI FCC7_REGNUM)) > + (clobber (reg:SI RETURN_ADDR_REGNUM)) > + (clobber (match_operand:DI 2 "register_operand" "=&r"))] > + "TARGET_TLS_DESC && TARGET_CMODEL_EXTREME" > +{ > + return TARGET_EXPLICIT_RELOCS > + ? "pcalau12i\t$r4,%%desc_pc_hi20(%1)\n\ > + \taddi.d\t%2,$r0,%%desc_pc_lo12(%1)\n\ > + \tlu32i.d\t%2,%%desc64_pc_lo20(%1)\n\ > + \tlu52i.d\t%2,%2,%%desc64_pc_hi12(%1)\n\ > + \tadd.d\t$r4,$r4,%2\n\ > + \tld.d\t$r1,$r4,%%desc_ld(%1)\n\ > + \tjirl\t$r1,$r1,%%desc_call(%1)" > + : "la.tls.desc\t%0,%2,%1"; Likewise. > +} > + [(set_attr "got" "load") > + (set_attr "length" "28")]) Otherwise OK. It's better to allow splitting these two instructions but we can do it in another patch. And IMO it's better to enable TLS desc by default if supported by both the assembler and the libc, but we'll have to defer it until Glibc 2.40 release. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH v4] LoongArch: Add support for TLS descriptors
On Wed, 2024-03-13 at 06:15 +0800, Xi Ruoyao wrote: > > +(define_insn "@got_load_tls_desc" > > + [(set (match_operand:P 0 "register_operand" "=r") Hmm, and it looks like we should use (reg:P 4) instead of match_operand here, because the instruction does not work for a different register: with TARGET_EXPLICIT_RELOCS we are hard coding r4, and without TARGET_EXPLICIT_RELOCS the TLS desc function still only puts the return value in r4. > > + (unspec:P > > + [(match_operand:P 1 "symbolic_operand" "")] > > + UNSPEC_TLS_DESC)) > > + (clobber (reg:SI FCC0_REGNUM)) > > + (clobber (reg:SI FCC1_REGNUM)) > > + (clobber (reg:SI FCC2_REGNUM)) > > + (clobber (reg:SI FCC3_REGNUM)) > > + (clobber (reg:SI FCC4_REGNUM)) > > + (clobber (reg:SI FCC5_REGNUM)) > > + (clobber (reg:SI FCC6_REGNUM)) > > + (clobber (reg:SI FCC7_REGNUM)) > > + (clobber (reg:SI RETURN_ADDR_REGNUM))] > > + "TARGET_TLS_DESC" > > +{ > > + return TARGET_EXPLICIT_RELOCS > > + ? "pcalau12i\t$r4,%%desc_pc_hi20(%1)\n\ > > + \taddi.d\t$r4,$r4,%%desc_pc_lo12(%1)\n\ > > + \tld.d\t$r1,$r4,%%desc_ld(%1)\n\ > > + \tjirl\t$r1,$r1,%%desc_call(%1)" > > Use something like > > ? "pcalau12i\t$r4,%%desc_pc_hi20(%1)\n\t" > "addi.d\t$r4,$r4,%%desc_pc_lo12(%1)\n\t" > "ld.d\t$r1,$r4,%%desc_ld(%1)\n\t" > "jirl\t$r1,$r1,%%desc_call(%1)" > : "la.tls.desc\t%0,%1"; > > to prevent additional white spaces in the output asm before tabs. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH v4] LoongArch: Add support for TLS descriptors
On Wed, 2024-03-13 at 06:56 +0800, Xi Ruoyao wrote: > On Wed, 2024-03-13 at 06:15 +0800, Xi Ruoyao wrote: > > > +(define_insn "@got_load_tls_desc" > > > + [(set (match_operand:P 0 "register_operand" "=r") > > Hmm, and it looks like we should use (reg:P 4) instead of match_operand > here, because the instruction does not work for a different register: > with TARGET_EXPLICIT_RELOCS we are hard coding r4, and without > TARGET_EXPLICIT_RELOCS the TLS desc function still only puts the return > value in r4. Suggested changes: diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index 303666bf6d5..8f4d3f36c26 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -2954,10 +2954,10 @@ loongarch_legitimize_tls_address (rtx loc) tp = gen_rtx_REG (Pmode, THREAD_POINTER_REGNUM); if (TARGET_CMODEL_EXTREME) - emit_insn (gen_got_load_tls_desc_off64 (a0, loc, + emit_insn (gen_got_load_tls_desc_off64 (loc, gen_reg_rtx (DImode))); else - emit_insn (gen_got_load_tls_desc (Pmode, a0, loc)); + emit_insn (gen_got_load_tls_desc (Pmode, loc)); emit_insn (gen_add3_insn (dest, a0, tp)); } diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md index 0a1a6a24f61..8e8f1012344 100644 --- a/gcc/config/loongarch/loongarch.md +++ b/gcc/config/loongarch/loongarch.md @@ -2772,9 +2772,9 @@ (define_insn "store_word" ;; Thread-Local Storage (define_insn "@got_load_tls_desc" - [(set (match_operand:P 0 "register_operand" "=r") + [(set (reg:P 4) (unspec:P - [(match_operand:P 1 "symbolic_operand" "")] + [(match_operand:P 0 "symbolic_operand" "")] UNSPEC_TLS_DESC)) (clobber (reg:SI FCC0_REGNUM)) (clobber (reg:SI FCC1_REGNUM)) @@ -2788,20 +2788,20 @@ (define_insn "@got_load_tls_desc" "TARGET_TLS_DESC" { return TARGET_EXPLICIT_RELOCS -? "pcalau12i\t$r4,%%desc_pc_hi20(%1)\n\ - \taddi.d\t$r4,$r4,%%desc_pc_lo12(%1)\n\ - \tld.d\t$r1,$r4,%%desc_ld(%1)\n\ - \tjirl\t$r1,$r1,%%desc_call(%1)" -: "la.tls.desc\t%0,%1"; +? "pcalau12i\t$r4,%%desc_pc_hi20(%0)\n\t" + "addi.d\t$r4,$r4,%%desc_pc_lo12(%0)\n\t" + "ld.d\t$r1,$r4,%%desc_ld(%0)\n\t" + "jirl\t$r1,$r1,%%desc_call(%0)" +: "la.tls.desc\t$r4,%0"; } [(set_attr "got" "load") (set_attr "mode" "") (set_attr "length" "16")]) (define_insn "got_load_tls_desc_off64" - [(set (match_operand:DI 0 "register_operand" "=r") + [(set (reg:DI 4) (unspec:DI - [(match_operand:DI 1 "symbolic_operand" "")] + [(match_operand:DI 0 "symbolic_operand" "")] UNSPEC_TLS_DESC_OFF64)) (clobber (reg:SI FCC0_REGNUM)) (clobber (reg:SI FCC1_REGNUM)) @@ -2812,18 +2812,18 @@ (define_insn "got_load_tls_desc_off64" (clobber (reg:SI FCC6_REGNUM)) (clobber (reg:SI FCC7_REGNUM)) (clobber (reg:SI RETURN_ADDR_REGNUM)) -(clobber (match_operand:DI 2 "register_operand" "=&r"))] +(clobber (match_operand:DI 1 "register_operand" "=&r"))] "TARGET_TLS_DESC && TARGET_CMODEL_EXTREME" { return TARGET_EXPLICIT_RELOCS -? "pcalau12i\t$r4,%%desc_pc_hi20(%1)\n\ - \taddi.d\t%2,$r0,%%desc_pc_lo12(%1)\n\ - \tlu32i.d\t%2,%%desc64_pc_lo20(%1)\n\ - \tlu52i.d\t%2,%2,%%desc64_pc_hi12(%1)\n\ - \tadd.d\t$r4,$r4,%2\n\ - \tld.d\t$r1,$r4,%%desc_ld(%1)\n\ - \tjirl\t$r1,$r1,%%desc_call(%1)" -: "la.tls.desc\t%0,%2,%1"; +? "pcalau12i\t$r4,%%desc_pc_hi20(%0)\n\t" + "addi.d\t%1,$r0,%%desc_pc_lo12(%0)\n\t" + "lu32i.d\t%1,%%desc64_pc_lo20(%0)\n\t" + "lu52i.d\t%1,%2,%%desc64_pc_hi12(%0)\n\t" + "add.d\t$r4,$r4,%1\n\t" + "ld.d\t$r1,$r4,%%desc_ld(%0)\n\t" + "jirl\t$r1,$r1,%%desc_call(%0)" +: "la.tls.desc\t$r4,%1,%0"; } [(set_attr "got" "load") (set_attr "length" "28")]) -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH v4] LoongArch: Add support for TLS descriptors
On Wed, 2024-03-13 at 11:06 +0800, mengqinggang wrote: > > 在 2024/3/13 上午6:15, Xi Ruoyao 写道: > > On Tue, 2024-03-12 at 17:20 +0800, mengqinggang wrote: > > > +(define_insn "@got_load_tls_desc" > > > + [(set (match_operand:P 0 "register_operand" "=r") > > > + (unspec:P > > > + [(match_operand:P 1 "symbolic_operand" "")] > > > + UNSPEC_TLS_DESC)) > > > + (clobber (reg:SI FCC0_REGNUM)) > > > + (clobber (reg:SI FCC1_REGNUM)) > > > + (clobber (reg:SI FCC2_REGNUM)) > > > + (clobber (reg:SI FCC3_REGNUM)) > > > + (clobber (reg:SI FCC4_REGNUM)) > > > + (clobber (reg:SI FCC5_REGNUM)) > > > + (clobber (reg:SI FCC6_REGNUM)) > > > + (clobber (reg:SI FCC7_REGNUM)) > > > + (clobber (reg:SI RETURN_ADDR_REGNUM))] > > > + "TARGET_TLS_DESC" > > > +{ > > > + return TARGET_EXPLICIT_RELOCS > > > + ? "pcalau12i\t$r4,%%desc_pc_hi20(%1)\n\ > > > + \taddi.d\t$r4,$r4,%%desc_pc_lo12(%1)\n\ > > > + \tld.d\t$r1,$r4,%%desc_ld(%1)\n\ > > > + \tjirl\t$r1,$r1,%%desc_call(%1)" > > Use something like > > > > ? "pcalau12i\t$r4,%%desc_pc_hi20(%1)\n\t" > > "addi.d\t$r4,$r4,%%desc_pc_lo12(%1)\n\t" > > "ld.d\t$r1,$r4,%%desc_ld(%1)\n\t" > > "jirl\t$r1,$r1,%%desc_call(%1)" > > : "la.tls.desc\t%0,%1"; > > > > to prevent additional white spaces in the output asm before tabs. > > > > > + : "la.tls.desc\t%0,%1"; > > > +} > > > + [(set_attr "got" "load") > > > + (set_attr "mode" "") > > > + (set_attr "length" "16")]) > > > + > > > +(define_insn "got_load_tls_desc_off64" > > > + [(set (match_operand:DI 0 "register_operand" "=r") > > > + (unspec:DI > > > + [(match_operand:DI 1 "symbolic_operand" "")] > > > + UNSPEC_TLS_DESC_OFF64)) > > > + (clobber (reg:SI FCC0_REGNUM)) > > > + (clobber (reg:SI FCC1_REGNUM)) > > > + (clobber (reg:SI FCC2_REGNUM)) > > > + (clobber (reg:SI FCC3_REGNUM)) > > > + (clobber (reg:SI FCC4_REGNUM)) > > > + (clobber (reg:SI FCC5_REGNUM)) > > > + (clobber (reg:SI FCC6_REGNUM)) > > > + (clobber (reg:SI FCC7_REGNUM)) > > > + (clobber (reg:SI RETURN_ADDR_REGNUM)) > > > + (clobber (match_operand:DI 2 "register_operand" "=&r"))] > > > + "TARGET_TLS_DESC && TARGET_CMODEL_EXTREME" > > > +{ > > > + return TARGET_EXPLICIT_RELOCS > > > + ? "pcalau12i\t$r4,%%desc_pc_hi20(%1)\n\ > > > + \taddi.d\t%2,$r0,%%desc_pc_lo12(%1)\n\ > > > + \tlu32i.d\t%2,%%desc64_pc_lo20(%1)\n\ > > > + \tlu52i.d\t%2,%2,%%desc64_pc_hi12(%1)\n\ > > > + \tadd.d\t$r4,$r4,%2\n\ > > > + \tld.d\t$r1,$r4,%%desc_ld(%1)\n\ > > > + \tjirl\t$r1,$r1,%%desc_call(%1)" > > > + : "la.tls.desc\t%0,%2,%1"; > > Likewise. > > > > > +} > > > + [(set_attr "got" "load") > > > + (set_attr "length" "28")]) > > Otherwise OK. > > > > It's better to allow splitting these two instructions but we can do it > > in another patch. And IMO it's better to enable TLS desc by default if > > supported by both the assembler and the libc, but we'll have to defer it > > until Glibc 2.40 release. > > > Do we need to wait until LLVM also supports TLS DESC before setting it > as default? Hmm, maybe... I remember when we added R_LARCH_ALIGN lld was being broken for a while. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH v4] LoongArch: Add support for TLS descriptors
On Wed, 2024-03-13 at 10:24 +0800, Xi Ruoyao wrote: > return TARGET_EXPLICIT_RELOCS > - ? "pcalau12i\t$r4,%%desc_pc_hi20(%1)\n\ > - \taddi.d\t%2,$r0,%%desc_pc_lo12(%1)\n\ > - \tlu32i.d\t%2,%%desc64_pc_lo20(%1)\n\ > - \tlu52i.d\t%2,%2,%%desc64_pc_hi12(%1)\n\ > - \tadd.d\t$r4,$r4,%2\n\ > - \tld.d\t$r1,$r4,%%desc_ld(%1)\n\ > - \tjirl\t$r1,$r1,%%desc_call(%1)" > - : "la.tls.desc\t%0,%2,%1"; > + ? "pcalau12i\t$r4,%%desc_pc_hi20(%0)\n\t" > + "addi.d\t%1,$r0,%%desc_pc_lo12(%0)\n\t" > + "lu32i.d\t%1,%%desc64_pc_lo20(%0)\n\t" > + "lu52i.d\t%1,%2,%%desc64_pc_hi12(%0)\n\t" Oops, the "%2" in the above line should be "%1". > + "add.d\t$r4,$r4,%1\n\t" > + "ld.d\t$r1,$r4,%%desc_ld(%0)\n\t" > + "jirl\t$r1,$r1,%%desc_call(%0)" > + : "la.tls.desc\t$r4,%1,%0"; -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH] testsuite: Fix vfprintf-chk-1.c with -fhardened
On Tue, 2024-03-12 at 17:19 +0100, Jakub Jelinek wrote: > On Thu, Feb 15, 2024 at 10:53:08PM +, Sam James wrote: > > With _FORTIFY_SOURCE >= 2 (enabled by -fhardened), vfprintf-chk-1.c's > > __vfprintf_chk ends up calling __vprintf_chk rather than vprintf. Do we really want to support adding random CFLAGS running the test suite? AFAIK adding random CFLAGS will just cause test failures here or there. We are adjusting the test suite for -fPIE -pie and -fstack- protector-strong but it's because they can be implicitly enabled with -- enable-default-* options, and we don't have --enable-default-hardened as at now. If we need to bootstrap a hardened GCC and test it, pass -fhardened as how "info gccinstall" suggests: make BOOT_CFLAGS="-O2 -g -fhardened" instead of env C{,XX}FLAGS="-O2 -g -fhardened" /path/to/gcc/configure ... which will taint the test suite with -fhardened. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH v1] LoongArch: Remove masking process for operand 3 of xvpermi.q.
On Tue, 2024-03-12 at 09:56 +0800, Chenghui Pan wrote: > The behavior of non-zero unused bits in xvpermi.q instruction's > third operand is undefined on LoongArch, according to our > discussion (https://github.com/llvm/llvm-project/pull/83540), > we think that keeping original insn operand as unmodified > state is better solution. > > This patch partially reverts 7b158e036a95b1ab40793dd53bed7dbd770ffdaf. > > gcc/ChangeLog: > > * config/loongarch/lasx.md: Remove masking of operand 3. Add (lasx_xvpermi_q_) before ":". > > gcc/testsuite/ChangeLog: > > * gcc.target/loongarch/vector/lasx/lasx-xvpermi_q.c: > Reposition operand 3's value into instruction's defined accept range. ^^ Remove these two white spaces. Should be OK with these ChangeLog style issues fixed. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
[PATCH] LoongArch: Remove unused and incorrect "sge_" define_insn
If this insn is really used, we'll have something like slti $r4,$r0,$r5 in the code. The assembler will reject it because slti wants 2 register operands and 1 immediate operand. But we've not got any bug report for this, indicating this define_insn is unused at all. Note that do_store_flag (in expr.cc) is already converting x >= 1 to x > 0 unconditionally, so this define_insn is indeed unused and we can just remove it. gcc/ChangeLog: * config/loongarch/loongarch.md (any_ge): Remove. (sge_): Remove. --- Not fully tested but should be obvious. Ok for trunk? gcc/config/loongarch/loongarch.md | 10 -- 1 file changed, 10 deletions(-) diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md index 525e1e82183..18fd9c1e7d5 100644 --- a/gcc/config/loongarch/loongarch.md +++ b/gcc/config/loongarch/loongarch.md @@ -517,7 +517,6 @@ (define_code_iterator equality_op [eq ne]) ;; These code iterators allow the signed and unsigned scc operations to use ;; the same template. (define_code_iterator any_gt [gt gtu]) -(define_code_iterator any_ge [ge geu]) (define_code_iterator any_lt [lt ltu]) (define_code_iterator any_le [le leu]) @@ -3355,15 +3354,6 @@ (define_insn "*sgt_" [(set_attr "type" "slt") (set_attr "mode" "")]) -(define_insn "*sge_" - [(set (match_operand:GPR 0 "register_operand" "=r") - (any_ge:GPR (match_operand:X 1 "register_operand" "r") -(const_int 1)))] - "" - "slti\t%0,%.,%1" - [(set_attr "type" "slt") - (set_attr "mode" "")]) - (define_insn "*slt_" [(set (match_operand:GPR 0 "register_operand" "=r") (any_lt:GPR (match_operand:X 1 "register_operand" "r") -- 2.44.0
[PATCH] LoongArch: Fix C23 (...) functions returning large aggregates [PR114175]
We were assuming TYPE_NO_NAMED_ARGS_STDARG_P don't have any named arguments and there is nothing to advance, but that is not the case for (...) functions returning by hidden reference which have one such artificial argument. This is causing gcc.dg/c23-stdarg-6.c and gcc.dg/c23-stdarg-8.c to fail. Fix the issue by checking if arg.type is NULL, as r14-9503 explains. gcc/ChangeLog: PR target/114175 * config/loongarch/loongarch.cc (loongarch_setup_incoming_varargs): Only skip loongarch_function_arg_advance for TYPE_NO_NAMED_ARGS_STDARG_P functions if arg.type is NULL. --- Bootstrapped and regtested on loongarch64-linux-gnu. Ok for trunk? gcc/config/loongarch/loongarch.cc | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index 70e31bb831c..57de8ef7d20 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -767,7 +767,8 @@ loongarch_setup_incoming_varargs (cumulative_args_t cum, argument. Advance a local copy of CUM past the last "real" named argument, to find out how many registers are left over. */ local_cum = *get_cumulative_args (cum); - if (!TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (current_function_decl))) + if (!TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (current_function_decl)) + || arg.type != NULL_TREE) loongarch_function_arg_advance (pack_cumulative_args (&local_cum), arg); /* Found out how many registers we need to save. */ -- 2.44.0
Pushed: [PATCH v2] LoongArch: Fix C23 (...) functions returning large aggregates [PR114175]
On Tue, 2024-03-19 at 11:19 +0800, chenglulu wrote: > > 在 2024/3/18 下午5:34, Xi Ruoyao 写道: > > We were assuming TYPE_NO_NAMED_ARGS_STDARG_P don't have any named > > arguments and there is nothing to advance, but that is not the case > > for (...) functions returning by hidden reference which have one > > such > > artificial argument. This is causing gcc.dg/c23-stdarg-6.c and > > gcc.dg/c23-stdarg-8.c to fail. > > > > Fix the issue by checking if arg.type is NULL, as r14-9503 explains. > > > > gcc/ChangeLog: > > > > PR target/114175 > > * config/loongarch/loongarch.cc > > (loongarch_setup_incoming_varargs): Only skip > > loongarch_function_arg_advance for > > TYPE_NO_NAMED_ARGS_STDARG_P > > functions if arg.type is NULL. > > --- > > > > Bootstrapped and regtested on loongarch64-linux-gnu. Ok for trunk? > > > > gcc/config/loongarch/loongarch.cc | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/gcc/config/loongarch/loongarch.cc > > b/gcc/config/loongarch/loongarch.cc > > index 70e31bb831c..57de8ef7d20 100644 > > --- a/gcc/config/loongarch/loongarch.cc > > +++ b/gcc/config/loongarch/loongarch.cc > > @@ -767,7 +767,8 @@ loongarch_setup_incoming_varargs > > (cumulative_args_t cum, > > argument. Advance a local copy of CUM past the last "real" > > named > > argument, to find out how many registers are left over. */ > > local_cum = *get_cumulative_args (cum); > I think it's important to add annotation information here: > /* where there is no hidden return argument passed, arg.type > > is always NULL. */ > > Others LTGM. > > Thanks! Pushed v2 with a comment added as attached. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University From c1fd4589c2bf9fd8409d51b94df219cb75107762 Mon Sep 17 00:00:00 2001 From: Xi Ruoyao Date: Mon, 18 Mar 2024 17:18:34 +0800 Subject: [PATCH v2] LoongArch: Fix C23 (...) functions returning large aggregates [PR114175] We were assuming TYPE_NO_NAMED_ARGS_STDARG_P don't have any named arguments and there is nothing to advance, but that is not the case for (...) functions returning by hidden reference which have one such artificial argument. This is causing gcc.dg/c23-stdarg-6.c and gcc.dg/c23-stdarg-8.c to fail. Fix the issue by checking if arg.type is NULL, as r14-9503 explains. gcc/ChangeLog: PR target/114175 * config/loongarch/loongarch.cc (loongarch_setup_incoming_varargs): Only skip loongarch_function_arg_advance for TYPE_NO_NAMED_ARGS_STDARG_P functions if arg.type is NULL. --- gcc/config/loongarch/loongarch.cc | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index 70e31bb831c..5344f2a6987 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -767,7 +767,13 @@ loongarch_setup_incoming_varargs (cumulative_args_t cum, argument. Advance a local copy of CUM past the last "real" named argument, to find out how many registers are left over. */ local_cum = *get_cumulative_args (cum); - if (!TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (current_function_decl))) + + /* For a C23 variadic function w/o any named argument, and w/o an + artifical argument for large return value, skip advancing args. + There is such an artifical argument iff. arg.type is non-NULL + (PR 114175). */ + if (!TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (current_function_decl)) + || arg.type != NULL_TREE) loongarch_function_arg_advance (pack_cumulative_args (&local_cum), arg); /* Found out how many registers we need to save. */ -- 2.44.0
[PATCH] mips: Fix C23 (...) functions returning large aggregates [PR114175]
We were assuming TYPE_NO_NAMED_ARGS_STDARG_P don't have any named arguments and there is nothing to advance, but that is not the case for (...) functions returning by hidden reference which have one such artificial argument. This is causing gcc.dg/c23-stdarg-{6,8,9}.c to fail. Fix the issue by checking if arg.type is NULL, as r14-9503 explains. gcc/ChangeLog: PR target/114175 * config/mips/mips.cc (mips_setup_incoming_varargs): Only skip mips_function_arg_advance for TYPE_NO_NAMED_ARGS_STDARG_P functions if arg.type is NULL. --- Bootstrapped and regtested on mips64el-linux-gnuabi64. Ok for trunk? gcc/config/mips/mips.cc | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc index 68e2ae8d8fa..ce764a5cb35 100644 --- a/gcc/config/mips/mips.cc +++ b/gcc/config/mips/mips.cc @@ -6834,7 +6834,13 @@ mips_setup_incoming_varargs (cumulative_args_t cum, argument. Advance a local copy of CUM past the last "real" named argument, to find out how many registers are left over. */ local_cum = *get_cumulative_args (cum); - if (!TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (current_function_decl))) + + /* For a C23 variadic function w/o any named argument, and w/o an + artifical argument for large return value, skip advancing args. + There is such an artifical argument iff. arg.type is non-NULL + (PR 114175). */ + if (!TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (current_function_decl)) + || arg.type != NULL_TREE) mips_function_arg_advance (pack_cumulative_args (&local_cum), arg); /* Found out how many registers we need to save. */ -- 2.44.0
Pushed: [PATCH] LoongArch: Fix a typo [PR 114407]
gcc/ChangeLog: PR target/114407 * config/loongarch/loongarch-opts.cc (loongarch_config_target): Fix typo in diagnostic message, enabing -> enabling. --- Pushed r14-9582 as obvious. gcc/config/loongarch/loongarch-opts.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/config/loongarch/loongarch-opts.cc b/gcc/config/loongarch/loongarch-opts.cc index 7eeac43ed2f..627f9148adf 100644 --- a/gcc/config/loongarch/loongarch-opts.cc +++ b/gcc/config/loongarch/loongarch-opts.cc @@ -362,7 +362,7 @@ config_target_isa: gcc_assert (constrained.simd); inform (UNKNOWN_LOCATION, - "enabing %qs promotes %<%s%s%> to %<%s%s%>", + "enabling %qs promotes %<%s%s%> to %<%s%s%>", loongarch_isa_ext_strings[t.isa.simd], OPTSTR_ISA_EXT_FPU, loongarch_isa_ext_strings[t.isa.fpu], OPTSTR_ISA_EXT_FPU, loongarch_isa_ext_strings[ISA_EXT_FPU64]); -- 2.44.0
Re: [PATCH] MIPS: Add MIN/MAX.fmt instructions support for MIPS R6
On Thu, 2024-03-21 at 10:14 +0800, Jie Mei wrote: > diff --git a/gcc/testsuite/gcc.target/mips/mips-minmax.c > b/gcc/testsuite/gcc.target/mips/mips-minmax.c > new file mode 100644 > index 000..2d234ac4b1d > --- /dev/null > +++ b/gcc/testsuite/gcc.target/mips/mips-minmax.c > @@ -0,0 +1,40 @@ > +/* { dg-do compile } */ > +/* { dg-options "-mhard-float -ffinite-math-only -march=mips32r6" } */ You may want to add fmin3 and fmax3 in addition to smin3 and smax3 so it will work without -ffinite-math-only. ‘fminM3’, ‘fmaxM3’ IEEE-conformant minimum and maximum operations. If one operand is a quiet ‘NaN’, then the other operand is returned. If both operands are quiet ‘NaN’, then a quiet ‘NaN’ is returned. In the case when gcc supports signaling ‘NaN’ (-fsignaling-nans) an invalid floating point exception is raised and a quiet ‘NaN’ is returned. And the MIPS 6.06 manual says: Numbers are preferred to NaNs: if one input is a NaN, but not both, the value of the numeric input is returned. If both are NaNs, the NaN in fs is returned. for MAX.fmt and MIN.fmt, so they matches fmin3 and fmax3. > +/* { dg-skip-if "code quality test" { *-*-* } { "-O0" } { "" } } */ -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
TARGET_RTX_COSTS and pipeline latency vs. variable-latency instructions (was Re: [PATCH] RISC-V: Add XiangShan Nanhu microarchitecture.)
On Mon, 2024-03-18 at 20:54 -0600, Jeff Law wrote: > > +/* Costs to use when optimizing for xiangshan nanhu. */ > > +static const struct riscv_tune_param xiangshan_nanhu_tune_info = { > > + {COSTS_N_INSNS (3), COSTS_N_INSNS (3)}, /* fp_add */ > > + {COSTS_N_INSNS (3), COSTS_N_INSNS (3)}, /* fp_mul */ > > + {COSTS_N_INSNS (10), COSTS_N_INSNS (20)},/* fp_div */ > > + {COSTS_N_INSNS (3), COSTS_N_INSNS (3)}, /* int_mul */ > > + {COSTS_N_INSNS (6), COSTS_N_INSNS (6)}, /* int_div */ > > + 6, /* issue_rate */ > > + 3, /* branch_cost */ > > + 3, /* memory_cost */ > > + 3, /* fmv_cost */ > > + true,/* > > slow_unaligned_access */ > > + false, /* use_divmod_expansion */ > > + RISCV_FUSE_ZEXTW | RISCV_FUSE_ZEXTH, /* fusible_ops */ > > + NULL,/* vector cost */ > Is your integer division really that fast? The table above essentially > says that your cpu can do integer division in 6 cycles. Hmm, I just seen I've coded some even smaller value for LoongArch CPUs so forgive me for "hijacking" this thread... The problem seems integer division may spend different number of cycles for different inputs: on LoongArch LA664 I've observed 5 cycles for some inputs and 39 cycles for other inputs. So should we use the minimal value, the maximum value, or something in- between for TARGET_RTX_COSTS and pipeline descriptions? -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
[PATCH] LoongArch: Increase division costs
The latency of LA464 and LA664 division instructions depends on the input. When I updated the costs in r14-6642, I unintentionally set the division costs to the best-case latency (when the first operand is 0). Per a recent discussion [1] we should use "something sensible" instead of it. Use the average of the minimum and maximum latency observed instead. This enables multiplication to reciprocal sequence reduction and speeds up the following test case for about 30%: int main (void) { unsigned long stat = 0xdeadbeef; for (int i = 0; i < 1; i++) stat = (stat * stat + stat * 114514 + 1919810) % 17; asm(""::"r"(stat)); } [1]: https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648348.html gcc/ChangeLog: * config/loongarch/loongarch-def.cc (loongarch_rtx_cost_data::loongarch_rtx_cost_data): Increase default division cost to the average of the best case and worst case senarios observed. gcc/testsuite/ChangeLog: * gcc.target/loongarch/div-const-reduction.c: New test. --- Bootstrapped and regtested on loongarch64-linux-gnu. Ok for trunk? gcc/config/loongarch/loongarch-def.cc| 8 gcc/testsuite/gcc.target/loongarch/div-const-reduction.c | 9 + 2 files changed, 13 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/gcc.target/loongarch/div-const-reduction.c diff --git a/gcc/config/loongarch/loongarch-def.cc b/gcc/config/loongarch/loongarch-def.cc index e8c129ce643..93e72a520d5 100644 --- a/gcc/config/loongarch/loongarch-def.cc +++ b/gcc/config/loongarch/loongarch-def.cc @@ -95,12 +95,12 @@ loongarch_rtx_cost_data::loongarch_rtx_cost_data () : fp_add (COSTS_N_INSNS (5)), fp_mult_sf (COSTS_N_INSNS (5)), fp_mult_df (COSTS_N_INSNS (5)), -fp_div_sf (COSTS_N_INSNS (8)), -fp_div_df (COSTS_N_INSNS (8)), +fp_div_sf (COSTS_N_INSNS (12)), +fp_div_df (COSTS_N_INSNS (15)), int_mult_si (COSTS_N_INSNS (4)), int_mult_di (COSTS_N_INSNS (4)), -int_div_si (COSTS_N_INSNS (5)), -int_div_di (COSTS_N_INSNS (5)), +int_div_si (COSTS_N_INSNS (14)), +int_div_di (COSTS_N_INSNS (22)), movcf2gr (COSTS_N_INSNS (7)), movgr2cf (COSTS_N_INSNS (15)), branch_cost (6), diff --git a/gcc/testsuite/gcc.target/loongarch/div-const-reduction.c b/gcc/testsuite/gcc.target/loongarch/div-const-reduction.c new file mode 100644 index 000..0ee86410dd7 --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/div-const-reduction.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mtune=la464" } */ +/* { dg-final { scan-assembler-not "div\.\[dw\]" } } */ + +int +test (int a) +{ + return a % 17; +} -- 2.44.0
Re: [PATCH v2] MIPS: Add MIN/MAX.fmt instructions support for MIPS R6
On Tue, 2024-03-26 at 11:15 +0800, YunQiang Su wrote: /* snip */ > With -ffinite-math-only -fno-signed-zeros, it does work with > x >= y ? x : y > while without `-ffinite-math-only -fno-signed-zeros`, it cannot. > @Xi Ruoyao Is it expected by IEEE? When y is (quiet) NaN and x is not, fmax(x, y) should produce x but x >= y ? x : y should produce y. Thus -ffinite-math-only is needed. When x is +0.0 and y is -0.0, x >= y ? x : y should produce +0.0 but fmax(x, y) may produce +0.0 or -0.0 (IEEE allows both and I don't see a more strict requirement in MIPS 6.06 manual either). Thus -fno-signed- zeros is needed. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH] LoongArch: Increase division costs
On Wed, 2024-03-27 at 10:38 +0800, chenglulu wrote: > > 在 2024/3/26 下午5:48, Xi Ruoyao 写道: > > The latency of LA464 and LA664 division instructions depends on the > > input. When I updated the costs in r14-6642, I unintentionally set the > > division costs to the best-case latency (when the first operand is 0). > > Per a recent discussion [1] we should use "something sensible" instead > > of it. > > > > Use the average of the minimum and maximum latency observed instead. > > This enables multiplication to reciprocal sequence reduction and speeds > > up the following test case for about 30%: > > > > int > > main (void) > > { > > unsigned long stat = 0xdeadbeef; > > for (int i = 0; i < 1; i++) > > stat = (stat * stat + stat * 114514 + 1919810) % 17; > > asm(""::"r"(stat)); > > } > > > > [1]: https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648348.html > > The test case div-const-reduction.c is modified to assemble the instruction > sequence as follows: > lu12i.w $r12,97440>>12 # 0x3b9ac000 > ori $r12,$r12,2567 > mod.w $r13,$r13,$r12 > > This sequence of instructions takes 5 clock cycles. Hmm indeed, it seems a waste to do this reduction for int / 17. I'll try to make a better heuristic as Richard suggests... -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH] LoongArch: Increase division costs
On Wed, 2024-03-27 at 08:54 +0100, Richard Biener wrote: > On Tue, Mar 26, 2024 at 10:52 AM Xi Ruoyao wrote: > > > > The latency of LA464 and LA664 division instructions depends on the > > input. When I updated the costs in r14-6642, I unintentionally set the > > division costs to the best-case latency (when the first operand is 0). > > Per a recent discussion [1] we should use "something sensible" instead > > of it. > > > > Use the average of the minimum and maximum latency observed instead. > > This enables multiplication to reciprocal sequence reduction and speeds > > up the following test case for about 30%: > > > > int > > main (void) > > { > > unsigned long stat = 0xdeadbeef; > > for (int i = 0; i < 1; i++) > > stat = (stat * stat + stat * 114514 + 1919810) % 17; > > asm(""::"r"(stat)); > > } > > I think you should be able to see a constant divisor and thus could do > better than return the same latency for everything. For non-constant > divisors using the best-case latency shouldn't be a problem. Hmm, it seems not really possible as at now. expand_divmod does something like: max_cost = (unsignedp ? udiv_cost (speed, compute_mode) : sdiv_cost (speed, compute_mode)); which is reading the pre-calculated costs from a table. Thus we don't really know the denominator and cannot estimate the cost based on it :(. CSE really invokes the cost hook with the actual (mod (a, (const_int 17)) RTX but it's less important. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH] LoongArch: Increase division costs
On Wed, 2024-03-27 at 18:39 +0800, Xi Ruoyao wrote: > On Wed, 2024-03-27 at 10:38 +0800, chenglulu wrote: > > > > 在 2024/3/26 下午5:48, Xi Ruoyao 写道: > > > The latency of LA464 and LA664 division instructions depends on the > > > input. When I updated the costs in r14-6642, I unintentionally set the > > > division costs to the best-case latency (when the first operand is 0). > > > Per a recent discussion [1] we should use "something sensible" instead > > > of it. > > > > > > Use the average of the minimum and maximum latency observed instead. > > > This enables multiplication to reciprocal sequence reduction and speeds > > > up the following test case for about 30%: > > > > > > int > > > main (void) > > > { > > > unsigned long stat = 0xdeadbeef; > > > for (int i = 0; i < 1; i++) > > > stat = (stat * stat + stat * 114514 + 1919810) % 17; > > > asm(""::"r"(stat)); > > > } > > > > > > [1]: https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648348.html > > > > The test case div-const-reduction.c is modified to assemble the instruction > > sequence as follows: > > lu12i.w $r12,97440>>12 # 0x3b9ac000 > > ori $r12,$r12,2567 > > mod.w $r13,$r13,$r12 > > > > This sequence of instructions takes 5 clock cycles. It actually may take 5 to 8 cycles depending on the input. And multiplication is fully pipelined while division is not, so the reciprocal sequence should still produce a better throughput. > Hmm indeed, it seems a waste to do this reduction for int / 17. > I'll try to make a better heuristic as Richard suggests... Oops, it seems impossible (w/o refactoring the generic code). See my reply to Richi :(. Can you also try benchmarking with the costs of SI and DI division increased to (10, 10) instead of (14, 22) - allowing more CSE but not reciprocal sequence reduction, and (10, 22) - only allowing reduction for DI but not SI? -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Ping: [PATCH] mips: Fix C23 (...) functions returning large aggregates [PR114175]
Ping. On Wed, 2024-03-20 at 15:10 +0800, Xi Ruoyao wrote: > We were assuming TYPE_NO_NAMED_ARGS_STDARG_P don't have any named > arguments and there is nothing to advance, but that is not the case > for (...) functions returning by hidden reference which have one such > artificial argument. This is causing gcc.dg/c23-stdarg-{6,8,9}.c to > fail. > > Fix the issue by checking if arg.type is NULL, as r14-9503 explains. > > gcc/ChangeLog: > > PR target/114175 > * config/mips/mips.cc (mips_setup_incoming_varargs): Only skip > mips_function_arg_advance for TYPE_NO_NAMED_ARGS_STDARG_P > functions if arg.type is NULL. > --- > > Bootstrapped and regtested on mips64el-linux-gnuabi64. Ok for trunk? > > gcc/config/mips/mips.cc | 8 +++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc > index 68e2ae8d8fa..ce764a5cb35 100644 > --- a/gcc/config/mips/mips.cc > +++ b/gcc/config/mips/mips.cc > @@ -6834,7 +6834,13 @@ mips_setup_incoming_varargs (cumulative_args_t cum, > argument. Advance a local copy of CUM past the last "real" named > argument, to find out how many registers are left over. */ > local_cum = *get_cumulative_args (cum); > - if (!TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (current_function_decl))) > + > + /* For a C23 variadic function w/o any named argument, and w/o an > + artifical argument for large return value, skip advancing args. > + There is such an artifical argument iff. arg.type is non-NULL > + (PR 114175). */ > + if (!TYPE_NO_NAMED_ARGS_STDARG_P (TREE_TYPE (current_function_decl)) > + || arg.type != NULL_TREE) > mips_function_arg_advance (pack_cumulative_args (&local_cum), arg); > > /* Found out how many registers we need to save. */ -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH] LoongArch: Increase division costs
On Fri, 2024-03-29 at 09:23 +0800, chenglulu wrote: > I tested spec2006. In the floating-point program, the test items with large > > fluctuations are removed, and the rest is basically unchanged. > > The fixed-point 464.h264ref (10,10) was 6.7% higher than (5,5) and (10,22). So IIUC (10,10) is better than (5,5), (10,22), and the originally proposed (14,22)? Then should I make a change to make all 4 costs (SF, DF, SI, DI) 10? I'd still want DI % 17 to be reduced as reciprocal sequence (but not SI % 17) since DI % (smaller const) is quite important for some workloads like competitive programming. However "adapting with different modulos" is not possible w/o refactoring generic code so it must be deferred to at least GCC 15. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH] LoongArch: Increase division costs
On Mon, 2024-04-01 at 10:22 +0800, chenglulu wrote: > > 在 2024/4/1 上午9:29, Xi Ruoyao 写道: > > On Fri, 2024-03-29 at 09:23 +0800, chenglulu wrote: > > > > > I tested spec2006. In the floating-point program, the test items with > > > large > > > > > > fluctuations are removed, and the rest is basically unchanged. > > > > > > The fixed-point 464.h264ref (10,10) was 6.7% higher than (5,5) and > > > (10,22). > > So IIUC (10,10) is better than (5,5), (10,22), and the originally > > proposed (14,22)? Then should I make a change to make all 4 costs (SF, > > DF, SI, DI) 10? > > I think this may require the analysis of the spec's test case. I took a > look at the test results again, > > where the scores of SPEC INT 462.libquantum fluctuated greatly, but the > combination of (10,22) > > showed an overall upward trend compared to the scores of the other two > combinations. > > I don't know if (10,22) this combination happens to have the kind of > test cases in the changelog. > > So can we change it together in GCC15? Ok. Abandoning this patch then. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH v5] LoongArch: Add support for TLS descriptors
Is this patch targeting GCC 14 or 15? If 14 I guess we'd commit now... Generally we don't add features in stage 4, but if we keep trad as the default I think it'd be OK. And RISC-V guys plan to push their TLS desc implementation this week too. On Tue, 2024-03-19 at 09:54 +0800, mengqinggang wrote: > Add support for TLS descriptors on normal code model and extreme code model. > > Normal code model instruction sequence: > -mno-explicit-relocs: > la.tls.desc $r4, s > add.d $r12, $r4, $r2 > -mexplicit-relocs: > pcalau12i $r4,%desc_pc_hi20(s) > addi.d$r4,$r4,%desc_pc_lo12(s) > ld.d $r1,$r4,%desc_ld(s) > jirl $r1,$r1,%desc_call(s) > add.d $r12, $r4, $r2 > > Extreme code model instruction sequence: > -mno-explicit-relocs: > la.tls.desc $r4, $r12, s > add.d $r12, $r4, $r2 > -mexplicit-relocs: > pcalau12i $r4,%desc_pc_hi20(s) > addi.d$r12,$r0,%desc_pc_lo12(s) > lu32i.d $r12,%desc64_pc_lo20(s) > lu52i.d $r12,$r12,%desc64_pc_hi12(s) > add.d $r4,$r4,$r12 > ld.d $r1,$r4,%desc_ld(s) > jirl $r1,$r1,%desc_call(s) > add.d $r12, $r4, $r2 > > The default is still traditional TLS model, but can be configured with > --with-tls={trad,desc}. The default can change to TLS descriptors once > libc and LLVM support this. > > gcc/ChangeLog: > > * config.gcc: Add --with-tls option to change TLS flavor. > * config/loongarch/genopts/loongarch.opt.in: Add -mtls-dialect to > configure TLS flavor. > * config/loongarch/loongarch-def.h (struct loongarch_target): Add > tls_dialect. > * config/loongarch/loongarch-driver.cc (la_driver_init): Add tls > flavor. > * config/loongarch/loongarch-opts.cc (loongarch_init_target): Add > tls_dialect. > (loongarch_config_target): Ditto. > (loongarch_update_gcc_opt_status): Ditto. > * config/loongarch/loongarch-opts.h (loongarch_init_target):Ditto. > (TARGET_TLS_DESC): New define. > * config/loongarch/loongarch.cc (loongarch_symbol_insns): Add TLS DESC > instructions sequence length. > (loongarch_legitimize_tls_address): New TLS DESC instruction sequence. > (loongarch_option_override_internal): Add la_opt_tls_dialect. > (loongarch_option_restore): Add la_target.tls_dialect. > * config/loongarch/loongarch.md (@got_load_tls_desc): Normal > code model for TLS DESC. > (got_load_tls_desc_off64): Extreme code model for TLS DESC. > * config/loongarch/loongarch.opt: Regenerated. > > gcc/testsuite/ChangeLog: > > * gcc.target/loongarch/cmodel-extreme-1.c: Add -mtls-dialect=trad. > * gcc.target/loongarch/cmodel-extreme-2.c: Ditto. > * gcc.target/loongarch/explicit-relocs-auto-tls-ld-gd.c: Ditto. > * gcc.target/loongarch/explicit-relocs-medium-call36-auto-tls-ld-gd.c: > Ditto. > * gcc.target/loongarch/func-call-medium-1.c: Ditto. > * gcc.target/loongarch/func-call-medium-2.c: Ditto. > * gcc.target/loongarch/func-call-medium-3.c: Ditto. > * gcc.target/loongarch/func-call-medium-4.c: Ditto. > * gcc.target/loongarch/tls-extreme-macro.c: Ditto. > * gcc.target/loongarch/tls-gd-noplt.c: Ditto. > * gcc.target/loongarch/explicit-relocs-auto-extreme-tls-desc.c: New > test. > * gcc.target/loongarch/explicit-relocs-auto-tls-desc.c: New test. > * gcc.target/loongarch/explicit-relocs-extreme-tls-desc.c: New test. > * gcc.target/loongarch/explicit-relocs-tls-desc.c: New test. > > Co-authored-by: Lulu Cheng > Co-authored-by: Xi Ruoyao -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH v1] LoongArch: Set default alignment for functions jumps and loops [PR112919].
On Tue, 2024-04-02 at 15:03 +0800, Lulu Cheng wrote: > +/* Alignment for functions loops and jumps for best performance. For new > + uarchs the value should be measured via benchmarking. See the > documentation > + for -falign-functions -falign-loops and -falign-jumps in invoke.texi for > the ^ ^ Better have two commas here. Otherwise it should be OK. > + format. */ -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH] LoongArch: Enable switchable target
On Sun, 2024-04-07 at 15:47 +0800, Yang Yujie wrote: > This patch fixes the back-end context switching in cases where functions > should be built with their own target contexts instead of the > global one, such as LTO linking and functions with target attributes (TBD). > > PR target/113233 Oops, so this PR isn't fixed with r14-7134 "LoongArch: Implement option save/restore"? Should I reopen it? -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH] LoongArch: Enable switchable target
On Sun, 2024-04-07 at 16:23 +0800, Yang Yujie wrote: > On Sun, Apr 07, 2024 at 04:23:53PM +0800, Xi Ruoyao wrote: > > On Sun, 2024-04-07 at 15:47 +0800, Yang Yujie wrote: > > > This patch fixes the back-end context switching in cases where functions > > > should be built with their own target contexts instead of the > > > global one, such as LTO linking and functions with target attributes > > > (TBD). > > > > > > PR target/113233 > > > > Oops, so this PR isn't fixed with r14-7134 "LoongArch: Implement option > > save/restore"? Should I reopen it? > > > > -- > > Xi Ruoyao > > School of Aerospace Science and Technology, Xidian University > > Yes, the issue was not fixed with that patch. This one should do. So reopened the PR. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH] ICF&SRA: Make ICF and SRA agree on padding
On Thu, 2024-04-04 at 23:19 +0200, Martin Jambor wrote: > The patch has been approved by Honza in Bugzilla. (I hope. He did write > it looked reasonable.) Together with the patch for PR 113907, it has > passed bootstrap, LTO bootstrap and LTO profiledbootstrap and testing on > x86_64-linux and bootstrap and LTO bootstrap on ppc64le-linux. It also > passed normal bootstrap on aarch64-linux but there many testcases failed > because the compiler timed out. The machine is old and slow and might > have been oversubscribed so my plan is to try again on gcc185 from > cfarm. If that goes well, I intend to commit the patch and then start > working on backports. I've tried these two patches out on my own 24-core AArch64 machine. Bootstrapped (but no LTO or PGO) and regtested fine. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH] ICF&SRA: Make ICF and SRA agree on padding
On Thu, 2024-04-04 at 23:19 +0200, Martin Jambor wrote: > +/* Given two types in an assignment, return true either if any one cannot be > + totally scalarized or if they have padding (i.e. not copied bits) */ > + > +bool > +sra_total_scalarization_would_copy_same_data_p (tree t1, tree t2) > +{ > + sra_padding_collecting p1; > + if (!check_ts_and_push_padding_to_vec (t1, &p1)) > + return true; > + > + sra_padding_collecting p2; > + if (!check_ts_and_push_padding_to_vec (t2, &p2)) > + return true; > + > + unsigned l = p1.m_padding.length (); > + if (l != p2.m_padding.length ()) > + return false; > + for (unsigned i = 0; i < l; i++) > + if (p1.m_padding[i].first != p2.m_padding[i].first > + || p1.m_padding[i].second != p2.m_padding[i].second) > + return false; > + > + return true; > +} > + Better remove this trailing empty line from tree-sra.cc. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH] LoongArch: Enable switchable target
On Sun, 2024-04-07 at 15:47 +0800, Yang Yujie wrote: > * config/loongarch/loongarch-builtins.cc > (loongarch_init_builtins): > Initialize all builtin functions at startup. git gcc-verify complains that tab should be used instead of space for this line. > (loongarch_expand_builtin): Turn assertion of builtin > availability > into a test. and this line. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH v2] LoongArch: Enable switchable target
On Mon, 2024-04-08 at 16:46 +0800, Yang Yujie wrote: > v1 -> v2: > Remove spaces from changelog. I've rebuilt the base system with a GCC including this patch. LTO+PGO bootstrap fine, regtested fine, and no issues observed. I do usually include the optimization flags into LDFLAGS when I do LTO, so I don't really rely on this patch though. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH] Change gcc/ira-conflicts.cc build_conflict_bit_table to use size_t/%zu
On Thu, 2024-02-01 at 14:01 +0100, Jakub Jelinek wrote: > On Thu, Feb 01, 2024 at 12:45:31PM +, Jonathan Yong wrote: > > Attached patch OK? Copied inline for review convenience. > > No, I think e.g. AIX doesn't support the z modifier. > I don't see %zd or %zu used anywhere except in gcc/jit/ which presumably > doesn't work on AIX. > > If you really want to avoid truncation, perhaps do something like > if (internal_flag_ira_verbose > 0 && ira_dump_file != NULL) > { > if (sizeof (void *) <= sizeof (long)) > fprintf (ira_dump_file, >"+++Allocating %lu bytes for conflict table " >"(uncompressed size %lu)\n", >(unsigned long) (sizeof (IRA_INT_TYPE) * allocated_words_num), >(unsigned long) (sizeof (IRA_INT_TYPE) * object_set_words > * ira_objects_num)); > else > fprintf (ira_dump_file, >"+++Allocating %l" PRIu64 "bytes for conflict table " >"(uncompressed size %" PRIu64 ")\n", Should use HOST_WIDE_INT_PRINT_UNSIGNED instead of PRIu64. >(unsigned HOST_WIDE_INT) (sizeof (IRA_INT_TYPE) > * allocated_words_num), >(unsigned HOST_WIDE_INT) (sizeof (IRA_INT_TYPE) > * object_set_words > * ira_objects_num)); > } -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH] Change gcc/ira-conflicts.cc build_conflict_bit_table to use size_t/%zu
On Thu, 2024-02-01 at 14:55 +0100, Jakub Jelinek wrote: > On Thu, Feb 01, 2024 at 01:42:03PM +, Jonathan Yong wrote: > > On 2/1/24 13:06, Xi Ruoyao wrote: > > > On Thu, 2024-02-01 at 14:01 +0100, Jakub Jelinek wrote: > > > > On Thu, Feb 01, 2024 at 12:45:31PM +, Jonathan Yong wrote: > > > > > Attached patch OK? Copied inline for review convenience. > > > > > > > > No, I think e.g. AIX doesn't support the z modifier. > > > > I don't see %zd or %zu used anywhere except in gcc/jit/ which presumably > > > > doesn't work on AIX. > > > > > > > > > > Should use HOST_WIDE_INT_PRINT_UNSIGNED instead of PRIu64. > > > > > Updated the patch with the suggestions. I mean if you are casting it to unsigned HOST_WIDE_INT, you should use HOST_WIDE_INT_PRINT_UNSIGNED, If you are casting it to size_t you cannot use it (as Jakub has explained). When you use printf-like things you have to keep the correspondence between format specifier and the argument itself, > No, that is wrong. That will break bootstrap on lots of hosts, any time > size_t is not unsigned long (if unsigned long is 64-bit) or unsigned long > long (if unsigned long is not 64-bit). > That includes e.g. all targets where size_t is unsigned int, and some others > too. > > Jakub > -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
[PATCH] LoongArch: Fix an ODR violation
When bootstrapping GCC 14 --with-build-config=bootstrap-lto, an ODR violation is detected: ../../gcc/config/loongarch/loongarch-opts.cc:57: warning: 'abi_minimal_isa' violates the C++ One Definition Rule [-Wodr] 57 | abi_minimal_isa[N_ABI_BASE_TYPES][N_ABI_EXT_TYPES]; ../../gcc/config/loongarch/loongarch-def.cc:186: note: 'abi_minimal_isa' was previously declared here 186 | abi_minimal_isa = array, ../../gcc/config/loongarch/loongarch-def.cc:186: note: code may be misoptimized unless '-fno-strict-aliasing' is used Fix it by adding a proper declaration of abi_minimal_isa into loongarch-def.h and remove the ODR-violating local declaration in loongarch-opts.cc. gcc/ChangeLog: * config/loongarch/loongarch-def.h (abi_minimal_isa): Declare. * config/loongarch/loongarch-opts.cc (abi_minimal_isa): Remove the ODR-violating locale declaration. --- Bootstrapped on loongarch64-linux-gnu. Not fully regtested but it should be an obvious fix. Ok for trunk? gcc/config/loongarch/loongarch-def.h | 3 +++ gcc/config/loongarch/loongarch-opts.cc | 2 -- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/gcc/config/loongarch/loongarch-def.h b/gcc/config/loongarch/loongarch-def.h index a1237ecf1fd..2dbf006d013 100644 --- a/gcc/config/loongarch/loongarch-def.h +++ b/gcc/config/loongarch/loongarch-def.h @@ -203,5 +203,8 @@ extern loongarch_def_array loongarch_cpu_align; extern loongarch_def_array loongarch_cpu_rtx_cost_data; +extern loongarch_def_array< + loongarch_def_array, + N_ABI_BASE_TYPES> abi_minimal_isa; #endif /* LOONGARCH_DEF_H */ diff --git a/gcc/config/loongarch/loongarch-opts.cc b/gcc/config/loongarch/loongarch-opts.cc index b87299513c9..7eeac43ed2f 100644 --- a/gcc/config/loongarch/loongarch-opts.cc +++ b/gcc/config/loongarch/loongarch-opts.cc @@ -53,8 +53,6 @@ static const int tm_multilib_list[] = { TM_MULTILIB_LIST }; static int enabled_abi_types[N_ABI_BASE_TYPES][N_ABI_EXT_TYPES] = { 0 }; #define isa_required(ABI) (abi_minimal_isa[(ABI).base][(ABI).ext]) -extern "C" const struct loongarch_isa -abi_minimal_isa[N_ABI_BASE_TYPES][N_ABI_EXT_TYPES]; static inline int is_multilib_enabled (struct loongarch_abi abi) -- 2.43.0
[PATCH] LoongArch: Avoid out-of-bounds access in loongarch_symbol_insns
We call loongarch_symbol_insns with mode = MAX_MACHINE_MODE sometimes. But in loongarch_symbol_insns: if (LSX_SUPPORTED_MODE_P (mode) || LASX_SUPPORTED_MODE_P (mode)) return 0; And LSX_SUPPORTED_MODE_P is defined as: #define LSX_SUPPORTED_MODE_P(MODE) \ (ISA_HAS_LSX \ && GET_MODE_SIZE (MODE) == UNITS_PER_LSX_REG ... ... GET_MODE_SIZE is expanded to a call to mode_to_bytes, which is defined: ALWAYS_INLINE poly_uint16 mode_to_bytes (machine_mode mode) { #if GCC_VERSION >= 4001 return (__builtin_constant_p (mode) ? mode_size_inline (mode) : mode_size[mode]); #else return mode_size[mode]; #endif } There is an assertion in mode_size_inline: gcc_assert (mode >= 0 && mode < NUM_MACHINE_MODES); Note that NUM_MACHINE_MODES = MAX_MACHINE_MODE (emitted by genmodes.cc), thus if __builtin_constant_p (mode) is evaluated true (it happens when GCC is bootstrapped with LTO+PGO), the assertion will be triggered and cause an ICE. OTOH if __builtin_constant_p (mode) is evaluated false, mode_size[mode] is still an out-of-bound array access (the length or the mode_size array is NUM_MACHINE_MODES). So we shouldn't call LSX_SUPPORTED_MODE_P or LASX_SUPPORTED_MODE_P with MAX_MACHINE_MODE in loongarch_symbol_insns. This is very similar to a MIPS bug PR98491 fixed by me about 3 years ago. gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_symbol_insns): Do not use LSX_SUPPORTED_MODE_P or LASX_SUPPORTED_MODE_P if mode is MAX_MACHINE_MODE. --- Bootstrapped and regtested on loongarch64-linux-gnu. Ok for trunk? gcc/config/loongarch/loongarch.cc | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index 963e86d61af..6badef45d62 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -2007,7 +2007,8 @@ loongarch_symbol_insns (enum loongarch_symbol_type type, machine_mode mode) { /* LSX LD.* and ST.* cannot support loading symbols via an immediate operand. */ - if (LSX_SUPPORTED_MODE_P (mode) || LASX_SUPPORTED_MODE_P (mode)) + if (mode != MAX_MACHINE_MODE + && (LSX_SUPPORTED_MODE_P (mode) || LASX_SUPPORTED_MODE_P (mode))) return 0; switch (type) -- 2.43.0
[PATCH] LoongArch: Fix wrong LSX FP vector negation
We expanded (neg x) to (minus const0 x) for LSX FP vectors, this is wrong because -0.0 is not 0 - 0.0. This causes some Python tests to fail when Python is built with LSX enabled. Use the vbitrevi.{d/w} instructions to simply reverse the sign bit instead. We are already doing this for LASX and now we can unify them into simd.md. gcc/ChangeLog: * config/loongarch/lsx.md (neg2): Remove the incorrect expand. * config/loongarch/simd.md (simdfmt_as_i): New define_mode_attr. (elmsgnbit): Likewise. (neg2): New define_insn. * config/loongarch/lasx.md (negv4df2, negv8sf2): Remove as they are now instantiated in simd.md. --- Bootstrapped and regtested on loongarch64-linux-gnu. Ok for trunk? gcc/config/loongarch/lasx.md | 16 gcc/config/loongarch/lsx.md | 11 --- gcc/config/loongarch/simd.md | 18 ++ 3 files changed, 18 insertions(+), 27 deletions(-) diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md index e2115ffb884..ac84db7f0ce 100644 --- a/gcc/config/loongarch/lasx.md +++ b/gcc/config/loongarch/lasx.md @@ -3028,22 +3028,6 @@ (define_insn "absv8sf2" [(set_attr "type" "simd_logic") (set_attr "mode" "V8SF")]) -(define_insn "negv4df2" - [(set (match_operand:V4DF 0 "register_operand" "=f") - (neg:V4DF (match_operand:V4DF 1 "register_operand" "f")))] - "ISA_HAS_LASX" - "xvbitrevi.d\t%u0,%u1,63" - [(set_attr "type" "simd_logic") - (set_attr "mode" "V4DF")]) - -(define_insn "negv8sf2" - [(set (match_operand:V8SF 0 "register_operand" "=f") - (neg:V8SF (match_operand:V8SF 1 "register_operand" "f")))] - "ISA_HAS_LASX" - "xvbitrevi.w\t%u0,%u1,31" - [(set_attr "type" "simd_logic") - (set_attr "mode" "V8SF")]) - (define_insn "xvfmadd4" [(set (match_operand:FLASX 0 "register_operand" "=f") (fma:FLASX (match_operand:FLASX 1 "register_operand" "f") diff --git a/gcc/config/loongarch/lsx.md b/gcc/config/loongarch/lsx.md index 7002edae4d4..b9b94b9079c 100644 --- a/gcc/config/loongarch/lsx.md +++ b/gcc/config/loongarch/lsx.md @@ -728,17 +728,6 @@ (define_expand "neg2" DONE; }) -(define_expand "neg2" - [(set (match_operand:FLSX 0 "register_operand") - (neg:FLSX (match_operand:FLSX 1 "register_operand")))] - "ISA_HAS_LSX" -{ - rtx reg = gen_reg_rtx (mode); - emit_move_insn (reg, CONST0_RTX (mode)); - emit_insn (gen_sub3 (operands[0], reg, operands[1])); - DONE; -}) - (define_expand "lsx_vrepli" [(match_operand:ILSX 0 "register_operand") (match_operand 1 "const_imm10_operand")] diff --git a/gcc/config/loongarch/simd.md b/gcc/config/loongarch/simd.md index cb0a19447a1..00ff2823a4e 100644 --- a/gcc/config/loongarch/simd.md +++ b/gcc/config/loongarch/simd.md @@ -85,12 +85,21 @@ (define_mode_attr simdfmt [(V2DF "d") (V4DF "d") (define_mode_attr simdifmt_for_f [(V2DF "l") (V4DF "l") (V4SF "w") (V8SF "w")]) +;; Suffix for integer mode in LSX or LASX instructions to operating FP +;; vectors using integer vector operations. +(define_mode_attr simdfmt_as_i [(V2DF "d") (V4DF "d") + (V4SF "w") (V8SF "w")]) + ;; Size of vector elements in bits. (define_mode_attr elmbits [(V2DI "64") (V4DI "64") (V4SI "32") (V8SI "32") (V8HI "16") (V16HI "16") (V16QI "8") (V32QI "8")]) +;; The index of sign bit in FP vector elements. +(define_mode_attr elmsgnbit [(V2DF "63") (V4DF "63") +(V4SF "31") (V8SF "31")]) + ;; This attribute is used to form an immediate operand constraint using ;; "const__operand". (define_mode_attr bitimm [(V16QI "uimm3") (V32QI "uimm3") @@ -457,6 +466,15 @@ (define_expand "reduc__scal_" DONE; }) +;; FP negation. +(define_insn "neg2" + [(set (match_operand:FVEC 0 "register_operand" "=f") + (neg:FVEC (match_operand:FVEC 1 "register_operand" "f")))] + "" + "vbitrevi.\t%0,%1," + [(set_attr "type" "simd_logic") + (set_attr "mode" "")]) + ; The LoongArch SX Instructions. (include "lsx.md") -- 2.43.0
Pushed: [PATCH] LoongArch: Fix an ODR violation
On Fri, 2024-02-02 at 10:42 +0800, chenglulu wrote: > LGTM! > > Thanks! Pushed r14-8773. > 在 2024/2/2 上午5:54, Xi Ruoyao 写道: > > When bootstrapping GCC 14 --with-build-config=bootstrap-lto, an ODR > > violation is detected: > > > > ../../gcc/config/loongarch/loongarch-opts.cc:57: warning: > > 'abi_minimal_isa' violates the C++ One Definition Rule [-Wodr] > > 57 | abi_minimal_isa[N_ABI_BASE_TYPES][N_ABI_EXT_TYPES]; > > ../../gcc/config/loongarch/loongarch-def.cc:186: note: > > 'abi_minimal_isa' was previously declared here > > 186 | abi_minimal_isa = array, > > ../../gcc/config/loongarch/loongarch-def.cc:186: note: > > code may be misoptimized unless '-fno-strict-aliasing' is used > > > > Fix it by adding a proper declaration of abi_minimal_isa into > > loongarch-def.h and remove the ODR-violating local declaration in > > loongarch-opts.cc. > > > > gcc/ChangeLog: > > > > * config/loongarch/loongarch-def.h (abi_minimal_isa): Declare. > > * config/loongarch/loongarch-opts.cc (abi_minimal_isa): Remove > > the ODR-violating locale declaration. > > --- > > > > Bootstrapped on loongarch64-linux-gnu. Not fully regtested but it > > should be an obvious fix. Ok for trunk? > > > > gcc/config/loongarch/loongarch-def.h | 3 +++ > > gcc/config/loongarch/loongarch-opts.cc | 2 -- > > 2 files changed, 3 insertions(+), 2 deletions(-) > > > > diff --git a/gcc/config/loongarch/loongarch-def.h > > b/gcc/config/loongarch/loongarch-def.h > > index a1237ecf1fd..2dbf006d013 100644 > > --- a/gcc/config/loongarch/loongarch-def.h > > +++ b/gcc/config/loongarch/loongarch-def.h > > @@ -203,5 +203,8 @@ extern loongarch_def_array > N_TUNE_TYPES> > > loongarch_cpu_align; > > extern loongarch_def_array > > loongarch_cpu_rtx_cost_data; > > +extern loongarch_def_array< > > + loongarch_def_array, > > + N_ABI_BASE_TYPES> abi_minimal_isa; > > > > #endif /* LOONGARCH_DEF_H */ > > diff --git a/gcc/config/loongarch/loongarch-opts.cc > > b/gcc/config/loongarch/loongarch-opts.cc > > index b87299513c9..7eeac43ed2f 100644 > > --- a/gcc/config/loongarch/loongarch-opts.cc > > +++ b/gcc/config/loongarch/loongarch-opts.cc > > @@ -53,8 +53,6 @@ static const int tm_multilib_list[] = { TM_MULTILIB_LIST > > }; > > static int enabled_abi_types[N_ABI_BASE_TYPES][N_ABI_EXT_TYPES] = { 0 }; > > > > #define isa_required(ABI) (abi_minimal_isa[(ABI).base][(ABI).ext]) > > -extern "C" const struct loongarch_isa > > -abi_minimal_isa[N_ABI_BASE_TYPES][N_ABI_EXT_TYPES]; > > > > static inline int > > is_multilib_enabled (struct loongarch_abi abi) > -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Pushed: [PATCH] LoongArch: Fix wrong LSX FP vector negation
On Sun, 2024-02-04 at 11:20 +0800, chenglulu wrote: > > 在 2024/2/3 下午4:58, Xi Ruoyao 写道: > > We expanded (neg x) to (minus const0 x) for LSX FP vectors, this is > > wrong because -0.0 is not 0 - 0.0. This causes some Python tests to > > fail when Python is built with LSX enabled. > > > > Use the vbitrevi.{d/w} instructions to simply reverse the sign bit > > instead. We are already doing this for LASX and now we can unify them > > into simd.md. > > > > gcc/ChangeLog: > > > > * config/loongarch/lsx.md (neg2): Remove the > > incorrect expand. > > * config/loongarch/simd.md (simdfmt_as_i): New define_mode_attr. > > (elmsgnbit): Likewise. > > (neg2): New define_insn. > > * config/loongarch/lasx.md (negv4df2, negv8sf2): Remove as they > > are now instantiated in simd.md. > > --- > > > > Bootstrapped and regtested on loongarch64-linux-gnu. Ok for trunk? > > LGTM! > > Thanks! Pushed r14-8785. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Pushed: [PATCH] LoongArch: Avoid out-of-bounds access in loongarch_symbol_insns
On Sun, 2024-02-04 at 11:19 +0800, chenglulu wrote: > > 在 2024/2/2 下午5:55, Xi Ruoyao 写道: > > We call loongarch_symbol_insns with mode = MAX_MACHINE_MODE sometimes. > > But in loongarch_symbol_insns: > > > > if (LSX_SUPPORTED_MODE_P (mode) || LASX_SUPPORTED_MODE_P (mode)) > > return 0; > > > > And LSX_SUPPORTED_MODE_P is defined as: > > > > #define LSX_SUPPORTED_MODE_P(MODE) \ > > (ISA_HAS_LSX \ > > && GET_MODE_SIZE (MODE) == UNITS_PER_LSX_REG ... ... > > > > GET_MODE_SIZE is expanded to a call to mode_to_bytes, which is defined: > > > > ALWAYS_INLINE poly_uint16 > > mode_to_bytes (machine_mode mode) > > { > > #if GCC_VERSION >= 4001 > > return (__builtin_constant_p (mode) > > ? mode_size_inline (mode) : mode_size[mode]); > > #else > > return mode_size[mode]; > > #endif > > } > > > > There is an assertion in mode_size_inline: > > > > gcc_assert (mode >= 0 && mode < NUM_MACHINE_MODES); > > > > Note that NUM_MACHINE_MODES = MAX_MACHINE_MODE (emitted by genmodes.cc), > > thus if __builtin_constant_p (mode) is evaluated true (it happens when > > GCC is bootstrapped with LTO+PGO), the assertion will be triggered and > > cause an ICE. OTOH if __builtin_constant_p (mode) is evaluated false, > > mode_size[mode] is still an out-of-bound array access (the length or the > > mode_size array is NUM_MACHINE_MODES). > > > > So we shouldn't call LSX_SUPPORTED_MODE_P or LASX_SUPPORTED_MODE_P with > > MAX_MACHINE_MODE in loongarch_symbol_insns. This is very similar to a > > MIPS bug PR98491 fixed by me about 3 years ago. > > > > gcc/ChangeLog: > > > > * config/loongarch/loongarch.cc (loongarch_symbol_insns): Do not > > use LSX_SUPPORTED_MODE_P or LASX_SUPPORTED_MODE_P if mode is > > MAX_MACHINE_MODE. > > --- > > > > Bootstrapped and regtested on loongarch64-linux-gnu. Ok for trunk? > > LGTM! Pushed r14-8785. > I have a question. I see that you often add compilation options in > BOOT_CFLAGS. > > I also want to test it. Do you have a recommended set of compilation > options? When I build a compiler for my system I use {BOOT_{C,CXX,LD}FLAGS,{C,CXX,LD}FLAGS_FOR_TARGET}="-O3 -march=la664 - mtune=la664 -pipe -fgraphite-identity -floop-nest-optimize -fipa-pta - fdevirtualize-at-ltrans -fno-semantic-interposition -Wl,-O1 -Wl,--as- needed" and enable PGO (make profiledbootstrap) and LTO (--with-build- config=bootstrap-lto). All of them but GRAPHITE (-fgraphite-identity -floop-nest-optimize) seems "pretty safe" on the architectures I have a hardware of. GRAPHITE is causing bootstrap failure on AArch64 with GCC 13 (PR109929) if combined with PGO and the real cause is still not found yet. But when I do a test build I normally only enable the flags which may help to catch some issues, for example when a change only affects LTO I add --with-build-config=bootstrap-lto, when changing something related to LASX I use -O3 -mlasx (or -O3 -march=la664) as BOOT_CFLAGS. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
[PATCH] MIPS: Fix wrong MSA FP vector negation
We expanded (neg x) to (minus const0 x) for MSA FP vectors, this is wrong because -0.0 is not 0 - 0.0. This causes some Python tests to fail when Python is built with MSA enabled. Use the bnegi.df instructions to simply reverse the sign bit instead. gcc/ChangeLog: * config/mips/mips-msa.md (elmsgnbit): New define_mode_attr. (neg2): Change the mode iterator from MSA to IMSA because in FP arithmetic we cannot use (0 - x) for -x. (neg2): New define_insn to implement FP vector negation, using a bnegi instruction to negate the sign bit. --- Bootstrapped and regtested on mips64el-linux-gnuabi64. Ok for trunk and/or release branches? gcc/config/mips/mips-msa.md | 18 +++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/gcc/config/mips/mips-msa.md b/gcc/config/mips/mips-msa.md index 83d9a08e360..920161ed1d8 100644 --- a/gcc/config/mips/mips-msa.md +++ b/gcc/config/mips/mips-msa.md @@ -231,6 +231,10 @@ (define_mode_attr bitimm (V4SI "uimm5") (V2DI "uimm6")]) +;; The index of sign bit in FP vector elements. +(define_mode_attr elmsgnbit [(V2DF "63") (V4DF "63") +(V4SF "31") (V8SF "31")]) + (define_expand "vec_init" [(match_operand:MSA 0 "register_operand") (match_operand:MSA 1 "")] @@ -597,9 +601,9 @@ (define_expand "abs2" }) (define_expand "neg2" - [(set (match_operand:MSA 0 "register_operand") - (minus:MSA (match_dup 2) - (match_operand:MSA 1 "register_operand")))] + [(set (match_operand:IMSA 0 "register_operand") + (minus:IMSA (match_dup 2) + (match_operand:IMSA 1 "register_operand")))] "ISA_HAS_MSA" { rtx reg = gen_reg_rtx (mode); @@ -607,6 +611,14 @@ (define_expand "neg2" operands[2] = reg; }) +(define_insn "neg2" + [(set (match_operand:FMSA 0 "register_operand" "=f") + (neg (match_operand:FMSA 1 "register_operand" "f")))] + "ISA_HAS_MSA" + "bnegi.\t%w0,%w1," + [(set_attr "type" "simd_bit") + (set_attr "mode" "")]) + (define_expand "msa_ldi" [(match_operand:IMSA 0 "register_operand") (match_operand 1 "const_imm10_operand")] -- 2.43.0
Pushed: [PATCH] MIPS: Fix wrong MSA FP vector negation
On Mon, 2024-02-05 at 09:56 +0800, YunQiang Su wrote: > Xi Ruoyao 于2024年2月5日周一 02:01写道: > > > > We expanded (neg x) to (minus const0 x) for MSA FP vectors, this is > > wrong because -0.0 is not 0 - 0.0. This causes some Python tests to > > fail when Python is built with MSA enabled. > > > > Use the bnegi.df instructions to simply reverse the sign bit instead. > > > > gcc/ChangeLog: > > > > * config/mips/mips-msa.md (elmsgnbit): New define_mode_attr. > > (neg2): Change the mode iterator from MSA to IMSA because > > in FP arithmetic we cannot use (0 - x) for -x. > > (neg2): New define_insn to implement FP vector negation, > > using a bnegi instruction to negate the sign bit. > > --- > > > > Bootstrapped and regtested on mips64el-linux-gnuabi64. Ok for trunk > > and/or release branches? > > > > gcc/config/mips/mips-msa.md | 18 +++--- > > 1 file changed, 15 insertions(+), 3 deletions(-) > > > > LGTM, while I guess that we also need a test case. Pushed to trunk and release branches, with a following obvious fix: diff --git a/gcc/config/mips/mips-msa.md b/gcc/config/mips/mips-msa.md index 920161ed1d8..779157f2a0c 100644 --- a/gcc/config/mips/mips-msa.md +++ b/gcc/config/mips/mips-msa.md @@ -613,7 +613,7 @@ (define_expand "neg2" (define_insn "neg2" [(set (match_operand:FMSA 0 "register_operand" "=f") - (neg (match_operand:FMSA 1 "register_operand" "f")))] + (neg:FMSA (match_operand:FMSA 1 "register_operand" "f")))] "ISA_HAS_MSA" "bnegi.\t%w0,%w1," [(set_attr "type" "simd_bit") I'll write a test case for gcc.dg/vect later (now I have to do $SOME_REAL_LIFE_THING...) -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
[PATCH] testsuite: Add a test case for negating FP vectors containing zeros
Recently I've fixed two wrong FP vector negate implementation which caused wrong sign bits in zeros in targets (r14-8786 and r14-8801). To prevent a similar issue from happening again, add a test case. Tested on x86_64 (with SSE2, AVX, AVX2, and AVX512F), AArch64, MIPS (with MSA), LoongArch (with LSX and LASX). gcc/testsuite: * gcc.dg/vect/vect-neg-zero.c: New test. --- Ok for trunk? gcc/testsuite/gcc.dg/vect/vect-neg-zero.c | 39 +++ 1 file changed, 39 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/vect/vect-neg-zero.c diff --git a/gcc/testsuite/gcc.dg/vect/vect-neg-zero.c b/gcc/testsuite/gcc.dg/vect/vect-neg-zero.c new file mode 100644 index 000..adb032f5c6a --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-neg-zero.c @@ -0,0 +1,39 @@ +/* { dg-do run } */ +/* { dg-add-options ieee } */ +/* { dg-additional-options "-fsigned-zeros" } */ + +double x[4] = {-0.0, 0.0, -0.0, 0.0}; +float y[8] = {-0.0, 0.0, -0.0, 0.0, -0.0, -0.0, 0.0, 0.0}; + +static __attribute__ ((always_inline)) inline void +test (int factor) +{ + double a[4]; + float b[8]; + + asm ("" ::: "memory"); + + for (int i = 0; i < 2 * factor; i++) +a[i] = -x[i]; + + for (int i = 0; i < 4 * factor; i++) +b[i] = -y[i]; + +#pragma GCC novector + for (int i = 0; i < 2 * factor; i++) +if (__builtin_signbit (a[i]) == __builtin_signbit (x[i])) + __builtin_abort (); + +#pragma GCC novector + for (int i = 0; i < 4 * factor; i++) +if (__builtin_signbit (b[i]) == __builtin_signbit (y[i])) + __builtin_abort (); +} + +int +main (void) +{ + test (1); + test (2); + return 0; +} -- 2.43.0
LoongArch: Backport r14-4674 "LoongArch: Delete macro definition ASM_OUTPUT_ALIGN_WITH_NOP."?
Hi Lulu, I'm proposing to backport r14-4674 "LoongArch: Delete macro definition ASM_OUTPUT_ALIGN_WITH_NOP." to releases/gcc-12 and releases/gcc-13. The reasons: 1. Strictly speaking, the old ASM_OUTPUT_ALIGN_WITH_NOP macro may cause a correctness issue. For example, a developer may use -falign- functions=16 and then use the low 4 bits of a function pointer to encode some metainfo. Then ASM_OUTPUT_ALIGN_WITH_NOP causes the functions not really aligned to a 16 bytes boundary, causing some breakage. 2. With Binutils-2.42, ASM_OUTPUT_ALIGN_WITH_NOP can cause illegal opcodes. For example: .globl _start _start: .balign 32 nop nop nop addi.d $a0, $r0, 1 .balign 16,54525952,4 addi.d $a0, $a0, 1 is assembled and linked to: 0220 <_start>: 220: 0340nop 224: 0340nop 228: 0340nop 22c: 02c00404li.d$a0, 1 230: .word 0x # <== OOPS! 234: 02c00484addi.d $a0, $a0, 1 Arguably this is a bug in GAS (it should at least error out for the unsupported case where .balign 16,54525952,4 appears with -mrelax; I'd prefer it to support the 3-operand .align directive even -mrelax for reasons I've given in [1]). But we can at least work it around by removing ASM_OUTPUT_ALIGN_WITH_NOP to allow using GCC 13.3 with Binutils 2.42. 3. Without ASM_OUTPUT_ALIGN_WITH_NOP, GCC just outputs something like ".align 5" which works as expected since Binutils-2.38. 4. GCC < 14 does not have a default setting of -falign-*, so changing this won't affect anyone who do not specify -falign-* explicitly. [1]:https://github.com/loongson-community/discussions/issues/41#issuecomment-1925872603 Is it OK to backport r14-4674 into releases/gcc-12 and releases/gcc-13 then? -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH] testsuite: Add a test case for negating FP vectors containing zeros
On Tue, 2024-02-06 at 17:55 +0800, Xi Ruoyao wrote: > Recently I've fixed two wrong FP vector negate implementation which > caused wrong sign bits in zeros in targets (r14-8786 and r14-8801). To > prevent a similar issue from happening again, add a test case. > > Tested on x86_64 (with SSE2, AVX, AVX2, and AVX512F), AArch64, MIPS > (with MSA), LoongArch (with LSX and LASX). > > gcc/testsuite: > > * gcc.dg/vect/vect-neg-zero.c: New test. > --- > > Ok for trunk? > > gcc/testsuite/gcc.dg/vect/vect-neg-zero.c | 39 +++ > 1 file changed, 39 insertions(+) > create mode 100644 gcc/testsuite/gcc.dg/vect/vect-neg-zero.c > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-neg-zero.c > b/gcc/testsuite/gcc.dg/vect/vect-neg-zero.c > new file mode 100644 > index 000..adb032f5c6a > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/vect-neg-zero.c > @@ -0,0 +1,39 @@ > +/* { dg-do run } */ This patch fails on Linaro CI for ARM. I guess I need to remove this { dg-do run } line and let the test framework to decide run or compile. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: LoongArch: Backport r14-4674 "LoongArch: Delete macro definition ASM_OUTPUT_ALIGN_WITH_NOP."?
On Fri, 2024-02-09 at 00:02 +0800, chenglulu wrote: > > 在 2024/2/7 上午12:23, Xi Ruoyao 写道: > > Hi Lulu, > > > > I'm proposing to backport r14-4674 "LoongArch: Delete macro definition > > ASM_OUTPUT_ALIGN_WITH_NOP." to releases/gcc-12 and releases/gcc-13. The > > reasons: > > > > 1. Strictly speaking, the old ASM_OUTPUT_ALIGN_WITH_NOP macro may cause > > a correctness issue. For example, a developer may use -falign- > > functions=16 and then use the low 4 bits of a function pointer to encode > > some metainfo. Then ASM_OUTPUT_ALIGN_WITH_NOP causes the functions not > > really aligned to a 16 bytes boundary, causing some breakage. > > > > 2. With Binutils-2.42, ASM_OUTPUT_ALIGN_WITH_NOP can cause illegal > > opcodes. For example: > > > > .globl _start > > _start: > > .balign 32 > > nop > > nop > > nop > > addi.d $a0, $r0, 1 > > .balign 16,54525952,4 > > addi.d $a0, $a0, 1 > > > > is assembled and linked to: > > > > 0220 <_start>: > > 220: 0340 nop > > 224: 0340 nop > > 228: 0340 nop > > 22c: 02c00404 li.d$a0, 1 > > 230: .word 0x # <== OOPS! > > 234: 02c00484 addi.d $a0, $a0, 1 > > > > Arguably this is a bug in GAS (it should at least error out for the > > unsupported case where .balign 16,54525952,4 appears with -mrelax; I'd > > prefer it to support the 3-operand .align directive even -mrelax for > > reasons I've given in [1]). But we can at least work it around by > > removing ASM_OUTPUT_ALIGN_WITH_NOP to allow using GCC 13.3 with Binutils > > 2.42. > > > > 3. Without ASM_OUTPUT_ALIGN_WITH_NOP, GCC just outputs something like > > ".align 5" which works as expected since Binutils-2.38. > > > > 4. GCC < 14 does not have a default setting of -falign-*, so changing > > this won't affect anyone who do not specify -falign-* explicitly. > > > > [1]:https://github.com/loongson-community/discussions/issues/41#issuecomment-1925872603 > > > > Is it OK to backport r14-4674 into releases/gcc-12 and releases/gcc-13 > > then? > > > Ok, I agree with you. > > Thanks! Oops, with Binutils-2.41 GAS will fail to assemble some conditional branches if we do this :(. Not sure what to do (maybe backporting both this and a simplified version of PR112330 fix?) Let's reconsider after the holiday... -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: LoongArch: Backport r14-4674 "LoongArch: Delete macro definition ASM_OUTPUT_ALIGN_WITH_NOP."?
On Tue, 2024-02-20 at 10:07 +0800, chenglulu wrote: > So I think that without worrying about performance and ensuring that > there is no problem > > with binutils, I think we can make the following modifications: > > -/* "nop" instruction 54525952 (andi $r0,$r0,0) is > - used for padding. */ > +/* ".align num,,4" will insert "nop"(andi $r0,$r0,0) into padding by > + default. */ > #define ASM_OUTPUT_ALIGN_WITH_NOP(STREAM, LOG) \ > - fprintf (STREAM, "\t.align\t%d,54525952,4\n", (LOG)) > + fprintf (STREAM, "\t.align\t%d,,4\n", (LOG)) > > What do you think of it? Unfortunately it will cause warnings with GAS 2.41 or earlier like t1.s:1: Warning: expected fill pattern missing t1.s:5: Warning: expected fill pattern missing And AFAIK these things may cause many test failures due to "excessive errors" if running the GCC test suite with these earlier GAS versions. Maybe we'll have to add some autoconf-based probing for the linker anyway? -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: LoongArch: Backport r14-4674 "LoongArch: Delete macro definition ASM_OUTPUT_ALIGN_WITH_NOP."?
On Tue, 2024-02-20 at 19:25 +0800, Xi Ruoyao wrote: > On Tue, 2024-02-20 at 10:07 +0800, chenglulu wrote: > > > So I think that without worrying about performance and ensuring that > > there is no problem > > > > with binutils, I think we can make the following modifications: > > > > -/* "nop" instruction 54525952 (andi $r0,$r0,0) is > > - used for padding. */ > > +/* ".align num,,4" will insert "nop"(andi $r0,$r0,0) into padding by > > + default. */ > > #define ASM_OUTPUT_ALIGN_WITH_NOP(STREAM, LOG) \ > > - fprintf (STREAM, "\t.align\t%d,54525952,4\n", (LOG)) > > + fprintf (STREAM, "\t.align\t%d,,4\n", (LOG)) > > > > What do you think of it? > > Unfortunately it will cause warnings with GAS 2.41 or earlier like > > t1.s:1: Warning: expected fill pattern missing > t1.s:5: Warning: expected fill pattern missing > > And AFAIK these things may cause many test failures due to "excessive > errors" if running the GCC test suite with these earlier GAS versions. > Maybe we'll have to add some autoconf-based probing for the linker > anyway? Or just silence the warning passing "--no-warn" to the assembler but I'm highly unsure if this is really a good idea :(. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: LoongArch: Backport r14-4674 "LoongArch: Delete macro definition ASM_OUTPUT_ALIGN_WITH_NOP."?
On Tue, 2024-02-20 at 19:50 +0800, chenglulu wrote: > > 在 2024/2/20 下午7:31, Xi Ruoyao 写道: > > On Tue, 2024-02-20 at 19:25 +0800, Xi Ruoyao wrote: > > > On Tue, 2024-02-20 at 10:07 +0800, chenglulu wrote: > > > > > > > So I think that without worrying about performance and ensuring that > > > > there is no problem > > > > > > > > with binutils, I think we can make the following modifications: > > > > > > > > -/* "nop" instruction 54525952 (andi $r0,$r0,0) is > > > > - used for padding. */ > > > > +/* ".align num,,4" will insert "nop"(andi $r0,$r0,0) into padding > > > > by > > > > + default. */ > > > > #define ASM_OUTPUT_ALIGN_WITH_NOP(STREAM, LOG) \ > > > > - fprintf (STREAM, "\t.align\t%d,54525952,4\n", (LOG)) > > > > + fprintf (STREAM, "\t.align\t%d,,4\n", (LOG)) > > > > > > > > What do you think of it? > > > Unfortunately it will cause warnings with GAS 2.41 or earlier like > > > > > > t1.s:1: Warning: expected fill pattern missing > > > t1.s:5: Warning: expected fill pattern missing > > > > > > And AFAIK these things may cause many test failures due to "excessive > > > errors" if running the GCC test suite with these earlier GAS versions. > > > Maybe we'll have to add some autoconf-based probing for the linker > > > anyway? > > Or just silence the warning passing "--no-warn" to the assembler but I'm > > highly unsure if this is really a good idea :(. > > > I am not opposed to adding detection code, but I looked at this problem > today > > and I think this change is the smallest change. I asked Meng Qinggang and he > > said that the warning of GAS 2.41 can be removed. Yes, but we cannot change a released binutils-2.41 tarball and Binutils folks don't make point releases like GCC. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
[PATCH] LoongArch: Don't falsely claim gold supported in toplevel configure
The gold linker has never been ported to LoongArch (and it seems unlikely to be ported in the future as the new architectures are focusing on lld and/or mold for fast linkers). ChangeLog: * configure.ac (ENABLE_GOLD): Remove loongarch*-*-* from target list. * configure: Regenerate. --- Ok for GCC trunk (to get synced into Binutils later)? configure| 2 +- configure.ac | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/configure b/configure index 874966fb9f0..02b435c1163 100755 --- a/configure +++ b/configure @@ -3092,7 +3092,7 @@ case "${ENABLE_GOLD}" in # Check for target supported by gold. case "${target}" in i?86-*-* | x86_64-*-* | sparc*-*-* | powerpc*-*-* | arm*-*-* \ -| aarch64*-*-* | tilegx*-*-* | mips*-*-* | s390*-*-* | loongarch*-*-*) +| aarch64*-*-* | tilegx*-*-* | mips*-*-* | s390*-*-*) configdirs="$configdirs gold" if test x${ENABLE_GOLD} = xdefault; then default_ld=gold diff --git a/configure.ac b/configure.ac index 4f34004a072..1a19c07a27b 100644 --- a/configure.ac +++ b/configure.ac @@ -364,7 +364,7 @@ case "${ENABLE_GOLD}" in # Check for target supported by gold. case "${target}" in i?86-*-* | x86_64-*-* | sparc*-*-* | powerpc*-*-* | arm*-*-* \ -| aarch64*-*-* | tilegx*-*-* | mips*-*-* | s390*-*-* | loongarch*-*-*) +| aarch64*-*-* | tilegx*-*-* | mips*-*-* | s390*-*-*) configdirs="$configdirs gold" if test x${ENABLE_GOLD} = xdefault; then default_ld=gold -- 2.43.2
[GCC 13 PATCH] LoongArch: Don't default to -mno-explicit-relocs if -mno-relax
To improve Binutils compatibility we've had to backported relaxation support. But if a user just updates to GCC 13.3 and sticks with Binutils 2.41, there is no reason to use -mno-explicit-relocs as the default because we are turning off relaxation for Binutils 2.41 (it lacks conditional branch relaxation support) anyway. So like GCC 14, make the default of -m[no-]explicit-relocs depend on -m[no-]relax instead of HAVE_AS_MRELAX_OPTION. Also update the doc to reflect the behavior change. gcc/ChangeLog: * config/loongarch/genopts/loongarch.opt.in (TARGET_EXPLICIT_RELOCS): Init to M_OPTION_NOT_SEEN. * config/loongarch/loongarch.opt: Regenerate. * config/loongarch/loongarch.cc (loongarch_option_override_internal): Set the default of TARGET_EXPLICIT_RELOCS to HAVE_AS_EXPLICIT_RELOCS && !loongarch_mrelax. * doc/invoke.texi (-m[no-]explicit-relocs): Update for LoongArch. --- Ok for releases/gcc-13? gcc/config/loongarch/genopts/loongarch.opt.in | 2 +- gcc/config/loongarch/loongarch.cc | 4 gcc/config/loongarch/loongarch.opt| 2 +- gcc/doc/invoke.texi | 11 +-- 4 files changed, 11 insertions(+), 8 deletions(-) diff --git a/gcc/config/loongarch/genopts/loongarch.opt.in b/gcc/config/loongarch/genopts/loongarch.opt.in index da6fedd153e..76acd35d39c 100644 --- a/gcc/config/loongarch/genopts/loongarch.opt.in +++ b/gcc/config/loongarch/genopts/loongarch.opt.in @@ -155,7 +155,7 @@ Target Joined RejectNegative UInteger Var(loongarch_max_inline_memcpy_size) Init -mmax-inline-memcpy-size=SIZE Set the max size of memcpy to inline, default is 1024. mexplicit-relocs -Target Var(TARGET_EXPLICIT_RELOCS) Init(HAVE_AS_EXPLICIT_RELOCS & !HAVE_AS_MRELAX_OPTION) +Target Var(TARGET_EXPLICIT_RELOCS) Init(M_OPTION_NOT_SEEN) Use %reloc() assembly operators. ; The code model option names for -mcmodel. diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index 768e2427285..e78b81cd8fc 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -6222,6 +6222,10 @@ loongarch_option_override_internal (struct gcc_options *opts) gcc_unreachable (); } + if (TARGET_EXPLICIT_RELOCS == M_OPTION_NOT_SEEN) +TARGET_EXPLICIT_RELOCS = (HAVE_AS_EXPLICIT_RELOCS + && !loongarch_mrelax); + /* Validate the guard size. */ int guard_size = param_stack_clash_protection_guard_size; diff --git a/gcc/config/loongarch/loongarch.opt b/gcc/config/loongarch/loongarch.opt index 59b1e06d3f2..e61fbaed2c1 100644 --- a/gcc/config/loongarch/loongarch.opt +++ b/gcc/config/loongarch/loongarch.opt @@ -162,7 +162,7 @@ Target Joined RejectNegative UInteger Var(loongarch_max_inline_memcpy_size) Init -mmax-inline-memcpy-size=SIZE Set the max size of memcpy to inline, default is 1024. mexplicit-relocs -Target Var(TARGET_EXPLICIT_RELOCS) Init(HAVE_AS_EXPLICIT_RELOCS & !HAVE_AS_MRELAX_OPTION) +Target Var(TARGET_EXPLICIT_RELOCS) Init(M_OPTION_NOT_SEEN) Use %reloc() assembly operators. ; The code model option names for -mcmodel. diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 99657fb44d8..792ce283bb9 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -25830,12 +25830,11 @@ The default code model is @code{normal}. @itemx -mno-explicit-relocs Use or do not use assembler relocation operators when dealing with symbolic addresses. The alternative is to use assembler macros instead, which may -limit optimization. The default value for the option is determined during -GCC build-time by detecting corresponding assembler support: -@code{-mexplicit-relocs} if said support is present, -@code{-mno-explicit-relocs} otherwise. This option is mostly useful for -debugging, or interoperation with assemblers different from the build-time -one. +limit instruction scheduling but allow linker relaxation. The default +value for the option is determined with the assembler capability detected +during GCC build-time and the setting of @code{-mrelax}: +@code{-mexplicit-relocs} if the assembler supports relocation operators +but @code{-mrelax} is not enabled, @code{-mno-explicit-relocs} otherwise. @opindex mdirect-extern-access @item -mdirect-extern-access -- 2.43.2
Re: [PATCH] LoongArch: Don't falsely claim gold supported in toplevel configure
On Fri, 2024-02-23 at 11:16 +0800, chenglulu wrote: > > 在 2024/2/22 下午5:17, Xi Ruoyao 写道: > > The gold linker has never been ported to LoongArch (and it seems > > unlikely to be ported in the future as the new architectures are > > focusing on lld and/or mold for fast linkers). > > > > ChangeLog: > > > > * configure.ac (ENABLE_GOLD): Remove loongarch*-*-* from target > > list. > > * configure: Regenerate. > > --- > > > > Ok for GCC trunk (to get synced into Binutils later)? > > I have no problem. But I have a question. Is this modification simply > because we don’t > > support it or is there an error somewhere? If a user specify --enable-gold building Binutils, with loongarch in this list the building system will attempt to build gold and fail. If removing loongarch from the list the building system will ignore -- enable-gold. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Pushed: [PATCH] LoongArch: Don't falsely claim gold supported in toplevel configure
On Fri, 2024-02-23 at 11:37 +0800, chenglulu wrote: > > 在 2024/2/23 上午11:27, Xi Ruoyao 写道: > > On Fri, 2024-02-23 at 11:16 +0800, chenglulu wrote: > > > 在 2024/2/22 下午5:17, Xi Ruoyao 写道: > > > > The gold linker has never been ported to LoongArch (and it seems > > > > unlikely to be ported in the future as the new architectures are > > > > focusing on lld and/or mold for fast linkers). > > > > > > > > ChangeLog: > > > > > > > > * configure.ac (ENABLE_GOLD): Remove loongarch*-*-* from target > > > > list. > > > > * configure: Regenerate. > > > > --- > > > > > > > > Ok for GCC trunk (to get synced into Binutils later)? > > > I have no problem. But I have a question. Is this modification simply > > > because we don’t > > > > > > support it or is there an error somewhere? > > If a user specify --enable-gold building Binutils, with loongarch in > > this list the building system will attempt to build gold and fail. If > > removing loongarch from the list the building system will ignore -- > > enable-gold. > > > Okay, I understand. Pushed r14-9149 and the Binutils maintainer will pick it up before the next Binutils release (AFAIK). -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Pushed: [GCC 13 PATCH] LoongArch: Don't default to -mno-explicit-relocs if -mno-relax
On Thu, 2024-02-22 at 19:09 +0800, chenglulu wrote: > > 在 2024/2/22 下午6:20, Xi Ruoyao 写道: > > To improve Binutils compatibility we've had to backported relaxation > > support. But if a user just updates to GCC 13.3 and sticks with > > Binutils 2.41, there is no reason to use -mno-explicit-relocs as the > > default because we are turning off relaxation for Binutils 2.41 (it > > lacks conditional branch relaxation support) anyway. > > > > So like GCC 14, make the default of -m[no-]explicit-relocs depend on > > -m[no-]relax instead of HAVE_AS_MRELAX_OPTION. Also update the doc > > to > > reflect the behavior change. > > > > gcc/ChangeLog: > > > > * config/loongarch/genopts/loongarch.opt.in > > (TARGET_EXPLICIT_RELOCS): Init to M_OPTION_NOT_SEEN. > > * config/loongarch/loongarch.opt: Regenerate. > > * config/loongarch/loongarch.cc > > (loongarch_option_override_internal): Set the default of > > TARGET_EXPLICIT_RELOCS to HAVE_AS_EXPLICIT_RELOCS > > && !loongarch_mrelax. > > * doc/invoke.texi (-m[no-]explicit-relocs): Update for > > LoongArch. > > --- > > > > Ok for releases/gcc-13? > > LGTM! Pushed r13-8357. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
[PATCH 1/2] LoongArch: NFC: Deduplicate crc instruction defines
Introduce an iterator for UNSPEC_CRC and UNSPEC_CRCC to make the next change easier. gcc/ChangeLog: * config/loongarch/loongarch.md (CRC): New define_int_iterator. (crc): New define_int_attr. (loongarch_crc_w__w, loongarch_crcc_w__w): Unify into ... (loongarch__w__w): ... here. --- gcc/config/loongarch/loongarch.md | 18 +- 1 file changed, 5 insertions(+), 13 deletions(-) diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md index 2ce7a151880..4ded1b3a117 100644 --- a/gcc/config/loongarch/loongarch.md +++ b/gcc/config/loongarch/loongarch.md @@ -4251,24 +4251,16 @@ (define_peephole2 (define_mode_iterator QHSD [QI HI SI DI]) +(define_int_iterator CRC [UNSPEC_CRC UNSPEC_CRCC]) +(define_int_attr crc [(UNSPEC_CRC "crc") (UNSPEC_CRCC "crcc")]) -(define_insn "loongarch_crc_w__w" +(define_insn "loongarch__w__w" [(set (match_operand:SI 0 "register_operand" "=r") (unspec:SI [(match_operand:QHSD 1 "register_operand" "r") (match_operand:SI 2 "register_operand" "r")] -UNSPEC_CRC))] +CRC))] "" - "crc.w..w\t%0,%1,%2" - [(set_attr "type" "unknown") - (set_attr "mode" "")]) - -(define_insn "loongarch_crcc_w__w" - [(set (match_operand:SI 0 "register_operand" "=r") - (unspec:SI [(match_operand:QHSD 1 "register_operand" "r") - (match_operand:SI 2 "register_operand" "r")] -UNSPEC_CRCC))] - "" - "crcc.w..w\t%0,%1,%2" + ".w..w\t%0,%1,%2" [(set_attr "type" "unknown") (set_attr "mode" "")]) -- 2.44.0
[PATCH 2/2] LoongArch: Remove unneeded sign extension after crc/crcc instructions
The specification of crc/crcc instructions is clear that the output is sign-extended to GRLEN. Add a define_insn to tell the compiler this fact and allow it to remove the unneeded sign extension on crc/crcc output. As crc/crcc instructions are usually used in a tight loop, this should produce a significant performance gain. gcc/ChangeLog: * config/loongarch/loongarch.md (loongarch__w__w_extended): New define_insn. gcc/testsuite/ChangeLog: * gcc.target/loongarch/crc-sext.c: New test; --- Bootstrapped and regtested on loongarch64-linux-gnu. Ok for trunk? gcc/config/loongarch/loongarch.md | 11 +++ gcc/testsuite/gcc.target/loongarch/crc-sext.c | 13 + 2 files changed, 24 insertions(+) create mode 100644 gcc/testsuite/gcc.target/loongarch/crc-sext.c diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md index 4ded1b3a117..525e1e82183 100644 --- a/gcc/config/loongarch/loongarch.md +++ b/gcc/config/loongarch/loongarch.md @@ -4264,6 +4264,17 @@ (define_insn "loongarch__w__w" [(set_attr "type" "unknown") (set_attr "mode" "")]) +(define_insn "loongarch__w__w_extended" + [(set (match_operand:DI 0 "register_operand" "=r") + (sign_extend:DI + (unspec:SI [(match_operand:QHSD 1 "register_operand" "r") + (match_operand:SI 2 "register_operand" "r")] +CRC)))] + "TARGET_64BIT" + ".w..w\t%0,%1,%2" + [(set_attr "type" "unknown") + (set_attr "mode" "")]) + ;; With normal or medium code models, if the only use of a pc-relative ;; address is for loading or storing a value, then relying on linker ;; relaxation is not better than emitting the machine instruction directly. diff --git a/gcc/testsuite/gcc.target/loongarch/crc-sext.c b/gcc/testsuite/gcc.target/loongarch/crc-sext.c new file mode 100644 index 000..9ade5a8e4ca --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/crc-sext.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=loongarch64" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +/* +**my_crc: +** crc.w.d.w \$r4,\$r4,\$r5 +** jr \$r1 +*/ +int my_crc(long long dword, int crc) +{ + return __builtin_loongarch_crc_w_d_w(dword, crc); +} -- 2.44.0
Re: [PATCH v2] LoongArch: Add support for TLS descriptors
On Thu, 2024-02-29 at 09:42 +0800, mengqinggang wrote: > Generate la.tls.desc macro instruction for TLS descriptors model. > > la.tls.desc expand to > pcalau12i $a0, %desc_pc_hi20(a) > ld.d $a1, $a0, %desc_ld_pc_lo12(a) > addi.d $a0, $a0, %desc_add_pc_lo12(a) > jirl $ra, $a1, %desc_call(a) > > The default is TLS descriptors, but can be configure with > -mtls-dialect={desc,trad}. Please keep trad as the default for now. Glibc-2.40 will be released after GCC 14.1 but we don't want to end up in a situation where the default configuration of the latest GCC release creating something not working with latest Glibc release. And there's also musl libc we need to take into account. Or you can write some autoconf test for if the assembler supports tlsdesc and check TARGET_GLIBC_MAJOR & TARGET_GLIBC_MINOR for Glibc version to decide if enable desc by default. If you want this but don't have time to implement you can leave trad the default and I'll take care of this. /* snip */ > +(define_insn "@got_load_tls_desc" > + [(set (match_operand:P 0 "register_operand" "=r") > + (unspec:P > + [(match_operand:P 1 "symbolic_operand" "")] > + UNSPEC_TLS_DESC)) > + (clobber (reg:SI FCC0_REGNUM)) > + (clobber (reg:SI FCC1_REGNUM)) > + (clobber (reg:SI FCC2_REGNUM)) > + (clobber (reg:SI FCC3_REGNUM)) > + (clobber (reg:SI FCC4_REGNUM)) > + (clobber (reg:SI FCC5_REGNUM)) > + (clobber (reg:SI FCC6_REGNUM)) > + (clobber (reg:SI FCC7_REGNUM)) > + (clobber (reg:SI A1_REGNUM)) > + (clobber (reg:SI RETURN_ADDR_REGNUM))] Ok, the clobber list is correct. > + "TARGET_TLS_DESC" > + "la.tls.desc\t%0,%1" With -mexplicit-relocs=always we should emit %desc_pc_lo12 etc. instead of la.tls.desc. As we don't want to add too many code we can just hard code the 4 instructions here instead of splitting this insn, just something like { return TARGET_EXPLICIT_RELOCS_ALWAS ? "......" : "la.tls.desc\t%0,%1"; } > + [(set_attr "got" "load") > + (set_attr "mode" "")]) We need (set_attr "length" "16") in this list as this actually expands into 16 bytes. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH v2] LoongArch: Add support for TLS descriptors
On Thu, 2024-02-29 at 14:08 +0800, Xi Ruoyao wrote: > > + "TARGET_TLS_DESC" > > + "la.tls.desc\t%0,%1" > > With -mexplicit-relocs=always we should emit %desc_pc_lo12 etc. instead > of la.tls.desc. As we don't want to add too many code we can just hard > code the 4 instructions here instead of splitting this insn, just > something like > > { return TARGET_EXPLICIT_RELOCS_ALWAS ? ".." : "la.tls.desc\t%0,%1"; } And if -mcmodel=extreme we should use a 3-operand la.tls.desc. Or if we don't want to support this we can just error out if -mcmodel=extreme - mtls-dialect=desc. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
[PATCH v2] testsuite: Make pr104992.c irrelated to target vector feature [PR113418]
The vect_int_mod target selector is evaluated with the options in DEFAULT_VECTCFLAGS in effect, but these options are not automatically passed to tests out of the vect directories. So this test fails on targets where integer vector modulo operation is supported but requiring an option to enable, for example LoongArch. In this test case, the only expected optimization not happened in original is in corge because it needs forward propogation. So we can scan the forwprop2 dump (where the vector operation is not expanded to scalars yet) instead of optimized, then we don't need to consider vect_int_mod or not. gcc/testsuite/ChangeLog: PR testsuite/113418 * gcc.dg/pr104992.c (dg-options): Use -fdump-tree-forwprop2 instead of -fdump-tree-optimized. (dg-final): Scan forwprop2 dump instead of optimized, and remove the use of vect_int_mod. * lib/target-supports.exp (check_effective_target_vect_int_mod): Remove because it's not used anymore. --- v1->v2: Remove check_effective_target_vect_int_mod as it's now unused. This fixes the test failure on loongarch64-linux-gnu. Also tested on x86_64-linux-gnu. Ok for trunk? gcc/testsuite/gcc.dg/pr104992.c | 5 ++--- gcc/testsuite/lib/target-supports.exp | 13 - 2 files changed, 2 insertions(+), 16 deletions(-) diff --git a/gcc/testsuite/gcc.dg/pr104992.c b/gcc/testsuite/gcc.dg/pr104992.c index 82f8c75559c..6fd513d34b2 100644 --- a/gcc/testsuite/gcc.dg/pr104992.c +++ b/gcc/testsuite/gcc.dg/pr104992.c @@ -1,6 +1,6 @@ /* PR tree-optimization/104992 */ /* { dg-do compile } */ -/* { dg-options "-O2 -Wno-psabi -fdump-tree-optimized" } */ +/* { dg-options "-O2 -Wno-psabi -fdump-tree-forwprop2" } */ #define vector __attribute__((vector_size(4*sizeof(int @@ -54,5 +54,4 @@ __attribute__((noipa)) unsigned waldo (unsigned x, unsigned y, unsigned z) { return x / y * z == x; } -/* { dg-final { scan-tree-dump-times " % " 9 "optimized" { target { ! vect_int_mod } } } } */ -/* { dg-final { scan-tree-dump-times " % " 6 "optimized" { target vect_int_mod } } } */ +/* { dg-final { scan-tree-dump-times " % " 6 "forwprop2" } } */ diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index 4138cc9a662..ae33c4f1e3a 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -9064,19 +9064,6 @@ proc check_effective_target_vect_long_mult { } { return $answer } -# Return 1 if the target supports vector int modulus, 0 otherwise. - -proc check_effective_target_vect_int_mod { } { -return [check_cached_effective_target_indexed vect_int_mod { - expr { ([istarget powerpc*-*-*] - && [check_effective_target_has_arch_pwr10]) - || [istarget amdgcn-*-*] - || ([istarget loongarch*-*-*] -&& [check_effective_target_loongarch_sx]) - || ([istarget riscv*-*-*] -&& [check_effective_target_riscv_v]) }}] -} - # Return 1 if the target supports vector even/odd elements extraction, 0 otherwise. proc check_effective_target_vect_extract_even_odd { } { -- 2.44.0
[PATCH v2] testsuite: Add a test case for negating FP vectors containing zeros
Recently I've fixed two wrong FP vector negate implementation which caused wrong sign bits in zeros in targets (r14-8786 and r14-8801). To prevent a similar issue from happening again, add a test case. Tested on x86_64 (with SSE2, AVX, AVX2, and AVX512F), AArch64, MIPS (with MSA), LoongArch (with LSX and LASX). gcc/testsuite: * gcc.dg/vect/vect-neg-zero.c: New test. --- v1->v2: Remove { dg-do run } which was likely triggering a SIGILL on Linaro ARM CI. Ok for trunk? gcc/testsuite/gcc.dg/vect/vect-neg-zero.c | 38 +++ 1 file changed, 38 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/vect/vect-neg-zero.c diff --git a/gcc/testsuite/gcc.dg/vect/vect-neg-zero.c b/gcc/testsuite/gcc.dg/vect/vect-neg-zero.c new file mode 100644 index 000..6af4a02c517 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-neg-zero.c @@ -0,0 +1,38 @@ +/* { dg-add-options ieee } */ +/* { dg-additional-options "-fsigned-zeros" } */ + +double x[4] = {-0.0, 0.0, -0.0, 0.0}; +float y[8] = {-0.0, 0.0, -0.0, 0.0, -0.0, -0.0, 0.0, 0.0}; + +static __attribute__ ((always_inline)) inline void +test (int factor) +{ + double a[4]; + float b[8]; + + asm ("" ::: "memory"); + + for (int i = 0; i < 2 * factor; i++) +a[i] = -x[i]; + + for (int i = 0; i < 4 * factor; i++) +b[i] = -y[i]; + +#pragma GCC novector + for (int i = 0; i < 2 * factor; i++) +if (__builtin_signbit (a[i]) == __builtin_signbit (x[i])) + __builtin_abort (); + +#pragma GCC novector + for (int i = 0; i < 4 * factor; i++) +if (__builtin_signbit (b[i]) == __builtin_signbit (y[i])) + __builtin_abort (); +} + +int +main (void) +{ + test (1); + test (2); + return 0; +} -- 2.44.0
[PATCH] LoongArch: Emit R_LARCH_RELAX for TLS IE with non-extreme code model to allow the IE to LE linker relaxation
In Binutils we need to make IE to LE relaxation only allowed when there is an R_LARCH_RELAX after R_LARCH_TLE_IE_PC_{HI20,LO12} so an invalid "partial" relaxation won't happen with the extreme code model. So if we are emitting %ie_pc_{hi20,lo12} in a non-extreme code model, emit an R_LARCH_RELAX to allow the relaxation. The IE to LE relaxation does not require the pcalau12i and the ld instruction to be adjacent, so we don't need to limit ourselves to use the macro. For the distro maintainers backporting changes: this change depends on r14-8721, without r14-8721 R_LARCH_RELAX can be emitted mistakenly in the extreme code model. gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_print_operand_reloc): Support 'Q' for R_LARCH_RELAX for TLS IE. (loongarch_output_move): Use 'Q' to print R_LARCH_RELAX for TLS IE. * config/loongarch/loongarch.md (ld_from_got): Likewise. gcc/testsuite/ChangeLog: * gcc.target/loongarch/tls-ie-relax.c: New test. * gcc.target/loongarch/tls-ie-norelax.c: New test. * gcc.target/loongarch/tls-ie-extreme.c: New test. --- Bootstrapped & regtested on loongarch64-linux-gnu. Ok for trunk? gcc/config/loongarch/loongarch.cc | 15 ++- gcc/config/loongarch/loongarch.md | 2 +- .../gcc.target/loongarch/tls-ie-extreme.c | 5 + .../gcc.target/loongarch/tls-ie-norelax.c | 5 + gcc/testsuite/gcc.target/loongarch/tls-ie-relax.c | 11 +++ 5 files changed, 36 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.target/loongarch/tls-ie-extreme.c create mode 100644 gcc/testsuite/gcc.target/loongarch/tls-ie-norelax.c create mode 100644 gcc/testsuite/gcc.target/loongarch/tls-ie-relax.c diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index 0428b6e65d5..70e31bb831c 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -4981,7 +4981,7 @@ loongarch_output_move (rtx dest, rtx src) if (type == SYMBOL_TLS_LE) return "lu12i.w\t%0,%h1"; else - return "pcalau12i\t%0,%h1"; + return "%Q1pcalau12i\t%0,%h1"; } if (src_code == CONST_INT) @@ -6145,6 +6145,7 @@ loongarch_print_operand_reloc (FILE *file, rtx op, bool hi64_part, 'L' Print the low-part relocation associated with OP. 'm' Print one less than CONST_INT OP in decimal. 'N' Print the inverse of the integer branch condition for comparison OP. + 'Q' Print R_LARCH_RELAX for TLS IE. 'r' Print address 12-31bit relocation associated with OP. 'R' Print address 32-51bit relocation associated with OP. 'T' Print 'f' for (eq:CC ...), 't' for (ne:CC ...), @@ -6282,6 +6283,18 @@ loongarch_print_operand (FILE *file, rtx op, int letter) letter); break; +case 'Q': + if (!TARGET_LINKER_RELAXATION) + break; + + if (code == HIGH) + op = XEXP (op, 0); + + if (loongarch_classify_symbolic_expression (op) == SYMBOL_TLS_IE) + fprintf (file, ".reloc\t.,R_LARCH_RELAX\n\t"); + + break; + case 'r': loongarch_print_operand_reloc (file, op, false /* hi64_part */, true /* lo_reloc */); diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md index f3b5c641fce..525e1e82183 100644 --- a/gcc/config/loongarch/loongarch.md +++ b/gcc/config/loongarch/loongarch.md @@ -2620,7 +2620,7 @@ (define_insn "@ld_from_got" (match_operand:P 2 "symbolic_operand")))] UNSPEC_LOAD_FROM_GOT))] "" - "ld.\t%0,%1,%L2" + "%Q2ld.\t%0,%1,%L2" [(set_attr "type" "move")] ) diff --git a/gcc/testsuite/gcc.target/loongarch/tls-ie-extreme.c b/gcc/testsuite/gcc.target/loongarch/tls-ie-extreme.c new file mode 100644 index 000..00c545a3e8c --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/tls-ie-extreme.c @@ -0,0 +1,5 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=loongarch64 -mabi=lp64d -mcmodel=extreme -mexplicit-relocs=auto -mrelax" } */ +/* { dg-final { scan-assembler-not "R_LARCH_RELAX" { target tls_native } } } */ + +#include "tls-ie-relax.c" diff --git a/gcc/testsuite/gcc.target/loongarch/tls-ie-norelax.c b/gcc/testsuite/gcc.target/loongarch/tls-ie-norelax.c new file mode 100644 index 000..dd6bf3634a4 --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/tls-ie-norelax.c @@ -0,0 +1,5 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mcmodel=normal -mexplicit-relocs -mno-relax" } */ +/* { dg-final { scan-assembler-not "R_LARCH_RELAX" { target tls_native } } } */ + +#include "tls-ie-relax.c" diff --git a/gcc/testsuite/gcc.target/loongarch/tls-ie-relax.c b/gcc/testsuite/gcc.target/loongarch/tls-ie-relax.c new file mode 100644 index 000..e9f7569b1da --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/tls-ie-relax.c @@ -0,0
[PATCH] LoongArch: Allow s9 as a register alias
The psABI allows using s9 as an alias of r22. gcc/ChangeLog: * config/loongarch/loongarch.h (ADDITIONAL_REGISTER_NAMES): Add s9 as an alias of r22. --- Bootstrapped and regtested on loongarch64-linux-gnu. Ok for trunk? gcc/config/loongarch/loongarch.h | 1 + 1 file changed, 1 insertion(+) diff --git a/gcc/config/loongarch/loongarch.h b/gcc/config/loongarch/loongarch.h index 8b453ab3140..bf2351f0968 100644 --- a/gcc/config/loongarch/loongarch.h +++ b/gcc/config/loongarch/loongarch.h @@ -931,6 +931,7 @@ typedef struct { { "t8", 20 + GP_REG_FIRST },\ { "x", 21 + GP_REG_FIRST },\ { "fp", 22 + GP_REG_FIRST },\ + { "s9", 22 + GP_REG_FIRST },\ { "s0", 23 + GP_REG_FIRST },\ { "s1", 24 + GP_REG_FIRST },\ { "s2", 25 + GP_REG_FIRST },\ -- 2.44.0
Re: [PATCH v2] testsuite: Add a test case for negating FP vectors containing zeros
On Thu, 2024-02-29 at 15:09 +0800, Xi Ruoyao wrote: > Recently I've fixed two wrong FP vector negate implementation which > caused wrong sign bits in zeros in targets (r14-8786 and r14-8801). To > prevent a similar issue from happening again, add a test case. > > Tested on x86_64 (with SSE2, AVX, AVX2, and AVX512F), AArch64, MIPS > (with MSA), LoongArch (with LSX and LASX). > > gcc/testsuite: > > * gcc.dg/vect/vect-neg-zero.c: New test. > --- > > v1->v2: Remove { dg-do run } which was likely triggering a SIGILL on > Linaro ARM CI. Oops, still failing ARM CI. Not sure why... -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH v2 2/2] LoongArch: When the code model is extreme, the symbol address is obtained through macro instructions regardless of the value of -mexplicit-relocs.
在 2024-01-12星期五的 09:46 +0800,chenglulu写道: > > I found an issue bootstrapping GCC with -mcmodel=extreme in BOOT_CFLAGS: > > we need a target hook to tell the generic code > > UNSPEC_LA_PCREL_64_PART{1,2} are just a wrapper around symbols, or we'll > > see millions lines of messages like > > > > ../../gcc/gcc/tree.h:4171:1: note: non-delegitimized UNSPEC > > UNSPEC_LA_PCREL_64_PART1 (42) found in variable location > > I build GCC with -mcmodel=extreme in BOOT_CFLAGS, but I haven't reproduced > the problem you mentioned. > > $ ../configure --host=loongarch64-linux-gnu > --target=loongarch64-linux-gnu --build=loongarch64-linux-gnu \ > --with-arch=loongarch64 --with-abi=lp64d --enable-tls > --enable-languages=c,c++,fortran,lto --enable-plugin \ > --disable-multilib --disable-host-shared --enable-bootstrap > --enable-checking=release > $ make BOOT_FLAGS="-mcmodel=extreme" > > What did I do wrong?:-( BOOT_CFLAGS, not BOOT_FLAGS :). -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH v2 2/2] LoongArch: When the code model is extreme, the symbol address is obtained through macro instructions regardless of the value of -mexplicit-relocs.
在 2024-01-13星期六的 15:01 +0800,chenglulu写道: > > 在 2024/1/12 下午7:42, Xi Ruoyao 写道: > > 在 2024-01-12星期五的 09:46 +0800,chenglulu写道: > > > > > > I found an issue bootstrapping GCC with -mcmodel=extreme in BOOT_CFLAGS: > > > > we need a target hook to tell the generic code > > > > UNSPEC_LA_PCREL_64_PART{1,2} are just a wrapper around symbols, or we'll > > > > see millions lines of messages like > > > > > > > > ../../gcc/gcc/tree.h:4171:1: note: non-delegitimized UNSPEC > > > > UNSPEC_LA_PCREL_64_PART1 (42) found in variable location > > > I build GCC with -mcmodel=extreme in BOOT_CFLAGS, but I haven't > > > reproduced the problem you mentioned. > > > > > > $ ../configure --host=loongarch64-linux-gnu > > > --target=loongarch64-linux-gnu --build=loongarch64-linux-gnu \ > > > --with-arch=loongarch64 --with-abi=lp64d --enable-tls > > > --enable-languages=c,c++,fortran,lto --enable-plugin \ > > > --disable-multilib --disable-host-shared --enable-bootstrap > > > --enable-checking=release > > > $ make BOOT_FLAGS="-mcmodel=extreme" > > > > > > What did I do wrong?:-( > > BOOT_CFLAGS, not BOOT_FLAGS :). > > > This is so strange. My compilation here stopped due to syntax problems, > > and I still haven't reproduced the information you mentioned about > UNSPEC_LA_PCREL_64_PART1. I used: ../gcc/configure --with-system-zlib --disable-fixincludes \ --enable-default-ssp --enable-default-pie \ --disable-werror --disable-multilib \ --prefix=/home/xry111/gcc-dev and then make STAGE1_{C,CXX}FLAGS="-O2 -g" -j8 \ BOOT_{C,CXX}FLAGS="-O2 -g -mcmodel=extreme" &| tee gcc-build.log I guess "-g" is needed to reproduce the issue as well as the messages were produced in dwarf generation. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH v2] LoongArch: testsuite:Added additional vectorization "-mlsx" option.
在 2024-01-13星期六的 15:28 +0800,chenxiaolong写道: > gcc/testsuite/ChangeLog: > > * gcc.dg/pr104992.c: Added additional "-mlsx" compilation options. > * gcc.dg/signbit-2.c: Dito. > * gcc.dg/tree-ssa/scev-16.c: Dito. > * gfortran.dg/graphite/vect-pr40979.f90: Dito. > * gfortran.dg/vect/fast-math-mgrid-resid.f: Dito. I don't feel it right about the changes to pr104992.c and scev-16.c because no other architectures add special options there. Why are we so special? > --- > gcc/testsuite/gcc.dg/pr104992.c | 1 + > gcc/testsuite/gcc.dg/signbit-2.c | 1 + > gcc/testsuite/gcc.dg/tree-ssa/scev-16.c | 1 + > gcc/testsuite/gfortran.dg/graphite/vect-pr40979.f90 | 1 + > gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f | 1 + > 5 files changed, 5 insertions(+) > > diff --git a/gcc/testsuite/gcc.dg/pr104992.c b/gcc/testsuite/gcc.dg/pr104992.c > index 82f8c75559c..a77992fa491 100644 > --- a/gcc/testsuite/gcc.dg/pr104992.c > +++ b/gcc/testsuite/gcc.dg/pr104992.c > @@ -1,6 +1,7 @@ > /* PR tree-optimization/104992 */ > /* { dg-do compile } */ > /* { dg-options "-O2 -Wno-psabi -fdump-tree-optimized" } */ > +/* { dg-additional-options "-mlsx" { target loongarch_sx } } */ > > #define vector __attribute__((vector_size(4*sizeof(int > > diff --git a/gcc/testsuite/gcc.dg/signbit-2.c > b/gcc/testsuite/gcc.dg/signbit-2.c > index 62bb4047d74..5511bb78149 100644 > --- a/gcc/testsuite/gcc.dg/signbit-2.c > +++ b/gcc/testsuite/gcc.dg/signbit-2.c > @@ -5,6 +5,7 @@ > /* { dg-additional-options "-msse2 -mno-avx512f" { target { i?86-*-* > x86_64-*-* } } } */ > /* { dg-additional-options "-march=armv8-a" { target aarch64_sve } } */ > /* { dg-additional-options "-maltivec" { target powerpc_altivec_ok } } */ > +/* { dg-additional-options "-mlsx" { target loongarch_sx } } */ > /* { dg-skip-if "no fallback for MVE" { arm_mve } } */ > > #include > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c > b/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c > index 120f40c0b6c..06cfbbcfae5 100644 > --- a/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c > +++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c > @@ -1,6 +1,7 @@ > /* { dg-do compile } */ > /* { dg-require-effective-target vect_int } */ > /* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details" } */ > +/* { dg-additional-options "-mlsx" { target { loongarch*-*-* } } } */ > > int A[1024 * 2]; > > diff --git a/gcc/testsuite/gfortran.dg/graphite/vect-pr40979.f90 > b/gcc/testsuite/gfortran.dg/graphite/vect-pr40979.f90 > index a42290948c4..6f2ad1166a4 100644 > --- a/gcc/testsuite/gfortran.dg/graphite/vect-pr40979.f90 > +++ b/gcc/testsuite/gfortran.dg/graphite/vect-pr40979.f90 > @@ -1,6 +1,7 @@ > ! { dg-do compile } > ! { dg-require-effective-target vect_double } > ! { dg-additional-options "-msse2" { target { { i?86-*-* x86_64-*-* } && > ilp32 } } } > +! { dg-additional-options "-mlsx" { target { loongarch*-*-* } } } > > module mqc_m > integer, parameter, private :: longreal = selected_real_kind(15,90) > diff --git a/gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f > b/gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f > index 08965cc5e20..97b88821731 100644 > --- a/gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f > +++ b/gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f > @@ -2,6 +2,7 @@ > ! { dg-require-effective-target vect_double } > ! { dg-options "-O3 --param vect-max-peeling-for-alignment=0 > -fpredictive-commoning -fdump-tree-pcom-details -std=legacy" } > ! { dg-additional-options "-mprefer-avx128" { target { i?86-*-* x86_64-*-* } > } } > +! { dg-additional-options "-mlsx" { target { loongarch*-*-* } } } > ! { dg-additional-options "-mzarch" { target { s390*-*-* } } } > > *** RESID COMPUTES THE RESIDUAL: R = V - AU -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University