Re: [PATCH] LoongArch: Set 4 * (issue rate) as the default for -falign-functions and -falign-loops
On Tue, 2023-04-18 at 21:06 +0800, Lulu Cheng wrote: > Hi, ruoyao: > > Thank you so much for making this submission. But we are testing the > impact of these two alignment parameters > > (also including -falign-jumps and -falign-lables ) on performance. So > before the result comes out, this patch will > > not be merged into the main branch for the time being. Hi! Is there an estimate when the benchmark will be done? If it will be done soon I'll wait for the result before performing a full system rebuild, otherwise I'll use my gut feeling to specify a -falign- functions= value for the build :). -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
[PATCH] libatomic: x86_64: Always try ifunc
We used to skip ifunc check when CX16 is available. But now we use CX16+AVX+Intel/AMD for the "perfect" 16b load implementation, so CX16 alone is not a sufficient reason not to use ifunc (see PR104688). This causes a subtle and annoying issue: when GCC is built with a higher -march= setting in CFLAGS_FOR_TARGET, ifunc is disabled and the worst (locked) implementation of __atomic_load_16 is always used. There seems no good way to check if the CPU is Intel or AMD from the built-in macros (maybe we can check every known model like __skylake, __bdver2, ..., but it will be very error-prune and require an update whenever we add the support for a new x86 model). The best thing we can do seems "always try ifunc" here. Bootstrapped and tested on x86_64-linux-gnu. Ok for trunk? libatomic/ChangeLog: * configure.tgt: For x86_64, always set try_ifunc=yes. --- libatomic/configure.tgt | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/libatomic/configure.tgt b/libatomic/configure.tgt index a92ae9e8309..39dd5686f2e 100644 --- a/libatomic/configure.tgt +++ b/libatomic/configure.tgt @@ -100,9 +100,7 @@ EOF fi cat > conftestx.c <
Re: [PATCH] libatomic: x86_64: Always try ifunc
On Sat, 2023-06-03 at 14:53 +0200, Bernhard Reutner-Fischer wrote: > On 3 June 2023 13:25:32 CEST, Xi Ruoyao via Gcc-patches > wrote: > > > There seems no good way to check if the CPU is Intel or AMD from > > the built-in macros (maybe we can check every known model like > > __skylake, > > __bdver2, ..., but it will be very error-prune and require an update > > whenever we add the support for a new x86 model). The best thing we > > can > > do seems "always try ifunc" here. > > IIRC there is __builtin_cpu_is (after initialisation) -- A couple of > days ago, we wondered if it would be handy to lower that even in > fortran without going through C, so i am pretty sure I don't make that > up.. ;-) Unfortunately __builtin_cpu_is performs CPU detection on runtime, not compile time. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Ping: [PATCH] libatomic: x86_64: Always try ifunc
Ping (in hopes that someone can review before the weekend). On Sat, 2023-06-03 at 19:25 +0800, Xi Ruoyao wrote: > We used to skip ifunc check when CX16 is available. But now we use > CX16+AVX+Intel/AMD for the "perfect" 16b load implementation, so CX16 > alone is not a sufficient reason not to use ifunc (see PR104688). > > This causes a subtle and annoying issue: when GCC is built with a > higher -march= setting in CFLAGS_FOR_TARGET, ifunc is disabled and > the worst (locked) implementation of __atomic_load_16 is always used. > > There seems no good way to check if the CPU is Intel or AMD from > the built-in macros (maybe we can check every known model like > __skylake, > __bdver2, ..., but it will be very error-prune and require an update > whenever we add the support for a new x86 model). The best thing we > can > do seems "always try ifunc" here. > > Bootstrapped and tested on x86_64-linux-gnu. Ok for trunk? > > libatomic/ChangeLog: > > * configure.tgt: For x86_64, always set try_ifunc=yes. > --- > libatomic/configure.tgt | 4 +--- > 1 file changed, 1 insertion(+), 3 deletions(-) > > diff --git a/libatomic/configure.tgt b/libatomic/configure.tgt > index a92ae9e8309..39dd5686f2e 100644 > --- a/libatomic/configure.tgt > +++ b/libatomic/configure.tgt > @@ -100,9 +100,7 @@ EOF > fi > cat > conftestx.c < #ifdef __x86_64__ > -#ifndef __GCC_HAVE_SYNC_COMPARE_AND_SWAP_16 > -#error need -mcx16 > -#endif > +#error ifunc is always wanted for 16B atomic load > #else > #ifndef __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8 > #error need -march=i686 -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH] LoongArch: Set 4 * (issue rate) as the default for -falign-functions and -falign-loops
On Tue, 2023-05-30 at 09:30 +0800, Lulu Cheng wrote: > > 在 2023/5/29 下午2:09, Xi Ruoyao 写道: > > On Tue, 2023-04-18 at 21:06 +0800, Lulu Cheng wrote: > > > Hi, ruoyao: > > > > > > Thank you so much for making this submission. But we are testing > > > the > > > impact of these two alignment parameters > > > > > > (also including -falign-jumps and -falign-lables ) on performance. > > > So > > > before the result comes out, this patch will > > > > > > not be merged into the main branch for the time being. > > Hi! > > > > Is there an estimate when the benchmark will be done? If it will be > > done soon I'll wait for the result before performing a full system > > rebuild, otherwise I'll use my gut feeling to specify a -falign- > > functions= value for the build :). > > > Sorry for taking so long to reply to the email. From our current test > results, > > the performance of the SPEC is best when combined with -falign- > loops=16, > > -falign-jumps=16, -falign-functions=32 and -falign-lables=16. I've completed a system rebuild with -falign- {jumps,functions,labels}=16. I've missed -falign-loops=16 but the doc says -falign-labels=16 implies -falign-jumps=16 and -falign-loops=16 (if -falign-jumps or -falign-loops are not set explicitly with a larger value). I'll make a patch to set -falign-functions=32 and -falign-labels=16 with -mtune={la464,loongarch64} after setting a basic develop environment on the new system... And I'm wondering if things will change with LA664 :). -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH] rs6000: replace '(const_int 0)' to 'unspec:BLK [(const_int 0)]' for stack_tie
On Tue, 2023-06-13 at 20:23 +0800, Jiufu Guo via Gcc-patches wrote: > Compare with previous version, this addes ChangeLog and removes > const_anchor parts. > https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621356.html. [Off topic] const_anchor is just broken now. See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104843 and the thread beginning at https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591470.html. If you want to use it for rs6000 I guess you need to fix it first... To me const_anchor needs a complete rework but I don't want to spend my time on it. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
[PATCH] LoongArch: Set default alignment for functions and labels with -mtune
The LA464 micro-architecture is sensitive to alignment of code. The Loongson team has benchmarked various combinations of function, the results [1] show that 16-byte label alignment together with 32-byte function alignment gives best results in terms of SPEC score. Add a mtune-based table-driven mechanism to set the default of -falign-{functions,labels}. As LA464 is the first (and the only for now) uarch supported by GCC, the same setting is also used for the "generic" -mtune=loongarch64. In the future we may set different settings for LA{2,3,6}64 once we add the support for them. Bootstrapped and regtested on loongarch64-linux-gnu. Ok for trunk? gcc/ChangeLog: * config/loongarch/loongarch-tune.h (loongarch_align): New struct. * config/loongarch/loongarch-def.h (loongarch_cpu_align): New array. * config/loongarch/loongarch-def.c (loongarch_cpu_align): Define the array. * config/loongarch/loongarch.cc (loongarch_option_override_internal): Set the value of -falign-functions= if -falign-functions is enabled but no value is given. Likewise for -falign-labels=. --- gcc/config/loongarch/loongarch-def.c | 12 gcc/config/loongarch/loongarch-def.h | 1 + gcc/config/loongarch/loongarch-tune.h | 8 gcc/config/loongarch/loongarch.cc | 6 ++ 4 files changed, 27 insertions(+) diff --git a/gcc/config/loongarch/loongarch-def.c b/gcc/config/loongarch/loongarch-def.c index fc4ebbefede..6729c857f7c 100644 --- a/gcc/config/loongarch/loongarch-def.c +++ b/gcc/config/loongarch/loongarch-def.c @@ -72,6 +72,18 @@ loongarch_cpu_cache[N_TUNE_TYPES] = { }, }; +struct loongarch_align +loongarch_cpu_align[N_TUNE_TYPES] = { + [CPU_LOONGARCH64] = { +.function = "32", +.label = "16", + }, + [CPU_LA464] = { +.function = "32", +.label = "16", + }, +}; + /* The following properties cannot be looked up directly using "cpucfg". So it is necessary to provide a default value for "unknown native" tune targets (i.e. -mtune=native while PRID does not correspond to diff --git a/gcc/config/loongarch/loongarch-def.h b/gcc/config/loongarch/loongarch-def.h index 778b1409956..fb8bb88eb52 100644 --- a/gcc/config/loongarch/loongarch-def.h +++ b/gcc/config/loongarch/loongarch-def.h @@ -144,6 +144,7 @@ extern int loongarch_cpu_issue_rate[]; extern int loongarch_cpu_multipass_dfa_lookahead[]; extern struct loongarch_cache loongarch_cpu_cache[]; +extern struct loongarch_align loongarch_cpu_align[]; extern struct loongarch_rtx_cost_data loongarch_cpu_rtx_cost_data[]; #ifdef __cplusplus diff --git a/gcc/config/loongarch/loongarch-tune.h b/gcc/config/loongarch/loongarch-tune.h index ba31c4f08c3..5c03262daff 100644 --- a/gcc/config/loongarch/loongarch-tune.h +++ b/gcc/config/loongarch/loongarch-tune.h @@ -48,4 +48,12 @@ struct loongarch_cache { int simultaneous_prefetches; /* number of parallel prefetch */ }; +/* Alignment for functions and labels for best performance. For new uarchs + the value should be measured via benchmarking. See the documentation for + -falign-functions and -falign-labels in invoke.texi for the format. */ +struct loongarch_align { + const char *function;/* default value for -falign-functions */ + const char *label; /* default value for -falign-labels */ +}; + #endif /* LOONGARCH_TUNE_H */ diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index eb73d11b869..5b8b93eb24b 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -6249,6 +6249,12 @@ loongarch_option_override_internal (struct gcc_options *opts) && !opts->x_optimize_size) opts->x_flag_prefetch_loop_arrays = 1; + if (opts->x_flag_align_functions && !opts->x_str_align_functions) +opts->x_str_align_functions = loongarch_cpu_align[LARCH_ACTUAL_TUNE].function; + + if (opts->x_flag_align_labels && !opts->x_str_align_labels) +opts->x_str_align_labels = loongarch_cpu_align[LARCH_ACTUAL_TUNE].label; + if (TARGET_DIRECT_EXTERN_ACCESS && flag_shlib) error ("%qs cannot be used for compiling a shared library", "-mdirect-extern-access"); -- 2.41.0
Re: [PATCH] rs6000: replace '(const_int 0)' to 'unspec:BLK [(const_int 0)]' for stack_tie
On Wed, 2023-06-14 at 09:55 +0800, Jiufu Guo wrote: > Hi, > > Xi Ruoyao writes: > > > On Tue, 2023-06-13 at 20:23 +0800, Jiufu Guo via Gcc-patches wrote: > > > > > Compare with previous version, this addes ChangeLog and removes > > > const_anchor parts. > > > https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621356.html. > > > > [Off topic] > > > > const_anchor is just broken now. See > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104843 and the thread > > beginning at > > https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591470.html. If > > you want to use it for rs6000 I guess you need to fix it first... > > Thanks so much for pointing out this. It seems about supporting > negative value, right? > > As you say: for 1. "g(0x8123, 0x81240001)", it would be fine. > > The generated insns are: > (insn 5 2 6 2 (set (reg:DI 117) > (const_int -2128347135 [0x81240001])) "negative.c":5:3 681 > {*movdi_internal64} > (nil)) > (insn 6 5 7 2 (set (reg:DI 118) > (plus:DI (reg:DI 117) > (const_int -2 [0xfffe]))) "negative.c":5:3 66 > {*adddi3} > (expr_list:REG_EQUAL (const_int -2128347137 [0x8123]) > (nil))) > > While for 2. "g (0x7fff, 0x8001)", the generated rtl insns: > (insn 5 2 6 2 (set (reg:DI 117) > (const_int -2147483647 [0x8001])) "negative.c":5:3 681 > {*movdi_internal64} > (nil)) > (insn 7 6 8 2 (set (reg:DI 3 3) > (const_int 2147483647 [0x7fff])) "negative.c":5:3 681 > {*movdi_internal64} > (nil)) > > The current const_anchor does not generate sth like: "r3 = r117 - 2" > But I would lean to say it is the limitation of current implementation: > "0x8001" and "0x7fff" hit different anchors(even these > two values are 'close' on some aspect.) The generic issue here is to fix (not "papering over") the signed overflow, we need to perform the addition in a target machine mode. We may always use Pmode (IIRC const_anchor was introduced for optimizing some constant addresses), but can we do better? Should we try addition in both DImode and SImode for a 64-bit capable machine? Or should we even try more operations than addition (for eg bit operations like xor or shift)? Doing so will need to create a new target hook for const anchoring, this is the "complete rework" I meant. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Pushed: [PATCH] LoongArch: Set default alignment for functions and labels with -mtune
Pushed r14-1839. On Thu, 2023-06-15 at 09:12 +0800, Lulu Cheng wrote: > LGTM! Thanks! > > 在 2023/6/14 上午8:43, Xi Ruoyao 写道: > > The LA464 micro-architecture is sensitive to alignment of code. The > > Loongson team has benchmarked various combinations of function, the > > results [1] show that 16-byte label alignment together with 32-byte > > function alignment gives best results in terms of SPEC score. > > > > Add a mtune-based table-driven mechanism to set the default of > > -falign-{functions,labels}. As LA464 is the first (and the only for > > now) uarch supported by GCC, the same setting is also used for > > the "generic" -mtune=loongarch64. In the future we may set > > different > > settings for LA{2,3,6}64 once we add the support for them. > > > > Bootstrapped and regtested on loongarch64-linux-gnu. Ok for trunk? > > > > gcc/ChangeLog: > > > > * config/loongarch/loongarch-tune.h (loongarch_align): New > > struct. > > * config/loongarch/loongarch-def.h (loongarch_cpu_align): > > New > > array. > > * config/loongarch/loongarch-def.c (loongarch_cpu_align): > > Define > > the array. > > * config/loongarch/loongarch.cc > > (loongarch_option_override_internal): Set the value of > > -falign-functions= if -falign-functions is enabled but no > > value > > is given. Likewise for -falign-labels=. > > --- > > gcc/config/loongarch/loongarch-def.c | 12 > > gcc/config/loongarch/loongarch-def.h | 1 + > > gcc/config/loongarch/loongarch-tune.h | 8 > > gcc/config/loongarch/loongarch.cc | 6 ++ > > 4 files changed, 27 insertions(+) > > > > diff --git a/gcc/config/loongarch/loongarch-def.c > > b/gcc/config/loongarch/loongarch-def.c > > index fc4ebbefede..6729c857f7c 100644 > > --- a/gcc/config/loongarch/loongarch-def.c > > +++ b/gcc/config/loongarch/loongarch-def.c > > @@ -72,6 +72,18 @@ loongarch_cpu_cache[N_TUNE_TYPES] = { > > }, > > }; > > > > +struct loongarch_align > > +loongarch_cpu_align[N_TUNE_TYPES] = { > > + [CPU_LOONGARCH64] = { > > + .function = "32", > > + .label = "16", > > + }, > > + [CPU_LA464] = { > > + .function = "32", > > + .label = "16", > > + }, > > +}; > > + > > /* The following properties cannot be looked up directly using > > "cpucfg". > > So it is necessary to provide a default value for "unknown > > native" > > tune targets (i.e. -mtune=native while PRID does not correspond > > to > > diff --git a/gcc/config/loongarch/loongarch-def.h > > b/gcc/config/loongarch/loongarch-def.h > > index 778b1409956..fb8bb88eb52 100644 > > --- a/gcc/config/loongarch/loongarch-def.h > > +++ b/gcc/config/loongarch/loongarch-def.h > > @@ -144,6 +144,7 @@ extern int loongarch_cpu_issue_rate[]; > > extern int loongarch_cpu_multipass_dfa_lookahead[]; > > > > extern struct loongarch_cache loongarch_cpu_cache[]; > > +extern struct loongarch_align loongarch_cpu_align[]; > > extern struct loongarch_rtx_cost_data > > loongarch_cpu_rtx_cost_data[]; > > > > #ifdef __cplusplus > > diff --git a/gcc/config/loongarch/loongarch-tune.h > > b/gcc/config/loongarch/loongarch-tune.h > > index ba31c4f08c3..5c03262daff 100644 > > --- a/gcc/config/loongarch/loongarch-tune.h > > +++ b/gcc/config/loongarch/loongarch-tune.h > > @@ -48,4 +48,12 @@ struct loongarch_cache { > > int simultaneous_prefetches; /* number of parallel prefetch */ > > }; > > > > +/* Alignment for functions and labels for best performance. For > > new uarchs > > + the value should be measured via benchmarking. See the > > documentation for > > + -falign-functions and -falign-labels in invoke.texi for the > > format. */ > > +struct loongarch_align { > > + const char *function;/* default value for -falign- > > functions */ > > + const char *label; /* default value for -falign-labels */ > > +}; > > + > > #endif /* LOONGARCH_TUNE_H */ > > diff --git a/gcc/config/loongarch/loongarch.cc > > b/gcc/config/loongarch/loongarch.cc > > index eb73d11b869..5b8b93eb24b 100644 > > --- a/gcc/config/loongarch/loongarch.cc > > +++ b/gcc/config/loongarch/loongarch.cc > > @@ -6249,6 +6249,12 @@ loongarch_option_override_internal (struct > > gcc_options *opts) > > && !opts->x_optimize_size) > > opts->x_flag_prefetch_loop_arrays = 1; > > > > + if (opts->x_flag_align_functions && !opts->x_str_align_functions) > > + opts->x_str_align_functions = > > loongarch_cpu_align[LARCH_ACTUAL_TUNE].function; > > + > > + if (opts->x_flag_align_labels && !opts->x_str_align_labels) > > + opts->x_str_align_labels = > > loongarch_cpu_align[LARCH_ACTUAL_TUNE].label; > > + > > if (TARGET_DIRECT_EXTERN_ACCESS && flag_shlib) > > error ("%qs cannot be used for compiling a shared library", > > "-mdirect-extern-access"); > -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [pushed][PATCH v3] LoongArch: Avoid non-returning indirect jumps through $ra [PR110136]
Xuerui: I guess this makes it sensible to show "ret" instead of "jirl $zero, $ra, 0" in objdump -d output, but I don't know how to implement it. Do you have some idea? On Thu, 2023-06-15 at 16:27 +0800, Lulu Cheng wrote: > Pushed to trunk and gcc-12 gcc-13. > r14-1866 > r13-7448 > r12-9698 > > 在 2023/6/15 上午9:30, Lulu Cheng 写道: > > Micro-architecture unconditionally treats a "jr $ra" as "return from > > subroutine", > > hence doing "jr $ra" would interfere with both subroutine return > > prediction and > > the more general indirect branch prediction. > > > > Therefore, a problem like PR110136 can cause a significant increase > > in branch error > > prediction rate and affect performance. The same problem exists with > > "indirect_jump". > > > > gcc/ChangeLog: > > > > * config/loongarch/loongarch.md: Modify the register > > constraints for template > > "jumptable" and "indirect_jump" from "r" to "e". > > > > Co-authored-by: Andrew Pinski > > --- > > v1 -> v2: > > 1. Modify the description. > > 2. Modify the register constraints of the template > > "indirect_jump". > > v2 -> v3: > > 1. Modify the description. > > --- > > gcc/config/loongarch/loongarch.md | 8 ++-- > > 1 file changed, 6 insertions(+), 2 deletions(-) > > > > diff --git a/gcc/config/loongarch/loongarch.md > > b/gcc/config/loongarch/loongarch.md > > index 816a943d155..b37e070660f 100644 > > --- a/gcc/config/loongarch/loongarch.md > > +++ b/gcc/config/loongarch/loongarch.md > > @@ -2895,6 +2895,10 @@ (define_insn "*jump_pic" > > } > > [(set_attr "type" "branch")]) > > > > +;; Micro-architecture unconditionally treats a "jr $ra" as "return > > from subroutine", > > +;; non-returning indirect jumps through $ra would interfere with > > both subroutine > > +;; return prediction and the more general indirect branch > > prediction. > > + > > (define_expand "indirect_jump" > > [(set (pc) (match_operand 0 "register_operand"))] > > "" > > @@ -2905,7 +2909,7 @@ (define_expand "indirect_jump" > > }) > > > > (define_insn "@indirect_jump" > > - [(set (pc) (match_operand:P 0 "register_operand" "r"))] > > + [(set (pc) (match_operand:P 0 "register_operand" "e"))] > > "" > > "jr\t%0" > > [(set_attr "type" "jump") > > @@ -2928,7 +2932,7 @@ (define_expand "tablejump" > > > > (define_insn "@tablejump" > > [(set (pc) > > - (match_operand:P 0 "register_operand" "r")) > > + (match_operand:P 0 "register_operand" "e")) > > (use (label_ref (match_operand 1 "" "")))] > > "" > > "jr\t%0" > -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: Devirtualization of objects in array
On Wed, 2023-07-12 at 16:58 +0800, Ng YongXiang via Gcc-patches wrote: > I'm writing to seek for a review for an issue I filed some time ago. > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110057 . A proposed patch is > attached in the bug tracker as well. You should send the patch to gcc-patches@gcc.gnu.org for a review, see https://gcc.gnu.org/contribute.html for the details. Generally we consider patches attached in bugzilla as drafts. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH v2 2/2] libstdc++: use new built-in trait __is_scalar for std::is_scalar
On Wed, 2023-07-12 at 11:32 -0700, Ken Matsui via Gcc-patches wrote: > > conditional on the front-end change being committed first of course > > Does this mean we want to commit this [2/2] patch before committing > the [1/2] patch in this case? No, this mean you should get 1/2 reviewed and committed first. > Also, can I tweak the commit message without being approved again, > such as attaching the benchmark result? Yes, as long as the ChangeLog is still correct (the Git hook will reject a push with wrong ChangeLog format anyway). -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH v2 0/8] Add Loongson SX/ASX instruction support to LoongArch target.
On Tue, 2023-07-18 at 19:06 +0800, Chenghui Pan wrote: > Lulu Cheng (8): > LoongArch: Added Loongson SX vector directive compilation framework. > LoongArch: Added Loongson SX base instruction support. > LoongArch: Added Loongson SX directive builtin function support. > LoongArch: Added Loongson ASX vector directive compilation framework. > LoongArch: Added Loongson ASX base instruction support. > LoongArch: Added Loongson ASX directive builtin function support. Let's always use "Add". > LoongArch: Add Loongson SX directive test cases. > LoongArch: Add Loongson ASX directive test cases. Have you tested this series by bootstrapping and regtesting GCC with BOOT_CFLAGS="-O2 -ftree-vectorize -fno-vect-cost-model -mlasx" and BOOT_CFLAGS="-O3 -mlasx"? This may catch some mistakes early. And I'll rebuild the entire system with these GCC patches and -mlasx in Aug (after Glibc-2.38 release) as a field test too. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
[PATCH] LoongArch: Allow using --with-arch=native if host CPU is LoongArch
If the host triple and the target triple are different but the host is LoongArch, in some cases --with-arch=native can be useful. For example, if we are bootstrapping a loongarch64-linux-musl toolchain on a Glibc-based system and we don't intend to use the toolchain on other machines, we can use ../gcc/configure --{build,host}=loongarch64-linux-gnu \ --target=loongarch64-linux-musl --with-arch=native Relax the check in config.gcc to allow such configurations. gcc/ChangeLog: * config.gcc [target=loongarch*-*-*, with_arch=native]: Allow building cross compiler if the host CPU is LoongArch. --- Tested on x86_64-linux-gnu (building a cross compiler targeting LoongArch --with-arch=native still rejected) and loongarch64-linux-gnu (building a cross compiler targeting loongarch64-linux-musl allowed). Ok for trunk? gcc/config.gcc | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/gcc/config.gcc b/gcc/config.gcc index 1446eb2b3ca..146bca22a38 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -4939,10 +4939,13 @@ case "${target}" in case ${with_arch} in "" | loongarch64 | la464) ;; # OK, append here. native) - if test x${host} != x${target}; then + case ${host} in + loongarch*) ;; # OK + *) echo "--with-arch=native is illegal for cross-compiler." 1>&2 exit 1 - fi + ;; + esac ;; "") echo "Please set a default value for \${with_arch}" \ -- 2.41.0
Re: [PATCH] Reduce floating-point difficulties in timevar.cc
On Fri, 2023-07-21 at 13:11 +0100, Matthew Malcomson via Gcc-patches wrote: > This change ensures those operations are not fused and hence stops the test > being flaky on that particular machine. There is no expected change in the > generated code. > Bootstrap & regtest on AArch64 passes with no regressions. > > gcc/ChangeLog: > > * timevar.cc (get_time): Make this noinline to avoid fusing > behaviour and associated test flakyness. I don't think it's correct. It will break bootstrapping GCC from other ISO C++11 compilers, you need to at least guard it with #ifdef __GNUC__. And IMO it's just hiding the real problem. We need more info of the "particular machine". Is this a hardware bug (i.e. the machine violates the AArch64 spec) or a GCC code generation issue? Or should we generally use -ffp-contract=off in BOOT_CFLAGS? -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH] Reduce floating-point difficulties in timevar.cc
On Fri, 2023-07-21 at 14:11 +0100, Matthew Malcomson wrote: > My understanding is that this is not a hardware bug and that it's > specified that rounding does not happen on the multiply "sub-part" in > `FNMSUB`, but rounding happens on the `FMUL` that generates some input > to it. AFAIK the C standard does only say "A floating *expression* may be contracted". I.e: double r = a * b + c; may be compiled to use FMA because "a * b + c" is a floating point expression. But double t = a * b; double r = t + c; is not, because "a * b" and "t + c" are two separate floating point expressions. So a contraction across two functions is not allowed. We now have -ffp- contract=on (https://gcc.gnu.org/r14-2023) to only allow C-standard contractions. Perhaps -ffp-contract=on (not off) is enough to fix the issue (if you are building GCC 14 snapshot). The default is "fast" (if no -std= option is used), which allows some contractions disallowed by the standard. But GCC is in C++ and I'm not sure if the C++ standard has the same definition for allowed contractions as C. > I can look into `-ffp-contract=off` as you both have recommended. > One question -- if we have concerns that the host compiler may not be > able to handle `attribute((noinline))` would we also be concerned that > this flag may not be supported? Only use it in BOOT_CFLAGS, i. e. 'make BOOT_CFLAGS="-O2 -g -ffp- contract=on"' (or "off" instead of "on"). In 3-stage bootstrapping it's only applied in stage 2 and 3, during which GCC is compiled by itself. > (Or is the severity of lack of support sufficiently different in the two > cases that this is fine -- i.e. not compile vs may trigger floating > point rounding inaccuracies?) It's possible that the test itself is flaky. Can you provide some detail about how it fails? -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH] Reduce floating-point difficulties in timevar.cc
On Fri, 2023-07-21 at 16:58 +0300, Alexander Monakov wrote: > > On Fri, 21 Jul 2023, Xi Ruoyao via Gcc-patches wrote: > > > Perhaps -ffp-contract=on (not off) is enough to fix the issue (if you > > are building GCC 14 snapshot). The default is "fast" (if no -std= > > option is used), which allows some contractions disallowed by the > > standard. > > Not fully, see below. > > > But GCC is in C++ and I'm not sure if the C++ standard has the same > > definition for allowed contractions as C. > > It doesn't, but in GCC we should aim to provide the same semantics in C++ > as in C. > > > > (Or is the severity of lack of support sufficiently different in the two > > > cases that this is fine -- i.e. not compile vs may trigger floating > > > point rounding inaccuracies?) > > > > It's possible that the test itself is flaky. Can you provide some > > detail about how it fails? > > See also PR 99903 for an earlier known issue which appears due to x87 > excess precision and so tweaking -ffp-contract wouldn't help: > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99903 Does it affect AArch64 too? > Now that multiple platforms are hitting this, can we _please_ get rid > of the questionable attempt to compute time in a floating-point variable > and just use an uint64_t storing nanoseconds? To me this is the correct thing to do. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH] Mips: Resolve build issues for the n32 ABI
Please configure your mail client to send the patch as plain text (not HTML, and don't replace a tab with 8 whitespaces, etc). If it's not possible, send the patch as an attachment. And, it's better to separate the changes in https://reviews.llvm.org/D127098 and struct_kernel_stat_sz into two patches, one just "cherry-pick LLVM commit aabbccdd...", another changes struct_kernel_stat_sz. It would make the reviewing and tracking of changes easier. On Fri, 2022-07-01 at 08:18 +, Dimitrije Milosevic wrote: > Building the ASAN for the n32 MIPS ABI currently fails, due to a few reasons: > - defined(__mips64), which is set solely based on the architecture type > (32-bit/64-bit), was still used in some places. > Therefore, defined(__mips64) is swapped with SANITIZER_MIPS64, which takes > the ABI into account as well - defined(__mips64) && _MIPS_SIM == ABI64. > - The n32 ABI still uses 64-bit *Linux* system calls, even though the word > size is 32 bits. > - After the transition to canonical system calls > (https://reviews.llvm.org/D124212), the n32 ABI still didn't use them, even > though they are supported, > as per > https://github.com/torvalds/linux/blob/master/arch/mips/kernel/syscalls/syscall_n32.tbl. > - struct_kernel_stat_sz was not updated after being changed in LLVM's source > tree. > > See https://reviews.llvm.org/D127098. > > libsanitizer/ChangeLog: > > * sanitizer_common/sanitizer_linux.cpp (defined): Resolve > ASAN build issues for the n32 ABI. > * sanitizer_common/sanitizer_platform.h (defined): Likewise. > * sanitizer_common/sanitizer_platform_limits_posix.h: Likewise. > > --- > > libsanitizer/sanitizer_common/sanitizer_linux.cpp | 17 > ++--- > libsanitizer/sanitizer_common/sanitizer_platform.h | 2 +- > libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.h | 2 +- > 3 files changed, 12 insertions(+), 9 deletions(-) > > diff --git a/libsanitizer/sanitizer_common/sanitizer_linux.cpp > b/libsanitizer/sanitizer_common/sanitizer_linux.cpp > index e2c32d679ad..5ba033492e7 100644 > --- a/libsanitizer/sanitizer_common/sanitizer_linux.cpp > +++ b/libsanitizer/sanitizer_common/sanitizer_linux.cpp > @@ -34,7 +34,7 @@ > // format. Struct kernel_stat is defined as 'struct stat' in asm/stat.h. To > // access stat from asm/stat.h, without conflicting with definition in > // sys/stat.h, we use this trick. > -#if defined(__mips64) > +#if SANITIZER_MIPS64 > #include > #include > #define stat kernel_stat > @@ -124,8 +124,9 @@ const int FUTEX_WAKE_PRIVATE = FUTEX_WAKE | > FUTEX_PRIVATE_FLAG; > // Are we using 32-bit or 64-bit Linux syscalls? > // x32 (which defines __x86_64__) has SANITIZER_WORDSIZE == 32 > // but it still needs to use 64-bit syscalls. > -#if SANITIZER_LINUX && (defined(__x86_64__) || defined(__powerpc64__) || > \ > - SANITIZER_WORDSIZE == 64) > +#if SANITIZER_LINUX && (defined(__x86_64__) || defined(__powerpc64__) || \ > + SANITIZER_WORDSIZE == 64 || \ > + (defined(__mips__) && _MIPS_SIM == _ABIN32)) > # define SANITIZER_LINUX_USES_64BIT_SYSCALLS 1 > #else > # define SANITIZER_LINUX_USES_64BIT_SYSCALLS 0 > @@ -289,7 +290,7 @@ static void stat64_to_stat(struct stat64 *in, struct stat > *out) { > } > #endif > > -#if defined(__mips64) > +#if SANITIZER_MIPS64 > // Undefine compatibility macros from > // so that they would not clash with the kernel_stat > // st_[a|m|c]time fields > @@ -343,7 +344,8 @@ uptr internal_stat(const char *path, void *buf) { > #if SANITIZER_FREEBSD > return internal_syscall(SYSCALL(fstatat), AT_FDCWD, (uptr)path, (uptr)buf, > 0); > # elif SANITIZER_LINUX > -# if SANITIZER_WORDSIZE == 64 || SANITIZER_X32 > +# if SANITIZER_WORDSIZE == 64 || SANITIZER_X32 || \ > + (defined(__mips__) && _MIPS_SIM == _ABIN32) > return internal_syscall(SYSCALL(newfstatat), AT_FDCWD, (uptr)path, > (uptr)buf, > 0); > # else > @@ -366,7 +368,8 @@ uptr internal_lstat(const char *path, void *buf) { > return internal_syscall(SYSCALL(fstatat), AT_FDCWD, (uptr)path, (uptr)buf, > AT_SYMLINK_NOFOLLOW); > # elif SANITIZER_LINUX > -# if defined(_LP64) || SANITIZER_X32 > +# if defined(_LP64) || SANITIZER_X32 || \ > + (defined(__mips__) && _MIPS_SIM == _ABIN32) > return internal_syscall(SYSCALL(newfstatat), AT_FDCWD, (uptr)path, > (uptr)buf, > AT_SYMLINK_NOFOLLOW); > # else > @@ -1053,7 +1056,7 @@ uptr GetMaxVirtualAddress() { > return (1ULL << (MostSignificantSetBitIndex(GET_CURRENT_FRAME()) + 1)) - 1; > #elif SANITIZER_RISCV64 > return (1ULL << 38) - 1; > -# elif defined(__mips64) > +# elif SANITIZER_MIPS64 > return (1ULL << 40) - 1; // 0x00ffUL; > # elif defined(__s390x__) > return (1ULL << 53) - 1;
Re: [PATCH] libsanitizer: Fix linkage errors for cross toolchains
Again, please send patch as plain text. On Fri, 2022-07-01 at 08:18 +, Dimitrije Milosevic wrote: > When we use cross toolchains, in which the GCC libraries are not > installed within a designated system root, the shared sanitizer > libraries link against libstdc++.so* within the same libraries. > This directory, however, is not in RPATH, so attempting to build a > dynamically linked application with -fsanitize=... gives a linkage > error. > More information can be found here: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69839. Hmm... Is anyone really using a cross compiler without a sysroot? PR 69839 is already closed as INVALID. If you don't want to install GCC runtime libraries into the sysroot (for example, for multiple GCC builds using a shared sysroot), we have --enable-version-specific-runtime-libs but unfortunately it's broken (PR 32415). Please explain the actual use case and let us elaborate to see if there is a better solution. > gcc/ChangeLog: > * gcc.c (LIBSAN_RPATH): New macro. > (LIBASAN_SPEC): Add LIBSAN_RPATH. > (LIBUBSAN_SPEC): Likewise. > (LIBTSAN_SPEC): Likewise. > (LIBLSAN_SPEC): Likewise. > > libsanitizer/ChangeLog: > * configure.ac (link_libsan_rpath): New config variable. > * libsanitizer.spec.in (link_libsan_rpath): New spec. > * configure (link_libsan_rpath): New config variable. > * Makefile.in (link_libsan_rpath): Define new Makefile > variable. > * asan/Makefile.in: Likewise. > * interception/Makefile.in: Likewise. > * libbacktrace/Makefile.in: Likewise. > * lsan/Makefile.in: Likewise. > * sanitizer_common/Makefile.in: Likewise. > * tsan/Makefile.in: Likewise. > * ubsan/Makefile.in: Likewise. > * hwasan/Makefile.in: Likewise. > > --- > > gcc/gcc.cc | 20 > libsanitizer/Makefile.in | 1 + > libsanitizer/asan/Makefile.in | 1 + > libsanitizer/configure | 10 -- > libsanitizer/configure.ac | 7 +++ > libsanitizer/hwasan/Makefile.in | 1 + > libsanitizer/interception/Makefile.in | 1 + > libsanitizer/libbacktrace/Makefile.in | 1 + > libsanitizer/libsanitizer.spec.in | 2 ++ > libsanitizer/lsan/Makefile.in | 1 + > libsanitizer/sanitizer_common/Makefile.in | 1 + > libsanitizer/tsan/Makefile.in | 1 + > libsanitizer/ubsan/Makefile.in | 1 + > 13 files changed, 38 insertions(+), 10 deletions(-) > > > diff --git a/gcc/gcc.cc b/gcc/gcc.ccindex 299e09c4f54..37ff75f1ad5 > 100644 > --- a/gcc/gcc.cc > +++ b/gcc/gcc.cc > @@ -738,17 +738,21 @@ proper position among the other output files. > */ > #define STACK_SPLIT_SPEC " %{fsplit-stack: --wrap=pthread_create}" > #endif > > +#ifndef LIBSAN_RPATH > +#define LIBSAN_RPATH " > %:include(libsanitizer.spec)%(link_libsan_rpath)" > +#endif > + > #ifndef LIBASAN_SPEC > #define STATIC_LIBASAN_LIBS \ > " %{static- > libasan|static:%:include(libsanitizer.spec)%(link_libasan)}" > #ifdef LIBASAN_EARLY_SPEC > -#define LIBASAN_SPEC STATIC_LIBASAN_LIBS > +#define LIBASAN_SPEC STATIC_LIBASAN_LIBS LIBSAN_RPATH > #elif defined(HAVE_LD_STATIC_DYNAMIC) > #define LIBASAN_SPEC "%{static-libasan:" LD_STATIC_OPTION \ > "} -lasan %{static-libasan:" LD_DYNAMIC_OPTION > "}" \ > STATIC_LIBASAN_LIBS > #else > -#define LIBASAN_SPEC "-lasan" STATIC_LIBASAN_LIBS > +#define LIBASAN_SPEC "-lasan" STATIC_LIBASAN_LIBS LIBSAN_RPATH > #endif > #endif > > @@ -778,13 +782,13 @@ proper position among the other output files. > */ > #define STATIC_LIBTSAN_LIBS \ > " %{static- > libtsan|static:%:include(libsanitizer.spec)%(link_libtsan)}" > #ifdef LIBTSAN_EARLY_SPEC > -#define LIBTSAN_SPEC STATIC_LIBTSAN_LIBS > +#define LIBTSAN_SPEC STATIC_LIBTSAN_LIBS LIBSAN_RPATH > #elif defined(HAVE_LD_STATIC_DYNAMIC) > #define LIBTSAN_SPEC "%{static-libtsan:" LD_STATIC_OPTION \ > "} -ltsan %{static-libtsan:" LD_DYNAMIC_OPTION > "}" \ > STATIC_LIBTSAN_LIBS > #else > -#define LIBTSAN_SPEC "-ltsan" STATIC_LIBTSAN_LIBS > +#define LIBTSAN_SPEC "-ltsan" STATIC_LIBTSAN_LIBS LIBSAN_RPATH > #endif > #endif > > @@ -793,7 +797,7 @@ proper position among the other output files. */ > #endif > > #ifndef LIBLSAN_SPEC > -#define STATIC_LIBLSAN_LIBS \ > +#define STATIC_LIBLSAN_LIBS LIBSAN_RPATH \ > " %{static- > liblsan|static:%:include(libsanitizer.spec)%(link_liblsan)}" > #ifdef LIBLSAN_EARLY_SPEC > #define LIBLSAN_SPEC STATIC_LIBLSAN_LIBS > @@ -802,7 +806,7 @@ proper position among the other output files. */ > "} -llsan %{static-liblsan:" LD_DYNAMIC_OPTION > "}" \ > STATIC_LIBLSAN_LIBS > #else > -#define LIBLSAN_SPEC "-llsan" STATIC_LIBLSAN_LIBS > +#define LIBLSAN_SPEC "-llsan" STATIC_LIBLSAN_LIBS LIBSAN_RPATH
Re: Mips: Fix kernel_stat structure size
On Fri, 2022-07-01 at 12:40 +, Dimitrije Milosevic wrote: > Fix kernel_stat structure size for non-Android 32-bit Mips. > LLVM currently has this value for the kernel_stat structure size, > as per compiler-rt/lib/sanitizer-common/sanitizer_platform_limits_posix.h. > This also resolves one of the build issues for non-Android 32-bit Mips. nit: the ChangeLog file name shall have no indents in the commit message, and there should be one tab (instead of 8 whitespaces) before the content. Like: libsanitizer/ChangeLog: * sanitizer_common/sanitizer_platform_limits_posix.h: Fix kernel_stat structure size for non-Android 32-bit Mips. Patch content LGTM as it just changes our code to match the upstream, but I don't have privilege to approve the change. Richard? > libsanitizer/ChangeLog: > > * sanitizer_common/sanitizer_platform_limits_posix.h: Fix > kernel_stat structure size for non-Android 32-bit Mips. > > --- > > libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.h > b/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.h > index 89772a7e5c0..62a99035db3 100644 > --- a/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.h > +++ b/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.h > @@ -83,7 +83,7 @@ const unsigned struct_kernel_stat64_sz = 104; > #elif defined(__mips__) > const unsigned struct_kernel_stat_sz = SANITIZER_ANDROID > ? FIRST_32_SECOND_64(104, 128) > - : FIRST_32_SECOND_64(144, 216); > + : FIRST_32_SECOND_64(160, 216); > const unsigned struct_kernel_stat64_sz = 104; > #elif defined(__s390__) && !defined(__s390x__) > const unsigned struct_kernel_stat_sz = 64; > > --- -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH] Mips: Resolve build issues for the n32 ABI
On Fri, 2022-07-01 at 12:40 +, Dimitrije Milosevic wrote: > Building the ASAN for the n32 MIPS ABI currently fails, due to a few reasons: > - defined(__mips64), which is set solely based on the architecture type > (32-bit/64-bit), > was still used in some places. Therefore, defined(__mips64) is swapped with > SANITIZER_MIPS64, > which takes the ABI into account as well - defined(__mips64) && > _MIPS_SIM == ABI64. > - The n32 ABI still uses 64-bit *Linux* system calls, even though the word > size is 32 bits. > - After the transition to canonical system calls > (https://reviews.llvm.org/D124212), the n32 ABI still didn't use them, > even though they are supported, > as per > https://github.com/torvalds/linux/blob/master/arch/mips/kernel/syscalls/syscall_n32.tbl. > > See https://reviews.llvm.org/D127098. > > libsanitizer/ChangeLog: > > * sanitizer_common/sanitizer_linux.cpp (defined): Resolve > ASAN build issues for the Mips n32 ABI. > * sanitizer_common/sanitizer_platform.h (defined): Likewise. LGTM (with the ChangeLog format fixed), but I think you need to commit this into LLVM repository first. And in the commit message you should say something like "cherry-pick 0011aabb... from upstream". Then we still require the approve from a maintainer. > --- > > libsanitizer/sanitizer_common/sanitizer_linux.cpp | 17 ++--- > libsanitizer/sanitizer_common/sanitizer_platform.h | 2 +- > 2 files changed, 11 insertions(+), 8 deletions(-) > > diff --git a/libsanitizer/sanitizer_common/sanitizer_linux.cpp > b/libsanitizer/sanitizer_common/sanitizer_linux.cpp > index e2c32d679ad..5ba033492e7 100644 > --- a/libsanitizer/sanitizer_common/sanitizer_linux.cpp > +++ b/libsanitizer/sanitizer_common/sanitizer_linux.cpp > @@ -34,7 +34,7 @@ > // format. Struct kernel_stat is defined as 'struct stat' in asm/stat.h. To > // access stat from asm/stat.h, without conflicting with definition in > // sys/stat.h, we use this trick. > -#if defined(__mips64) > +#if SANITIZER_MIPS64 > #include > #include > #define stat kernel_stat > @@ -124,8 +124,9 @@ const int FUTEX_WAKE_PRIVATE = FUTEX_WAKE | > FUTEX_PRIVATE_FLAG; > // Are we using 32-bit or 64-bit Linux syscalls? > // x32 (which defines __x86_64__) has SANITIZER_WORDSIZE == 32 > // but it still needs to use 64-bit syscalls. > -#if SANITIZER_LINUX && (defined(__x86_64__) || defined(__powerpc64__) || > \ > - SANITIZER_WORDSIZE == 64) > +#if SANITIZER_LINUX && (defined(__x86_64__) || defined(__powerpc64__) || \ > + SANITIZER_WORDSIZE == 64 || \ > + (defined(__mips__) && _MIPS_SIM == _ABIN32)) > # define SANITIZER_LINUX_USES_64BIT_SYSCALLS 1 > #else > # define SANITIZER_LINUX_USES_64BIT_SYSCALLS 0 > @@ -289,7 +290,7 @@ static void stat64_to_stat(struct stat64 *in, struct stat > *out) { > } > #endif > > -#if defined(__mips64) > +#if SANITIZER_MIPS64 > // Undefine compatibility macros from > // so that they would not clash with the kernel_stat > // st_[a|m|c]time fields > @@ -343,7 +344,8 @@ uptr internal_stat(const char *path, void *buf) { > #if SANITIZER_FREEBSD > return internal_syscall(SYSCALL(fstatat), AT_FDCWD, (uptr)path, (uptr)buf, > 0); > # elif SANITIZER_LINUX > -# if SANITIZER_WORDSIZE == 64 || SANITIZER_X32 > +# if SANITIZER_WORDSIZE == 64 || SANITIZER_X32 || \ > + (defined(__mips__) && _MIPS_SIM == _ABIN32) > return internal_syscall(SYSCALL(newfstatat), AT_FDCWD, (uptr)path, > (uptr)buf, > 0); > # else > @@ -366,7 +368,8 @@ uptr internal_lstat(const char *path, void *buf) { > return internal_syscall(SYSCALL(fstatat), AT_FDCWD, (uptr)path, (uptr)buf, > AT_SYMLINK_NOFOLLOW); > # elif SANITIZER_LINUX > -# if defined(_LP64) || SANITIZER_X32 > +# if defined(_LP64) || SANITIZER_X32 || \ > + (defined(__mips__) && _MIPS_SIM == _ABIN32) > return internal_syscall(SYSCALL(newfstatat), AT_FDCWD, (uptr)path, > (uptr)buf, > AT_SYMLINK_NOFOLLOW); > # else > @@ -1053,7 +1056,7 @@ uptr GetMaxVirtualAddress() { > return (1ULL << (MostSignificantSetBitIndex(GET_CURRENT_FRAME()) + 1)) - 1; > #elif SANITIZER_RISCV64 > return (1ULL << 38) - 1; > -# elif defined(__mips64) > +# elif SANITIZER_MIPS64 > return (1ULL << 40) - 1; // 0x00ffUL; > # elif defined(__s390x__) > return (1ULL << 53) - 1; // 0x001fUL; > diff --git a/libsanitizer/sanitizer_common/sanitizer_platform.h > b/libsanitizer/sanitizer_common/sanitizer_platform.h > index 8fe0d831431..8bd9a327623 100644 > --- a/libsanitizer/sanitizer_common/sanitizer_platform.h > +++ b/libsanitizer/sanitizer_common/sanitizer_platform.h > @@ -159,7 +159,7 @@ > > #if defined(__mips__) > # define SANITIZER_MIPS 1 > -# if defined(__mips64) > +# if defined(__mips
Re: [PATCH] loongarch: use -mno-check-zero-division as the default for optimized code
On Sat, 2022-07-02 at 15:39 +0800, Lulu Cheng wrote: > diff --git a/gcc/common/config/loongarch/loongarch-common.cc > b/gcc/common/config/loongarch/loongarch-common.cc > index b6cbd84b873..f8b4660fabf 100644 > --- a/gcc/common/config/loongarch/loongarch-common.cc > +++ b/gcc/common/config/loongarch/loongarch-common.cc > @@ -38,7 +38,4 @@ static const struct default_options > loongarch_option_optimization_table[] = > { OPT_LEVELS_NONE, 0, NULL, 0 } > }; > > -#undef TARGET_DEFAULT_TARGET_FLAGS > -#define TARGET_DEFAULT_TARGET_FLAGS MASK_CHECK_ZERO_DIV > - > > I think this modifications are needed, and there is no problem with the > rest. Yes, this hook is unneeded now. Though the removal is not strictly needed, it's good to remove irrelevant code (CWE-1164 says we shouldn't keep any irrelevant code). I'll commit the patch with the hook removed after another regtest on loongarch64-linux-gnu. I just rebuilt the entire system on my 3A5000, so I need some time to set it up. Expectation time to commit is today evening or tomorrow morning. BTW I've included this patch building my system, no bad things has happened so far.
Re: [PATCH] loongarch: use -mno-check-zero-division as the default for optimized code
On Sat, 2022-07-02 at 16:35 +0800, Lulu Cheng wrote: > 在 2022/7/2 下午4:24, Xi Ruoyao 写道: > > > > I'll commit the patch with the hook removed after another regtest on > > loongarch64-linux-gnu. I just rebuilt the entire system on my > > 3A5000, > > so I need some time to set it up. Expectation time to commit is > > today > > evening or tomorrow morning. > > > > BTW I've included this patch building my system, no bad things has > > happened so far. > Ok,Thanks!:-) Pushed as r13-1410. How do you think about the suggestion from Richard about a backport into gcc-12 branch? Normally we don't backport behavior changes, but making 12.1 the only exception seems compelling.
Re: [PATCH] loongarch: use -mno-check-zero-division as the default for optimized code
On Mon, 2022-07-04 at 14:25 +0800, Lulu Cheng wrote: > > How do you think about the suggestion from Richard about a backport into > > gcc-12 branch? Normally we don't backport behavior changes, but making > > 12.1 the only exception seems compelling. > > I agree with you and Richard. > > Thanks! Pushed r12-8546. wwwdocs patch preparing.
[PATCH][wwwdocs] gcc-12/changes.html: document LoongArch -m{no-,}check-zero-division default change for 12.2
Document a behavior change in r12-8546. Ok for wwwdocs? --- htdocs/gcc-12/changes.html | 26 ++ 1 file changed, 26 insertions(+) diff --git a/htdocs/gcc-12/changes.html b/htdocs/gcc-12/changes.html index ae03c3c6..ef957204 100644 --- a/htdocs/gcc-12/changes.html +++ b/htdocs/gcc-12/changes.html @@ -1027,6 +1027,32 @@ known to be fixed in the 12.1 release. This list might not be complete (that is, it is possible that some PRs that have been fixed are not listed here). + +GCC 12.2 + +Note: GCC 12.2 has not been released yet, so this section is a +work-in-progress. + +This is the https://gcc.gnu.org/bugzilla/buglist.cgi?bug_status=RESOLVED&resolution=FIXED&target_milestone=12.2";>list +of problem reports (PRs) from GCC's bug tracking system that are +known to be fixed in the 12.2 release. This list might not be +complete (that is, it is possible that some PRs that have been fixed +are not listed here). + +Target Specific Changes + +LoongArch + + + +The default setting of +-m[check|no-check]-zero-division is changed for optimized +code. Now -mno-check-zero-division is the default for +all optimization levels but -O0 and -Og. +The old behavior can be obtained by explicitly passing +-mcheck-zero-division to GCC. + + -- 2.37.0
Re: [PATCH] Mips: Enable asynchronous unwind tables with both ASAN and TSAN
On Mon, 2022-07-04 at 14:28 +, Dimitrije Milosevic wrote: > On Saturday, June 11, 2022 2:03 PM, Xi wrote: > > Just tried TSAN_SUPPORTED=yes with asynchronous unwind tables > > enabled, > > but I got some strange test failures for tls_race.c: > > > > FAIL: c-c++-common/tsan/tls_race.c -O0 output pattern test > > Output was: > > ThreadSanitizer: CHECK failed: tsan_platform_linux.cpp:452 > > "((thr_end)) <= ((tls_addr + tls_size))" (0xffec35f8c0, > > 0xffec35f784) (tid=748216) > > #0 __tsan::CheckUnwind() > > ../../../../gcc/libsanitizer/tsan/tsan_rtl.cpp:627 > > (libtsan.so.2+0xa30ec) > > #1 __sanitizer::CheckFailed(char const*, int, char const*, > > unsigned long long, unsigned long long) > > ../../../../gcc/libsanitizer/sanitizer_common/sanitizer_termination. > > cpp:86 (libtsan.so.2+0xeb8cc) > > #2 __tsan::ImitateTlsWrite(__tsan::ThreadState*, unsigned long, > > unsigned long) > > ../../../../gcc/libsanitizer/tsan/tsan_platform_linux.cpp:452 > > (libtsan.so.2+0xa0cac) > > #3 __tsan::ThreadStart(__tsan::ThreadState*, unsigned int, > > unsigned long long, __sanitizer::ThreadType) > > ../../../../gcc/libsanitizer/tsan/tsan_rtl_thread.cpp:197 > > (libtsan.so.2+0xc0e88) > > #4 __tsan_thread_start_func > > ../../../../gcc/libsanitizer/tsan/tsan_interceptors_posix.cpp:1009 > > (libtsan.so.2+0x3e5dc) > > #5 start_thread /sources/glibc-2.35/nptl/pthread_create.c:442 > > (libc.so.6+0xc75f4) > > > > I've tried to diagnose the root cause but failed. > > Hi Xi, thanks for looking into this. I've tried running the testsuite > on a cross-toolchain (as I do not currently have access to a physical > machine) > for a MIPS64R6 and the test passes successfully. Could you please > verify that the test fails solely based on this change? I guess you mean you are running MIPS64R6 target code with qemu? I'm not 100% sure because maybe something is wrong in my system. I'm now retrying on gcc230.fsffrance.org (an EdgeRouter 4 in Cfarm) but building GCC on it is really slow. The changes I've tested: diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc index 5eb845960e1..a7f0580e9ba 100644 --- a/gcc/config/mips/mips.cc +++ b/gcc/config/mips/mips.cc @@ -20115,10 +20115,11 @@ mips_option_override (void) target_flags |= MASK_64BIT; } - /* -fsanitize=address needs to turn on -fasynchronous-unwind-tables in - order for tracebacks to be complete but not if any - -fasynchronous-unwind-table were already specified. */ - if (flag_sanitize & SANITIZE_USER_ADDRESS + /* For -fsanitize=address or -fsanitize=thread, it's needed to turn + on -fasynchronous-unwind-tables in order for tracebacks + to be complete but not if any -fasynchronous-unwind-table + were already specified. */ + if (flag_sanitize & (SANITIZE_USER_ADDRESS | SANITIZE_THREAD) && !global_options_set.x_flag_asynchronous_unwind_tables) flag_asynchronous_unwind_tables = 1; diff --git a/libsanitizer/configure.tgt b/libsanitizer/configure.tgt index fb89df4935c..52546bbe4e3 100644 --- a/libsanitizer/configure.tgt +++ b/libsanitizer/configure.tgt @@ -55,6 +55,9 @@ case "${target}" in arm*-*-linux*) ;; mips*-*-linux*) + if test x$ac_cv_sizeof_void_p = x8; then + TSAN_SUPPORTED=yes + fi ;; aarch64*-*-linux*) if test x$ac_cv_sizeof_void_p = x8; then -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH] Mips: Enable asynchronous unwind tables with both ASAN and TSAN
On Mon, 2022-07-04 at 21:21 -0700, Fangrui Song wrote: > Clang considers that asan/msan/tsan/dataflow/etc enables > -fasynchronous-unwind-tables by default so I assume this GCC change is > fine. I agree it's fine, but the problem is TSAN is currently "unsupported" within GCC (i. e. when you build gcc libtsan is not built). So it does not make any benefit to commit this change before making TSAN supported on GCC side. Dimitrije told me TSAN should be supported on 64-bit MIPS, but I can't make it work fine with GCC. I'll need some time to debug... > With https://reviews.llvm.org/D102046 ("[sanitizer] Fall back to fast > unwinder"), compiler-rt may fall back to the frame pointer based > unwinder. There is not strong need to have the default > -fasynchronous-unwind-tables or -funwind-tables behavior. > However, most targets still default to omit frame pointer, so it's a > bit unfortunately that we need to enable unwind tables to get good > diagnostics. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH] Mips: Enable asynchronous unwind tables with both ASAN and TSAN
On Tue, 2022-07-05 at 12:51 +0800, Xi Ruoyao via Gcc-patches wrote: > I agree it's fine, but the problem is TSAN is currently "unsupported" > within GCC (i. e. when you build gcc libtsan is not built). So it > does > not make any benefit to commit this change before making TSAN > supported > on GCC side. > > Dimitrije told me TSAN should be supported on 64-bit MIPS, but I can't > make it work fine with GCC. I'll need some time to debug... Hi Fangrui, On my system dlpi_tls_modid for libtsan.so.2 is 2, instead of 1. GetStaticTlsBoundary seems assuming the dlpi_tls_modid for libtsan.so to be 1, and returning wrong values if the assumption is broken. Is it unsafe to make such an assumption? Or, am I being haunted by a glibc bug? -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH] Mips: Enable asynchronous unwind tables with both ASAN and TSAN
On Mon, 2022-07-04 at 21:21 -0700, Fangrui Song wrote: > > > > > > > FAIL: c-c++-common/tsan/tls_race.c -O0 output pattern test > > > > Output was: > > > > ThreadSanitizer: CHECK failed: tsan_platform_linux.cpp:452 > > > > "((thr_end)) <= ((tls_addr + tls_size))" (0xffec35f8c0, > > > > 0xffec35f784) (tid=748216) > > > > #0 __tsan::CheckUnwind() > > > > ../../../../gcc/libsanitizer/tsan/tsan_rtl.cpp:627 > > > > (libtsan.so.2+0xa30ec) > > > > #1 __sanitizer::CheckFailed(char const*, int, char const*, > > > > unsigned long long, unsigned long long) > > > > ../../../../gcc/libsanitizer/sanitizer_common/sanitizer_termination. > > > > cpp:86 (libtsan.so.2+0xeb8cc) > > > > #2 __tsan::ImitateTlsWrite(__tsan::ThreadState*, unsigned long, > > > > unsigned long) > > > > ../../../../gcc/libsanitizer/tsan/tsan_platform_linux.cpp:452 > > > > (libtsan.so.2+0xa0cac) > > > > #3 __tsan::ThreadStart(__tsan::ThreadState*, unsigned int, > > > > unsigned long long, __sanitizer::ThreadType) > > > > ../../../../gcc/libsanitizer/tsan/tsan_rtl_thread.cpp:197 > > > > (libtsan.so.2+0xc0e88) > > > > #4 __tsan_thread_start_func > > > > ../../../../gcc/libsanitizer/tsan/tsan_interceptors_posix.cpp:1009 > > > > (libtsan.so.2+0x3e5dc) > > > > #5 start_thread /sources/glibc-2.35/nptl/pthread_create.c:442 > > > > (libc.so.6+0xc75f4) > > > > > > > > I've tried to diagnose the root cause but failed. > > > > > > Hi Xi, thanks for looking into this. I've tried running the testsuite > > > on a cross-toolchain (as I do not currently have access to a physical > > > machine) > > > for a MIPS64R6 and the test passes successfully. Could you please > > > verify that the test fails solely based on this change? I've got a clue about why this happens. See https://reviews.llvm.org/D129112. It depends on how the dynamic linker arranges TLS blocks so different version of Glibc may produce different results. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH] Mips: Resolve build issues for the n32 ABI
On Wed, 2022-07-06 at 11:34 +, Dimitrije Milosevic wrote: > Ping. :) > This change just landed on LLVM (see > https://reviews.llvm.org/rG5d8077565e4196efdd4ed525a64c11a96d5aa5dd). > Unfortunately, I do not have commit access, so if anyone can commit it, that > would be great! > Here is the patch with the updated commit message. I will push this and https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596226.html (with ChangeLog format fixed) after a regression test. If you will submit more GCC patches, please note: (1) You may need to sign a FSF copyright assignment or contribute further patches under DCO. See "Legal Prerequisites" in https://gcc.gnu.org/contribute.html. For these two patches I'd consider them "small" enough and acceptable w/o copyright assignment or DCO. (2) You may require a sourceware.org account with write access to gcc.git if you can get the approve from Richard or another Global Reviewer.
Re: [PATCH] Mips: Resolve build issues for the n32 ABI
On Thu, 2022-07-07 at 02:11 +0800, Xi Ruoyao via Gcc-patches wrote: > On Wed, 2022-07-06 at 11:34 +, Dimitrije Milosevic wrote: > > Ping. :) > > This change just landed on LLVM (see > > https://reviews.llvm.org/rG5d8077565e4196efdd4ed525a64c11a96d5aa5dd) > > . > > Unfortunately, I do not have commit access, so if anyone can commit > > it, that would be great! > > Here is the patch with the updated commit message. > > I will push this and > https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596226.html (with > ChangeLog format fixed) after a regression test. Pushed r13-1548 and r13-1549. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
[PATCH 0/2] loongarch: improve code generation for integer division
We were generating some unnecessary instructions for integer division. These two patches improve the code generation to compile template T div(T a, T b) { return a / b; } into a single division instruction (along with a return instruction of course) as we expected for T in {int32_t, uint32_t, int64_t}. Bootstrapped and regtested on loongarch64-linux-gnu. Ok for trunk? Xi Ruoyao (2): loongarch: add alternatives for idiv insns to improve code generation loongarch: avoid unnecessary sign-extend after 32-bit division gcc/config/loongarch/loongarch-protos.h| 1 + gcc/config/loongarch/loongarch.cc | 2 +- gcc/config/loongarch/loongarch.md | 34 -- gcc/testsuite/gcc.target/loongarch/div-1.c | 9 ++ gcc/testsuite/gcc.target/loongarch/div-2.c | 9 ++ gcc/testsuite/gcc.target/loongarch/div-3.c | 9 ++ gcc/testsuite/gcc.target/loongarch/div-4.c | 9 ++ 7 files changed, 63 insertions(+), 10 deletions(-) create mode 100644 gcc/testsuite/gcc.target/loongarch/div-1.c create mode 100644 gcc/testsuite/gcc.target/loongarch/div-2.c create mode 100644 gcc/testsuite/gcc.target/loongarch/div-3.c create mode 100644 gcc/testsuite/gcc.target/loongarch/div-4.c -- 2.37.0
[PATCH 1/2] loongarch: add alternatives for idiv insns to improve code generation
Currently in the description of LoongArch integer division instructions, the output is marked as earlyclobbered ('&'). It's necessary when loongarch_check_zero_div_p() because clobbering operand 2 (divisor) will make the checking for zero divisor impossible. But, for -mno-check-zero-division (the default of GCC >= 12.2 for optimized code), the output is not earlyclobbered at all. And, the read of operand 1 only occurs before clobbering the output. So we make three alternatives for an idiv instruction: * (=r,r,r): For -mno-check-zero-division. * (=&r,r,r): For -mcheck-zero-division. * (=&r,0,r): For -mcheck-zero-division, to explicitly allow patterns like "div.d $a0, $a0, $a1". gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_check_zero_div_p): Remove static, for use in the machine description file. * config/loongarch/loongarch-protos.h: (loongarch_check_zero_div_p): Add prototype. * config/loongarch/loongarch.md (enabled): New attr. (*3): Add (=r,r,r) and (=&r,0,r) alternatives for idiv. Conditionally enable the alternatives using loongarch_check_zero_div_p. (di3_fake): Likewise. gcc/testsuite/ChangeLog: * gcc.target/loongarch/div-1.c: New test. * gcc.target/loongarch/div-2.c: New test. * gcc.target/loongarch/div-3.c: New test. --- gcc/config/loongarch/loongarch-protos.h| 1 + gcc/config/loongarch/loongarch.cc | 2 +- gcc/config/loongarch/loongarch.md | 28 +++--- gcc/testsuite/gcc.target/loongarch/div-1.c | 9 +++ gcc/testsuite/gcc.target/loongarch/div-2.c | 9 +++ gcc/testsuite/gcc.target/loongarch/div-3.c | 9 +++ 6 files changed, 49 insertions(+), 9 deletions(-) create mode 100644 gcc/testsuite/gcc.target/loongarch/div-1.c create mode 100644 gcc/testsuite/gcc.target/loongarch/div-2.c create mode 100644 gcc/testsuite/gcc.target/loongarch/div-3.c diff --git a/gcc/config/loongarch/loongarch-protos.h b/gcc/config/loongarch/loongarch-protos.h index 2144c2421ed..2287fd3763c 100644 --- a/gcc/config/loongarch/loongarch-protos.h +++ b/gcc/config/loongarch/loongarch-protos.h @@ -130,6 +130,7 @@ extern bool loongarch_symbol_binds_local_p (const_rtx); extern const char *current_section_name (void); extern unsigned int current_section_flags (void); extern bool loongarch_use_ins_ext_p (rtx, HOST_WIDE_INT, HOST_WIDE_INT); +extern bool loongarch_check_zero_div_p (void); union loongarch_gen_fn_ptrs { diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index d72b256df51..bc56282c9a7 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -2104,7 +2104,7 @@ loongarch_load_store_insns (rtx mem, rtx_insn *insn) /* Return true if we need to trap on division by zero. */ -static bool +bool loongarch_check_zero_div_p (void) { /* if -m[no-]check-zero-division is given explicitly. */ diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md index d3c809e25f3..b002eb2ac22 100644 --- a/gcc/config/loongarch/loongarch.md +++ b/gcc/config/loongarch/loongarch.md @@ -110,6 +110,8 @@ (define_constants ;; ;; +(define_attr "enabled" "no,yes" (const_string "yes")) + (define_attr "got" "unset,load" (const_string "unset")) @@ -763,26 +765,36 @@ (define_expand "3" }) (define_insn "*3" - [(set (match_operand:GPR 0 "register_operand" "=&r") - (any_div:GPR (match_operand:GPR 1 "register_operand" "r") -(match_operand:GPR 2 "register_operand" "r")))] + [(set (match_operand:GPR 0 "register_operand" "=r,&r,&r") + (any_div:GPR (match_operand:GPR 1 "register_operand" "r,r,0") +(match_operand:GPR 2 "register_operand" "r,r,r")))] "" { return loongarch_output_division (".\t%0,%1,%2", operands); } [(set_attr "type" "idiv") - (set_attr "mode" "")]) + (set_attr "mode" "") + (set (attr "enabled") + (if_then_else + (match_test "!!which_alternative == loongarch_check_zero_div_p()") + (const_string "yes") + (const_string "no")))]) (define_insn "di3_fake" - [(set (match_operand:SI 0 "register_operand" "=&r") - (any_div:SI (match_operand:DI 1 "register_operand" "r") - (match_operand:DI 2 "register_operand" "r")))] + [(set (match_operand:SI 0 "register_operand" "=r,&r,&r") + (any_div:SI (match_operand:DI 1 "register_operand" "r,r,0") + (match_operand:DI 2 "register_operand" "r,r,r")))] "" { return loongarch_output_division (".w\t%0,%1,%2", operands); } [(set_attr "type" "idiv") - (set_attr "mode" "SI")]) + (set_attr "mode" "SI") + (set (attr "enabled") + (if_then_else + (match_test "!!which_alternative == loongarch_check_zero_div_p()") + (const_string "yes") + (const_string "no")))]) ;; Floating point multiply accumulate instructions. diff --git a/gcc/testsu
[PATCH 2/2] loongarch: avoid unnecessary sign-extend after 32-bit division
Like add.w/sub.w/mul.w, div.w/mod.w/div.wu/mod.wu also sign-extend the output on LA64. But, LoongArch v1.00 mandates that the inputs of 32-bit division to be sign-extended so we have to expand 32-bit division into RTL sequences. We defined div.w/mod.w/div.wu/mod.wu as a (DI, DI) -> SI instruction. This definition does not indicate the fact that these instructions will store the result as sign-extended value in a 64-bit GR. Then the compiler would emit unnecessary sign-extend operations. For example: int div(int a, int b) { return a / b; } was compiled to: div.w $r4, $r4, $r5 slli.w $r4, $r4, 0# this is unnecessary jr $r1 To remove this unnecessary operation, we change the division instructions to (DI, DI) -> DI and describe the sign-extend behavior explicitly in the RTL template. In the expander for 32-bit division we then use simplify_gen_subreg to extract the lower 32 bits. gcc/ChangeLog: * config/loongarch/loongarch.md (di3_fake): Describe the sign-extend of result in the RTL template. (3): Adjust for di3_fake change. gcc/testsuite/ChangeLog: * gcc.target/loongarch/div-4.c: New test. --- gcc/config/loongarch/loongarch.md | 12 gcc/testsuite/gcc.target/loongarch/div-4.c | 9 + 2 files changed, 17 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/gcc.target/loongarch/div-4.c diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md index b002eb2ac22..0202f73efae 100644 --- a/gcc/config/loongarch/loongarch.md +++ b/gcc/config/loongarch/loongarch.md @@ -752,6 +752,7 @@ (define_expand "3" { rtx reg1 = gen_reg_rtx (DImode); rtx reg2 = gen_reg_rtx (DImode); +rtx rd = gen_reg_rtx (DImode); operands[1] = gen_rtx_SIGN_EXTEND (word_mode, operands[1]); operands[2] = gen_rtx_SIGN_EXTEND (word_mode, operands[2]); @@ -759,7 +760,9 @@ (define_expand "3" emit_insn (gen_rtx_SET (reg1, operands[1])); emit_insn (gen_rtx_SET (reg2, operands[2])); -emit_insn (gen_di3_fake (operands[0], reg1, reg2)); +emit_insn (gen_di3_fake (rd, reg1, reg2)); +emit_insn (gen_rtx_SET (operands[0], + simplify_gen_subreg (SImode, rd, DImode, 0))); DONE; } }) @@ -781,9 +784,10 @@ (define_insn "*3" (const_string "no")))]) (define_insn "di3_fake" - [(set (match_operand:SI 0 "register_operand" "=r,&r,&r") - (any_div:SI (match_operand:DI 1 "register_operand" "r,r,0") - (match_operand:DI 2 "register_operand" "r,r,r")))] + [(set (match_operand:DI 0 "register_operand" "=r,&r,&r") + (sign_extend:DI + (any_div:SI (match_operand:DI 1 "register_operand" "r,r,0") + (match_operand:DI 2 "register_operand" "r,r,r"] "" { return loongarch_output_division (".w\t%0,%1,%2", operands); diff --git a/gcc/testsuite/gcc.target/loongarch/div-4.c b/gcc/testsuite/gcc.target/loongarch/div-4.c new file mode 100644 index 000..a52f87d6caf --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/div-4.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ +/* { dg-final { scan-assembler-not "slli" } } */ + +int +div(int a, int b) +{ + return a / b; +} -- 2.37.0
Re: [PATCH] LoongArch: Modify fp_sp_offset and gp_sp_offset's calculation method, when frame->mask or frame->fmask is zero.
On Thu, 2022-07-07 at 16:31 +0800, WANG Xuerui wrote: > IMO it's better to also state which problem this change is meant to > solve (i.e. your intent), better yet, with an appropriate bugzilla > link. And/or add a testcase (which FAILs without this change) into gcc/testsuite/gcc.target/loongarch. > > --- > > gcc/config/loongarch/loongarch.cc | 12 +--- > > 1 file changed, 9 insertions(+), 3 deletions(-) > > > > diff --git a/gcc/config/loongarch/loongarch.cc > > b/gcc/config/loongarch/loongarch.cc > > index d72b256df51..5c9a33c14f7 100644 > > --- a/gcc/config/loongarch/loongarch.cc > > +++ b/gcc/config/loongarch/loongarch.cc > > @@ -917,8 +917,12 @@ loongarch_compute_frame_info (void) > > frame->frame_pointer_offset = offset; > > /* Next are the callee-saved FPRs. */ > > if (frame->fmask) > > - offset += LARCH_STACK_ALIGN (num_f_saved * UNITS_PER_FP_REG); > > - frame->fp_sp_offset = offset - UNITS_PER_FP_REG; > > + { > > + offset += LARCH_STACK_ALIGN (num_f_saved * UNITS_PER_FP_REG); > > + frame->fp_sp_offset = offset - UNITS_PER_FP_REG; > > + } > > + else > > + frame->fp_sp_offset = offset; > > /* Next are the callee-saved GPRs. */ > > if (frame->mask) > > { > > @@ -931,8 +935,10 @@ loongarch_compute_frame_info (void) > > frame->save_libcall_adjustment = x_save_size; > > > > offset += x_save_size; > > + frame->gp_sp_offset = offset - UNITS_PER_WORD; > > } > > - frame->gp_sp_offset = offset - UNITS_PER_WORD; > > + else > > + frame->gp_sp_offset = offset; > > /* The hard frame pointer points above the callee-saved GPRs. > > */ > > frame->hard_frame_pointer_offset = offset; > > /* Above the hard frame pointer is the callee-allocated varags > > save area. */ -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH v3] LoongArch: Modify fp_sp_offset and gp_sp_offset's calculation method when frame->mask or frame->fmask is zero.
On Thu, 2022-07-07 at 18:30 +0800, Lulu Cheng wrote: /* snip */ > diff --git a/gcc/testsuite/gcc.target/loongarch/prolog-opt.c > b/gcc/testsuite/gcc.target/loongarch/prolog-opt.c > new file mode 100644 > index 000..c7bd71dde93 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/loongarch/prolog-opt.c > @@ -0,0 +1,15 @@ > +/* Test that LoongArch backend stack drop operation optimized. */ > + > +/* { dg-do compile } */ > +/* { dg-options "-O2 -mabi=lp64d" } */ > +/* { dg-final { scan-assembler "addi.d\t\\\$r3,\\\$r3,-16" } } */ > + > +#include It's better to hard code "extern int printf(char *, ...);" here, so the test case won't unnecessarily depend on libc header. LGTM otherwise. > + > +int main() > +{ > + char buf[1024 * 12]; > + printf ("%p\n", buf); > + return 0; > +} > +
[PATCH] loongarch: fix mulsidi3_64bit instruction
I think this should be obvious. Ok for trunk and gcc-12 branch? (Note: this bug really amazed me. It's just a simple typo and all of us failed to spot it reviewing the LoongArch port. Incredibly, it can be reproduced with such a simple test case (in the patch) but did not blow the entire system up. I didn't see anything abnormal until it blown up two UBSan test cases when I tried to port UBSan for LoongArch.) -- >8 -- (mult (sign_extend:DI rj:SI) (sign_extend:DI rk:SI)) should be "mulw.d.w", not "mul.d". gcc/ChangeLog: * config/loongarch/loongarch.md (mulsidi3_64bit): Use mulw.d.w instead of mul.d. gcc/testsuite/ChangeLog: * gcc.target/loongarch/mul-1.c: New test. * gcc.target/loongarch/mul-2.c: New test. --- gcc/config/loongarch/loongarch.md | 2 +- gcc/testsuite/gcc.target/loongarch/mul-1.c | 20 gcc/testsuite/gcc.target/loongarch/mul-2.c | 10 ++ 3 files changed, 31 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/loongarch/mul-1.c create mode 100644 gcc/testsuite/gcc.target/loongarch/mul-2.c diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md index d3c809e25f3..8f8412fba84 100644 --- a/gcc/config/loongarch/loongarch.md +++ b/gcc/config/loongarch/loongarch.md @@ -621,7 +621,7 @@ (define_insn "mulsidi3_64bit" (mult:DI (sign_extend:DI (match_operand:SI 1 "register_operand" "r")) (sign_extend:DI (match_operand:SI 2 "register_operand" "r"] "TARGET_64BIT" - "mul.d\t%0,%1,%2" + "mulw.d.w\t%0,%1,%2" [(set_attr "type" "imul") (set_attr "mode" "DI")]) diff --git a/gcc/testsuite/gcc.target/loongarch/mul-1.c b/gcc/testsuite/gcc.target/loongarch/mul-1.c new file mode 100644 index 000..8b6800804fb --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/mul-1.c @@ -0,0 +1,20 @@ +/* { dg-do run } */ + +typedef __INT64_TYPE__ int64_t; +typedef __INT32_TYPE__ int32_t; + +/* f() was misoptimized to a single "mul.d" instruction on LA64. */ +__attribute__((noipa, noinline)) int64_t +f(int64_t a, int64_t b) +{ + return (int64_t)(int32_t)a * (int64_t)(int32_t)b; +} + +int +main() +{ + int64_t a = 0x11451401; + int64_t b = 0x19198101; + if (f(a, b) != 1) +__builtin_abort(); +} diff --git a/gcc/testsuite/gcc.target/loongarch/mul-2.c b/gcc/testsuite/gcc.target/loongarch/mul-2.c new file mode 100644 index 000..a9a713210df --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/mul-2.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mabi=lp64d" } */ +/* { dg-final { scan-assembler "mulw.d.w\t\\\$r4,\\\$r5,\\\$r4" } } */ + +/* This should be optimized to mulw.d.w for LA64. */ +__attribute__((noipa, noinline)) long +f(long a, long b) +{ + return (long)(int)a * (long)(int)b; +} -- 2.37.0
Re: [PATCH v2] loongarch: fix mulsidi3_64bit instruction
v2: Move one portable test to gcc.c-torture so it will be tested with all optimization levels. And it might be helpful if the engineer of the next GCC port makes a similar typo :). -- >8 -- (mult (sign_extend:DI rj:SI) (sign_extend:DI rk:SI)) should be "mulw.d.w", not "mul.d". gcc/ChangeLog: * config/loongarch/loongarch.md (mulsidi3_64bit): Use mulw.d.w instead of mul.d. gcc/testsuite/ChangeLog: * gcc.target/loongarch/mulw_d_w.c: New test. * gcc.c-torture/execute/mul-sext.c: New test. --- gcc/config/loongarch/loongarch.md | 2 +- .../gcc.c-torture/execute/mul-sext.c | 20 +++ gcc/testsuite/gcc.target/loongarch/mulw_d_w.c | 10 ++ 3 files changed, 31 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.c-torture/execute/mul-sext.c create mode 100644 gcc/testsuite/gcc.target/loongarch/mulw_d_w.c diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md index d3c809e25f3..8f8412fba84 100644 --- a/gcc/config/loongarch/loongarch.md +++ b/gcc/config/loongarch/loongarch.md @@ -621,7 +621,7 @@ (define_insn "mulsidi3_64bit" (mult:DI (sign_extend:DI (match_operand:SI 1 "register_operand" "r")) (sign_extend:DI (match_operand:SI 2 "register_operand" "r"] "TARGET_64BIT" - "mul.d\t%0,%1,%2" + "mulw.d.w\t%0,%1,%2" [(set_attr "type" "imul") (set_attr "mode" "DI")]) diff --git a/gcc/testsuite/gcc.c-torture/execute/mul-sext.c b/gcc/testsuite/gcc.c-torture/execute/mul-sext.c new file mode 100644 index 000..8b6800804fb --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/execute/mul-sext.c @@ -0,0 +1,20 @@ +/* { dg-do run } */ + +typedef __INT64_TYPE__ int64_t; +typedef __INT32_TYPE__ int32_t; + +/* f() was misoptimized to a single "mul.d" instruction on LA64. */ +__attribute__((noipa, noinline)) int64_t +f(int64_t a, int64_t b) +{ + return (int64_t)(int32_t)a * (int64_t)(int32_t)b; +} + +int +main() +{ + int64_t a = 0x11451401; + int64_t b = 0x19198101; + if (f(a, b) != 1) +__builtin_abort(); +} diff --git a/gcc/testsuite/gcc.target/loongarch/mulw_d_w.c b/gcc/testsuite/gcc.target/loongarch/mulw_d_w.c new file mode 100644 index 000..a9a713210df --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/mulw_d_w.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mabi=lp64d" } */ +/* { dg-final { scan-assembler "mulw.d.w\t\\\$r4,\\\$r5,\\\$r4" } } */ + +/* This should be optimized to mulw.d.w for LA64. */ +__attribute__((noipa, noinline)) long +f(long a, long b) +{ + return (long)(int)a * (long)(int)b; +} -- 2.37.0
Re: Mips: Fix kernel_stat structure size
On Fri, 2022-07-08 at 21:42 -0400, Hans-Peter Nilsson wrote: > On Fri, 1 Jul 2022, Dimitrije Milosevic wrote: > > > Fix kernel_stat structure size for non-Android 32-bit Mips. > > LLVM currently has this value for the kernel_stat structure size, > > as per compiler-rt/lib/sanitizer- > > common/sanitizer_platform_limits_posix.h. > > This also resolves one of the build issues for non-Android 32-bit > > Mips. > > I insist that PR105614 comment #7 is the way to go, i.e. fix > the merge error, avoiding the faulty include that it > reintroduced. Was this tested on O32? I'm pretty sure it is *not* the way to go. Sanitizer does not really intercept system call. It intercepts libc stat() or lstat() etc. calls. So you need to keep struct_kernel_stat_sz same as the size of struct stat in libc, i. e. "the size of buffer which *libc* stat()-like functions writing into". The "kernel_" in the name is just misleading. And, if you still think it should be the way to go, let's submit the change to LLVM and get it reviewed properly. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
[PATCH v3] loongarch: fix mulsidi3_64bit instruction
v3: Relax scan-assembler pattern in test case mulw_d_w.c. It's because multiplication is Abelian and the compiler may switch the order of operands in the future. -- >8 -- (mult (sign_extend:DI rj:SI) (sign_extend:DI rk:SI)) should be "mulw.d.w", not "mul.d". gcc/ChangeLog: * config/loongarch/loongarch.md (mulsidi3_64bit): Use mulw.d.w instead of mul.d. gcc/testsuite/ChangeLog: * gcc.target/loongarch/mulw_d_w.c: New test. * gcc.c-torture/execute/mul-sext.c: New test. --- gcc/config/loongarch/loongarch.md | 2 +- .../gcc.c-torture/execute/mul-sext.c | 20 +++ gcc/testsuite/gcc.target/loongarch/mulw_d_w.c | 10 ++ 3 files changed, 31 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.c-torture/execute/mul-sext.c create mode 100644 gcc/testsuite/gcc.target/loongarch/mulw_d_w.c diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md index d3c809e25f3..8f8412fba84 100644 --- a/gcc/config/loongarch/loongarch.md +++ b/gcc/config/loongarch/loongarch.md @@ -621,7 +621,7 @@ (define_insn "mulsidi3_64bit" (mult:DI (sign_extend:DI (match_operand:SI 1 "register_operand" "r")) (sign_extend:DI (match_operand:SI 2 "register_operand" "r"] "TARGET_64BIT" - "mul.d\t%0,%1,%2" + "mulw.d.w\t%0,%1,%2" [(set_attr "type" "imul") (set_attr "mode" "DI")]) diff --git a/gcc/testsuite/gcc.c-torture/execute/mul-sext.c b/gcc/testsuite/gcc.c-torture/execute/mul-sext.c new file mode 100644 index 000..8b6800804fb --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/execute/mul-sext.c @@ -0,0 +1,20 @@ +/* { dg-do run } */ + +typedef __INT64_TYPE__ int64_t; +typedef __INT32_TYPE__ int32_t; + +/* f() was misoptimized to a single "mul.d" instruction on LA64. */ +__attribute__((noipa, noinline)) int64_t +f(int64_t a, int64_t b) +{ + return (int64_t)(int32_t)a * (int64_t)(int32_t)b; +} + +int +main() +{ + int64_t a = 0x11451401; + int64_t b = 0x19198101; + if (f(a, b) != 1) +__builtin_abort(); +} diff --git a/gcc/testsuite/gcc.target/loongarch/mulw_d_w.c b/gcc/testsuite/gcc.target/loongarch/mulw_d_w.c new file mode 100644 index 000..4ab7df8836b --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/mulw_d_w.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mabi=lp64d" } */ +/* { dg-final { scan-assembler "mulw.d.w" } } */ + +/* This should be optimized to mulw.d.w for LA64. */ +__attribute__((noipa, noinline)) long +f(long a, long b) +{ + return (long)(int)a * (long)(int)b; +} -- 2.37.0
Re: [PATCH v3] loongarch: fix mulsidi3_64bit instruction
On Sun, 2022-07-10 at 09:45 +0800, Lulu Cheng wrote: > > 在 2022/7/9 上午10:56, Xi Ruoyao 写道: > > v3: Relax scan-assembler pattern in test case mulw_d_w.c. It's > > because > > multiplication is Abelian and the compiler may switch the order of > > operands in the future. > > -- >8 -- > > > > (mult (sign_extend:DI rj:SI) (sign_extend:DI rk:SI)) should be > > "mulw.d.w", not "mul.d". > > > > gcc/ChangeLog: > > > > * config/loongarch/loongarch.md (mulsidi3_64bit): Use > > mulw.d.w > > instead of mul.d. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/loongarch/mulw_d_w.c: New test. > > * gcc.c-torture/execute/mul-sext.c: New test. > > --- > > I think there is no problem with this modification. > > Thankes! > Pushed r13-1591 and r12-8562. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH 0/2] loongarch: improve code generation for integer division
On Sun, 2022-07-10 at 10:20 +0800, Lulu Cheng wrote: > > 在 2022/7/7 上午10:23, Xi Ruoyao 写道: > > We were generating some unnecessary instructions for integer > > division. > > These two patches improve the code generation to compile > > > > template T div(T a, T b) { return a / b; } > > > > into a single division instruction (along with a return instruction > > of > > course) as we expected for T in {int32_t, uint32_t, int64_t}. > > > > Bootstrapped and regtested on loongarch64-linux-gnu. Ok for trunk? > > > > Xi Ruoyao (2): > > loongarch: add alternatives for idiv insns to improve code > > generation > > loongarch: avoid unnecessary sign-extend after 32-bit division > > > > gcc/config/loongarch/loongarch-protos.h | 1 + > > gcc/config/loongarch/loongarch.cc | 2 +- > > gcc/config/loongarch/loongarch.md | 34 -- > > > > gcc/testsuite/gcc.target/loongarch/div-1.c | 9 ++ > > gcc/testsuite/gcc.target/loongarch/div-2.c | 9 ++ > > gcc/testsuite/gcc.target/loongarch/div-3.c | 9 ++ > > gcc/testsuite/gcc.target/loongarch/div-4.c | 9 ++ > > 7 files changed, 63 insertions(+), 10 deletions(-) > > create mode 100644 gcc/testsuite/gcc.target/loongarch/div-1.c > > create mode 100644 gcc/testsuite/gcc.target/loongarch/div-2.c > > create mode 100644 gcc/testsuite/gcc.target/loongarch/div-3.c > > create mode 100644 gcc/testsuite/gcc.target/loongarch/div-4.c > > > LGTM and spec has been tested. > > Thanks! Pushed r13-1592 and r13-1593. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: Mips: Fix kernel_stat structure size
On Tue, 2022-07-12 at 06:42 +, Dimitrije Milosevic wrote: > Hi Hans-Peter, > You're right, this is not ok for the O32 ABI. Your change however, broke the > functionality > for the N32 ABI. AFAIK, the changes like this should go through LLVM first > (yours didn't), > so I will send out a review, covering both 32 ABIs, meaning that both O32 and > N32 ABIs > will be working properly. As for this change, I'm not sure what should be > done? > Should this be committed now, while the LLVM change is cherry-picked once > it's committed. I think just get it into LLVM first, then we sync with LLVM. It seems on o32 sizeof(struct stat) is 144, n32 -> 160, n64 -> 216 (tested on gcc23.fsffrance.org). "Luckily" we don't support other strange things like "o64".
Re: Mips: Fix kernel_stat structure size
On Tue, 2022-07-12 at 06:42 +, Dimitrije Milosevic wrote: > I will send out a review, covering both 32 ABIs, meaning that both O32 > and N32 ABIs will be working properly. Please CC me (@xry111) in LLVM review. By the way, if you have some spare time, you can also add the value for Musl (all 3 ABIs). Musl has a different struct stat than Glibc (it's PR 106136 in GCC bugzilla). -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH v1 1/2] LoongArch: Modify the method of obtaining symbolic addresses.
The change seems too large. It would be better to split it into multiple commits (for example, just 3 commits for 1,2,3 below). On Tue, 2022-07-19 at 21:08 +0800, Lulu Cheng wrote: > 1. The original LA macro instruction is split into two instructions to > obtain the address of the symbol if enable '-mexplicit-relocs'. It's better to add some test cases (with dg-final scan-assembler) for this. The test case will also show humans the intended behavior after the change. > 3. Modify the method that calls global functions. From 'la.global + jirl' > to 'bl'. Why? Does it means we'll rely on the assembler to emit the correct sequence for -fno-plt? Then it would be better to use a pseudo mnemonic like "call" instead of "bl" (because it's not a single "bl" instruction). /* snip */ > static bool > loongarch_valid_index_p (struct loongarch_address_info *info, rtx x, > machine_mode mode, bool strict_p) > @@ -1881,6 +1978,26 @@ loongarch_classify_address (struct > loongarch_address_info *info, rtx x, > info->offset = XEXP (x, 1); > return (loongarch_valid_base_register_p (info->reg, mode, strict_p) > && loongarch_valid_offset_p (info->offset, mode)); > + > + case LO_SUM: > + info->type = ADDRESS_LO_SUM; > + info->reg = XEXP (x, 0); > + info->offset = XEXP (x, 1); > + /* We have to trust the creator of the LO_SUM to do something vaguely > + sane. Target-independent code that creates a LO_SUM should also > + create and verify the matching HIGH. Target-independent code that > + adds an offset to a LO_SUM must prove that the offset will not > + induce a carry. Failure to do either of these things would be > + a bug, and we are not required to check for it here. The MIPS Don't copy MIPS code. > +static bool loongarch_split_move_insn_p (rtx dest, rtx src); Nit: "loongarch_split_move_insn_p" shall start a new line.
Re: [PATCH] libsanitizer: Cherry-pick 2bfb0fcb51510f22723c8cdfefe from upstream
LGTM. Requiring permission to push. > +const unsigned struct_kernel_stat_sz = > + SANITIZER_ANDROID > + ? FIRST_32_SECOND_64(104, 128) > + : FIRST_32_SECOND_64((_MIPS_SIM == _ABIN32) ? 160 : 144, 216); These values matches sizeof(struct stat) on gcc230, so should be correct: xry111@gcc230:~$ cat t.c #include #include int main() { printf("%d\n", sizeof(struct stat)); } xry111@gcc230:~$ cc t.c -mabi=32 && ./a.out 144 xry111@gcc230:~$ cc t.c -mabi=n32 && ./a.out 160 xry111@gcc230:~$ cc t.c -mabi=64 && ./a.out 216 -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
PING: [PATCH] libsanitizer: don't enable for MIPS Linux without GNU libc [PR 106136]
Gentle ping :). On Thu, 2022-06-30 at 12:22 +0800, Xi Ruoyao wrote: > In libsanitizer code, the size of some GNU libc data structure > (notably, > struct stat) is hard coded. These sizes may trigger a static assert > buidling against another libc. > > Just make non-GNU libc targets UNSUPPORTED now. If someone really > cares > about those alternative libc implementations, please submit patch to > LLVM project adding the support. > > libsanitizer/ChangeLog > > PR sanitizer/106136 > * configure.tgt: Change mips*-*-linux* to mips*-*-linux-gnu* > because it fails to build with non-GNU libc. > --- > libsanitizer/configure.tgt | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/libsanitizer/configure.tgt b/libsanitizer/configure.tgt > index fb89df4935c..54b74b60e2f 100644 > --- a/libsanitizer/configure.tgt > +++ b/libsanitizer/configure.tgt > @@ -54,7 +54,7 @@ case "${target}" in > ;; > arm*-*-linux*) > ;; > - mips*-*-linux*) > + mips*-*-linux-gnu*) > ;; > aarch64*-*-linux*) > if test x$ac_cv_sizeof_void_p = x8; then -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH][wwwdocs] gcc-13: Add loongarch '-mexplicit-relocs' support
+Xuerui (his English is much better than mine). On Tue, 2022-07-26 at 15:21 +0800, Lulu Cheng wrote: > Hi, > Recently we added split symbol support, changed in r13-1834. > It is ok for wwwdocs? > +LoongArch > + > + The option -mexplicit-relocs has been added, this indicate I think "added and enabled by default" would be better. > + whether the la.* macro instructions will be generated when > + loading symbolic addresses. > + This feature requires binutils version 2.40 or later. If you want to use > the > + older version of bintuils, add compiler parameters > + -mno-explicit-relocs at compile time. Does it mean we need to make sure GCC 13 released after binutils-2.40? binutils-2.39 release branch is already created and it's now explicitly "no new feature" so a backport seems impossible... > + > + The method for calling global functions changed from > + la.global + jirl to bl when complied add > + -fplt. "from la.global + jirl to bl with -fno-plt and -mexplicit-relocs"? With "-fplt" GCC 12 is already using bl, and with -mno-explicit-relocs la.global is still used (if I read func-call-3.c correctly). > + Enable option -fsection-anchors when -O1 and > + more advanced optimization. "-fsection-anchors is enabled by default at -O1 and higher optimization levels for better code generation". > + Changed ASM_PREFERRED_EH_DATA_FORMAT macro definition from > + WD_EH_PE_absptr to WD_EH_PE_pcrel | > DW_EH_PE_sdata4. > + I don't think this paragraph is necessary because this change is purely internal. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH][wwwdocs] gcc-13: Add loongarch '-mexplicit-relocs' support
On Tue, 2022-07-26 at 19:42 +0800, Lulu Cheng wrote: > 在 2022/7/26 下午5:44, Xi Ruoyao 写道: > > > > > + whether the la.* macro instructions will be generated when > > > + loading symbolic addresses. > > > + This feature requires binutils version 2.40 or later. If you want to > > > use the > > > + older version of bintuils, add compiler parameters > > > + -mno-explicit-relocs at compile time. > > Does it mean we need to make sure GCC 13 released after binutils-2.40? > > binutils-2.39 release branch is already created and it's now explicitly > > "no new feature" so a backport seems impossible... > > Do you think it's okay if we don't write Binutils version restrictions > now and wait until Binutils code is released to annotate? I think you can just put Binutils 2.40 here. GCC 13 will be released in Apr or May 2023, and Binutils-2.40 will be released in Jan or Feb 2023 (if no bad thing happens), so my previous concern is actually not a problem. Maybe we can add a check in gcc/configure.ac to see if gcc_cv_ld supports %got_pc_hi20 and adjust the default for -m[no]-explicit-relocs? > > Should we indicate that our .eh_frame section format has changed? I don't really understand C++ exception handling, so: does the change breaks something? For example, if foo links to libbar, libbar is built with GCC 12 (or vice versa), would an exception thrown from libbar properly caught by foo? Generally changes.html is for compiler users (instead of developers), and I believe at least 90% of users (including I) don't know the potential consequences from a .eh_frame format change. So it's better to describe the breakage and possible workaround here. If nothing will be broken, we can just skip the change in changes.html. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
[PATCH] LoongArch: adjust the default of -mexplicit-relocs by checking gas feature
> > Maybe we can add a check in gcc/configure.ac to see if gcc_cv_ld > > supports %got_pc_hi20 and adjust the default for -m[no]-explicit-relocs? > I think this is a good way, I'll look at adding a check. The following should work. I've tested it locally by building GCC with both old (2.38 with patch) and new (trunk) Binutils. Ok for trunk? I simply checked as instead of ld. If as supports explicitly relocations, the produced .o file won't be supported by an old ld even if we use -mno-explicit-relocs (because as will generate new relocation types for la.local etc). So such a mismatch between as and ld can be considered "completely broken" and we don't need to support it. And, the Alpha port also check the assembler for the default of -mexplicit- relocs. This should be documented in invoke.texi, but currently it does not mention -mexplicit-relocs for LoongArch at all. So I'll submit the doclater in another patch. -- >8 -- The assembly produced with -mexplicit-relocs is not supported by gas <= 2.39. Check if the assembler supports explicit relocations and set the default accordingly. gcc/ChangeLog: * configure.ac (HAVE_AS_EXPLICIT_RELOCS): Define to 1 if the assembler supports explicit relocation for LoongArch. * configure: Regenerate. * config/loongarch/loongarch-opts.h (HAVE_AS_EXPLICIT_RELOCS): Define to 0 if not defined. * config/loongarch/genopts/loongarch.opt.in (TARGET_EXPLICIT_RELOCS): Default to HAVE_AS_EXPLICIT_RELOCS. * config/loongarch/loongarch.opt: Regenerate. --- gcc/config/loongarch/genopts/loongarch.opt.in | 2 +- gcc/config/loongarch/loongarch-opts.h | 4 ++ gcc/config/loongarch/loongarch.opt| 2 +- gcc/configure | 37 +-- gcc/configure.ac | 7 +++- 5 files changed, 46 insertions(+), 6 deletions(-) diff --git a/gcc/config/loongarch/genopts/loongarch.opt.in b/gcc/config/loongarch/genopts/loongarch.opt.in index 6f39500935d..a571b6b7524 100644 --- a/gcc/config/loongarch/genopts/loongarch.opt.in +++ b/gcc/config/loongarch/genopts/loongarch.opt.in @@ -155,7 +155,7 @@ Target Joined RejectNegative UInteger Var(loongarch_max_inline_memcpy_size) Init -mmax-inline-memcpy-size=SIZE Set the max size of memcpy to inline, default is 1024. mexplicit-relocs -Target Var(TARGET_EXPLICIT_RELOCS) Init(1) +Target Var(TARGET_EXPLICIT_RELOCS) Init(HAVE_AS_EXPLICIT_RELOCS) Use %reloc() assembly operators. ; The code model option names for -mcmodel. diff --git a/gcc/config/loongarch/loongarch-opts.h b/gcc/config/loongarch/loongarch-opts.h index eaa6fc07448..da24ecd2b50 100644 --- a/gcc/config/loongarch/loongarch-opts.h +++ b/gcc/config/loongarch/loongarch-opts.h @@ -87,4 +87,8 @@ loongarch_config_target (struct loongarch_target *target, while -m[no]-memcpy imposes a global constraint. */ #define TARGET_DO_OPTIMIZE_BLOCK_MOVE_P loongarch_do_optimize_block_move_p() +#ifndef HAVE_AS_EXPLICIT_RELOCS +#define HAVE_AS_EXPLICIT_RELOCS 0 +#endif + #endif /* LOONGARCH_OPTS_H */ diff --git a/gcc/config/loongarch/loongarch.opt b/gcc/config/loongarch/loongarch.opt index 7a8c5b44418..9df7e187283 100644 --- a/gcc/config/loongarch/loongarch.opt +++ b/gcc/config/loongarch/loongarch.opt @@ -162,7 +162,7 @@ Target Joined RejectNegative UInteger Var(loongarch_max_inline_memcpy_size) Init -mmax-inline-memcpy-size=SIZE Set the max size of memcpy to inline, default is 1024. mexplicit-relocs -Target Var(TARGET_EXPLICIT_RELOCS) Init(1) +Target Var(TARGET_EXPLICIT_RELOCS) Init(HAVE_AS_EXPLICIT_RELOCS) Use %reloc() assembly operators. ; The code model option names for -mcmodel. diff --git a/gcc/configure b/gcc/configure index 62872d132ea..7eb9479ae8e 100755 --- a/gcc/configure +++ b/gcc/configure @@ -19674,7 +19674,7 @@ else lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 lt_status=$lt_dlunknown cat > conftest.$ac_ext <<_LT_EOF -#line 19679 "configure" +#line 19677 "configure" #include "confdefs.h" #if HAVE_DLFCN_H @@ -19780,7 +19780,7 @@ else lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 lt_status=$lt_dlunknown cat > conftest.$ac_ext <<_LT_EOF -#line 19785 "configure" +#line 19783 "configure" #include "confdefs.h" #if HAVE_DLFCN_H @@ -28771,7 +28771,7 @@ $as_echo "#define HAVE_AS_MARCH_ZIFENCEI 1" >>confdefs.h fi ;; - loongarch*-*-*) +loongarch*-*-*) { $as_echo "$as_me:${as_lineno-$LINENO}: checking assembler for .dtprelword support" >&5 $as_echo_n "checking assembler for .dtprelword support... " >&6; } if ${gcc_cv_as_loongarch_dtprelword+:} false; then : @@ -28807,6 +28807,37 @@ if test $gcc_cv_as_loongarch_dtprelword != yes; then $as_echo "#define HAVE_AS_DTPRELWORD 1" >>confdefs.h fi +{ $as_echo "$as_me:${as_lineno-$LINENO}: checking assembler for explicit relocation support" >&5 +$as_echo_n "checking assembler for explicit relocation support... " >&6; } +if
Re: [PATCH] LoongArch: adjust the default of -mexplicit-relocs by checking gas feature
On Wed, 2022-07-27 at 09:34 +0800, Lulu Cheng wrote: > > -- >8 -- > > > > The assembly produced with -mexplicit-relocs is not supported by gas > > <= > > 2.39. Check if the assembler supports explicit relocations and set > > the > > default accordingly. > Looks good to me. Pushed r13-1851.
Re: PING: [PATCH] libsanitizer: Cherry-pick 2bfb0fcb51510f22723c8cdfefe from upstream
On Wed, 2022-07-27 at 06:41 +, Dimitrije Milosevic wrote: > Gentle ping, requiring someone to push this change, as I do not have > commit access. :) Do you know someone very familiar with MIPS and GCC and capable as a port maintainer? An active MIPS port maintainer will make the situation better. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
[PATCH] LoongArch: document -m[no-]explicit-relocs
Document newly introduced -m[no-]explicit-relocs options. Ok for trunk? -- >8 -- gcc/ChangeLog: * doc/invoke.texi: Document -m[no-]explicit-relocs for LoongArch. --- gcc/doc/invoke.texi | 12 1 file changed, 12 insertions(+) diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 9a3f2d14c5a..04418f80428 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -24939,6 +24939,18 @@ global symbol: The data got table must be within +/-8EiB addressing space. @end itemize @end table The default code model is @code{normal}. + +@item -mexplicit-relocs +@itemx -mno-explicit-relocs +@opindex mexplicit-relocs +@opindex mno-explicit-relocs +Generate (do not generate) explicit symbol relocations instead of +assembler macros. Using explicit relocations can improve code generation. +GCC detects the capaiblities of the assembler when it is built and sets +the default to @code{-mexplicit-relocs} if the assembler supports the +syntax for explicit specification of relocations, and +@code{-mno-explicit-relocs} otherwise. This option is mostly useful for +debugging or using an assembler different from build-time. @end table @node M32C Options -- 2.37.1
Re: [PATCH] LoongArch: document -m[no-]explicit-relocs
On Wed, 2022-07-27 at 16:47 +0800, Lulu Cheng wrote: > > "Use or do not use assembler relocation operators when dealing with > > symbolic addresses. The alternative is to use assembler macros > > instead, which may limit optimization. > > > > The default value for the option is determined during GCC build- > > time by detecting corresponding assembler support: @code{-mexplicit- > > relocs} if said support is present, @code{-mno-explicit-relocs} > > otherwise. This option is mostly useful for debugging, or > > interoperation with assemblers different from the build-time one." > > > I agree with wangxuerui's idea. > The same parameter and the same description information can reduce the > user's time to learn how to use this parameter. I agree it's better than my origin paragraph. If you agree I'll commit it with Xuerui as the commit author. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH] LoongArch: document -m[no-]explicit-relocs
On Wed, 2022-07-27 at 17:57 +0800, WANG Xuerui wrote: > On 2022/7/27 17:28, Lulu Cheng wrote: > > > > > > 在 2022/7/27 下午5:15, Xi Ruoyao 写道: > > > On Wed, 2022-07-27 at 16:47 +0800, Lulu Cheng wrote: > > > > > > > > "Use or do not use assembler relocation operators when dealing with > > > > > symbolic addresses. The alternative is to use assembler macros > > > > > instead, which may limit optimization. > > > > > > > > > > The default value for the option is determined during GCC build- > > > > > time by detecting corresponding assembler support: @code{-mexplicit- > > > > > relocs} if said support is present, @code{-mno-explicit-relocs} > > > > > otherwise. This option is mostly useful for debugging, or > > > > > interoperation with assemblers different from the build-time one." > > > > > > > > > I agree with wangxuerui's idea. > > > > The same parameter and the same description information can reduce the > > > > user's time to learn how to use this parameter. > > > I agree it's better than my origin paragraph. > > > > > > If you agree I'll commit it with Xuerui as the commit author. > > > > > > > I have no opinion if wangxuerui agrees. > Either is OK (if you really think the commit is effectively rewritten by > me), thanks. Pushed r13-1859. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH][wwwdocs] gcc-13: Add loongarch '-mexplicit-relocs' support
On Thu, 2022-07-28 at 10:59 +0800, Lulu Cheng wrote: > > The ASM_PREFERRED_EH_DATA_FORMAT macro before and after modification > > is > as follows: > #define ASM_PREFERRED_EH_DATA_FORMAT(CODE, GLOBAL) \ > - (((GLOBAL) ? DW_EH_PE_indirect : 0) | DW_EH_PE_absptr) > + (((GLOBAL) ? DW_EH_PE_indirect : 0) | DW_EH_PE_pcrel | > DW_EH_PE_sdata4) The master branch still contains: #define ASM_PREFERRED_EH_DATA_FORMAT(CODE, GLOBAL) \ (((GLOBAL) ? DW_EH_PE_indirect : 0) | DW_EH_PE_absptr) Did you forgot to commit this change (or lost it during a rebase)? -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH v1] LoongArch: Define the macro ASM_PREFERRED_EH_DATA_FORMAT by checking the assembler's support for eh_frame encoding.
On Fri, 2022-07-29 at 10:43 +0800, Lulu Cheng wrote: > .eh_frame DW_EH_PE_pcrel encoding format is not supported by gas <= 2.39. > Check if the assembler support DW_EH_PE_PCREL encoding and define .eh_frame > encoding type. > > gcc/ChangeLog: > > * config.in: Regenerate. > * config/loongarch/loongarch.h (ASM_PREFERRED_EH_DATA_FORMAT): > Select the value of the macro definition according to whether > HAVE_AS_EH_FRAME_PCREL_ENCODING_SUPPORT is defined. > * configure: Regenerate. > * configure.ac: Reinstate HAVE_AS_EH_FRAME_PCREL_ENCODING_SUPPORT > test. To me it looks a little strange to list regenerated config.in & configure before configure.ac. But maybe I'm wrong here if a lexicographical order is preferred... /* snip */ > + gcc_GAS_CHECK_FEATURE([eh_frame pcrel encoding support], > + gcc_cv_as_loongarch_eh_frame_pcrel_encoding_support,, > + [.LFB1780 = . > + .cfi_startproc > + .cfi_personality 0x9b,DW.ref.__gxx_personality_v0 > + .cfi_lsda 0x1b,.LLSDA1780 > + .cfi_endproc],, I think the conftest content can be simplified to: .cfi_startproc .cfi_personality 0x9b,a .cfi_lsda 0x1b,b .cfi_endproc > + [AC_DEFINE(HAVE_AS_EH_FRAME_PCREL_ENCODING_SUPPORT, 1, > + [Define if your assembler supports eh_frame pcrel encoding.])]) > ;; > s390*-*-*) > gcc_GAS_CHECK_FEATURE([.gnu_attribute support], -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH v1] LoongArch: Define the macro ASM_PREFERRED_EH_DATA_FORMAT by checking the assembler's support for eh_frame encoding.
On Fri, 2022-07-29 at 11:51 +0800, Lulu Cheng wrote: > > > gcc/ChangeLog: > > > > > > * config.in: Regenerate. > > > * config/loongarch/loongarch.h (ASM_PREFERRED_EH_DATA_FORMAT): > > > Select the value of the macro definition according to whether > > > HAVE_AS_EH_FRAME_PCREL_ENCODING_SUPPORT is defined. > > > * configure: Regenerate. > > > * configure.ac: Reinstate HAVE_AS_EH_FRAME_PCREL_ENCODING_SUPPORT > > > test. > > To me it looks a little strange to list regenerated config.in & > > configure before configure.ac. But maybe I'm wrong here if a > > lexicographical order is preferred... > This information is generated by me through the git gcc-commit-mklog command, > then I didn't move the sequence. I have no objection then. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
[PATCH] LoongArch: add addr_global attribute
Background: https://lore.kernel.org/loongarch/d7670b60-2782-4642-995b-7baa01779...@loongson.cn/T/#e1d47e2fe185f2e2be8fdc0784f0db2f644119379 Question: Do you have a better name than "addr_global"? Alternatives: 1. Just use "-mno-explicit-relocs -mla-local-with-abs" for kernel modules. It's stupid IMO. 2. Implement a "-maddress-local-with-got" option for GCC and use it for kernel modules. It seems too overkill: we might create many unnecessary GOT entries. 3. For all variables with a section attribute, consider it global. It may make sense, but I just checked x86_64 and riscv and they don't do this. 4. Implement -mcmodel=extreme and use it for kernel modules. To me "extreme" seems really too extreme. 5. More hacks in kernel. (Convert relocations against .data..percpu with objtool? But objtool is not even implemented for LoongArch yet.) Note: I'll be mostly AFK in the following week. My attempt to finish the kernel support for new relocs before going AFK now failed miserably :(. -- >8 -- A linker script and/or a section attribute may locate a local object in some way unexpected by the code model, leading to a link failure. This happens when the Linux kernel loads a module with "local" per-CPU variables. Add an attribute to explicitly mark an variable with the address unlimited by the code model so we would be able to work around such problems. gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_attribute_table): New attribute table. (TARGET_ATTRIBUTE_TABLE): Define the target hook. (loongarch_handle_addr_global_attribute): New static function. (loongarch_classify_symbol): Return SYMBOL_GOT_DISP for SYMBOL_REF_DECL with addr_global attribute. * doc/extend.texi (Variable Attributes): Document new LoongArch specific attribute. gcc/testsuite/ChangeLog: * gcc.target/loongarch/addr-global.c: New test. --- gcc/config/loongarch/loongarch.cc | 43 +++ gcc/doc/extend.texi | 17 .../gcc.target/loongarch/addr-global.c| 28 3 files changed, 88 insertions(+) create mode 100644 gcc/testsuite/gcc.target/loongarch/addr-global.c diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index 79687340dfd..5f4e6114fc9 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -1643,6 +1643,13 @@ loongarch_classify_symbol (const_rtx x) && !loongarch_symbol_binds_local_p (x)) return SYMBOL_GOT_DISP; + if (SYMBOL_REF_P (x)) +{ + tree decl = SYMBOL_REF_DECL (x); + if (decl && lookup_attribute ("addr_global", DECL_ATTRIBUTES (decl))) + return SYMBOL_GOT_DISP; +} + return SYMBOL_PCREL; } @@ -6068,6 +6075,39 @@ loongarch_starting_frame_offset (void) return crtl->outgoing_args_size; } +static tree +loongarch_handle_addr_global_attribute (tree *node, tree name, tree, int, +bool *no_add_attrs) +{ + tree decl = *node; + if (TREE_CODE (decl) == VAR_DECL) +{ + if (DECL_CONTEXT (decl) + && TREE_CODE (DECL_CONTEXT (decl)) == FUNCTION_DECL + && !TREE_STATIC (decl)) + { + error_at (DECL_SOURCE_LOCATION (decl), + "%qE attribute cannot be specified for local " + "variables", name); + *no_add_attrs = true; + } +} + else +{ + warning (OPT_Wattributes, "%qE attribute ignored", name); + *no_add_attrs = true; +} + return NULL_TREE; +} + +static const struct attribute_spec loongarch_attribute_table[] = +{ + /* { name, min_len, max_len, decl_req, type_req, fn_type_req, + affects_type_identity, handler, exclude } */ + { "addr_global", 0, 0, true, false, false, false, +loongarch_handle_addr_global_attribute, NULL } +}; + /* Initialize the GCC target structure. */ #undef TARGET_ASM_ALIGNED_HI_OP #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t" @@ -6256,6 +6296,9 @@ loongarch_starting_frame_offset (void) #undef TARGET_HAVE_SPECULATION_SAFE_VALUE #define TARGET_HAVE_SPECULATION_SAFE_VALUE speculation_safe_value_not_needed +#undef TARGET_ATTRIBUTE_TABLE +#define TARGET_ATTRIBUTE_TABLE loongarch_attribute_table + struct gcc_target targetm = TARGET_INITIALIZER; #include "gt-loongarch.h" diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 7fe7f8817cd..4a1cd7ab5e4 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -7314,6 +7314,7 @@ attributes. * Blackfin Variable Attributes:: * H8/300 Variable Attributes:: * IA-64 Variable Attributes:: +* LoongArch Variable Attributes:: * M32R/D Variable Attributes:: * MeP Variable Attributes:: * Microsoft Windows Variable Attributes:: @@ -8098,6 +8099,22 @@ defined by shared libraries. @end table +@node LoongArch Variable Attributes +@subsection LoongArch Variable Attributes + +One attribute is currently defined for the LoongArch. + +@tab
[PATCH v2] LoongArch: add addr_global attribute
Don't look at V1 patch: I selected wrong file and sent a draft with bugs mistakenly. Background: https://lore.kernel.org/loongarch/d7670b60-2782-4642-995b-7baa01779...@loongson.cn/T/#e1d47e2fe185f2e2be8fdc0784f0db2f644119379 Question: Do you have a better name than "addr_global"? Alternatives: 1. Just use "-mno-explicit-relocs -mla-local-with-abs" for kernel modules. It's stupid IMO. 2. Implement a "-maddress-local-with-got" option for GCC and use it for kernel modules. It seems too overkill: we might create many unnecessary GOT entries. 3. For all variables with a section attribute, consider it global. It may make sense, but I just checked x86_64 and riscv and they don't do this. 4. Implement -mcmodel=extreme and use it for kernel modules. To me "extreme" seems really too extreme. 5. More hacks in kernel. (Convert relocations against .data..percpu with objtool? But objtool is not even implemented for LoongArch yet.) Note: I'll be mostly AFK in the following week. My attempt to finish the kernel support for new relocs before going AFK now failed miserably :(. -- >8 -- A linker script and/or a section attribute may locate a local object in some way unexpected by the code model, leading to a link failure. This happens when the Linux kernel loads a module with "local" per-CPU variables. Add an attribute to explicitly mark an variable with the address unlimited by the code model so we would be able to work around such problems. gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_attribute_table): New attribute table. (TARGET_ATTRIBUTE_TABLE): Define the target hook. (loongarch_handle_addr_global_attribute): New static function. (loongarch_classify_symbol): Return SYMBOL_GOT_DISP for SYMBOL_REF_DECL with addr_global attribute. * doc/extend.texi (Variable Attributes): Document new LoongArch specific attribute. gcc/testsuite/ChangeLog: * gcc.target/loongarch/addr-global.c: New test. --- gcc/config/loongarch/loongarch.cc | 45 +++ gcc/doc/extend.texi | 17 +++ .../gcc.target/loongarch/addr-global.c| 28 3 files changed, 90 insertions(+) create mode 100644 gcc/testsuite/gcc.target/loongarch/addr-global.c diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index 79687340dfd..cdf6bf44e36 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -1643,6 +1643,13 @@ loongarch_classify_symbol (const_rtx x) && !loongarch_symbol_binds_local_p (x)) return SYMBOL_GOT_DISP; + if (SYMBOL_REF_P (x)) +{ + tree decl = SYMBOL_REF_DECL (x); + if (decl && lookup_attribute ("addr_global", DECL_ATTRIBUTES (decl))) + return SYMBOL_GOT_DISP; +} + return SYMBOL_PCREL; } @@ -6068,6 +6075,41 @@ loongarch_starting_frame_offset (void) return crtl->outgoing_args_size; } +static tree +loongarch_handle_addr_global_attribute (tree *node, tree name, tree, int, +bool *no_add_attrs) +{ + tree decl = *node; + if (TREE_CODE (decl) == VAR_DECL) +{ + if (DECL_CONTEXT (decl) + && TREE_CODE (DECL_CONTEXT (decl)) == FUNCTION_DECL + && !TREE_STATIC (decl)) + { + error_at (DECL_SOURCE_LOCATION (decl), + "%qE attribute cannot be specified for local " + "variables", name); + *no_add_attrs = true; + } +} + else +{ + warning (OPT_Wattributes, "%qE attribute ignored", name); + *no_add_attrs = true; +} + return NULL_TREE; +} + +static const struct attribute_spec loongarch_attribute_table[] = +{ + /* { name, min_len, max_len, decl_req, type_req, fn_type_req, + affects_type_identity, handler, exclude } */ + { "addr_global", 0, 0, true, false, false, false, +loongarch_handle_addr_global_attribute, NULL }, + /* The last attribute spec is set to be NULL. */ + {} +}; + /* Initialize the GCC target structure. */ #undef TARGET_ASM_ALIGNED_HI_OP #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t" @@ -6256,6 +6298,9 @@ loongarch_starting_frame_offset (void) #undef TARGET_HAVE_SPECULATION_SAFE_VALUE #define TARGET_HAVE_SPECULATION_SAFE_VALUE speculation_safe_value_not_needed +#undef TARGET_ATTRIBUTE_TABLE +#define TARGET_ATTRIBUTE_TABLE loongarch_attribute_table + struct gcc_target targetm = TARGET_INITIALIZER; #include "gt-loongarch.h" diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 7fe7f8817cd..4a1cd7ab5e4 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -7314,6 +7314,7 @@ attributes. * Blackfin Variable Attributes:: * H8/300 Variable Attributes:: * IA-64 Variable Attributes:: +* LoongArch Variable Attributes:: * M32R/D Variable Attributes:: * MeP Variable Attributes:: * Microsoft Windows Variable Attributes:: @@ -8098,6 +8099,22 @@ defined by shared libraries. @end table
[PATCH v3] LoongArch: add addr_global attribute
Change v2 to v3: - Disable section anchor for addr_global symbols. - Use -O2 in test to make sure section anchor is disabled. -- Background: https://lore.kernel.org/loongarch/d7670b60-2782-4642-995b-7baa01779...@loongson.cn/T/#e1d47e2fe185f2e2be8fdc0784f0db2f644119379 Question: Do you have a better name than "addr_global"? Alternatives: 1. Just use "-mno-explicit-relocs -mla-local-with-abs" for kernel modules. It's stupid IMO. 2. Implement a "-maddress-local-with-got" option for GCC and use it for kernel modules. It seems too overkill: we might create many unnecessary GOT entries. 3. For all variables with a section attribute, consider it global. It may make sense, but I just checked x86_64 and riscv and they don't do this. 4. Implement -mcmodel=extreme and use it for kernel modules. To me "extreme" seems really too extreme. 5. More hacks in kernel. (Convert relocations against .data..percpu with objtool? But objtool is not even implemented for LoongArch yet.) Note: I'll be mostly AFK in the following week. My attempt to finish the kernel support for new relocs before going AFK now failed miserably :(. -- >8 -- A linker script and/or a section attribute may locate a local object in some way unexpected by the code model, leading to a link failure. This happens when the Linux kernel loads a module with "local" per-CPU variables. Add an attribute to explicitly mark an variable with the address unlimited by the code model so we would be able to work around such problems. gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_attribute_table): New attribute table. (TARGET_ATTRIBUTE_TABLE): Define the target hook. (loongarch_handle_addr_global_attribute): New static function. (loongarch_classify_symbol): Return SYMBOL_GOT_DISP for SYMBOL_REF_DECL with addr_global attribute. (loongarch_use_anchors_for_symbol_p): New static function. (TARGET_USE_ANCHORS_FOR_SYMBOL_P): Define the target hook. * doc/extend.texi (Variable Attributes): Document new LoongArch specific attribute. gcc/testsuite/ChangeLog: * gcc.target/loongarch/addr-global.c: New test. --- gcc/config/loongarch/loongarch.cc | 61 +++ gcc/doc/extend.texi | 17 ++ .../gcc.target/loongarch/addr-global.c| 28 + 3 files changed, 106 insertions(+) create mode 100644 gcc/testsuite/gcc.target/loongarch/addr-global.c diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index 79687340dfd..db6f84d4e66 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -1643,6 +1643,13 @@ loongarch_classify_symbol (const_rtx x) && !loongarch_symbol_binds_local_p (x)) return SYMBOL_GOT_DISP; + if (SYMBOL_REF_P (x)) +{ + tree decl = SYMBOL_REF_DECL (x); + if (decl && lookup_attribute ("addr_global", DECL_ATTRIBUTES (decl))) + return SYMBOL_GOT_DISP; +} + return SYMBOL_PCREL; } @@ -6068,6 +6075,54 @@ loongarch_starting_frame_offset (void) return crtl->outgoing_args_size; } +static tree +loongarch_handle_addr_global_attribute (tree *node, tree name, tree, int, +bool *no_add_attrs) +{ + tree decl = *node; + if (TREE_CODE (decl) == VAR_DECL) +{ + if (DECL_CONTEXT (decl) + && TREE_CODE (DECL_CONTEXT (decl)) == FUNCTION_DECL + && !TREE_STATIC (decl)) + { + error_at (DECL_SOURCE_LOCATION (decl), + "%qE attribute cannot be specified for local " + "variables", name); + *no_add_attrs = true; + } +} + else +{ + warning (OPT_Wattributes, "%qE attribute ignored", name); + *no_add_attrs = true; +} + return NULL_TREE; +} + +static const struct attribute_spec loongarch_attribute_table[] = +{ + /* { name, min_len, max_len, decl_req, type_req, fn_type_req, + affects_type_identity, handler, exclude } */ + { "addr_global", 0, 0, true, false, false, false, +loongarch_handle_addr_global_attribute, NULL }, + /* The last attribute spec is set to be NULL. */ + {} +}; + +bool +loongarch_use_anchors_for_symbol_p (const_rtx symbol) +{ + tree decl = SYMBOL_REF_DECL (symbol); + + /* addr_global indicates we don't know how the linker will locate the symbol, + so the use of anchor may cause relocation overflow. */ + if (decl && lookup_attribute ("addr_global", DECL_ATTRIBUTES (decl))) +return false; + + return default_use_anchors_for_symbol_p (symbol); +} + /* Initialize the GCC target structure. */ #undef TARGET_ASM_ALIGNED_HI_OP #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t" @@ -6256,6 +6311,12 @@ loongarch_starting_frame_offset (void) #undef TARGET_HAVE_SPECULATION_SAFE_VALUE #define TARGET_HAVE_SPECULATION_SAFE_VALUE speculation_safe_value_not_needed +#undef TARGET_ATTRIBUTE_TABLE +#define TARGET_ATTRIBUTE_TABLE loongarch
[PATCH v4] LoongArch: add movable attribute
Changes v3 -> v4: * Use "movable" as the attribute name as Huacai says it's already used in downstream GCC fork. * Remove an inaccurate line from the doc. (Initially I tried to implement a "model(...)" like IA64 or M32R. Then I changed my mind but forgot to remove the line copied from M32R doc.) -- >8 -- A linker script and/or a section attribute may locate a local object in some way unexpected by the code model, leading to a link failure. This happens when the Linux kernel loads a module with "local" per-CPU variables. Add an attribute to explicitly mark an variable with the address unlimited by the code model so we would be able to work around such problems. gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_attribute_table): New attribute table. (TARGET_ATTRIBUTE_TABLE): Define the target hook. (loongarch_handle_addr_global_attribute): New static function. (loongarch_classify_symbol): Return SYMBOL_GOT_DISP for SYMBOL_REF_DECL with addr_global attribute. (loongarch_use_anchors_for_symbol_p): New static function. (TARGET_USE_ANCHORS_FOR_SYMBOL_P): Define the target hook. * doc/extend.texi (Variable Attributes): Document new LoongArch specific attribute. gcc/testsuite/ChangeLog: * gcc.target/loongarch/addr-global.c: New test. --- gcc/config/loongarch/loongarch.cc | 63 +++ gcc/doc/extend.texi | 16 + .../gcc.target/loongarch/attr-movable.c | 29 + 3 files changed, 108 insertions(+) create mode 100644 gcc/testsuite/gcc.target/loongarch/attr-movable.c diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index 79687340dfd..6b6026700a6 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -1643,6 +1643,15 @@ loongarch_classify_symbol (const_rtx x) && !loongarch_symbol_binds_local_p (x)) return SYMBOL_GOT_DISP; + if (SYMBOL_REF_P (x)) + { + tree decl = SYMBOL_REF_DECL (x); + /* A movable symbol may be moved away from the +/- 2GiB range around + the PC, so we have to use GOT. */ + if (decl && lookup_attribute ("movable", DECL_ATTRIBUTES (decl))) + return SYMBOL_GOT_DISP; + } + return SYMBOL_PCREL; } @@ -6068,6 +6077,54 @@ loongarch_starting_frame_offset (void) return crtl->outgoing_args_size; } +static tree +loongarch_handle_movable_attribute (tree *node, tree name, tree, int, + bool *no_add_attrs) +{ + tree decl = *node; + if (TREE_CODE (decl) == VAR_DECL) + { + if (DECL_CONTEXT (decl) + && TREE_CODE (DECL_CONTEXT (decl)) == FUNCTION_DECL + && !TREE_STATIC (decl)) + { + error_at (DECL_SOURCE_LOCATION (decl), + "%qE attribute cannot be specified for local " + "variables", name); + *no_add_attrs = true; + } + } + else + { + warning (OPT_Wattributes, "%qE attribute ignored", name); + *no_add_attrs = true; + } + return NULL_TREE; +} + +static const struct attribute_spec loongarch_attribute_table[] = +{ + /* { name, min_len, max_len, decl_req, type_req, fn_type_req, + affects_type_identity, handler, exclude } */ + { "movable", 0, 0, true, false, false, false, + loongarch_handle_movable_attribute, NULL }, + /* The last attribute spec is set to be NULL. */ + {} +}; + +bool +loongarch_use_anchors_for_symbol_p (const_rtx symbol) +{ + tree decl = SYMBOL_REF_DECL (symbol); + + /* A movable attribute indicates the linker may move the symbol away, + so the use of anchor may cause relocation overflow. */ + if (decl && lookup_attribute ("movable", DECL_ATTRIBUTES (decl))) + return false; + + return default_use_anchors_for_symbol_p (symbol); +} + /* Initialize the GCC target structure. */ #undef TARGET_ASM_ALIGNED_HI_OP #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t" @@ -6256,6 +6313,12 @@ loongarch_starting_frame_offset (void) #undef TARGET_HAVE_SPECULATION_SAFE_VALUE #define TARGET_HAVE_SPECULATION_SAFE_VALUE speculation_safe_value_not_needed +#undef TARGET_ATTRIBUTE_TABLE +#define TARGET_ATTRIBUTE_TABLE loongarch_attribute_table + +#undef TARGET_USE_ANCHORS_FOR_SYMBOL_P +#define TARGET_USE_ANCHORS_FOR_SYMBOL_P loongarch_use_anchors_for_symbol_p + struct gcc_target targetm = TARGET_INITIALIZER; #include "gt-loongarch.h" diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 7fe7f8817cd..322d8c05a04 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -7314,6 +7314,7 @@ attributes. * Blackfin Variable Attributes:: * H8/300 Variable Attributes:: * IA-64 Variable Attributes:: +* LoongArch Variable Attributes:: * M32R/D Variable Attributes:: * MeP Variable Attributes:: * Microsoft Windows Variable Attributes:: @@ -8098,6 +8099,21 @@ defined by shared libraries. @end table +@node LoongArch Variable Attributes +@subsection LoongArch Variable Attributes + +One attribute is currently defined for the LoongArch. + +@table @code +@item movable +@cindex @code{movable} variable attribute, LoongArch +Use this attribute on the LoongArch to mark an object possible to be moved +by the linker, so its address is unlimited by the local data section range +specified by the code mo
[PATCH v5] LoongArch: add movable attribute
Changes v4 -> v5: Fix changelog. No code change. Changes v3 -> v4: * Use "movable" as the attribute name as Huacai says it's already used in downstream GCC fork. * Remove an inaccurate line from the doc. (Initially I tried to implement a "model(...)" like IA64 or M32R. Then I changed my mind but forgot to remove the line copied from M32R doc.) -- >8 -- A linker script and/or a section attribute may locate a local object in some way unexpected by the code model, leading to a link failure. This happens when the Linux kernel loads a module with "local" per-CPU variables. Add an attribute to explicitly mark an variable with the address unlimited by the code model so we would be able to work around such problems. gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_attribute_table): New attribute table. (TARGET_ATTRIBUTE_TABLE): Define the target hook. (loongarch_handle_addr_global_attribute): New static function. (loongarch_classify_symbol): Return SYMBOL_GOT_DISP for SYMBOL_REF_DECL with addr_global attribute. (loongarch_use_anchors_for_symbol_p): New static function. (TARGET_USE_ANCHORS_FOR_SYMBOL_P): Define the target hook. * doc/extend.texi (Variable Attributes): Document new LoongArch specific attribute. gcc/testsuite/ChangeLog: * gcc.target/loongarch/addr-global.c: New test. --- gcc/config/loongarch/loongarch.cc | 63 +++ gcc/doc/extend.texi | 16 + .../gcc.target/loongarch/attr-movable.c | 29 + 3 files changed, 108 insertions(+) create mode 100644 gcc/testsuite/gcc.target/loongarch/attr-movable.c diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index 79687340dfd..6b6026700a6 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -1643,6 +1643,15 @@ loongarch_classify_symbol (const_rtx x) && !loongarch_symbol_binds_local_p (x)) return SYMBOL_GOT_DISP; + if (SYMBOL_REF_P (x)) +{ + tree decl = SYMBOL_REF_DECL (x); + /* A movable symbol may be moved away from the +/- 2GiB range around +the PC, so we have to use GOT. */ + if (decl && lookup_attribute ("movable", DECL_ATTRIBUTES (decl))) + return SYMBOL_GOT_DISP; +} + return SYMBOL_PCREL; } @@ -6068,6 +6077,54 @@ loongarch_starting_frame_offset (void) return crtl->outgoing_args_size; } +static tree +loongarch_handle_movable_attribute (tree *node, tree name, tree, int, + bool *no_add_attrs) +{ + tree decl = *node; + if (TREE_CODE (decl) == VAR_DECL) +{ + if (DECL_CONTEXT (decl) + && TREE_CODE (DECL_CONTEXT (decl)) == FUNCTION_DECL + && !TREE_STATIC (decl)) + { + error_at (DECL_SOURCE_LOCATION (decl), + "%qE attribute cannot be specified for local " + "variables", name); + *no_add_attrs = true; + } +} + else +{ + warning (OPT_Wattributes, "%qE attribute ignored", name); + *no_add_attrs = true; +} + return NULL_TREE; +} + +static const struct attribute_spec loongarch_attribute_table[] = +{ + /* { name, min_len, max_len, decl_req, type_req, fn_type_req, + affects_type_identity, handler, exclude } */ + { "movable", 0, 0, true, false, false, false, +loongarch_handle_movable_attribute, NULL }, + /* The last attribute spec is set to be NULL. */ + {} +}; + +bool +loongarch_use_anchors_for_symbol_p (const_rtx symbol) +{ + tree decl = SYMBOL_REF_DECL (symbol); + + /* A movable attribute indicates the linker may move the symbol away, + so the use of anchor may cause relocation overflow. */ + if (decl && lookup_attribute ("movable", DECL_ATTRIBUTES (decl))) +return false; + + return default_use_anchors_for_symbol_p (symbol); +} + /* Initialize the GCC target structure. */ #undef TARGET_ASM_ALIGNED_HI_OP #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t" @@ -6256,6 +6313,12 @@ loongarch_starting_frame_offset (void) #undef TARGET_HAVE_SPECULATION_SAFE_VALUE #define TARGET_HAVE_SPECULATION_SAFE_VALUE speculation_safe_value_not_needed +#undef TARGET_ATTRIBUTE_TABLE +#define TARGET_ATTRIBUTE_TABLE loongarch_attribute_table + +#undef TARGET_USE_ANCHORS_FOR_SYMBOL_P +#define TARGET_USE_ANCHORS_FOR_SYMBOL_P loongarch_use_anchors_for_symbol_p + struct gcc_target targetm = TARGET_INITIALIZER; #include "gt-loongarch.h" diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 7fe7f8817cd..322d8c05a04 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -7314,6 +7314,7 @@ attributes. * Blackfin Variable Attributes:: * H8/300 Variable Attributes:: * IA-64 Variable Attributes:: +* LoongArch Variable Attributes:: * M32R/D Variable Attributes:: * MeP Variable Attributes:: * Microsoft Windows Variable Attributes:: @@ -8098,6 +8099,21 @@ defined by shared libraries.
Re: [PATCH] Mips: Fix the ASAN shadow offset hook for the n32 ABI
On Sat, 2022-07-30 at 14:22 +0100, Maciej W. Rozycki wrote: > On Mon, 6 Jun 2022, Dimitrije Milosevic wrote: > > > * config/mips/mips.cc (mips_asan_shadow_offset): Reformat > > to handle the N32 ABI. > > That's not what the change does. > > > * config/mips/mips.h (SUBTARGET_SHADOW_OFFSET): Remove > > the macro, as it is not needed anymore. > > Why is the macro not needed anymore? Because it's only used by mips_asan_shadow_offset and now we directly code its content into mips_asan_shadow_offset. SUBTARGET_SHADOW_OFFSET is only needed if a different subtarget (say, mips64el-freebsd) needs a different shadow offset. But for MIPS we don't have any subtarget other than mips*-linux-gnu* supporting ASAN so we can omit SUBTARGET_SHADOW_OFFSET and fold the content directly into the asan_shadow_offset target hook. RISCV port does the same. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH v5] LoongArch: add movable attribute
Is it OK for trunk or I need to change something? By the way, I'm seeking a possibility to include this into 12.2. Then we leaves only 12.1 without this attribute, and we can just say "building the kernel needs GCC 12.2 or later". On Mon, 2022-08-01 at 18:07 +0800, Xi Ruoyao wrote: > Changes v4 -> v5: Fix changelog. No code change. > > Changes v3 -> v4: > > * Use "movable" as the attribute name as Huacai says it's already > used > in downstream GCC fork. > * Remove an inaccurate line from the doc. (Initially I tried to > implement a "model(...)" like IA64 or M32R. Then I changed my mind > but forgot to remove the line copied from M32R doc.) > > -- >8 -- > > A linker script and/or a section attribute may locate a local object > in > some way unexpected by the code model, leading to a link failure. > This > happens when the Linux kernel loads a module with "local" per-CPU > variables. > > Add an attribute to explicitly mark an variable with the address > unlimited by the code model so we would be able to work around such > problems. > > gcc/ChangeLog: > > * config/loongarch/loongarch.cc (loongarch_attribute_table): > New attribute table. > (TARGET_ATTRIBUTE_TABLE): Define the target hook. > (loongarch_handle_addr_global_attribute): New static function. > (loongarch_classify_symbol): Return SYMBOL_GOT_DISP for > SYMBOL_REF_DECL with addr_global attribute. > (loongarch_use_anchors_for_symbol_p): New static function. > (TARGET_USE_ANCHORS_FOR_SYMBOL_P): Define the target hook. > * doc/extend.texi (Variable Attributes): Document new > LoongArch specific attribute. > > gcc/testsuite/ChangeLog: > > * gcc.target/loongarch/addr-global.c: New test. > --- > gcc/config/loongarch/loongarch.cc | 63 > +++ > gcc/doc/extend.texi | 16 + > .../gcc.target/loongarch/attr-movable.c | 29 + > 3 files changed, 108 insertions(+) > create mode 100644 gcc/testsuite/gcc.target/loongarch/attr-movable.c > > diff --git a/gcc/config/loongarch/loongarch.cc > b/gcc/config/loongarch/loongarch.cc > index 79687340dfd..6b6026700a6 100644 > --- a/gcc/config/loongarch/loongarch.cc > +++ b/gcc/config/loongarch/loongarch.cc > @@ -1643,6 +1643,15 @@ loongarch_classify_symbol (const_rtx x) > && !loongarch_symbol_binds_local_p (x)) > return SYMBOL_GOT_DISP; > > + if (SYMBOL_REF_P (x)) > + { > + tree decl = SYMBOL_REF_DECL (x); > + /* A movable symbol may be moved away from the +/- 2GiB range > around > + the PC, so we have to use GOT. */ > + if (decl && lookup_attribute ("movable", DECL_ATTRIBUTES > (decl))) > + return SYMBOL_GOT_DISP; > + } > + > return SYMBOL_PCREL; > } > > @@ -6068,6 +6077,54 @@ loongarch_starting_frame_offset (void) > return crtl->outgoing_args_size; > } > > +static tree > +loongarch_handle_movable_attribute (tree *node, tree name, tree, int, > + bool *no_add_attrs) > +{ > + tree decl = *node; > + if (TREE_CODE (decl) == VAR_DECL) > + { > + if (DECL_CONTEXT (decl) > + && TREE_CODE (DECL_CONTEXT (decl)) == FUNCTION_DECL > + && !TREE_STATIC (decl)) > + { > + error_at (DECL_SOURCE_LOCATION (decl), > + "%qE attribute cannot be specified for local " > + "variables", name); > + *no_add_attrs = true; > + } > + } > + else > + { > + warning (OPT_Wattributes, "%qE attribute ignored", name); > + *no_add_attrs = true; > + } > + return NULL_TREE; > +} > + > +static const struct attribute_spec loongarch_attribute_table[] = > +{ > + /* { name, min_len, max_len, decl_req, type_req, fn_type_req, > + affects_type_identity, handler, exclude } */ > + { "movable", 0, 0, true, false, false, false, > + loongarch_handle_movable_attribute, NULL }, > + /* The last attribute spec is set to be NULL. */ > + {} > +}; > + > +bool > +loongarch_use_anchors_for_symbol_p (const_rtx symbol) > +{ > + tree decl = SYMBOL_REF_DECL (symbol); > + > + /* A movable attribute indicates the linker may move the symbol > away, > + so the use of anchor may cause relocation overflow. */ > + if (decl && lookup_attribute ("movable", DECL_ATTRIBUTES (decl))) > + return false; > + > + return default_use_anchors_for_symbol_p (symbol); > +} > + > /* Initialize the GCC target structure. */ > #undef TARGET_ASM_ALIGNED_HI_OP > #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t" > @@ -6256,6 +6313,12 @@ loongarch_starting_frame_offset (void) > #undef TARGET_HAVE_SPECULATION_SAFE_VALUE > #define TARGET_HAVE_SPECULATION_SAFE_VALUE > speculation_safe_value_not_needed > > +#undef TARGET_ATTRIBUTE_TABLE > +#define TARGET_ATTRIBUTE_TABLE loongarch_attribute_table > + > +#undef TARGET_USE_ANCHORS_FOR_SYMBOL_P > +#define TARGET_USE_ANCHORS_FOR_SYMBOL_P > loongarch_use_anchors_fo
Re: 回复:[PATCH v5] LoongArch: add movable attribute
On Wed, 2022-08-03 at 10:55 +0800, chengl...@loongson.cn wrote: > I think there is no problem with this patch。But I have a question. The > visibility attribute works, so is it necessary to add the moveable > attribute? 1. My use of -fPIC and visibility is not in the way ELF visibility has been designed for. It's hardly to tell if it's an legitimate use or a misuse. 2. Adding -fPIC can make unwanted side effects, especially: if we add some optimizations only suitable for -fno-PIC, we'll miss them using - fPIC. Note that -fPIC does not only mean "produce position independent code", but "produce position independent code *suitable for ELF dynamic libraries*". So to other people it will be ridiculous to use -fPIC for kernel. 3. Huacai said he didn't like using __attribute__((visibility)) like this (in kernel ML) and I share his feeling. > I'd like to wait for the kernel team to test the performance data of > the two implementations before deciding whether to support this > attribute. > > What do you think? Perhaps, I can't access my dev system now anyway (I've configured the SSH access but then a sudden power surge happened and I didn't configured automatically power on :( ) > -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH v5] LoongArch: add movable attribute
On Wed, 2022-08-03 at 10:59 +0800, WANG Xuerui wrote: > I don't think mindlessly caring for vendor forks is always correct. In > fact I find the name "movable" too generic, and something like > "force_got_access" could be better. The problem is "what will this behave *if* we later add some code model without GOT". If it's named "movable" we generate a full 4-instruction absolute (or PC-relative) address loading sequence if GOT is disabled. If it's named "force_got_access" we report an error and reject the code if GOT is disabled. > I don't currently have time to test this, unfortunately, due to day job. > Might be able to give it a whirl one or two week later though... Unfortunately, I can't access my dev system via SSH too because while I'm remote, a sudden power surge happened and I forgot to configure an automatically power-on. I'm kind of rushy because I want to make it into 12.2, leaving 12.1 the only exception cannot build Linux >= 6.0. But maybe it just can't be backported anyway. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: 回复:[PATCH v5] LoongArch: add movable attribute
On Wed, 2022-08-03 at 11:10 +0800, Xi Ruoyao via Gcc-patches wrote: > > I'd like to wait for the kernel team to test the performance data of > > the two implementations before deciding whether to support this > > attribute. > > > > What do you think? > > Perhaps, I can't access my dev system now anyway (I've configured the > SSH access but then a sudden power surge happened and I didn't > configured automatically power on :( ) Hi folks, Can someone perform a bench to see if a four-instruction immediate load sequence can outperform GOT or vice versa? I cannot access my test system in at least 1 week, and I may be busy preparing Linux From Scratch 11.2 release in the remaining of August. Note: if the four-instruction immediate load sequence outperforms GOT, we should consider use immediate load instead of GOT for -fno-PIC by default. P.S. It seems I have trouble accessing gcc400.fsffrance.org. I have a C Farm account and I've already put Host gcc400.fsffrance.org Port 25465 in ~/.ssh/config, and I can access other C farm machines w/o problem. But: $ ssh gcc400.fsffrance.org xry...@gcc400.fsffrance.org: Permission denied (publickey,keyboard-interactive). If you know the administrator of the C farm machine, can you tell him to check the configuration? If I can access it I may use some time to perform the bench (in userspace of course) myself. Thanks. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: 回复:[PATCH v5] LoongArch: add movable attribute
On Fri, 2022-08-05 at 09:05 +0800, Lulu Cheng wrote: > I'm working on the implementation of specifing attributes of variables for > other architectures. > If the address is obtained through the GOT table and 4 instructions, there is > not much difference in performance. In this case I still prefer a GOT table entry because for 4-instruction absolute addressing sequence we'll need to implement 4 new relocation types in the kernel module loader. > Is it more reasonable for us to refer to the implementation of the model > attribute under the IA64 architecture? Maybe we can use "model(got)", "model(abs)", "model(pcrel)" etc. > I will compare the performance of the two soon. Do you know the approximate > release date of GCC 12.2? > I also want to fix this before 12.2 is released. GCC 12.2 rc1 will be frozen on Aug 12th. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: 回复:[PATCH v5] LoongArch: add movable attribute
On Fri, 2022-08-05 at 10:38 +0800, Lulu Cheng wrote: > > > I'm working on the implementation of specifing attributes of variables > > > for other architectures. > > > If the address is obtained through the GOT table and 4 instructions, > > > there is not much difference in performance. > > In this case I still prefer a GOT table entry because for 4-instruction > > absolute addressing sequence we'll need to implement 4 new relocation > > types in the kernel module loader. > If it is accessed through the GOT table, dynamic relocation is required when > the module is loaded. Dynamic relocation is required when the module is loaded anyway. The .ko modules are actually relocatable ELF objects (produced by ld -r) and the module loader has to perform some work of a normal linker. > And accessing the got table may have a cache miss. /* snip */ > So my idea is "model(normal)","model (large)" etc. Then should we have an option to disable GOT globally? Maybe for kernel we'll just "-mno-got -mcmodel=large" (or "extreme"? The kernel image is loaded at 0x9000 and the modules are above 0x so we need to handle 64-bit offset). -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: 回复:[PATCH v5] LoongArch: add movable attribute
On Fri, 2022-08-05 at 10:51 +0800, Xi Ruoyao via Gcc-patches wrote: > > If it is accessed through the GOT table, dynamic relocation is required > > when the module is loaded. > > Dynamic relocation is required when the module is loaded anyway. The > .ko modules are actually relocatable ELF objects (produced by ld -r) and > the module loader has to perform some work of a normal linker. > > > And accessing the got table may have a cache miss. > > /* snip */ > > > So my idea is "model(normal)","model (large)" etc. > > Then should we have an option to disable GOT globally? Maybe for kernel > we'll just "-mno-got -mcmodel=large" (or "extreme"? The kernel image is > loaded at 0x9000 and the modules are above > 0x so we need to handle 64-bit offset). Or maybe we should just use a PC-relative addressing with 4 instructions instead of GOT for -fno-PIC? Both way consumes 16 bytes (4 instructions for PC-relative, 2 instructions and a 64-bit GOT entry for GOT) and PC- relative may be more cache friendly. But such a major change cannot be backported for 12.2 IMO. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: 回复:[PATCH v5] LoongArch: add movable attribute
On Fri, 2022-08-05 at 11:34 +0800, Xi Ruoyao via Gcc-patches wrote: > Or maybe we should just use a PC-relative addressing with 4 instructions > instead of GOT for -fno-PIC? Not possible, Glibc does not support R_LARCH_PCALA* relocations in ld.so. So we still need a -mno-got (or something) option to disable GOT for special cases like the kernel. > Both way consumes 16 bytes (4 instructions > for PC-relative, 2 instructions and a 64-bit GOT entry for GOT) and PC- > relative may be more cache friendly. But such a major change cannot be > backported for 12.2 IMO. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: 回复:[PATCH v5] LoongArch: add movable attribute
On Fri, 2022-08-05 at 12:01 +0800, Lulu Cheng wrote: > > 在 2022/8/5 上午11:45, Xi Ruoyao 写道: > > > > > > On Fri, 2022-08-05 at 11:34 +0800, Xi Ruoyao via Gcc-patches wrote: > > > > > > > > > > > > > Or maybe we should just use a PC-relative addressing with 4 instructions > > > instead of GOT for -fno-PIC? > > Not possible, Glibc does not support R_LARCH_PCALA* relocations in > > ld.so. So we still need a -mno-got (or something) option to disable GOT > > for special cases like the kernel. > > > > > > > > > > > > > Both way consumes 16 bytes (4 instructions > > > for PC-relative, 2 instructions and a 64-bit GOT entry for GOT) and PC- > > > relative may be more cache friendly. But such a major change cannot be > > > backported for 12.2 IMO. > > > > > I'm very sorry, my understanding of the precpu variable is wrong, > I just read the code of the kernel you submitted, this precpu variable > not only has a large offset but also has an uncertain address when compiling, > so no matter whether it is addressed with pcrel Still got addressing needs > dynamic relocation when loading. It seems that accessing through the got table > is a better choice. > > The name movable is also very vivid to describe this function in the kernel, > indicating that the address of the variable can be changed at will. > > But this name is more difficult to understand in gcc, I have no opinion on > other, > can this name be changed? Yes, we don't need to be compatible with old vendor compiler IMO. "force_got_access" as Xuerui suggested? -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: 回复:[PATCH v5] LoongArch: add movable attribute
On Fri, 2022-08-05 at 15:58 +0800, Lulu Cheng wrote: > I think the model of precpu is not very easy to describe. > model(got)?model(global)? > I also want to use attribute model and -mcmodel together, but this is just an > initial idea, > what do you think? It seems I had some misunderstanding about IA-64 model attribute. IA-64 actually does not have -mcmodel= options. And a code model only specifies where "the GOT and the local symbols" are, but our new attribute should apply for both local symbols and global symbols. So I don't think we should strongly bind the new attribute and -mcmodel. Maybe, __attribute__((addressing_model(got/pcrel32/pcrel64/abs32/abs64)) ? I think they are explicit enough (we can implement got and pc32 first, and adding the others when we implement other code models). -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: 回复:[PATCH v5] LoongArch: add movable attribute
Sorry for late reply, I'm rebuilding my entire Linux system (from scratch) for Glibc-2.36 and Binutils-2.39 update and I just reached the mail client. On Mon, 2022-08-08 at 12:53 +0800, Lulu Cheng wrote: > I still think it makes a little bit more sense to put attribute(model) > and -mcmodel together. > > -mcmodel sets the access range of all symbols in a single fileand > attribute (model) sets the > > accsess range of a single symbol in a file. For example > __attribute__((model(normal/large/extreme))). It might make sense, but then it would not be what we want for per-CPU symbols. What we want here is "treat a local symbol as-if it's global", while each code model may already treat local symbol and global symbol differently. Disambiguation: here "local" means "defined in this TU", "global" otherwise (not "local variable" in C). I'll send v6 with the name "addr_global" if no objection. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: 回复:[PATCH v5] LoongArch: add movable attribute
On Tue, 2022-08-09 at 21:03 +0800, Lulu Cheng wrote: > > 在 2022/8/9 下午7:30, Xi Ruoyao 写道: > > > > > > Sorry for late reply, I'm rebuilding my entire Linux system (from > > scratch) for Glibc-2.36 and Binutils-2.39 update and I just reached the > > mail client. > > > > On Mon, 2022-08-08 at 12:53 +0800, Lulu Cheng wrote: > > > > > > > > > > > I still think it makes a little bit more sense to put attribute(model) > > > and -mcmodel together. > > > > > > -mcmodel sets the access range of all symbols in a single fileand > > > attribute (model) sets the > > > > > > accsess range of a single symbol in a file. For example > > > __attribute__((model(normal/large/extreme))). > > It might make sense, but then it would not be what we want for per-CPU > > symbols. What we want here is "treat a local symbol as-if it's global", > > while each code model may already treat local symbol and global symbol > > differently. > > > > Disambiguation: here "local" means "defined in this TU", "global" > > otherwise (not "local variable" in C). > > > > I'll send v6 with the name "addr_global" if no objection. > > > I am implementing the mode of cmodel=extreme. > In this mode, the value of the relative offset is a signed 64-bit value, > so this can solve the access problem of the variables of the kernel precpu. > So I wonder if it is necessary to add another attribute like addr_global? If we use GOT I can implement only PC_HI20 and PC_LO12 relocs in kernel module loader. If we use extreme I'll need to implement 4 ABS relocations along with them. But "the less the better" is not a very strong reason anyway. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH v1] LoongArch:Implement 128-bit floating point functions in gcc.
On Mon, 2023-08-07 at 12:01 +0800, chenxiaolong wrote: > +/* Count the number of functions with "q" as the suffix */ > +static int MATHQ_NUMS=(int)LARCH_MAX_FTYPE_MAX-(int)LARCH_BUILTIN_HUGE_VALQ; This is obviously not the GCC coding standard... It should have some white spaces: static int MATHQ_NUMS = (int)LARCH_MAX_FTYPE_MAX - (int)LARCH_BUILTIN_HUGE_VALQ; And I guess this variable should be declared const. > +/* Define an float to do funciton huge_valq*/ > +#define FLOAT_BUILTIN_HUGE(INSN, FUNCTION_TYPE) \ > +{ CODE_FOR_ ## INSN, \ > +"__builtin_" #INSN, LARCH_BUILTIN_HUGE_DIRECT,\ > +FUNCTION_TYPE, loongarch_builtin_avail_default } /* snip */ > +/* Define an float to do funciton nansq*/ > +#define FLOAT_BUILTIN_NANSQ(INSN, FUNCTION_TYPE) \ > +{ CODE_FOR_ ## INSN, \ > +"__builtin_" #INSN, LARCH_BUILTIN_NANSQ_DIRECT, \ > +FUNCTION_TYPE, loongarch_builtin_avail_default } What's the point to define these macros each is only used once? > + tree type,ftype; > + tree const_string_type > + > =build_pointer_type(build_qualified_type(char_type_node,TYPE_QUAL_CONST)); Really bad format. In GNU coding standard you should have a white space after '=', and before '(', etc. Please fix the formatting everywhere. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH v2] LoongArch:Implement 128-bit floating point functions in gcc.
On Tue, 2023-08-08 at 10:09 +0800, chenxiaolong wrote: > +/* Count the number of functions with "q" as the suffix. */ > +const int MATHQ_NUMS=(int)LARCH_MAX_FTYPE_MAX-(int)LARCH_BUILTIN_HUGE_VALQ; Format issue still not fixed. > +__float128 nanq (const char * str) > +{ > + union _FP_UNION_Q nan; > + nan.bits.frac0 = 0; > + nan.bits.frac1 = 0; > + nan.bits.exp = 0x7FFF; > + nan.bits.sign = 1; > + if (str != NULL && strlen (str) > 0) > + return nan.flt; > + return 0; > +} I don't think the logic is correct. __builtin_nanq("") should return a NaN, not 0. > + if (str != NULL && strlen (str) > 0) > + return nan.flt; Indent is 2, not 4. And we don't need to check "str != NULL" here. Calling nan()-family functions with a null tagp is deemed undefined behavior. > +__float128 nansq (const char *str) > +{ > + union _FP_UNION_Q nan; > + nan.bits.frac0 = 0; > + nan.bits.frac1 = 0; > + nan.bits.exp = 0x7FFF; > + nan.bits.sign = 1; > + if (str != NULL && strlen (str) > 0) > + return nan.flt; > + return 0; > +} Same logic error. And this seems exactly same as nanq, the analogous is definitely wrong because __builtin_nanq should return a quiet NaN, but __builtin_nansq should return a signaling NaN. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH v2] LoongArch:Implement 128-bit floating point functions in gcc.
On Tue, 2023-08-08 at 10:24 +0800, Xi Ruoyao wrote: And I think this way to implement these functions (using libgcc calls) is not the best. On 64-bit LoongArch a __float128 is stored in a pair of GPR, so operations like copysignq and absq can be implemented much more efficiently by expanding them using bstrins and bstrpick instructions in the compiler. For example: __float128 absq (__float128 val) { return __builtin_absq (val); } should be compiled to: bstrins.d $a1, $zero, 63, 63 jr $ra Instead of b __fabstf2 (perhaps, unless -Os). > > +__float128 nanq(const char * str) Using "nanq" as the symbol name is unacceptable as well. Use "__nanq" or something. "nanq" is not reserved for implementation, so it may cause a conflict in the future if "nanq" finally become a standard function or the users defines their own "nanq" function. > -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH 2/9] LoongArch: Fix default ISA setting
On Sun, 2023-08-06 at 20:49 +0800, Jiajie Chen via Gcc-patches wrote: > When loongarch_arch_target is called, la_target has not been > initialized, thus the macro LARCH_ACTUAL_ARCH always equals to zero. > > This commit fixes by expanding the macro and reading the latest value. > It permits -march=loongarch64 when the default target is loongarch32 and > vice versa. > > gcc/ChangeLog: > > * config/loongarch/loongarch-opts.cc (loongarch_config_target): > Fix -march detection. Nit: the first letter 'F' of the second line should align with '*' of the first line, not 'c'. /* snip */ > diff --git a/gcc/testsuite/gcc.target/loongarch/arch-3.c > b/gcc/testsuite/gcc.target/loongarch/arch-3.c > new file mode 100644 > index 000..543b93883bd > --- /dev/null > +++ b/gcc/testsuite/gcc.target/loongarch/arch-3.c > @@ -0,0 +1,6 @@ > +/* { dg-do compile } */ > +/* { dg-options "-march=loongarch64 -mabi=ilp32d" } */ > +int foo() > +{ > +} > +/* { dg-error "unable to implement ABI 'ilp32d' with instruction set > 'la64/fpu64'" "" { target *-*-* } 0 } */ This is just wrong. It's absolutely possible to implement ilp32d with la64/fpu64. LoongArch *.w instructions are always 32-bit operations, no matter on LA32 or LA64. They are different from RISC-V where many instructions operate on 32-bit integers on RV32 but 64-bit integers on RV64. If you don't want to spend your time to implement it you should use `sorry ("%<-mabi=ilp32d%> is not implemented for la64");` instead. Yes, I know there is some (mis)uses of TARGET_64BIT in the config/loongarch code where TARGET_ABI_LP64 should be actually used instead. They are bugs preventing us from implementing -mabi=ilp32d - march=loongarch64 and they should be fixed. They are not our excuse to blindly "simulate" what RISC-V has. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH v2 01/14] LoongArch: Introduce loongarch32 target
On Wed, 2023-08-09 at 19:46 +0800, Jiajie Chen wrote: > + builtin_define ("_ABILP32=3"); > + builtin_define ("_LOONGARCH_SIM=_ABILP32"); Let's remove them. These MIPS-style definitions are deprecated: https://github.com/loongson/LoongArch-Documentation/pull/28. Unfortunately for LP64 ABI _ABILP64 is already a part of public API. I've tried to raise a deprecation warning for them, but it seems doing so needs a major change in libcpp... However ILP32 ABI is "fresh new" so we should take the advantage to remove the historic burden. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH v2 11/14] LoongArch: Mark am* instructions as LA64-only
On Wed, 2023-08-09 at 19:46 +0800, Jiajie Chen wrote: > LoongArch32 only provides basic ll/sc instructions for atomic > operations. Mark am* atomic instructions as 64-bit only. I'd prefer using a different symbol, say TARGET_LOONGARCH_AM here. Then it would be easier to adjust the code if we have a LA32 core with am* support in the future. For now we can just #define TARGET_LOONGARCH_AM TARGET_64BIT. > gcc/ChangeLog: > > * config/loongarch.sync.md: Guard am* atomic insns by > TARGET_64BIT. > --- > gcc/config/loongarch/sync.md | 10 +- > 1 file changed, 5 insertions(+), 5 deletions(-) > > diff --git a/gcc/config/loongarch/sync.md b/gcc/config/loongarch/sync.md > index 9924d522bcd..151b553bcc6 100644 > --- a/gcc/config/loongarch/sync.md > +++ b/gcc/config/loongarch/sync.md > @@ -77,7 +77,7 @@ > [(match_operand:GPR 1 "reg_or_0_operand" "rJ") > (match_operand:SI 2 "const_int_operand")] ;; model > UNSPEC_ATOMIC_STORE))] > - "" > + "TARGET_64BIT" > "amswap%A2.\t$zero,%z1,%0" > [(set (attr "length") (const_int 8))]) > > @@ -88,7 +88,7 @@ > (match_operand:GPR 1 "reg_or_0_operand" "rJ")) > (match_operand:SI 2 "const_int_operand")] ;; model > UNSPEC_SYNC_OLD_OP))] > - "" > + "TARGET_64BIT" > "am%A2.\t$zero,%z1,%0" > [(set (attr "length") (const_int 8))]) > > @@ -101,7 +101,7 @@ > (match_operand:GPR 2 "reg_or_0_operand" "rJ")) > (match_operand:SI 3 "const_int_operand")] ;; model > UNSPEC_SYNC_OLD_OP))] > - "" > + "TARGET_64BIT" > "am%A3.\t%0,%z2,%1" > [(set (attr "length") (const_int 8))]) > > @@ -113,7 +113,7 @@ > UNSPEC_SYNC_EXCHANGE)) > (set (match_dup 1) > (match_operand:GPR 2 "register_operand" "r"))] > - "" > + "TARGET_64BIT" > "amswap%A3.\t%0,%z2,%1" > [(set (attr "length") (const_int 8))]) > > @@ -182,7 +182,7 @@ > [(match_operand:QI 0 "register_operand" "") ;; bool output > (match_operand:QI 1 "memory_operand" "+ZB") ;; memory > (match_operand:SI 2 "const_int_operand" "")] ;; model > - "" > + "TARGET_64BIT" > { > /* We have no QImode atomics, so use the address LSBs to form a mask, > then use an aligned SImode atomic. */ -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions.
On Thu, 2023-08-10 at 09:11 +0800, liuhongt via Gcc-patches wrote: > Currently we have 3 different independent tunes for gather > "use_gather,use_gather_2parts,use_gather_4parts", > similar for scatter, there're > "use_scatter,use_scatter_2parts,use_scatter_4parts" > > The patch support 2 standardizing options to enable/disable > vectorization for all gather/scatter instructions. The options is > interpreted by driver to 3 tunes. > > bootstrapped and regtested on x86_64-pc-linux-gnu. > Ok for trunk? And should we set -mno-gather as the default for GDS affected processors? We'll likely apply the ucode update for them, and then the gathering instructions will be much slower. > gcc/ChangeLog: > > * config/i386/i386.h (DRIVER_SELF_SPECS): Add > GATHER_SCATTER_DRIVER_SELF_SPECS. > (GATHER_SCATTER_DRIVER_SELF_SPECS): New macro. > * config/i386/i386.opt (mgather): New option. > (mscatter): Ditto. > --- > gcc/config/i386/i386.h | 12 +++- > gcc/config/i386/i386.opt | 8 > 2 files changed, 19 insertions(+), 1 deletion(-) > > diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h > index ef342fcee9b..d9ac2c29bde 100644 > --- a/gcc/config/i386/i386.h > +++ b/gcc/config/i386/i386.h > @@ -565,7 +565,17 @@ extern GTY(()) tree x86_mfence; > # define SUBTARGET_DRIVER_SELF_SPECS "" > #endif > > -#define DRIVER_SELF_SPECS SUBTARGET_DRIVER_SELF_SPECS > +#ifndef GATHER_SCATTER_DRIVER_SELF_SPECS > +# define GATHER_SCATTER_DRIVER_SELF_SPECS \ > + "%{mno-gather:-mtune- > ctrl=^use_gather_2parts,^use_gather_4parts,^use_gather} \ > + %{mgather:-mtune- > ctrl=use_gather_2parts,use_gather_4parts,use_gather} \ > + %{mno-scatter:-mtune- > ctrl=^use_scatter_2parts,^use_scatter_4parts,^use_scatter} \ > + %{mscatter:-mtune- > ctrl=use_scatter_2parts,use_scatter_4parts,use_scatter}" > +#endif > + > +#define DRIVER_SELF_SPECS \ > + SUBTARGET_DRIVER_SELF_SPECS " " \ > + GATHER_SCATTER_DRIVER_SELF_SPECS > > /* -march=native handling only makes sense with compiler running on > an x86 or x86_64 chip. If changing this condition, also change > diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt > index ddb7f110aa2..99948644a8d 100644 > --- a/gcc/config/i386/i386.opt > +++ b/gcc/config/i386/i386.opt > @@ -424,6 +424,14 @@ mdaz-ftz > Target > Set the FTZ and DAZ Flags. > > +mgather > +Target > +Enable vectorization for gather instruction. > + > +mscatter > +Target > +Enable vectorization for scatter instruction. > + > mpreferred-stack-boundary= > Target RejectNegative Joined UInteger > Var(ix86_preferred_stack_boundary_arg) > Attempt to keep stack aligned to this power of 2. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH] rtl-optimization/110939 Really fix narrow comparison of memory and constant
On Thu, 2023-08-10 at 15:04 +0200, Stefan Schulze Frielinghaus via Gcc- patches wrote: > In the former fix in commit 41ef5a34161356817807be3a2e51fbdbe575ae85 I > completely missed the fact that the normal form of a generated constant for a > mode with fewer bits than in HOST_WIDE_INT is a sign extended version of the > actual constant. This even holds true for unsigned constants. > > Fixed by masking out the upper bits for the incoming constant and sign > extending the resulting unsigned constant. > > Bootstrapped and regtested on x64 and s390x. Ok for mainline? The patch fails to apply: patching file gcc/combine.cc Hunk #1 FAILED at 11923. Hunk #2 FAILED at 11962. It looks like some indents are tabs in the source file, but white spaces in the patch. > While reading existing optimizations in combine I stumbled across two > optimizations where either my intuition about the representation of > unsigned integers via a const_int rtx is wrong, which then in turn would > probably also mean that this patch is wrong, or that the optimizations > are missed sometimes. In other words in the following I would assume > that the upper bits are masked out: > > diff --git a/gcc/combine.cc b/gcc/combine.cc > index 468b7fde911..80c4ff0fbaf 100644 > --- a/gcc/combine.cc > +++ b/gcc/combine.cc > @@ -11923,7 +11923,7 @@ simplify_compare_const (enum rtx_code code, > machine_mode mode, > /* (unsigned) < 0x8000 is equivalent to >= 0. */ > else if (is_a (mode, &int_mode) > && GET_MODE_PRECISION (int_mode) - 1 < HOST_BITS_PER_WIDE_INT > - && ((unsigned HOST_WIDE_INT) const_op > + && (((unsigned HOST_WIDE_INT) const_op & GET_MODE_MASK > (int_mode)) > == HOST_WIDE_INT_1U << (GET_MODE_PRECISION (int_mode) - > 1))) > { > const_op = 0; > @@ -11962,7 +11962,7 @@ simplify_compare_const (enum rtx_code code, > machine_mode mode, > /* (unsigned) >= 0x8000 is equivalent to < 0. */ > else if (is_a (mode, &int_mode) > && GET_MODE_PRECISION (int_mode) - 1 < HOST_BITS_PER_WIDE_INT > - && ((unsigned HOST_WIDE_INT) const_op > + && (((unsigned HOST_WIDE_INT) const_op & GET_MODE_MASK > (int_mode)) > == HOST_WIDE_INT_1U << (GET_MODE_PRECISION (int_mode) - > 1))) > { > const_op = 0; > > For example, while bootstrapping on x64 the optimization is missed since > a LTU comparison in QImode is done and the constant equals > 0xff80. > > Sorry for inlining another patch, but I would really like to make sure > that my understanding is correct, now, before I come up with another > patch. Thus it would be great if someone could shed some light on this. > > gcc/ChangeLog: > > * combine.cc (simplify_compare_const): Properly handle unsigned > constants while narrowing comparison of memory and constants. > --- > gcc/combine.cc | 19 ++- > 1 file changed, 10 insertions(+), 9 deletions(-) > > diff --git a/gcc/combine.cc b/gcc/combine.cc > index e46d202d0a7..468b7fde911 100644 > --- a/gcc/combine.cc > +++ b/gcc/combine.cc > @@ -12003,14 +12003,15 @@ simplify_compare_const (enum rtx_code code, > machine_mode mode, > && !MEM_VOLATILE_P (op0) > /* The optimization makes only sense for constants which are big enough > so that we have a chance to chop off something at all. */ > - && (unsigned HOST_WIDE_INT) const_op > 0xff > - /* Bail out, if the constant does not fit into INT_MODE. */ > - && (unsigned HOST_WIDE_INT) const_op > - < ((HOST_WIDE_INT_1U << (GET_MODE_PRECISION (int_mode) - 1) << 1) - > 1) > + && ((unsigned HOST_WIDE_INT) const_op & GET_MODE_MASK (int_mode)) > > 0xff > /* Ensure that we do not overflow during normalization. */ > - && (code != GTU || (unsigned HOST_WIDE_INT) const_op < > HOST_WIDE_INT_M1U)) > + && (code != GTU > + || ((unsigned HOST_WIDE_INT) const_op & GET_MODE_MASK (int_mode)) > + < HOST_WIDE_INT_M1U) > + && trunc_int_for_mode (const_op, int_mode) == const_op) > { > - unsigned HOST_WIDE_INT n = (unsigned HOST_WIDE_INT) const_op; > + unsigned HOST_WIDE_INT n > + = (unsigned HOST_WIDE_INT) const_op & GET_MODE_MASK (int_mode); > enum rtx_code adjusted_code; > > /* Normalize code to either LEU or GEU. */ > @@ -12051,15 +12052,15 @@ simplify_compare_const (enum rtx_code code, > machine_mode mode, > HOST_WIDE_INT_PRINT_HEX ") to (MEM %s " > HOST_WIDE_INT_PRINT_HEX ").\n", GET_MODE_NAME (int_mode), > GET_MODE_NAME (narrow_mode_iter), GET_RTX_NAME (code), > - (unsigned HOST_WIDE_INT)const_op, GET_RTX_NAME > (adjusted_code), > - n); > + (unsigned HOST_WIDE_INT) const_op & GET_MODE_MASK (int_mode), > + GET_RTX_NAME (adjusted_code), n); > } >
Re: [PATCH v1 4/6] LoongArch: use -mstrict-align by default when building libraries
On Mon, 2023-08-14 at 11:57 +0800, Yang Yujie wrote: > LoongArch processors may not support memory accesses without natural > alignments. Building libraries with -mstrict-align may help with > toolchain binary compatiblity and performance on these implementations > (e.g. Loongson 2K1000LA). I don't think it's a good idea. You should provide a configuration-time option (maybe named --with-strict-align) to make -mstrict-align the default instead, thus both the libraries and the compiled user code will be suitable for 2K1000. > With this patch, no significant performance degredation is observed on > current mainstream LoongArch processors. > > gcc/ChangeLog: > > * gcc/config/t-linux: add -mstrict-align via self_specs > when building GCC libraries. > --- > gcc/config/loongarch/t-linux | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/gcc/config/loongarch/t-linux b/gcc/config/loongarch/t-linux > index 75bb430c555..2a170d600a9 100644 > --- a/gcc/config/loongarch/t-linux > +++ b/gcc/config/loongarch/t-linux > @@ -35,6 +35,9 @@ gen_mlib_spec = $(if $(word 2,$1),\ > # clean up the result of DRIVER_SELF_SPEC to avoid conflict > lib_build_self_spec = % > +# build libraries with -mstrict-align by default > +lib_build_self_spec += -mstrict-align > + > # append user-specified build options from --with-multilib-list > lib_build_self_spec += $(foreach mlib,$(subst $(comma), > ,$(TM_MULTILIB_CONFIG)),\ > $(call gen_mlib_spec,$(subst /, ,$(mlib -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH v1 2/6] LoongArch: improved target configuration interface
On Mon, 2023-08-14 at 11:57 +0800, Yang Yujie wrote: > The configure script and the GCC driver are updated so that > it is easier to customize and control GCC builds for targeting > different LoongArch implementations. > > * Support options for LoongArch SIMD extensions: > new configure options --with-simd={none,lsx,lasx}; > new driver options -m[no]-l[a]sx / -msimd={none,lsx,lasx}. What's the relationship between -mlasx and -msimd=lasx? What will happen if the user specifies -mlasx -msimd=none or -mlasx -msimd=lsx? -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH v1 3/6] LoongArch: define preprocessing macros "__loongarch_{arch,tune}"
On Mon, 2023-08-14 at 11:57 +0800, Yang Yujie wrote: > These are exported according to the LoongArch Toolchain Conventions[1] > as a replacement of the obsolete "_LOONGARCH_{ARCH,TUNE}" macros, > which are expanded to strings representing the actual architecture > and microarchitecture of the target. > > [1] currently relased at https://github.com/loongson/LoongArch-Documentation > /blob/main/docs/LoongArch-toolchain-conventions-EN.adoc > > gcc/ChangeLog: > > * gcc/config/loongarch/loongarch-c.cc: Export macros > "__loongarch_{arch,tune}" in the preprocessor. Ok. I think this can be applied anyway (regardless of other patches). > --- > gcc/config/loongarch/loongarch-c.cc | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/gcc/config/loongarch/loongarch-c.cc > b/gcc/config/loongarch/loongarch-c.cc > index 660c68f0e06..7bee037cc4a 100644 > --- a/gcc/config/loongarch/loongarch-c.cc > +++ b/gcc/config/loongarch/loongarch-c.cc > @@ -64,6 +64,9 @@ loongarch_cpu_cpp_builtins (cpp_reader *pfile) > LARCH_CPP_SET_PROCESSOR ("_LOONGARCH_ARCH", la_target.cpu_arch); > LARCH_CPP_SET_PROCESSOR ("_LOONGARCH_TUNE", la_target.cpu_tune); > > + LARCH_CPP_SET_PROCESSOR ("__loongarch_arch", la_target.cpu_arch); > + LARCH_CPP_SET_PROCESSOR ("__loongarch_tune", la_target.cpu_tune); > + > /* Base architecture / ABI. */ > if (TARGET_64BIT) > { -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH v1 2/6] LoongArch: improved target configuration interface
On Mon, 2023-08-14 at 11:57 +0800, Yang Yujie wrote: > loongarch64) > - tune_pattern="loongarch64|la464" > - tune_default="la464" > + tune_pattern="native|abi-default|loongarch64|la464" I think we can remove tune_pattern completely. There is no reason to limit --with-tune setting based on --with-arch setting. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH v1 1/6] LoongArch: a symmetric multilib subdir layout
On Mon, 2023-08-14 at 11:57 +0800, Yang Yujie wrote: > However, for LoongArch, we do not want such a "toplevel" library > installation since the default ABI may change. We expect all > multilib variants of libraries to be installed to their designated > ABI-specific subdirs (e.g. base/lp64d) of the GCC libdir, so that > the default ABI can be configured arbitrarily (with --with-abi) > while the gcc libdir layout stays consistent. This could be > helpful for the distribution packaging of GCC libraries. Have you tested a --disable-multilib configuration? To me with -- disable-configuration everything should be still in the toplevel directory, not any sub-directory. /* snip */ > ChangeLog: > > * config-ml.in: add loongarch support. Allow overriding Use a tab, not 8 white spaces. Likewise for all patches in the series. > toplevel multisubdir. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH v1 1/6] LoongArch: a symmetric multilib subdir layout
On Mon, 2023-08-14 at 13:38 +0800, Xi Ruoyao wrote: > > > However, for LoongArch, we do not want such a "toplevel" library > > installation since the default ABI may change. We expect all > > multilib variants of libraries to be installed to their designated > > ABI-specific subdirs (e.g. base/lp64d) of the GCC libdir, so that > > the default ABI can be configured arbitrarily (with --with-abi) > > while the gcc libdir layout stays consistent. This could be > > helpful for the distribution packaging of GCC libraries. > > Have you tested a --disable-multilib configuration? To me with -- > disable-configuration everything should be still in the toplevel I mean --disable-multilib configuration, not "--disable-configuration". > directory, not any sub-directory. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH v1 2/6] LoongArch: improved target configuration interface
On Mon, 2023-08-14 at 11:57 +0800, Yang Yujie wrote: > * Support options for LoongArch SIMD extensions: > new configure options --with-simd={none,lsx,lasx}; > new driver options -m[no]-l[a]sx / -msimd={none,lsx,lasx}. I suggest to rename --with-simd= to --with-ext= and accept a comma- separated ISA extension list, because we have non-SIMD ISA extensions. For example, "--with-ext=lasx,lbt" will make -mlasx, -mlsx (implied), and -mlbt the default. I prefer "-mlasx" over "-msimd=lasx" because "- mlasx" is shorter anyway (if there is no real reason to make -mlasx and -msimd=lasx two different things). -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH v1 2/6] LoongArch: improved target configuration interface
On Mon, 2023-08-14 at 13:58 +0800, Xi Ruoyao via Gcc-patches wrote: > On Mon, 2023-08-14 at 11:57 +0800, Yang Yujie wrote: > > * Support options for LoongArch SIMD extensions: > > new configure options --with-simd={none,lsx,lasx}; > > new driver options -m[no]-l[a]sx / -msimd={none,lsx,lasx}. > > I suggest to rename --with-simd= to --with-ext= and accept a comma- > separated ISA extension list, because we have non-SIMD ISA extensions. > For example, "--with-ext=lasx,lbt" will make -mlasx, -mlsx (implied), > and -mlbt the default. I prefer "-mlasx" over "-msimd=lasx" because "- > mlasx" is shorter anyway (if there is no real reason to make -mlasx and > -msimd=lasx two different things). Perhaps just "--with-feature" or "--with-loongarch-feature", then we can even fold -mstrict-align here, like "--with-feature=lbt,strict-align". -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH v1 1/6] LoongArch: a symmetric multilib subdir layout
On Mon, 2023-08-14 at 15:37 +0800, Yujie Yang wrote: > On Mon, Aug 14, 2023 at 01:38:40PM +0800, Xi Ruoyao wrote: > > On Mon, 2023-08-14 at 11:57 +0800, Yang Yujie wrote: > > > > > However, for LoongArch, we do not want such a "toplevel" library > > > installation since the default ABI may change. We expect all > > > multilib variants of libraries to be installed to their designated > > > ABI-specific subdirs (e.g. base/lp64d) of the GCC libdir, so that > > > the default ABI can be configured arbitrarily (with --with-abi) > > > while the gcc libdir layout stays consistent. This could be > > > helpful for the distribution packaging of GCC libraries. > > > > Have you tested a --disable-multilib configuration? To me with -- > > disable-configuration everything should be still in the toplevel > > directory, not any sub-directory. > > That's a good point, sorry I missed --disable-multilib here. > > However, you don't really need --disable-multilib since > the libraries are only built once in the default ABI configuration > as long as --with-multilib-list does not request anything more than > that. > > Maybe we should force-enabling multilib in all cases. I really don't like this. Why must I always remind my self "hey, this is LoongArch, there is a different directory layout" when I don't need multilib at all? -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH v1 2/6] LoongArch: improved target configuration interface
On Mon, 2023-08-14 at 16:44 +0800, Yujie Yang wrote: > I assume we all want: > > (1) -mlasx -mlsx -> enable LSX and LASX > (2) -mlasx -mno-lsx -> disable LSX and LASX > (3) -mno-lsx -mlasx -> enable LSX and LASX Yes. > Unless we declare -mlsx / -mlasx as driver deferred, AFAIK there is no other > way for > us to know the actual order of appearnce of all -m[no-]l[a]sx options on the > command > line. All we know from GCC's option system would be a final on/off state of > "lsx" > and a final on/off state of "lasx". But x86 does this correct; $ echo __AVX__ + __AVX2__ | LANG= cpp -E -mno-avx -mavx2 # 0 "" # 0 "" # 0 "" # 1 "/usr/include/stdc-predef.h" 1 3 4 # 0 "" 2 # 1 "" 1 + 1 so there must be a way to handle this... -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH v1 2/6] LoongArch: improved target configuration interface
On Mon, 2023-08-14 at 16:57 +0800, Yujie Yang wrote: > On Mon, Aug 14, 2023 at 04:49:11PM +0800, Xi Ruoyao wrote: > > On Mon, 2023-08-14 at 16:44 +0800, Yujie Yang wrote: > > > I assume we all want: > > > > > > (1) -mlasx -mlsx -> enable LSX and LASX > > > (2) -mlasx -mno-lsx -> disable LSX and LASX > > > (3) -mno-lsx -mlasx -> enable LSX and LASX > > > > Yes. > > > > > Unless we declare -mlsx / -mlasx as driver deferred, AFAIK there is no > > > other way for > > > us to know the actual order of appearnce of all -m[no-]l[a]sx options on > > > the command > > > line. All we know from GCC's option system would be a final on/off state > > > of "lsx" > > > and a final on/off state of "lasx". > > > > But x86 does this correct; > > > > $ echo __AVX__ + __AVX2__ | LANG= cpp -E -mno-avx -mavx2 > > # 0 "" > > # 0 "" > > # 0 "" > > # 1 "/usr/include/stdc-predef.h" 1 3 4 > > # 0 "" 2 > > # 1 "" > > 1 + 1 > > > > so there must be a way to handle this... > > > > -- > > Xi Ruoyao > > School of Aerospace Science and Technology, Xidian University > > Emm... What happens if you reverse the order? > > $ echo __AVX__ + __AVX2__ | LANG= cpp -E -mavx2 -mno-avx > > Anyways, I believe there may be other ways to implement this, but it would > require equally much effort (or even much more) that the current approach. > Especially considering the possiblity of future updates -- we now have a > framework for this sort of things. > > Meanwhile you confortably can stay away from -msimd= and use only > -mlsx / -mlasx. So...a matter of style maybe? I'm OK with that, but we need to document it clearly in invoke.texi. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH v1 1/6] LoongArch: a symmetric multilib subdir layout
On Mon, 2023-08-14 at 18:18 +0800, Yujie Yang wrote: > On Mon, Aug 14, 2023 at 03:48:53PM +0800, Xi Ruoyao wrote: > > On Mon, 2023-08-14 at 15:37 +0800, Yujie Yang wrote: > > > On Mon, Aug 14, 2023 at 01:38:40PM +0800, Xi Ruoyao wrote: > > > > On Mon, 2023-08-14 at 11:57 +0800, Yang Yujie wrote: > > > > > > > > > However, for LoongArch, we do not want such a "toplevel" library > > > > > installation since the default ABI may change. We expect all > > > > > multilib variants of libraries to be installed to their designated > > > > > ABI-specific subdirs (e.g. base/lp64d) of the GCC libdir, so that > > > > > the default ABI can be configured arbitrarily (with --with-abi) > > > > > while the gcc libdir layout stays consistent. This could be > > > > > helpful for the distribution packaging of GCC libraries. > > > > > > > > Have you tested a --disable-multilib configuration? To me with -- > > > > disable-configuration everything should be still in the toplevel > > > > directory, not any sub-directory. > > > > > > That's a good point, sorry I missed --disable-multilib here. > > > > > > However, you don't really need --disable-multilib since > > > the libraries are only built once in the default ABI configuration > > > as long as --with-multilib-list does not request anything more than > > > that. > > > > > > Maybe we should force-enabling multilib in all cases. > > > > I really don't like this. Why must I always remind my self "hey, this > > is LoongArch, there is a different directory layout" when I don't need > > multilib at all? > > > > AFAIK, the two main uses of the multisubdir layout are in the C++ > header directory and the GCC libdir (where libgcc.a resides), respectively. > The GCC libdir is fine since they are private to a user's GCC build. > However, the C++ header directory is shared across the system unless > an alternative sysroot is chosen, so the consisentency of the multilib > layout matters. The C++ header directory should also be considered private to the GCC build. AFAIK no distro supports "overwriting a part of the system", so you cannot just install a custom GCC build and overwrite the system C++ header directory. For a cross compiler, the C++ header directory is $prefix/$target_triple/include/c++/$gcc_version/$multi_dir, the C++ header in $sysroot/usr/include/c++ (if it ever exists) will not be used at all. > So theoretically, the toplevel libraries should have the same ABI under > the the target triplet. However, for many architectures, the > "--with-abi + MULTILIB_DEFAULT" scheme may cause the toplevel to be > configured to have different meanings. https://gcc.gnu.org/PR104085 is an example of the issue caused by the different meaning. > So I think it's also a reasonable approach that we just simply eliminate > the ambiguous toplevel libraries and use a symmetric layout instead. I don't like the inconsistency among different GCC ports. If all ports use the same approach I'll not object. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH v4 1/6] LoongArch: Add Loongson SX vector directive compilation framework.
I guess there is a merge conflict with Yujie's "-msimd=" patch and you may need to collaborate to resolve it. Maybe just add -msimd in this series. On Tue, 2023-08-15 at 09:05 +0800, Chenghui Pan wrote: > From: Lulu Cheng > > gcc/ChangeLog: > > * config/loongarch/genopts/loongarch-strings: Add compilation > framework. > * config/loongarch/genopts/loongarch.opt.in: Ditto. > * config/loongarch/loongarch-c.cc (loongarch_cpu_cpp_builtins): Ditto. > * config/loongarch/loongarch-def.c: Ditto. > * config/loongarch/loongarch-def.h (N_ISA_EXT_TYPES): Ditto. > (ISA_EXT_SIMD_LSX): Ditto. > (N_SWITCH_TYPES): Ditto. > (SW_LSX): Ditto. > (struct loongarch_isa): Ditto. > * config/loongarch/loongarch-driver.cc (APPEND_SWITCH): Ditto. > (driver_get_normalized_m_opts): Ditto. > * config/loongarch/loongarch-driver.h (driver_get_normalized_m_opts): > Ditto. > * config/loongarch/loongarch-opts.cc (loongarch_config_target): Ditto. > (isa_str): Ditto. > * config/loongarch/loongarch-opts.h (ISA_HAS_LSX): Ditto. > * config/loongarch/loongarch-str.h (OPTSTR_LSX): Ditto. > * config/loongarch/loongarch.opt: Ditto. /* snip */ -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH v1 1/6] LoongArch: a symmetric multilib subdir layout
On Mon, 2023-08-14 at 19:16 +0800, Xi Ruoyao wrote: > On Mon, 2023-08-14 at 18:18 +0800, Yujie Yang wrote: > > On Mon, Aug 14, 2023 at 03:48:53PM +0800, Xi Ruoyao wrote: > > > On Mon, 2023-08-14 at 15:37 +0800, Yujie Yang wrote: > > > > On Mon, Aug 14, 2023 at 01:38:40PM +0800, Xi Ruoyao wrote: > > > > > On Mon, 2023-08-14 at 11:57 +0800, Yang Yujie wrote: > > > > > > > > > > > However, for LoongArch, we do not want such a "toplevel" library > > > > > > installation since the default ABI may change. We expect all > > > > > > multilib variants of libraries to be installed to their designated > > > > > > ABI-specific subdirs (e.g. base/lp64d) of the GCC libdir, so that > > > > > > the default ABI can be configured arbitrarily (with --with-abi) > > > > > > while the gcc libdir layout stays consistent. This could be > > > > > > helpful for the distribution packaging of GCC libraries. > > > > > > > > > > Have you tested a --disable-multilib configuration? To me with -- > > > > > disable-configuration everything should be still in the toplevel > > > > > directory, not any sub-directory. > > > > > > > > That's a good point, sorry I missed --disable-multilib here. > > > > > > > > However, you don't really need --disable-multilib since > > > > the libraries are only built once in the default ABI configuration > > > > as long as --with-multilib-list does not request anything more than > > > > that. > > > > > > > > Maybe we should force-enabling multilib in all cases. > > > > > > I really don't like this. Why must I always remind my self "hey, this > > > is LoongArch, there is a different directory layout" when I don't need > > > multilib at all? > > > > > > > AFAIK, the two main uses of the multisubdir layout are in the C++ > > header directory and the GCC libdir (where libgcc.a resides), respectively. > > The GCC libdir is fine since they are private to a user's GCC build. > > However, the C++ header directory is shared across the system unless > > an alternative sysroot is chosen, so the consisentency of the multilib > > layout matters. > > The C++ header directory should also be considered private to the GCC > build. AFAIK no distro supports "overwriting a part of the system", so > you cannot just install a custom GCC build and overwrite the system C++ > header directory. For a cross compiler, the C++ header directory is > $prefix/$target_triple/include/c++/$gcc_version/$multi_dir, the C++ > header in $sysroot/usr/include/c++ (if it ever exists) will not be used > at all. > > > So theoretically, the toplevel libraries should have the same ABI under > > the the target triplet. However, for many architectures, the > > "--with-abi + MULTILIB_DEFAULT" scheme may cause the toplevel to be > > configured to have different meanings. > > https://gcc.gnu.org/PR104085 is an example of the issue caused by the > different meaning. > > > So I think it's also a reasonable approach that we just simply eliminate > > the ambiguous toplevel libraries and use a symmetric layout instead. > > I don't like the inconsistency among different GCC ports. If all ports > use the same approach I'll not object. I came up with another idea. What if we: 1. Keep the "default" ABI libs in the toplevel directory. There is *always* a default ABI so treating it specially is not really nonsense. 2. Create a symlink for consistency. For example, if --with-abi=lp64d, - -with-multilib-list=lp64d,lp64s: * /usr/lib/gcc/loongarch64-linux-gnu/14.0.0 contains the lp64d libraries. * /usr/lib/gcc/loongarch64-linux-gnu/14.0.0/lp64s contains the lp64s libraries. * /usr/lib/gcc/loongarch64-linux-gnu/14.0.0/lp64d is a symlink to "." Then we can refer to the lp64d libgcc.a with both /usr/lib/gcc/loongarch64-linux-gnu/14.0.0/lp64d/libgcc.a, and /usr/lib/gcc/loongarch64-linux-gnu/14.0.0/libgcc.a. For referring to the default multilib, the non-suffixed /usr/lib/gcc/loongarch64-linux-gnu/14.0.0 path should be used; for referring lp64d (no matter what the default is), /usr/lib/gcc/loongarch64-linux-gnu/14.0.0/lp64d should be used. The symlink can be created by the GCC building system or manually by the distro maintainer (or gcc packager). Thoughts? -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH v3] LoongArch:Implement 128-bit floating point functions in gcc.
Please fix code style (this is the third time I say it and I'm really frustrated now). GCC is a project, it's not a student homework so style matters. And it's not so difficult to fix the style: for a new file you can use "clang-format --style GNU -i filename.c" to do the work automatically. On Tue, 2023-08-15 at 18:39 +0800, chenxiaolong wrote: > In the implementation process, the "q" suffix function is > Re-register and associate the "__float128" type with the > "long double" type so that the compiler can handle the > corresponding function correctly. The functions implemented > include __builtin_{huge_valq infq, fabsq, copysignq, nanq,nansq}. > On the LoongArch architecture, __builtin_{fabsq,copysignq} can > be implemented with the instruction "bstrins.d", so that its > optimization effect reaches the optimal value. > > gcc/ChangeLog: > > * config/loongarch/loongarch-builtins.cc (DEF_LARCH_FTYPE): > (enum loongarch_builtin_type):Increases the type of the function. > (FLOAT_BUILTIN_HIQ):__builtin_{huge_valq,infq}. > (FLOAT_BUILTIN_FCQ):__builtin_{fabsq,copysignq}. > (FLOAT_BUILTIN_NNQ):__builtin_{nanq,nansq}. > (loongarch_init_builtins): > (loongarch_fold_builtin): > (loongarch_expand_builtin): > * config/loongarch/loongarch-protos.h (loongarch_fold_builtin): > (loongarch_c_mode_for_suffix):Add the declaration of the function. > * config/loongarch/loongarch.cc (loongarch_c_mode_for_suffix):Add > the definition of the function. > (TARGET_FOLD_BUILTIN): > (TARGET_C_MODE_FOR_SUFFIX): > * config/loongarch/loongarch.md (infq):Add an instruction template > to the machine description file to generate information such as > the icode used by the function and the constructor. > (): > (fabsq): > (copysignq): > > libgcc/ChangeLog: > > * config/loongarch/t-softfp-tf: > * config/loongarch/tf-signs.c: New file. > --- > gcc/config/loongarch/loongarch-builtins.cc | 168 - > gcc/config/loongarch/loongarch-protos.h | 2 + > gcc/config/loongarch/loongarch.cc | 14 ++ > gcc/config/loongarch/loongarch.md | 69 + > libgcc/config/loongarch/t-softfp-tf | 3 + > libgcc/config/loongarch/tf-signs.c | 59 > 6 files changed, 313 insertions(+), 2 deletions(-) > create mode 100644 libgcc/config/loongarch/tf-signs.c > > diff --git a/gcc/config/loongarch/loongarch-builtins.cc > b/gcc/config/loongarch/loongarch-builtins.cc > index b929f224dfa..2fb0fde0e3f 100644 > --- a/gcc/config/loongarch/loongarch-builtins.cc > +++ b/gcc/config/loongarch/loongarch-builtins.cc > @@ -36,6 +36,8 @@ along with GCC; see the file COPYING3. If not see > #include "fold-const.h" > #include "expr.h" > #include "langhooks.h" > +#include "calls.h" > +#include "explow.h" > > /* Macros to create an enumeration identifier for a function prototype. */ > #define LARCH_FTYPE_NAME1(A, B) LARCH_##A##_FTYPE_##B > @@ -48,9 +50,18 @@ enum loongarch_function_type > #define DEF_LARCH_FTYPE(NARGS, LIST) LARCH_FTYPE_NAME##NARGS LIST, > #include "config/loongarch/loongarch-ftypes.def" > #undef DEF_LARCH_FTYPE > + LARCH_BUILTIN_HUGE_VALQ, > + LARCH_BUILTIN_INFQ, > + LARCH_BUILTIN_FABSQ, > + LARCH_BUILTIN_COPYSIGNQ, > + LARCH_BUILTIN_NANQ, > + LARCH_BUILTIN_NANSQ, > LARCH_MAX_FTYPE_MAX > }; > > +/* Count the number of functions with "q" as the suffix. */ > +const int MATHQ_NUMS = (int)LARCH_MAX_FTYPE_MAX - > (int)LARCH_BUILTIN_HUGE_VALQ; > + > /* Specifies how a built-in function should be converted into rtl. */ > enum loongarch_builtin_type > { > @@ -63,6 +74,15 @@ enum loongarch_builtin_type > value and the arguments are mapped to operands 0 and above. */ > LARCH_BUILTIN_DIRECT_NO_TARGET, > > + /* The function corresponds to __builtin_{huge_valq,infq}. */ > + LARCH_BUILTIN_HIQ_DIRECT, > + > + /* The function corresponds to __builtin_{fabsq,copysignq}. */ > + LARCH_BUILTIN_FCQ_DIRECT, > + > + /* Define the type of the __builtin_{nanq,nansq} function. */ > + LARCH_BUILTIN_NNQ_DIRECT > + > }; > > /* Declare an availability predicate for built-in functions that require > @@ -136,6 +156,24 @@ AVAIL_ALL (hard_float, TARGET_HARD_FLOAT_ABI) > LARCH_BUILTIN (INSN, #INSN, LARCH_BUILTIN_DIRECT_NO_TARGET, \ > FUNCTION_TYPE, AVAIL) > > +/* Define an float to do funciton {huge_valq,infq}. */ > +#define FLOAT_BUILTIN_HIQ (INSN, FUNCTION_TYPE) \ > + { CODE_FOR_ ## INSN, \ > + "__builtin_" #INSN, LARCH_BUILTIN_HIQ_DIRECT, \ > + FUNCTION_TYPE, loongarch_builtin_avail_default } > + > +/* Define an float to do funciton {fabsq,copysignq}. */ > +#define FLOAT_BUILTIN_FCQ (INSN, FUNCTION_TYPE) \ > + { CODE_FOR_ ## INSN,
Re: [PATCH v4 0/6] Add Loongson SX/ASX instruction support to LoongArch target.
The implementation fails to handle this test case properly: typedef double __attribute__((vector_size(32))) v4df; void use1(double); __attribute__((noipa)) double use(double) { register double x asm("f24") = 114.514; __asm__("" : "+f" (x)); return x; } void test(void) { register v4df x asm("f24") = {1, 2, 3, 4}; __asm__("" : "+f" (x)); use(x[1]); use1(x[3]); } Here use() attempts to save and restore f24, but it uses fst.d/fld.d, clobbering the high 192 bits of xr24. Now test() passes a wrong value of x[3] to use1(). Note that saving and restoring f24 with xvst/xvld in use() won't really fix the issue because in real life use() can be in another translation unit (or even a shared library) compiled with -mno-lsx. So it seems we need to tell the compiler "a function call may clobber the high bits of a vector register even if the corresponding floating-point register is saved". I'm not sure how to accomplish this... On Tue, 2023-08-15 at 09:05 +0800, Chenghui Pan wrote: > This is an update of: > https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626194.html > > This version of patch set only introduces some small simplications of > implementation. Because I missed the size limitation of mail size, the > huge testsuite patches of v2 and v3 are not shown in the mail list. > So, > testsuite patches are splited from this patch set again and will be > submitted > independently in the future. > > Binutils-gdb introduced LSX/LASX support since 2.41 release: > https://lists.gnu.org/archive/html/info-gnu/2023-07/msg9.html > > Brief history of patch set version: > v1 -> v2: > - Reduce usage of "unspec" in RTL template. > - Append Support of ADDR_REG_REG in LSX and LASX. > - Constraint docs are appended in gcc/doc/md.texi and ccomment block. > - Codes related to vecarg are removed. > - Testsuite of LSX and LASX is added in v2. (Because of the size > limitation of > mail list, these patches are not shown) > - Adjust the loongarch_expand_vector_init() function to reduce > instruction > output amount. > - Some minor implementation changes of RTL templates. > > v2 -> v3: > - Revert vabsd/xvabsd RTL templates to unspec impl. > - Resolve warning in gcc/config/loongarch/loongarch.cc when > bootstrapping > with BOOT_CFLAGS="-O2 -ftree-vectorize -fno-vect-cost-model -mlasx". > - Remove redundant definitions in lasxintrin.h. > - Refine commit info. > > Lulu Cheng (6): > LoongArch: Add Loongson SX vector directive compilation framework. > LoongArch: Add Loongson SX base instruction support. > LoongArch: Add Loongson SX directive builtin function support. > LoongArch: Add Loongson ASX vector directive compilation framework. > LoongArch: Add Loongson ASX base instruction support. > LoongArch: Add Loongson ASX directive builtin function support. > > gcc/config.gcc | 2 +- > gcc/config/loongarch/constraints.md | 131 +- > .../loongarch/genopts/loongarch-strings | 4 + > gcc/config/loongarch/genopts/loongarch.opt.in | 12 +- > gcc/config/loongarch/lasx.md | 5122 > gcc/config/loongarch/lasxintrin.h | 5338 > + > gcc/config/loongarch/loongarch-builtins.cc | 2686 - > gcc/config/loongarch/loongarch-c.cc | 18 + > gcc/config/loongarch/loongarch-def.c | 6 + > gcc/config/loongarch/loongarch-def.h | 9 +- > gcc/config/loongarch/loongarch-driver.cc | 10 + > gcc/config/loongarch/loongarch-driver.h | 2 + > gcc/config/loongarch/loongarch-ftypes.def | 666 +- > gcc/config/loongarch/loongarch-modes.def | 39 + > gcc/config/loongarch/loongarch-opts.cc | 89 +- > gcc/config/loongarch/loongarch-opts.h | 3 + > gcc/config/loongarch/loongarch-protos.h | 35 + > gcc/config/loongarch/loongarch-str.h | 3 + > gcc/config/loongarch/loongarch.cc | 4586 +- > gcc/config/loongarch/loongarch.h | 117 +- > gcc/config/loongarch/loongarch.md | 56 +- > gcc/config/loongarch/loongarch.opt | 12 +- > gcc/config/loongarch/lsx.md | 4481 ++ > gcc/config/loongarch/lsxintrin.h | 5181 > gcc/config/loongarch/predicates.md | 333 +- > gcc/doc/md.texi | 11 + > 26 files changed, 28668 insertions(+), 284 deletions(-) > create mode 100644 gcc/config/loongarch/lasx.md > create mode 100644 gcc/config/loongarch/lasxintrin.h > create mode 100644 gcc/config/loongarch/lsx.md > create mode 100644 gcc/config/loongarch/lsxintrin.h > -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University