[Committed] RISC-V: Make known NITERS loop be aware of dynamic lmul cost model liveness information
Consider this following case: int f[12][100]; void bad1(int v1, int v2) { for (int r = 0; r < 100; r += 4) { int i = r + 1; f[0][r] = f[1][r] * (f[2][r]) - f[1][i] * (f[2][i]); f[0][i] = f[1][r] * (f[2][i]) + f[1][i] * (f[2][r]); f[0][r+2] = f[1][r+2] * (f[2][r+2]) - f[1][i+2] * (f[2][i+2]); f[0][i+2] = f[1][r+2] * (f[2][i+2]) + f[1][i+2] * (f[2][r+2]); } } Pick up LMUL = 8 VLS blindly: lui a4,%hi(f) addia4,a4,%lo(f) addisp,sp,-592 addia3,a4,800 lui a5,%hi(.LANCHOR0) vl8re32.v v24,0(a3) addia5,a5,%lo(.LANCHOR0) addia1,a4,400 addia3,sp,140 vl8re32.v v16,0(a1) vl4re16.v v4,0(a5) addia7,a5,192 vs4r.v v4,0(a3) addit0,a5,64 addia3,sp,336 li t2,32 addia2,a5,128 vsetvli a5,zero,e32,m8,ta,ma vrgatherei16.vv v8,v16,v4 vmul.vv v8,v8,v24 vl8re32.v v0,0(a7) vs8r.v v8,0(a3) vmsltu.vx v8,v0,t2 addia3,sp,12 addit2,sp,204 vsm.v v8,0(t2) vl4re16.v v4,0(t0) vl4re16.v v0,0(a2) vs4r.v v4,0(a3) addit0,sp,336 vrgatherei16.vv v8,v24,v4 addia3,sp,208 vrgatherei16.vv v24,v16,v0 vs4r.v v0,0(a3) vmul.vv v8,v8,v24 vlm.v v0,0(t2) vl8re32.v v24,0(t0) addia3,sp,208 vsub.vv v16,v24,v8 addit6,a4,528 vadd.vv v8,v24,v8 addit5,a4,928 vmerge.vvm v8,v8,v16,v0 addit3,a4,128 vs8r.v v8,0(a4) addit4,a4,1056 addit1,a4,656 addia0,a4,256 addia6,a4,1184 addia1,a4,784 addia7,a4,384 addia4,sp,140 vl4re16.v v0,0(a3) vl8re32.v v24,0(t6) vl4re16.v v4,0(a4) vrgatherei16.vv v16,v24,v0 addia3,sp,12 vs8r.v v16,0(t0) vl8re32.v v8,0(t5) vrgatherei16.vv v16,v24,v4 vl4re16.v v4,0(a3) vrgatherei16.vv v24,v8,v4 vmul.vv v16,v16,v8 vl8re32.v v8,0(t0) vmul.vv v8,v8,v24 vsub.vv v24,v16,v8 vlm.v v0,0(t2) addia3,sp,208 vadd.vv v8,v8,v16 vl8re32.v v16,0(t4) vmerge.vvm v8,v8,v24,v0 vrgatherei16.vv v24,v16,v4 vs8r.v v24,0(t0) vl4re16.v v28,0(a3) addia3,sp,464 vs8r.v v8,0(t3) vl8re32.v v8,0(t1) vrgatherei16.vv v0,v8,v28 vs8r.v v0,0(a3) addia3,sp,140 vl4re16.v v24,0(a3) addia3,sp,464 vrgatherei16.vv v0,v8,v24 vl8re32.v v24,0(t0) vmv8r.v v8,v0 vl8re32.v v0,0(a3) vmul.vv v8,v8,v16 vmul.vv v24,v24,v0 vsub.vv v16,v8,v24 vadd.vv v8,v8,v24 vsetivlizero,4,e32,m8,ta,ma vle32.v v24,0(a6) vsetvli a4,zero,e32,m8,ta,ma addia4,sp,12 vlm.v v0,0(t2) vmerge.vvm v8,v8,v16,v0 vl4re16.v v16,0(a4) vrgatherei16.vv v0,v24,v16 vsetivlizero,4,e32,m8,ta,ma vs8r.v v0,0(a4) addia4,sp,208 vl4re16.v v0,0(a4) vs8r.v v8,0(a0) vle32.v v16,0(a1) vsetvli a5,zero,e32,m8,ta,ma vrgatherei16.vv v8,v16,v0 vs8r.v v8,0(a4) addia4,sp,140 vl4re16.v v4,0(a4) addia5,sp,12 vrgatherei16.vv v8,v16,v4 vl8re32.v v0,0(a5) vsetivlizero,4,e32,m8,ta,ma addia5,sp,208 vmv8r.v v16,v8 vl8re32.v v8,0(a5) vmul.vv v24,v24,v16 vmul.vv v8,v0,v8 vsub.vv v16,v24,v8 vadd.vv v8,v8,v24 vsetvli a5,zero,e8,m2,ta,ma vlm.v v0,0(t2) vsetivlizero,4,e32,m8,ta,ma vmerge.vvm v8,v8,v16,v0 vse32.v v8,0(a7) addisp,sp,592 jr ra This patch makes loop with known NITERS be aware of liveness estimation, after this patch, choosing LMUL = 4: lui a5,%hi(f) addia5,a5,%lo(f) addia3,a5,400 addia4,a5,800 vsetivlizero,8,e32,m2,ta,ma vlseg4e32.v v16,(a3) vlseg4e32.v v8,(a4) vmul.vv v2,v8,v16 addia3,a5,528 vmv.v.v v24,v10 vnmsub.vv v24,v18,v2 addia4,a5,928 vmul.vv v2,v12,v22 vmul.vv v6,v8,v18 vmv.v.v v30,v2 vmacc.vvv30,v14,v20 vmv.v.v v26,v6 vmacc.vvv26,v10,v16 vmul.vv v4,v12,v20 vmv.v.v v28,v14 vnmsub.vv v28,v22,v4 vsseg4e32.v v24,(a5) vlseg4e32.v v16,(a3) vlseg4e32.v v8,(a4)
RE: [gcc-wwwdocs PATCH v2] gcc-13/14: Mention recent update for x86_64 backend
> -Original Message- > From: Haochen Jiang > Sent: Thursday, December 21, 2023 4:26 PM > To: gcc-patches@gcc.gnu.org > Cc: ubiz...@gmail.com; Liu, Hongtao ; > ger...@pfeifer.com > Subject: [gcc-wwwdocs PATCH v2] gcc-13/14: Mention recent update for > x86_64 backend > > Hi all, > > This is the v2 patch for the wwwdocs change regarding to review. > > If there is no objection, I will push this change next Tuesday. I will commit the doc change patch. Thx, Haochen > > Changes is v2: > > - Remove RAO-INT from Grand Ridge > - Remove the mask register restriction for -mno-evex512 > - Arrange the options alphabetically > - Other minor text change > > Thx, > Haochen > > Messages in v1: > > This patch will mention the following changes in wwwdocs for x86_64 > backend: > > - AVX10.1 support > - APX EGPR, PUSH2POP2, PPX and NDD support > - Xeon Phi ISAs deprecated > > Also I adjust the words in x86_64 part for GCC 13. > > --- > Mention AVX10.1 support, APX support and Xeon Phi deprecate in GCC 14. > Also adjust documentation in GCC 13. > --- > htdocs/gcc-13/changes.html | 38 -- > htdocs/gcc-14/changes.html | 27 ++- > 2 files changed, 42 insertions(+), 23 deletions(-) > > diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html index > d3bacc16..b4b1a39a 100644 > --- a/htdocs/gcc-13/changes.html > +++ b/htdocs/gcc-13/changes.html > @@ -543,24 +543,28 @@ You may also want to check out our >__bf16 type to x86 psABI. Users need to adjust their >AVX512BF16-related source code when upgrading GCC12 to GCC13. > > - New ISA extension support for Intel AVX-IFMA was added. > - AVX-IFMA intrinsics are available via the -mavxifma > + New ISA extension support for Intel AMX-COMPLEX was added. > + AMX-COMPLEX intrinsics are available via the > + -mamx-complex >compiler switch. > > - New ISA extension support for Intel AVX-VNNI-INT8 was added. > - AVX-VNNI-INT8 intrinsics are available via the - > mavxvnniint8 > + New ISA extension support for Intel AMX-FP16 was added. > + AMX-FP16 intrinsics are available via the -mamx-fp16 > + compiler switch. > + > + New ISA extension support for Intel AVX-IFMA was added. > + AVX-IFMA intrinsics are available via the -mavxifma >compiler switch. > >New ISA extension support for Intel AVX-NE-CONVERT was added. >AVX-NE-CONVERT intrinsics are available via the >-mavxneconvert compiler switch. > > - New ISA extension support for Intel CMPccXADD was added. > - CMPccXADD intrinsics are available via the -mcmpccxadd > + New ISA extension support for Intel AVX-VNNI-INT8 was added. > + AVX-VNNI-INT8 intrinsics are available via the > + -mavxvnniint8 >compiler switch. > > - New ISA extension support for Intel AMX-FP16 was added. > - AMX-FP16 intrinsics are available via the -mamx-fp16 > + New ISA extension support for Intel CMPccXADD was added. > + CMPccXADD intrinsics are available via the > + -mcmpccxadd >compiler switch. > >New ISA extension support for Intel PREFETCHI was added. > @@ -571,10 +575,6 @@ You may also want to check out our >RAO-INT intrinsics are available via the -mraoint >compiler switch. > > - New ISA extension support for Intel AMX-COMPLEX was added. > - AMX-COMPLEX intrinsics are available via the -mamx- > complex > - compiler switch. > - >GCC now supports the Intel CPU named Raptor Lake through > -march=raptorlake. > Raptor Lake is based on Alder Lake. > @@ -585,13 +585,13 @@ You may also want to check out our > >GCC now supports the Intel CPU named Sierra Forest through > -march=sierraforest. > -The switch enables the AVX-IFMA, AVX-VNNI-INT8, AVX-NE-CONVERT, > CMPccXADD, > -ENQCMD and UINTR ISA extensions. > +Based on ISA extensions enabled on Alder Lake, the switch further enables > +the AVX-IFMA, AVX-NE-CONVERT, AVX-VNNI-INT8, CMPccXADD, > ENQCMD and UINTR > +ISA extensions. > >GCC now supports the Intel CPU named Grand Ridge through > -march=grandridge. > -The switch enables the AVX-IFMA, AVX-VNNI-INT8, AVX-NE-CONVERT, > CMPccXADD, > -ENQCMD, UINTR and RAO-INT ISA extensions. > +Grand Ridge is based on Sierra Forest. > >GCC now supports the Intel CPU named Emerald Rapids through > -march=emeraldrapids. > @@ -599,11 +599,13 @@ You may also want to check out our > >GCC now supports the Intel CPU named Granite Rapids through > -march=graniterapids. > -The switch enables the AMX-FP16 and PREFETCHI ISA extensions. > +Based on Sapphire Rapids, the switch further enables the AMX-FP16 and > +PREFETCHI ISA extensions. > >GCC now supports the Intel CPU named Granite Rapids D through > -march=graniterapids-d. > -The switch enables the AMX-FP16, PREFETCHI and AMX-COMPLEX ISA > ext
[PATCH 0/2] When cmodel=extreme, add macro support and only
When cmodel=extreme, since the symbol address is obtained through four instructions, errors may occur in some cases during linking. Therefore, in order to ensure that the instructions for obtaining the symbol address are together, macro instructions are used to obtain the symbol address when cmodel=extreme. https://github.com/loongson/la-abi-specs/blob/release/laelf.adoc#extreme-code-model Lulu Cheng (2): LoongArch: Add the macro implementation of mcmodel=extreme. LoongArch: When the code model is extreme, the symbol address is obtained through macro instructions regardless of the value of -mexplicit-relocs. gcc/config/loongarch/loongarch.cc | 25 +- gcc/config/loongarch/loongarch.md | 47 ++- gcc/config/loongarch/predicates.md| 14 ++ .../gcc.target/loongarch/attr-model-1.c | 2 +- .../gcc.target/loongarch/attr-model-2.c | 2 +- .../gcc.target/loongarch/attr-model-3.c | 2 +- .../gcc.target/loongarch/attr-model-4.c | 2 +- .../loongarch/func-call-extreme-1.c | 6 +-- .../loongarch/func-call-extreme-2.c | 6 +-- .../loongarch/func-call-extreme-3.c | 6 +-- .../loongarch/func-call-extreme-4.c | 6 +-- .../loongarch/func-call-extreme-5.c | 7 +++ .../loongarch/func-call-extreme-6.c | 7 +++ 13 files changed, 102 insertions(+), 30 deletions(-) create mode 100644 gcc/testsuite/gcc.target/loongarch/func-call-extreme-5.c create mode 100644 gcc/testsuite/gcc.target/loongarch/func-call-extreme-6.c -- 2.39.3
[PATCH 2/2] LoongArch: When the code model is extreme, the symbol address is obtained through macro instructions regardless of the value of -mexplicit-relocs.
Instructions pcalau12i, addi.d, lu32i.d and lu52i.d must be adjancent so that the linker can infer the PC of pcalau12i to apply relocations to lu32i.d and lu52i.d. Otherwise, the results would be incorrect if these four instructions are not in the same 4KiB page. See the link for details: https://github.com/loongson/la-abi-specs/blob/release/laelf.adoc#extreme-code-model. gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_symbol_extreme_p): Add function declaration. (loongarch_explicit_relocs_p): Use the macro instruction to get the symbol address when loongarch_symbol_extreme_p returns true. gcc/testsuite/ChangeLog: * gcc.target/loongarch/attr-model-1.c: Modify the content of the search string in the test case. * gcc.target/loongarch/attr-model-2.c: Likewise. * gcc.target/loongarch/attr-model-3.c: Likewise. * gcc.target/loongarch/attr-model-4.c: Likewise. * gcc.target/loongarch/func-call-extreme-1.c: Likewise. * gcc.target/loongarch/func-call-extreme-2.c: Likewise. * gcc.target/loongarch/func-call-extreme-3.c: Likewise. * gcc.target/loongarch/func-call-extreme-4.c: Likewise. --- gcc/config/loongarch/loongarch.cc | 11 +++ gcc/testsuite/gcc.target/loongarch/attr-model-1.c | 2 +- gcc/testsuite/gcc.target/loongarch/attr-model-2.c | 2 +- gcc/testsuite/gcc.target/loongarch/attr-model-3.c | 2 +- gcc/testsuite/gcc.target/loongarch/attr-model-4.c | 2 +- .../gcc.target/loongarch/func-call-extreme-1.c| 6 +++--- .../gcc.target/loongarch/func-call-extreme-2.c| 6 +++--- .../gcc.target/loongarch/func-call-extreme-3.c| 6 +++--- .../gcc.target/loongarch/func-call-extreme-4.c| 6 +++--- 9 files changed, 27 insertions(+), 16 deletions(-) diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index e33b9db5981..aa9a9598000 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -264,6 +264,9 @@ const char *const loongarch_fp_conditions[16]= {LARCH_FP_CONDITIONS (STRINGIFY)}; #undef STRINGIFY +static bool +loongarch_symbol_extreme_p (enum loongarch_symbol_type type); + /* Size of guard page. */ #define STACK_CLASH_PROTECTION_GUARD_SIZE \ (1 << param_stack_clash_protection_guard_size) @@ -1963,6 +1966,14 @@ loongarch_symbolic_constant_p (rtx x, enum loongarch_symbol_type *symbol_type) bool loongarch_explicit_relocs_p (enum loongarch_symbol_type type) { + /* Instructions pcalau12i, addi.d, lu32i.d and lu52i.d must be adjancent + so that the linker can infer the PC of pcalau12i to apply relocations + to lu32i.d and lu52i.d. Otherwise, the results would be incorrect if + these four instructions are not in the same 4KiB page. + Therefore, macro instructions are used when cmodel=extreme. */ + if (loongarch_symbol_extreme_p (type)) +return false; + if (la_opt_explicit_relocs != EXPLICIT_RELOCS_AUTO) return la_opt_explicit_relocs == EXPLICIT_RELOCS_ALWAYS; diff --git a/gcc/testsuite/gcc.target/loongarch/attr-model-1.c b/gcc/testsuite/gcc.target/loongarch/attr-model-1.c index 916d715b98b..3963b8957b0 100644 --- a/gcc/testsuite/gcc.target/loongarch/attr-model-1.c +++ b/gcc/testsuite/gcc.target/loongarch/attr-model-1.c @@ -1,6 +1,6 @@ /* { dg-do compile } */ /* { dg-options "-mexplicit-relocs -mcmodel=normal -O2" } */ -/* { dg-final { scan-assembler-times "%pc64_hi12" 2 } } */ +/* { dg-final { scan-assembler-times "la\.local.*,\\\$r15," 2 } } */ #define ATTR_MODEL_TEST #include "attr-model-test.c" diff --git a/gcc/testsuite/gcc.target/loongarch/attr-model-2.c b/gcc/testsuite/gcc.target/loongarch/attr-model-2.c index a74c795ac3e..6f154a92499 100644 --- a/gcc/testsuite/gcc.target/loongarch/attr-model-2.c +++ b/gcc/testsuite/gcc.target/loongarch/attr-model-2.c @@ -1,6 +1,6 @@ /* { dg-do compile } */ /* { dg-options "-mexplicit-relocs -mcmodel=extreme -O2" } */ -/* { dg-final { scan-assembler-times "%pc64_hi12" 3 } } */ +/* { dg-final { scan-assembler-times "la\.local.*,\\\$r15," 3 } } */ #define ATTR_MODEL_TEST #include "attr-model-test.c" diff --git a/gcc/testsuite/gcc.target/loongarch/attr-model-3.c b/gcc/testsuite/gcc.target/loongarch/attr-model-3.c index 5622d508678..eb177905d34 100644 --- a/gcc/testsuite/gcc.target/loongarch/attr-model-3.c +++ b/gcc/testsuite/gcc.target/loongarch/attr-model-3.c @@ -1,6 +1,6 @@ /* { dg-do compile } */ /* { dg-options "-mexplicit-relocs=auto -mcmodel=normal -O2" } */ -/* { dg-final { scan-assembler-times "%pc64_hi12" 2 } } */ +/* { dg-final { scan-assembler-times "la\.local.*,\\\$r15," 2 } } */ #define ATTR_MODEL_TEST #include "attr-model-test.c" diff --git a/gcc/testsuite/gcc.target/loongarch/attr-model-4.c b/gcc/testsuite/gcc.target/loongarch/attr-model-4.c index 482724bb974..570a0bd6690 100644 --- a/gcc/testsuite/gcc.target/loongarch/attr-model-4.c +++ b/gcc/tes
[PATCH 1/2] LoongArch: Add the macro implementation of mcmodel=extreme.
gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_symbolic_constant_p): Remove the sym+addend form from the SYMBOL_PCREL64 type symbol. (loongarch_option_override_internal): Supports option combinations of -cmodel=extreme and -mexplicit-relocs=none. (loongarch_handle_model_attribute): Remove detection code. * config/loongarch/loongarch.md (movdi_pcrel64): New templated. (movdi_got_disp): Likewise. * config/loongarch/predicates.md (symbolic_got_operand): Determine whether the symbol type is SYMBOL_GOT_DISP. (symbolic_pcrel64_operand): Determine whether the symbol type is SYMBOL_PCREL64. gcc/testsuite/ChangeLog: * gcc.target/loongarch/func-call-extreme-5.c: New test. * gcc.target/loongarch/func-call-extreme-6.c: New test. --- gcc/config/loongarch/loongarch.cc | 14 +- gcc/config/loongarch/loongarch.md | 47 ++- gcc/config/loongarch/predicates.md| 14 ++ .../loongarch/func-call-extreme-5.c | 7 +++ .../loongarch/func-call-extreme-6.c | 7 +++ 5 files changed, 75 insertions(+), 14 deletions(-) create mode 100644 gcc/testsuite/gcc.target/loongarch/func-call-extreme-5.c create mode 100644 gcc/testsuite/gcc.target/loongarch/func-call-extreme-6.c diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index 578b9bc3f09..e33b9db5981 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -1944,10 +1944,10 @@ loongarch_symbolic_constant_p (rtx x, enum loongarch_symbol_type *symbol_type) case SYMBOL_TLSGD: case SYMBOL_TLSLDM: case SYMBOL_PCREL: -case SYMBOL_PCREL64: /* GAS rejects offsets outside the range [-2^31, 2^31-1]. */ return sext_hwi (INTVAL (offset), 32) == INTVAL (offset); +case SYMBOL_PCREL64: case SYMBOL_GOT_DISP: case SYMBOL_TLS: return false; @@ -7526,10 +7526,6 @@ loongarch_option_override_internal (struct gcc_options *opts, switch (la_target.cmodel) { case CMODEL_EXTREME: - if (la_opt_explicit_relocs == EXPLICIT_RELOCS_NONE) - error ("code model %qs is not compatible with %s", -"extreme", "-mexplicit-relocs=none"); - if (opts->x_flag_plt) { if (global_options_set.x_flag_plt) @@ -7894,14 +7890,6 @@ loongarch_handle_model_attribute (tree *node, tree name, tree arg, int, *no_add_attrs = true; return NULL_TREE; } - if (la_opt_explicit_relocs == EXPLICIT_RELOCS_NONE) - { - error_at (DECL_SOURCE_LOCATION (decl), - "%qE attribute is not compatible with %s", name, - "-mexplicit-relocs=none"); - *no_add_attrs = true; - return NULL_TREE; - } arg = TREE_VALUE (arg); if (TREE_CODE (arg) != STRING_CST) diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md index 2b0609f2f31..72abf180b1b 100644 --- a/gcc/config/loongarch/loongarch.md +++ b/gcc/config/loongarch/loongarch.md @@ -84,6 +84,9 @@ (define_c_enum "unspec" [ UNSPEC_SIBCALL_VALUE_MULTIPLE_INTERNAL_1 UNSPEC_CALL_VALUE_MULTIPLE_INTERNAL_1 + + UNSPEC_MOV_PCREL64 + UNSPEC_MOV_GOT_DISP ]) (define_c_enum "unspecv" [ @@ -123,6 +126,7 @@ (define_constants (TP_REGNUM 2) (T0_REGNUM 12) (T1_REGNUM 13) + (T3_REGNUM 15) (S0_REGNUM 23) ;; Return path styles @@ -2056,8 +2060,22 @@ (define_expand "movdi" { if (loongarch_legitimize_move (DImode, operands[0], operands[1])) DONE; -}) + enum loongarch_symbol_type symbol_type; + if (loongarch_symbolic_constant_p (operands[1], &symbol_type)) +{ + if (symbol_type == SYMBOL_PCREL64) + { + emit_insn (gen_movdi_pcrel64 (operands[0], operands[1])); + DONE; + } + else if (TARGET_CMODEL_EXTREME && symbol_type == SYMBOL_GOT_DISP) + { + emit_insn (gen_movdi_got_disp (operands[0], operands[1])); + DONE; + } +} +}) (define_insn_and_split "*movdi_32bit" [(set (match_operand:DI 0 "nonimmediate_operand" "=r,r,r,w,*f,*f,*r,*m") (match_operand:DI 1 "move_operand" "r,i,w,r,*J*r,*m,*f,*f"))] @@ -2096,6 +2114,33 @@ (define_insn_and_split "*movdi_64bit" [(set_attr "move_type" "move,const,load,store,mgtf,fpload,mftg,fpstore") (set_attr "mode" "DI")]) +;; $t0 and $t1 are used in loongarch_output_mi_thunk. If $t0 or $t1 is used +;; here, then when cmodel is extreme, C++ THUNK will error. So $t3 is selected +;; here. +(define_insn "movdi_pcrel64" + [(set (match_operand:DI 0 "register_operand" "=&r") + (match_operand:DI 1 "symbolic_pcrel64_operand")) + (unspec:DI [(const_int 0)] +UNSPEC_MOV_PCREL64) + (use (reg:DI T3_REGNUM)) + (clobber (reg:DI T3_REGNUM))] + "TARGET_64BIT" +
Re: [PATCH 0/2] When cmodel=extreme, add macro support and only
在 2023/12/27 下午4:46, Lulu Cheng 写道: When cmodel=extreme, since the symbol address is obtained through four instructions, errors may occur in some cases during linking. Therefore, in order to ensure that the instructions for obtaining the symbol address are together, macro instructions are used to obtain the symbol address when cmodel=extreme. https://github.com/loongson/la-abi-specs/blob/release/laelf.adoc#extreme-code-model There are some problems with the test case changes, I will fix them in the v2 version. Lulu Cheng (2): LoongArch: Add the macro implementation of mcmodel=extreme. LoongArch: When the code model is extreme, the symbol address is obtained through macro instructions regardless of the value of -mexplicit-relocs. gcc/config/loongarch/loongarch.cc | 25 +- gcc/config/loongarch/loongarch.md | 47 ++- gcc/config/loongarch/predicates.md| 14 ++ .../gcc.target/loongarch/attr-model-1.c | 2 +- .../gcc.target/loongarch/attr-model-2.c | 2 +- .../gcc.target/loongarch/attr-model-3.c | 2 +- .../gcc.target/loongarch/attr-model-4.c | 2 +- .../loongarch/func-call-extreme-1.c | 6 +-- .../loongarch/func-call-extreme-2.c | 6 +-- .../loongarch/func-call-extreme-3.c | 6 +-- .../loongarch/func-call-extreme-4.c | 6 +-- .../loongarch/func-call-extreme-5.c | 7 +++ .../loongarch/func-call-extreme-6.c | 7 +++ 13 files changed, 102 insertions(+), 30 deletions(-) create mode 100644 gcc/testsuite/gcc.target/loongarch/func-call-extreme-5.c create mode 100644 gcc/testsuite/gcc.target/loongarch/func-call-extreme-6.c
RE: [PATCH v4] [tree-optimization/110279] Consider FMA in get_reassociation_width
Committed at 6cec7b06b3c8187b36fc05cfd4dd38b42313d727 Thanks, Di > -Original Message- > From: Richard Biener > Sent: Friday, December 22, 2023 11:40 PM > To: Di Zhao OS > Cc: Thomas Schwinge ; gcc-patches@gcc.gnu.org > Subject: Re: [PATCH v4] [tree-optimization/110279] Consider FMA in > get_reassociation_width > > > > > Am 22.12.2023 um 16:05 schrieb Di Zhao OS : > > > > Updated the fix in attachment. > > > > Is it OK for trunk? > > Ok > > > Tested on aarch64-unknown-linux-gnu and x86_64-pc-linux-gnu. > > > > Thanks, > > Di Zhao > > > >> -Original Message- > >> From: Di Zhao OS > >> Sent: Sunday, December 17, 2023 8:31 PM > >> To: Thomas Schwinge ; gcc-patches@gcc.gnu.org > >> Cc: Richard Biener > >> Subject: RE: [PATCH v4] [tree-optimization/110279] Consider FMA in > >> get_reassociation_width > >> > >> Hello Thomas, > >> > >>> -Original Message- > >>> From: Thomas Schwinge > >>> Sent: Friday, December 15, 2023 5:46 PM > >>> To: Di Zhao OS ; gcc-patches@gcc.gnu.org > >>> Cc: Richard Biener > >>> Subject: RE: [PATCH v4] [tree-optimization/110279] Consider FMA in > >>> get_reassociation_width > >>> > >>> Hi! > >>> > >>> On 2023-12-13T08:14:28+, Di Zhao OS > >> wrote: > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/pr110279-2.c > @@ -0,0 +1,41 @@ > +/* PR tree-optimization/110279 */ > +/* { dg-do compile } */ > +/* { dg-options "-Ofast --param tree-reassoc-width=4 --param fully- > >>> pipelined-fma=1 -fdump-tree-reassoc2-details -fdump-tree-optimized" } */ > +/* { dg-additional-options "-march=armv8.2-a" { target aarch64-*-* } } > */ > + > +#define LOOP_COUNT 8 > +typedef double data_e; > + > +#include > + > +__attribute_noinline__ data_e > +foo (data_e in) > >>> > >>> Pushed to master branch commit 91e9e8faea4086b3b8aef2355fc12c1559d425f6 > >>> "Fix 'gcc.dg/pr110279-2.c' syntax error due to '__attribute_noinline__'", > >>> see attached. > >>> > >>> However: > >>> > +{ > + data_e a1, a2, a3, a4; > + data_e tmp, result = 0; > + a1 = in + 0.1; > + a2 = in * 0.1; > + a3 = in + 0.01; > + a4 = in * 0.59; > + > + data_e result2 = 0; > + > + for (int ic = 0; ic < LOOP_COUNT; ic++) > +{ > + /* Test that a complete FMA chain with length=4 is not broken. */ > + tmp = a1 + a2 * a2 + a3 * a3 + a4 * a4 ; > + result += tmp - ic; > + result2 = result2 / 2 - tmp; > + > + a1 += 0.91; > + a2 += 0.1; > + a3 -= 0.01; > + a4 -= 0.89; > + > +} > + > + return result + result2; > +} > + > +/* { dg-final { scan-tree-dump-not "was chosen for reassociation" > >>> "reassoc2"} } */ > +/* { dg-final { scan-tree-dump-times {\.FMA } 3 "optimized"} } */ > >> > >> Thank you for the fix. > >> > >>> ..., I still see these latter two tree dump scans FAIL, for GCN: > >>> > >>>$ grep -C2 'was chosen for reassociation' pr110279-2.c.197t.reassoc2 > >>> 2 *: a3_40 > >>> 2 *: a2_39 > >>>Width = 4 was chosen for reassociation > >>>Transforming _15 = powmult_1 + powmult_3; > >>> into _63 = powmult_1 + a1_38; > >>>$ grep -F .FMA pr110279-2.c.265t.optimized > >>> _63 = .FMA (a2_39, a2_39, a1_38); > >>> _64 = .FMA (a3_40, a3_40, powmult_5); > >>> > >>> ..., nvptx: > >>> > >>>$ grep -C2 'was chosen for reassociation' pr110279-2.c.197t.reassoc2 > >>> 2 *: a3_40 > >>> 2 *: a2_39 > >>>Width = 4 was chosen for reassociation > >>>Transforming _15 = powmult_1 + powmult_3; > >>> into _63 = powmult_1 + a1_38; > >>>$ grep -F .FMA pr110279-2.c.265t.optimized > >>> _63 = .FMA (a2_39, a2_39, a1_38); > >>> _64 = .FMA (a3_40, a3_40, powmult_5); > >> > >> For these 2 targets, the reassoc_width for FMUL is 1 (default value), > >> While the testcase assumes that to be 4. The bug was introduced when I > >> updated the patch but forgot to update the testcase. > >> > >>> ..., but also x86_64-pc-linux-gnu: > >>> > >>>$ grep -C2 'was chosen for reassociation' pr110279-2.c.197t.reassoc2 > >>> 2 *: a3_40 > >>> 2 *: a2_39 > >>>Width = 2 was chosen for reassociation > >>>Transforming _15 = powmult_1 + powmult_3; > >>> into _63 = powmult_1 + powmult_3; > >>>$ grep -cF .FMA pr110279-2.c.265t.optimized > >>>0 > >> > >> For x86_64 this needs "-mfma". Sorry the compile options missed that. > >> Can the change below fix these issues? I moved them into > >> testsuite/gcc.target/aarch64, since they rely on tunings. > >> > >> Tested on aarch64-unknown-linux-gnu. > >> > >>> > >>> Grüße > >>> Thomas > >>> > >>> > >>> - > >>> Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, > >> 80634 > >>> München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas > >>> Heurung, Frank Thürauf; Sitz der Gesellschaft: München;
[PATCH] aarch64: add 'AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA'
This patch adds a new tuning option 'AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA', to consider fully pipelined FMAs in reassociation. Also, set this option by default for Ampere CPUs. Tested on aarch64-unknown-linux-gnu. Is this OK for trunk? Thanks, Di Zhao gcc/ChangeLog: * config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNING_OPTION): New tuning option AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA. * config/aarch64/aarch64.cc (aarch64_override_options_internal): Set param_fully_pipelined_fma according to tuning option. * config/aarch64/tuning_models/ampere1.h: Add AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA to tune_flags. * config/aarch64/tuning_models/ampere1a.h: Likewise. * config/aarch64/tuning_models/ampere1b.h: Likewise. --- gcc/config/aarch64/aarch64-tuning-flags.def | 2 ++ gcc/config/aarch64/aarch64.cc | 6 ++ gcc/config/aarch64/tuning_models/ampere1.h | 3 ++- gcc/config/aarch64/tuning_models/ampere1a.h | 3 ++- gcc/config/aarch64/tuning_models/ampere1b.h | 3 ++- 5 files changed, 14 insertions(+), 3 deletions(-) diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def b/gcc/config/aarch64/aarch64-tuning-flags.def index f28a73839a6..256f17bad60 100644 --- a/gcc/config/aarch64/aarch64-tuning-flags.def +++ b/gcc/config/aarch64/aarch64-tuning-flags.def @@ -49,4 +49,6 @@ AARCH64_EXTRA_TUNING_OPTION ("matched_vector_throughput", MATCHED_VECTOR_THROUGH AARCH64_EXTRA_TUNING_OPTION ("avoid_cross_loop_fma", AVOID_CROSS_LOOP_FMA) +AARCH64_EXTRA_TUNING_OPTION ("fully_pipelined_FMA", FULLY_PIPELINED_FMA) + #undef AARCH64_EXTRA_TUNING_OPTION diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index f9850320f61..1b3b288cdf9 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -18289,6 +18289,12 @@ aarch64_override_options_internal (struct gcc_options *opts) SET_OPTION_IF_UNSET (opts, &global_options_set, param_avoid_fma_max_bits, 512); + /* Consider fully pipelined FMA in reassociation. */ + if (aarch64_tune_params.extra_tuning_flags + & AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA) +SET_OPTION_IF_UNSET (opts, &global_options_set, param_fully_pipelined_fma, +1); + aarch64_override_options_after_change_1 (opts); } diff --git a/gcc/config/aarch64/tuning_models/ampere1.h b/gcc/config/aarch64/tuning_models/ampere1.h index a144e8f94b3..d63788528a7 100644 --- a/gcc/config/aarch64/tuning_models/ampere1.h +++ b/gcc/config/aarch64/tuning_models/ampere1.h @@ -104,7 +104,8 @@ static const struct tune_params ampere1_tunings = 2, /* min_div_recip_mul_df. */ 0, /* max_case_values. */ tune_params::AUTOPREFETCHER_WEAK,/* autoprefetcher_model. */ - (AARCH64_EXTRA_TUNE_AVOID_CROSS_LOOP_FMA), /* tune_flags. */ + (AARCH64_EXTRA_TUNE_AVOID_CROSS_LOOP_FMA | + AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA),/* tune_flags. */ &ere1_prefetch_tune, AARCH64_LDP_STP_POLICY_ALIGNED, /* ldp_policy_model. */ AARCH64_LDP_STP_POLICY_ALIGNED/* stp_policy_model. */ diff --git a/gcc/config/aarch64/tuning_models/ampere1a.h b/gcc/config/aarch64/tuning_models/ampere1a.h index f688ed08a79..63506e1d1c6 100644 --- a/gcc/config/aarch64/tuning_models/ampere1a.h +++ b/gcc/config/aarch64/tuning_models/ampere1a.h @@ -56,7 +56,8 @@ static const struct tune_params ampere1a_tunings = 2, /* min_div_recip_mul_df. */ 0, /* max_case_values. */ tune_params::AUTOPREFETCHER_WEAK,/* autoprefetcher_model. */ - (AARCH64_EXTRA_TUNE_AVOID_CROSS_LOOP_FMA), /* tune_flags. */ + (AARCH64_EXTRA_TUNE_AVOID_CROSS_LOOP_FMA | + AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA),/* tune_flags. */ &ere1_prefetch_tune, AARCH64_LDP_STP_POLICY_ALIGNED, /* ldp_policy_model. */ AARCH64_LDP_STP_POLICY_ALIGNED/* stp_policy_model. */ diff --git a/gcc/config/aarch64/tuning_models/ampere1b.h b/gcc/config/aarch64/tuning_models/ampere1b.h index a98b6a980f7..7894e730174 100644 --- a/gcc/config/aarch64/tuning_models/ampere1b.h +++ b/gcc/config/aarch64/tuning_models/ampere1b.h @@ -106,7 +106,8 @@ static const struct tune_params ampere1b_tunings = 0, /* max_case_values. */ tune_params::AUTOPREFETCHER_STRONG, /* autoprefetcher_model. */ (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND | - AARCH64_EXTRA_TUNE_AVOID_CROSS_LOOP_FMA), /* tune_flags. */ + AARCH64_EXTRA_TUNE_AVOID_CROSS_LOOP_FMA | + AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA),/* tune_flags. */ &ere1b_prefetch_tune, AARCH64_LDP_STP_POLICY_ALIGNED, /* ldp_policy_model. */ AARCH64_LDP_STP_POLICY_ALIGNED/* stp_policy_model. */ -- 2.25.1
Re: Ping: [PATCH] LoongArch: Replace -mexplicit-relocs=auto simple-used address peephole2 with combine
On Wed, 2023-12-27 at 11:59 +0800, chenglulu wrote: > +FAIL: gcc.dg/pr86617.c scan-rtl-dump-times final "mem/v" 6 > > In r14-6818 the issue persists. I kind of chased the code and found that the > problem is like this: > volatile unsigned char u8; > > void test (void) > { > u8 = u8 + u8; > u8 = u8 - u8; > } > > $./gcc/cc1 test.c -o test.s -fdump-rtl-all-all -fdiagnostics-plain-output > -Os -fdump-rtl-final -ffat-lto-objects > > test.c.301r.outof_cfglayout > > (insn 7 6 9 2 (set (reg:DI 80 [ u8.0_1 ]) > (zero_extend:DI (mem/v/c:QI (symbol_ref:DI ("*.LANCHOR0") [flags > 0x182]) [0 u8D.2193+0 S1 A8]))) "volatile.c":5:11 459 {simple_load_uextdiqidi} > (nil)) > > test.c.302r.split1 > > (insn 27 6 28 2 (set (reg:DI 98) > (unspec:DI [ > (symbol_ref:DI ("*.LANCHOR0") [flags 0x182]) > ] UNSPEC_PCALAU12I_GR)) "volatile.c":5:11 -1 > (nil)) > (insn 28 27 9 2 (set (reg:DI 80 [ u8.0_1 ]) > (zero_extend:DI (mem:QI (lo_sum:DI (reg:DI 98) > (symbol_ref:DI ("*.LANCHOR0") [flags 0x182])) [0 S1 > A8]))) "volatile.c":5:11 -1 > (nil)) > > The volatile property of the mem here is gone, so the test fails. Phew. I guess I couldn't reproduce it because I have Jeff's ext-dce patch in my local repo, which removed the zero_extend... I'll rework this patch. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
[C PATCH] C: Fix type compatibility for structs with variable sized fields.
This patch hopefully fixes the test failure we see with gnu23-tag-4.c. It does for me locally with -march=native (which otherwise reproduces the problem). Bootstrapped and regession tested on x86_64 C: Fix type compatibility for structs with variable sized fields. This fixes the test gcc.dg/gnu23-tag-4.c introduced by commit 23fee88f which fails for -march=... because the DECL_FIELD_BIT_OFFSET are set inconsistently for types with and without variable-sized field. This is fixed by testing for DECL_ALIGN instead. The code is further simplified by removing some unnecessary conditions, i.e. anon_field is set unconditionaly and all fields are assumed to be DECL_FIELDs. gcc/c: * c-typeck.c (tagged_types_tu_compatible_p): Revise. gcc/testsuite: * gcc.dg./c23-tag-9.c: New test. --- gcc/c/c-typeck.cc| 19 --- gcc/testsuite/gcc.dg/c23-tag-9.c | 8 2 files changed, 16 insertions(+), 11 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/c23-tag-9.c diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc index 2d9139d09d2..84ddda1ebab 100644 --- a/gcc/c/c-typeck.cc +++ b/gcc/c/c-typeck.cc @@ -1511,8 +1511,6 @@ tagged_types_tu_compatible_p (const_tree t1, const_tree t2, if (!data->anon_field && TYPE_STUB_DECL (t1) != TYPE_STUB_DECL (t2)) data->different_types_p = true; - data->anon_field = false; - /* Incomplete types are incompatible inside a TU. */ if (TYPE_SIZE (t1) == NULL || TYPE_SIZE (t2) == NULL) return false; @@ -1592,22 +1590,21 @@ tagged_types_tu_compatible_p (const_tree t1, const_tree t2, s1 && s2; s1 = DECL_CHAIN (s1), s2 = DECL_CHAIN (s2)) { - if (TREE_CODE (s1) != TREE_CODE (s2) - || DECL_NAME (s1) != DECL_NAME (s2)) + gcc_assert (TREE_CODE (s1) == FIELD_DECL); + gcc_assert (TREE_CODE (s2) == FIELD_DECL); + + if (DECL_NAME (s1) != DECL_NAME (s2)) + return false; + + if (DECL_ALIGN (s1) != DECL_ALIGN (s2)) return false; - if (!DECL_NAME (s1) && RECORD_OR_UNION_TYPE_P (TREE_TYPE (s1))) - data->anon_field = true; + data->anon_field = !DECL_NAME (s1); data->cache = &entry; if (!comptypes_internal (TREE_TYPE (s1), TREE_TYPE (s2), data)) return false; - if (TREE_CODE (s1) == FIELD_DECL - && simple_cst_equal (DECL_FIELD_BIT_OFFSET (s1), -DECL_FIELD_BIT_OFFSET (s2)) != 1) - return false; - tree st1 = TYPE_SIZE (TREE_TYPE (s1)); tree st2 = TYPE_SIZE (TREE_TYPE (s2)); diff --git a/gcc/testsuite/gcc.dg/c23-tag-9.c b/gcc/testsuite/gcc.dg/c23-tag-9.c new file mode 100644 index 000..1d32560ec23 --- /dev/null +++ b/gcc/testsuite/gcc.dg/c23-tag-9.c @@ -0,0 +1,8 @@ +/* { dg-do compile } */ +/* { dg-options "-std=c23" } */ + +struct foo { int x; } x; +struct foo { alignas(128) int x; } y; /* { dg-error "redefinition" } */ +static_assert(alignof(y) == 128); + + -- 2.39.2
Re: Fortran: Use non conflicting file extensions for intermediates [PR81615]
Hi Rimvydas! Am 24.12.23 um 02:33 schrieb Rimvydas Jasinskas: Documentation part. The makeinfo gcc/fortran/gfortran.texi does not seem to have any new warnings. The patch is almost fine, except for a strange wording here: +@smallexample +gfortran -save-temps -c foo.F90 +@end smallexample + +preprocesses to in @file{foo.fii}, compiles to an intermediate +@file{foo.s}, and then assembles to the (implied) output file +@file{foo.o}, whereas: I understand the formulation is copied from gcc/doc/invoke.texi, where it does not fully make sense to me either. How about: "preprocesses input file @file{foo.F90} to @file{foo.fii}, ..." Furthermore, +@smallexample +gfortran -save-temps -S foo.F +@end smallexample + +saves the (no longer) temporary preprocessed file in @file{foo.fi}, and +then compiles to the (implied) output file @file{foo.s}. Even if this is copied from the gcc texinfo file, how about: "saves the preprocessor output in @file{foo.fi}, ..." which I find easier to read. Can you also add a reference to the PR number in the commit message? Is there a specific reason thy -fc-prototypes (Interoperability Options section) is excluded from manpage? Can you be more specific? I get here (since gcc-9): % man /opt/gcc/14/share/man/man1/gfortran.1 |grep -A 1 "Interoperability Options" Interoperability Options -fc-prototypes -fc-prototypes-external although no detailed explanation (-> gfortran.info). Regards, Rimvydas Thanks, Harald
[Committed] RISC-V: Make dynamic LMUL cost model more accurate for conversion codes
Notice current dynamic LMUL is not accurate for conversion codes. Refine for it, there is current case is changed from choosing LMUL = 4 into LMUL = 8. Tested no regression, committed. Before this patch (LMUL = 4): After this patch (LMUL = 8): lw a7,56(sp) lwa7,56(sp) ld t5,0(sp) ldt5,0(sp) ld t1,8(sp) ldt1,8(sp) ld t6,16(sp) ldt6,16(sp) ld t0,24(sp) ldt0,24(sp) ld t3,32(sp) ldt3,32(sp) ld t4,40(sp) ldt4,40(sp) ble a7,zero,.L5 ble a7,zero,.L5 .L3: .L3: vsetvli a4,a7,e32,m2,ta,mavsetvli a4,a7,e32,m4,ta vle8.v v1,0(a2) vle8.vv3,0(a2) vle8.v v4,0(a1) vle8.vv16,0(t0) vsext.vf4 v8,v1 vle8.vv7,0(a1) vsext.vf4 v2,v4 vle8.vv12,0(t6) vsetvli zero,zero,e8,mf2,ta,mavle8.vv2,0(a5) vadd.vv v4,v4,v1 vle8.vv1,0(t5) vsetvli zero,zero,e32,m2,ta,mavsext.vf4 v20,v3 vle8.v v5,0(t0) vsext.vf4 v8,v7 vle8.v v6,0(t6) vadd.vv v8,v8,v20 vadd.vv v2,v2,v8 vadd.vv v8,v8,v8 vadd.vv v2,v2,v2 vadd.vv v8,v8,v20 vadd.vv v2,v2,v8 vsetvli zero,zero,e8,m1 vsetvli zero,zero,e8,mf2,ta,mavadd.vv v15,v12,v16 vadd.vv v6,v6,v5 vsetvli zero,zero,e32,m4 vsetvli zero,zero,e32,m2,ta,mavsext.vf4 v12,v15 vle8.v v8,0(t5) vadd.vv v8,v8,v12 vle8.v v9,0(a5) vsetvli zero,zero,e8,m1 vsext.vf4 v10,v4vadd.vv v7,v7,v3 vsext.vf4 v12,v6vsetvli zero,zero,e32,m4 vadd.vv v2,v2,v12 vsext.vf4 v4,v7 vadd.vv v2,v2,v10 vadd.vv v8,v8,v4 vsetvli zero,zero,e16,m1,ta,mavsetvli zero,zero,e16,m2 vncvt.x.x.w v4,v2 vncvt.x.x.w v4,v8 vsetvli zero,zero,e32,m2,ta,mavsetvli zero,zero,e8,m1 vadd.vv v6,v2,v2 vncvt.x.x.w v4,v4 vsetvli zero,zero,e8,mf2,ta,mavadd.vv v15,v3,v4 vncvt.x.x.w v4,v4 vadd.vv v2,v2,v4 vadd.vv v5,v5,v4 vse8.vv15,0(t4) vadd.vv v9,v9,v4 vadd.vv v3,v16,v4 vadd.vv v1,v1,v4 vse8.vv2,0(a3) vadd.vv v4,v8,v4 vadd.vv v1,v1,v4 vse8.v v1,0(t4) vse8.vv1,0(a6) vse8.v v9,0(a3) vse8.vv3,0(t1) vsetvli zero,zero,e32,m2,ta,mavsetvli zero,zero,e32,m4 vse8.v v4,0(a6) vsext.vf4 v4,v3 vsext.vf4 v8,v5 vadd.vv v4,v4,v8 vse8.v v5,0(t1) vsetvli zero,zero,e64,m8 vadd.vv v2,v8,v2 vsext.vf2 v16,v4 vsetvli zero,zero,e64,m4,ta,mavse64.v v16,0(t3) vsext.vf2 v8,v2 vsetvli zero,zero,e32,m4 vsetvli zero,zero,e32,m2,ta,mavadd.vv v8,v8,v8 sllit2,a4,3 vsext.vf4 v4,v15 vse64.v v8,0(t3) slli t2,a4,3 vsext.vf4 v2,v1 vadd.vv v4,v8,v4 sub a7,a7,a4 sub a7,a7,a4 vadd.vv v2,v6,v2 vsetvli zero,zero,e64,m8 vsetvli zero,zero,e64,m4,ta,mavsext.vf2 v8,v4 vsext.vf2 v4,v2 vse64.v v8,0(a0) vse64.v v4,0(a0) add a1,a1,a4 add a2,a2,a4 add a2,a2,a4 add a1,a1,a4 add a5,a5,a4 add t6,t6,a4 add t5,t5,a4 add t0,t0,a4 add t6,t6,a4 add a5,a5,a4 add
[PATCH][V4] RISC-V: Nan-box the result of movhf on soft-fp16
According to spec, fmv.h checks if the input operands are correctly NaN-boxed. If not, the input value is treated as an n-bit canonical NaN. This patch fixs the issue that operands returned by soft-fp16 libgcc (i.e., __truncdfhf2) was not correctly NaN-boxed. *gcc/ChangeLog:* * config/riscv/riscv.cc (riscv_legitimize_move): Expand movfh with Nan-boxing value. * config/riscv/riscv.md (*movhf_softfloat_unspec): New pattern. *gcc/testsuite/ChangeLog:* * gcc.target/riscv/_Float16-nanboxing.c: New test. 0001-RISC-V-Nan-box-the-result-of-movhf-on-soft-fp16.patch Description: Binary data
Re: Fortran: Use non conflicting file extensions for intermediates [PR81615]
On Wed, Dec 27, 2023 at 10:34 PM Harald Anlauf wrote: > The patch is almost fine, except for a strange wording here: > > +@smallexample > +gfortran -save-temps -c foo.F90 > +@end smallexample > + > +preprocesses to in @file{foo.fii}, compiles to an intermediate > +@file{foo.s}, and then assembles to the (implied) output file > +@file{foo.o}, whereas: > > I understand the formulation is copied from gcc/doc/invoke.texi, > where it does not fully make sense to me either. > > How about: > > "preprocesses input file @file{foo.F90} to @file{foo.fii}, ..." > > Furthermore, > > +@smallexample > +gfortran -save-temps -S foo.F > +@end smallexample > + > +saves the (no longer) temporary preprocessed file in @file{foo.fi}, and > +then compiles to the (implied) output file @file{foo.s}. > > Even if this is copied from the gcc texinfo file, how about: > > "saves the preprocessor output in @file{foo.fi}, ..." > > which I find easier to read. > > Can you also add a reference to the PR number in the commit message? I agree, wording sounds a lot better, included in v2 together with PR number. > > Is there a specific reason thy -fc-prototypes (Interoperability > > Options section) is excluded from manpage? > > Can you be more specific? I get here (since gcc-9): > > % man /opt/gcc/14/share/man/man1/gfortran.1 |grep -A 1 "Interoperability > Options" > Interoperability Options > -fc-prototypes -fc-prototypes-external > > although no detailed explanation (-> gfortran.info). The https://gcc.gnu.org/onlinedocs/gfortran/Invoking-GNU-Fortran.html does contain a working link to https://gcc.gnu.org/onlinedocs/gfortran/Interoperability-Options.html However the manpage has Interoperability section explicitly disabled with "@c man end" ... "@c man begin ENVIRONMENT". After digging into git log it seems that Interoperability section was unintentionally added after this comment mark in https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=e655a6cc43 Best regards, Rimvydas From f8663a022a8b9c4f1c4a76d8e4823e24f691623c Mon Sep 17 00:00:00 2001 From: Rimvydas Jasinskas Date: Sat, 23 Dec 2023 18:59:09 + Subject: Fortran: Add Developer Options mini-section to documentation Separate out -fdump-* options to the new section. Sort by option name. While there, document -save-temps intermediates. gcc/fortran/ChangeLog: PR fortran/81615 * invoke.texi: Add Developer Options section. Move '-fdump-*' to it. Add small examples about changed '-save-temps' behavior. Signed-off-by: Rimvydas Jasinskas --- gcc/fortran/invoke.texi | 117 ++-- 1 file changed, 77 insertions(+), 40 deletions(-) diff --git a/gcc/fortran/invoke.texi b/gcc/fortran/invoke.texi index c7fd019a7c5..5d526e23e5c 100644 --- a/gcc/fortran/invoke.texi +++ b/gcc/fortran/invoke.texi @@ -94,12 +94,13 @@ one is not the default. compiled. * Preprocessing Options:: Enable and customize preprocessing. * Error and Warning Options:: How picky should the compiler be? -* Debugging Options:: Symbol tables, measurements, and debugging dumps. +* Debugging Options:: Symbol tables, measurements. * Directory Options:: Where to find module files * Link Options :: Influencing the linking step * Runtime Options:: Influencing runtime behavior * Code Gen Options::Specifying conventions for function calls, data layout and register usage. +* Developer Options:: Printing GNU Fortran specific info, debugging dumps. * Interoperability Options:: Options for interoperability with other languages. * Environment Variables:: Environment variables that affect @command{gfortran}. @@ -159,9 +160,8 @@ and warnings}. } @item Debugging Options -@xref{Debugging Options,,Options for debugging your program or GNU Fortran}. -@gccoptlist{-fbacktrace -fdump-fortran-optimized -fdump-fortran-original --fdebug-aux-vars -fdump-fortran-global -fdump-parse-tree -ffpe-trap=@var{list} +@xref{Debugging Options,,Options for debugging your program}. +@gccoptlist{-fbacktrace -fdebug-aux-vars -ffpe-trap=@var{list} -ffpe-summary=@var{list} } @@ -201,6 +201,12 @@ and warnings}. -fpack-derived -frealloc-lhs -frecursive -frepack-arrays -fshort-enums -fstack-arrays } + +@item Developer Options +@xref{Developer Options,,GNU Fortran Developer Options}. +@gccoptlist{-fdump-fortran-global -fdump-fortran-optimized +-fdump-fortran-original -fdump-parse-tree -save-temps +} @end table @node Fortran Dialect Options @@ -1280,40 +1286,14 @@ and other GNU compilers. Some of these have no effect when compiling programs written in Fortran. @node Debugging Options -@section Options for debugging your program or GNU Fortran +@section Options for debugging your program @cindex options, debugging @cindex debugging information options GNU Fortran has various special options that are used for debugging -either your program or the GNU Fortran compiler. +yo