date:20231227

[Committed] RISC-V: Make known NITERS loop be aware of dynamic lmul cost model liveness information

2023-12-27 Thread Juzhe-Zhong

Consider this following case:

int f[12][100];

void bad1(int v1, int v2)
{
  for (int r = 0; r < 100; r += 4)
{
  int i = r + 1;
  f[0][r] = f[1][r] * (f[2][r]) - f[1][i] * (f[2][i]);
  f[0][i] = f[1][r] * (f[2][i]) + f[1][i] * (f[2][r]);
  f[0][r+2] = f[1][r+2] * (f[2][r+2]) - f[1][i+2] * (f[2][i+2]);
  f[0][i+2] = f[1][r+2] * (f[2][i+2]) + f[1][i+2] * (f[2][r+2]);
}
}

Pick up LMUL = 8 VLS blindly:

lui a4,%hi(f)
addia4,a4,%lo(f)
addisp,sp,-592
addia3,a4,800
lui a5,%hi(.LANCHOR0)
vl8re32.v   v24,0(a3)
addia5,a5,%lo(.LANCHOR0)
addia1,a4,400
addia3,sp,140
vl8re32.v   v16,0(a1)
vl4re16.v   v4,0(a5)
addia7,a5,192
vs4r.v  v4,0(a3)
addit0,a5,64
addia3,sp,336
li  t2,32
addia2,a5,128
vsetvli a5,zero,e32,m8,ta,ma
vrgatherei16.vv v8,v16,v4
vmul.vv v8,v8,v24
vl8re32.v   v0,0(a7)
vs8r.v  v8,0(a3)
vmsltu.vx   v8,v0,t2
addia3,sp,12
addit2,sp,204
vsm.v   v8,0(t2)
vl4re16.v   v4,0(t0)
vl4re16.v   v0,0(a2)
vs4r.v  v4,0(a3)
addit0,sp,336
vrgatherei16.vv v8,v24,v4
addia3,sp,208
vrgatherei16.vv v24,v16,v0
vs4r.v  v0,0(a3)
vmul.vv v8,v8,v24
vlm.v   v0,0(t2)
vl8re32.v   v24,0(t0)
addia3,sp,208
vsub.vv v16,v24,v8
addit6,a4,528
vadd.vv v8,v24,v8
addit5,a4,928
vmerge.vvm  v8,v8,v16,v0
addit3,a4,128
vs8r.v  v8,0(a4)
addit4,a4,1056
addit1,a4,656
addia0,a4,256
addia6,a4,1184
addia1,a4,784
addia7,a4,384
addia4,sp,140
vl4re16.v   v0,0(a3)
vl8re32.v   v24,0(t6)
vl4re16.v   v4,0(a4)
vrgatherei16.vv v16,v24,v0
addia3,sp,12
vs8r.v  v16,0(t0)
vl8re32.v   v8,0(t5)
vrgatherei16.vv v16,v24,v4
vl4re16.v   v4,0(a3)
vrgatherei16.vv v24,v8,v4
vmul.vv v16,v16,v8
vl8re32.v   v8,0(t0)
vmul.vv v8,v8,v24
vsub.vv v24,v16,v8
vlm.v   v0,0(t2)
addia3,sp,208
vadd.vv v8,v8,v16
vl8re32.v   v16,0(t4)
vmerge.vvm  v8,v8,v24,v0
vrgatherei16.vv v24,v16,v4
vs8r.v  v24,0(t0)
vl4re16.v   v28,0(a3)
addia3,sp,464
vs8r.v  v8,0(t3)
vl8re32.v   v8,0(t1)
vrgatherei16.vv v0,v8,v28
vs8r.v  v0,0(a3)
addia3,sp,140
vl4re16.v   v24,0(a3)
addia3,sp,464
vrgatherei16.vv v0,v8,v24
vl8re32.v   v24,0(t0)
vmv8r.v v8,v0
vl8re32.v   v0,0(a3)
vmul.vv v8,v8,v16
vmul.vv v24,v24,v0
vsub.vv v16,v8,v24
vadd.vv v8,v8,v24
vsetivlizero,4,e32,m8,ta,ma
vle32.v v24,0(a6)
vsetvli a4,zero,e32,m8,ta,ma
addia4,sp,12
vlm.v   v0,0(t2)
vmerge.vvm  v8,v8,v16,v0
vl4re16.v   v16,0(a4)
vrgatherei16.vv v0,v24,v16
vsetivlizero,4,e32,m8,ta,ma
vs8r.v  v0,0(a4)
addia4,sp,208
vl4re16.v   v0,0(a4)
vs8r.v  v8,0(a0)
vle32.v v16,0(a1)
vsetvli a5,zero,e32,m8,ta,ma
vrgatherei16.vv v8,v16,v0
vs8r.v  v8,0(a4)
addia4,sp,140
vl4re16.v   v4,0(a4)
addia5,sp,12
vrgatherei16.vv v8,v16,v4
vl8re32.v   v0,0(a5)
vsetivlizero,4,e32,m8,ta,ma
addia5,sp,208
vmv8r.v v16,v8
vl8re32.v   v8,0(a5)
vmul.vv v24,v24,v16
vmul.vv v8,v0,v8
vsub.vv v16,v24,v8
vadd.vv v8,v8,v24
vsetvli a5,zero,e8,m2,ta,ma
vlm.v   v0,0(t2)
vsetivlizero,4,e32,m8,ta,ma
vmerge.vvm  v8,v8,v16,v0
vse32.v v8,0(a7)
addisp,sp,592
jr  ra

This patch makes loop with known NITERS be aware of liveness estimation, after 
this patch, choosing LMUL = 4:

lui a5,%hi(f)
addia5,a5,%lo(f)
addia3,a5,400
addia4,a5,800
vsetivlizero,8,e32,m2,ta,ma
vlseg4e32.v v16,(a3)
vlseg4e32.v v8,(a4)
vmul.vv v2,v8,v16
addia3,a5,528
vmv.v.v v24,v10
vnmsub.vv   v24,v18,v2
addia4,a5,928
vmul.vv v2,v12,v22
vmul.vv v6,v8,v18
vmv.v.v v30,v2
vmacc.vvv30,v14,v20
vmv.v.v v26,v6
vmacc.vvv26,v10,v16
vmul.vv v4,v12,v20
vmv.v.v v28,v14
vnmsub.vv   v28,v22,v4
vsseg4e32.v v24,(a5)
vlseg4e32.v v16,(a3)
vlseg4e32.v v8,(a4)

RE: [gcc-wwwdocs PATCH v2] gcc-13/14: Mention recent update for x86_64 backend

2023-12-27 Thread Jiang, Haochen

> -Original Message-
> From: Haochen Jiang 
> Sent: Thursday, December 21, 2023 4:26 PM
> To: gcc-patches@gcc.gnu.org
> Cc: ubiz...@gmail.com; Liu, Hongtao ;
> ger...@pfeifer.com
> Subject: [gcc-wwwdocs PATCH v2] gcc-13/14: Mention recent update for
> x86_64 backend
> 
> Hi all,
> 
> This is the v2 patch for the wwwdocs change regarding to review.
> 
> If there is no objection, I will push this change next Tuesday.

I will commit the doc change patch.

Thx,
Haochen

> 
> Changes is v2:
> 
>   - Remove RAO-INT from Grand Ridge
>   - Remove the mask register restriction for -mno-evex512
>   - Arrange the options alphabetically
>   - Other minor text change
> 
> Thx,
> Haochen
> 
> Messages in v1:
> 
> This patch will mention the following changes in wwwdocs for x86_64
> backend:
> 
>   - AVX10.1 support
>   - APX EGPR, PUSH2POP2, PPX and NDD support
>   - Xeon Phi ISAs deprecated
> 
> Also I adjust the words in x86_64 part for GCC 13.
> 
> ---
> Mention AVX10.1 support, APX support and Xeon Phi deprecate in GCC 14.
> Also adjust documentation in GCC 13.
> ---
>  htdocs/gcc-13/changes.html | 38 --
>  htdocs/gcc-14/changes.html | 27 ++-
>  2 files changed, 42 insertions(+), 23 deletions(-)
> 
> diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html index
> d3bacc16..b4b1a39a 100644
> --- a/htdocs/gcc-13/changes.html
> +++ b/htdocs/gcc-13/changes.html
> @@ -543,24 +543,28 @@ You may also want to check out our
>__bf16 type to x86 psABI. Users need to adjust their
>AVX512BF16-related source code when upgrading GCC12 to GCC13.
>
> -  New ISA extension support for Intel AVX-IFMA was added.
> -  AVX-IFMA intrinsics are available via the -mavxifma
> +  New ISA extension support for Intel AMX-COMPLEX was added.
> +  AMX-COMPLEX intrinsics are available via the
> + -mamx-complex
>compiler switch.
>
> -  New ISA extension support for Intel AVX-VNNI-INT8 was added.
> -  AVX-VNNI-INT8 intrinsics are available via the -
> mavxvnniint8
> +  New ISA extension support for Intel AMX-FP16 was added.
> +  AMX-FP16 intrinsics are available via the -mamx-fp16
> +  compiler switch.
> +  
> +  New ISA extension support for Intel AVX-IFMA was added.
> +  AVX-IFMA intrinsics are available via the -mavxifma
>compiler switch.
>
>New ISA extension support for Intel AVX-NE-CONVERT was added.
>AVX-NE-CONVERT intrinsics are available via the
>-mavxneconvert compiler switch.
>
> -  New ISA extension support for Intel CMPccXADD was added.
> -  CMPccXADD intrinsics are available via the -mcmpccxadd
> +  New ISA extension support for Intel AVX-VNNI-INT8 was added.
> +  AVX-VNNI-INT8 intrinsics are available via the
> + -mavxvnniint8
>compiler switch.
>
> -  New ISA extension support for Intel AMX-FP16 was added.
> -  AMX-FP16 intrinsics are available via the -mamx-fp16
> +  New ISA extension support for Intel CMPccXADD was added.
> +  CMPccXADD intrinsics are available via the
> + -mcmpccxadd
>compiler switch.
>
>New ISA extension support for Intel PREFETCHI was added.
> @@ -571,10 +575,6 @@ You may also want to check out our
>RAO-INT intrinsics are available via the -mraoint
>compiler switch.
>
> -  New ISA extension support for Intel AMX-COMPLEX was added.
> -  AMX-COMPLEX intrinsics are available via the -mamx-
> complex
> -  compiler switch.
> -  
>GCC now supports the Intel CPU named Raptor Lake through
>  -march=raptorlake.
>  Raptor Lake is based on Alder Lake.
> @@ -585,13 +585,13 @@ You may also want to check out our
>
>GCC now supports the Intel CPU named Sierra Forest through
>  -march=sierraforest.
> -The switch enables the AVX-IFMA, AVX-VNNI-INT8, AVX-NE-CONVERT,
> CMPccXADD,
> -ENQCMD and UINTR ISA extensions.
> +Based on ISA extensions enabled on Alder Lake, the switch further enables
> +the AVX-IFMA, AVX-NE-CONVERT, AVX-VNNI-INT8, CMPccXADD,
> ENQCMD and UINTR
> +ISA extensions.
>
>GCC now supports the Intel CPU named Grand Ridge through
>  -march=grandridge.
> -The switch enables the AVX-IFMA, AVX-VNNI-INT8, AVX-NE-CONVERT,
> CMPccXADD,
> -ENQCMD, UINTR and RAO-INT ISA extensions.
> +Grand Ridge is based on Sierra Forest.
>
>GCC now supports the Intel CPU named Emerald Rapids through
>  -march=emeraldrapids.
> @@ -599,11 +599,13 @@ You may also want to check out our
>
>GCC now supports the Intel CPU named Granite Rapids through
>  -march=graniterapids.
> -The switch enables the AMX-FP16 and PREFETCHI ISA extensions.
> +Based on Sapphire Rapids, the switch further enables the AMX-FP16 and
> +PREFETCHI ISA extensions.
>
>GCC now supports the Intel CPU named Granite Rapids D through
>  -march=graniterapids-d.
> -The switch enables the AMX-FP16, PREFETCHI and AMX-COMPLEX ISA
> ext

[PATCH 0/2] When cmodel=extreme, add macro support and only

2023-12-27 Thread Lulu Cheng

When cmodel=extreme, since the symbol address is obtained through four 
instructions,
errors may occur in some cases during linking. Therefore, in order to ensure 
that
the instructions for obtaining the symbol address are together, macro 
instructions
are used to obtain the symbol address when cmodel=extreme.

https://github.com/loongson/la-abi-specs/blob/release/laelf.adoc#extreme-code-model


Lulu Cheng (2):
  LoongArch: Add the macro implementation of mcmodel=extreme.
  LoongArch: When the code model is extreme, the symbol address is
obtained through macro instructions regardless of the value of
-mexplicit-relocs.

 gcc/config/loongarch/loongarch.cc | 25 +-
 gcc/config/loongarch/loongarch.md | 47 ++-
 gcc/config/loongarch/predicates.md| 14 ++
 .../gcc.target/loongarch/attr-model-1.c   |  2 +-
 .../gcc.target/loongarch/attr-model-2.c   |  2 +-
 .../gcc.target/loongarch/attr-model-3.c   |  2 +-
 .../gcc.target/loongarch/attr-model-4.c   |  2 +-
 .../loongarch/func-call-extreme-1.c   |  6 +--
 .../loongarch/func-call-extreme-2.c   |  6 +--
 .../loongarch/func-call-extreme-3.c   |  6 +--
 .../loongarch/func-call-extreme-4.c   |  6 +--
 .../loongarch/func-call-extreme-5.c   |  7 +++
 .../loongarch/func-call-extreme-6.c   |  7 +++
 13 files changed, 102 insertions(+), 30 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/func-call-extreme-5.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/func-call-extreme-6.c

-- 
2.39.3

[PATCH 2/2] LoongArch: When the code model is extreme, the symbol address is obtained through macro instructions regardless of the value of -mexplicit-relocs.

2023-12-27 Thread Lulu Cheng

Instructions pcalau12i, addi.d, lu32i.d and lu52i.d must be adjancent so that 
the
linker can infer the PC of pcalau12i to apply relocations to lu32i.d and 
lu52i.d.
Otherwise, the results would be incorrect if these four instructions are not in
the same 4KiB page.

See the link for details:
https://github.com/loongson/la-abi-specs/blob/release/laelf.adoc#extreme-code-model.

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_symbol_extreme_p): Add
function declaration.
(loongarch_explicit_relocs_p): Use the macro instruction to get
the symbol address when loongarch_symbol_extreme_p returns true.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/attr-model-1.c: Modify the content of the search
string in the test case.
* gcc.target/loongarch/attr-model-2.c: Likewise.
* gcc.target/loongarch/attr-model-3.c: Likewise.
* gcc.target/loongarch/attr-model-4.c: Likewise.
* gcc.target/loongarch/func-call-extreme-1.c: Likewise.
* gcc.target/loongarch/func-call-extreme-2.c: Likewise.
* gcc.target/loongarch/func-call-extreme-3.c: Likewise.
* gcc.target/loongarch/func-call-extreme-4.c: Likewise.
---
 gcc/config/loongarch/loongarch.cc | 11 +++
 gcc/testsuite/gcc.target/loongarch/attr-model-1.c |  2 +-
 gcc/testsuite/gcc.target/loongarch/attr-model-2.c |  2 +-
 gcc/testsuite/gcc.target/loongarch/attr-model-3.c |  2 +-
 gcc/testsuite/gcc.target/loongarch/attr-model-4.c |  2 +-
 .../gcc.target/loongarch/func-call-extreme-1.c|  6 +++---
 .../gcc.target/loongarch/func-call-extreme-2.c|  6 +++---
 .../gcc.target/loongarch/func-call-extreme-3.c|  6 +++---
 .../gcc.target/loongarch/func-call-extreme-4.c|  6 +++---
 9 files changed, 27 insertions(+), 16 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index e33b9db5981..aa9a9598000 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -264,6 +264,9 @@ const char *const
 loongarch_fp_conditions[16]= {LARCH_FP_CONDITIONS (STRINGIFY)};
 #undef STRINGIFY
 
+static bool
+loongarch_symbol_extreme_p (enum loongarch_symbol_type type);
+
 /* Size of guard page.  */
 #define STACK_CLASH_PROTECTION_GUARD_SIZE \
   (1 << param_stack_clash_protection_guard_size)
@@ -1963,6 +1966,14 @@ loongarch_symbolic_constant_p (rtx x, enum 
loongarch_symbol_type *symbol_type)
 bool
 loongarch_explicit_relocs_p (enum loongarch_symbol_type type)
 {
+  /* Instructions pcalau12i, addi.d, lu32i.d and lu52i.d must be adjancent
+ so that the linker can infer the PC of pcalau12i to apply relocations
+ to lu32i.d and lu52i.d.  Otherwise, the results would be incorrect if
+ these four instructions are not in the same 4KiB page.
+ Therefore, macro instructions are used when cmodel=extreme.  */
+  if (loongarch_symbol_extreme_p (type))
+return false;
+
   if (la_opt_explicit_relocs != EXPLICIT_RELOCS_AUTO)
 return la_opt_explicit_relocs == EXPLICIT_RELOCS_ALWAYS;
 
diff --git a/gcc/testsuite/gcc.target/loongarch/attr-model-1.c 
b/gcc/testsuite/gcc.target/loongarch/attr-model-1.c
index 916d715b98b..3963b8957b0 100644
--- a/gcc/testsuite/gcc.target/loongarch/attr-model-1.c
+++ b/gcc/testsuite/gcc.target/loongarch/attr-model-1.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-mexplicit-relocs -mcmodel=normal -O2" } */
-/* { dg-final { scan-assembler-times "%pc64_hi12" 2 } } */
+/* { dg-final { scan-assembler-times "la\.local.*,\\\$r15," 2 } } */
 
 #define ATTR_MODEL_TEST
 #include "attr-model-test.c"
diff --git a/gcc/testsuite/gcc.target/loongarch/attr-model-2.c 
b/gcc/testsuite/gcc.target/loongarch/attr-model-2.c
index a74c795ac3e..6f154a92499 100644
--- a/gcc/testsuite/gcc.target/loongarch/attr-model-2.c
+++ b/gcc/testsuite/gcc.target/loongarch/attr-model-2.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-mexplicit-relocs -mcmodel=extreme -O2" } */
-/* { dg-final { scan-assembler-times "%pc64_hi12" 3 } } */
+/* { dg-final { scan-assembler-times "la\.local.*,\\\$r15," 3 } } */
 
 #define ATTR_MODEL_TEST
 #include "attr-model-test.c"
diff --git a/gcc/testsuite/gcc.target/loongarch/attr-model-3.c 
b/gcc/testsuite/gcc.target/loongarch/attr-model-3.c
index 5622d508678..eb177905d34 100644
--- a/gcc/testsuite/gcc.target/loongarch/attr-model-3.c
+++ b/gcc/testsuite/gcc.target/loongarch/attr-model-3.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-mexplicit-relocs=auto -mcmodel=normal -O2" } */
-/* { dg-final { scan-assembler-times "%pc64_hi12" 2 } } */
+/* { dg-final { scan-assembler-times "la\.local.*,\\\$r15," 2 } } */
 
 #define ATTR_MODEL_TEST
 #include "attr-model-test.c"
diff --git a/gcc/testsuite/gcc.target/loongarch/attr-model-4.c 
b/gcc/testsuite/gcc.target/loongarch/attr-model-4.c
index 482724bb974..570a0bd6690 100644
--- a/gcc/testsuite/gcc.target/loongarch/attr-model-4.c
+++ b/gcc/tes

[PATCH 1/2] LoongArch: Add the macro implementation of mcmodel=extreme.

2023-12-27 Thread Lulu Cheng

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_symbolic_constant_p):
Remove the sym+addend form from the SYMBOL_PCREL64 type symbol.
(loongarch_option_override_internal): Supports option combinations
of -cmodel=extreme and -mexplicit-relocs=none.
(loongarch_handle_model_attribute): Remove detection code.
* config/loongarch/loongarch.md (movdi_pcrel64): New templated.
(movdi_got_disp): Likewise.
* config/loongarch/predicates.md (symbolic_got_operand): Determine
whether the symbol type is SYMBOL_GOT_DISP.
(symbolic_pcrel64_operand): Determine whether the symbol type is
SYMBOL_PCREL64.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/func-call-extreme-5.c: New test.
* gcc.target/loongarch/func-call-extreme-6.c: New test.
---
 gcc/config/loongarch/loongarch.cc | 14 +-
 gcc/config/loongarch/loongarch.md | 47 ++-
 gcc/config/loongarch/predicates.md| 14 ++
 .../loongarch/func-call-extreme-5.c   |  7 +++
 .../loongarch/func-call-extreme-6.c   |  7 +++
 5 files changed, 75 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/func-call-extreme-5.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/func-call-extreme-6.c

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 578b9bc3f09..e33b9db5981 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -1944,10 +1944,10 @@ loongarch_symbolic_constant_p (rtx x, enum 
loongarch_symbol_type *symbol_type)
 case SYMBOL_TLSGD:
 case SYMBOL_TLSLDM:
 case SYMBOL_PCREL:
-case SYMBOL_PCREL64:
   /* GAS rejects offsets outside the range [-2^31, 2^31-1].  */
   return sext_hwi (INTVAL (offset), 32) == INTVAL (offset);
 
+case SYMBOL_PCREL64:
 case SYMBOL_GOT_DISP:
 case SYMBOL_TLS:
   return false;
@@ -7526,10 +7526,6 @@ loongarch_option_override_internal (struct gcc_options 
*opts,
   switch (la_target.cmodel)
 {
   case CMODEL_EXTREME:
-   if (la_opt_explicit_relocs == EXPLICIT_RELOCS_NONE)
- error ("code model %qs is not compatible with %s",
-"extreme", "-mexplicit-relocs=none");
-
if (opts->x_flag_plt)
  {
if (global_options_set.x_flag_plt)
@@ -7894,14 +7890,6 @@ loongarch_handle_model_attribute (tree *node, tree name, 
tree arg, int,
  *no_add_attrs = true;
  return NULL_TREE;
}
-  if (la_opt_explicit_relocs == EXPLICIT_RELOCS_NONE)
-   {
- error_at (DECL_SOURCE_LOCATION (decl),
-   "%qE attribute is not compatible with %s", name,
-   "-mexplicit-relocs=none");
- *no_add_attrs = true;
- return NULL_TREE;
-   }
 
   arg = TREE_VALUE (arg);
   if (TREE_CODE (arg) != STRING_CST)
diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 2b0609f2f31..72abf180b1b 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -84,6 +84,9 @@ (define_c_enum "unspec" [
 
   UNSPEC_SIBCALL_VALUE_MULTIPLE_INTERNAL_1
   UNSPEC_CALL_VALUE_MULTIPLE_INTERNAL_1
+
+  UNSPEC_MOV_PCREL64
+  UNSPEC_MOV_GOT_DISP
 ])
 
 (define_c_enum "unspecv" [
@@ -123,6 +126,7 @@ (define_constants
(TP_REGNUM  2)
(T0_REGNUM  12)
(T1_REGNUM  13)
+   (T3_REGNUM  15)
(S0_REGNUM  23)
 
;; Return path styles
@@ -2056,8 +2060,22 @@ (define_expand "movdi"
 {
   if (loongarch_legitimize_move (DImode, operands[0], operands[1]))
 DONE;
-})
 
+  enum loongarch_symbol_type symbol_type;
+  if (loongarch_symbolic_constant_p (operands[1], &symbol_type))
+{
+  if (symbol_type == SYMBOL_PCREL64)
+   {
+ emit_insn (gen_movdi_pcrel64 (operands[0], operands[1]));
+ DONE;
+   }
+  else if (TARGET_CMODEL_EXTREME && symbol_type == SYMBOL_GOT_DISP)
+   {
+ emit_insn (gen_movdi_got_disp (operands[0], operands[1]));
+ DONE;
+   }
+}
+})
 (define_insn_and_split "*movdi_32bit"
   [(set (match_operand:DI 0 "nonimmediate_operand" "=r,r,r,w,*f,*f,*r,*m")
(match_operand:DI 1 "move_operand" "r,i,w,r,*J*r,*m,*f,*f"))]
@@ -2096,6 +2114,33 @@ (define_insn_and_split "*movdi_64bit"
   [(set_attr "move_type" "move,const,load,store,mgtf,fpload,mftg,fpstore")
(set_attr "mode" "DI")])
 
+;; $t0 and $t1 are used in loongarch_output_mi_thunk.  If $t0 or $t1 is used
+;; here, then when cmodel is extreme, C++ THUNK will error.  So $t3 is selected
+;; here.
+(define_insn "movdi_pcrel64"
+ [(set (match_operand:DI 0 "register_operand" "=&r")
+   (match_operand:DI 1 "symbolic_pcrel64_operand"))
+  (unspec:DI [(const_int 0)]
+UNSPEC_MOV_PCREL64)
+  (use (reg:DI T3_REGNUM))
+  (clobber (reg:DI T3_REGNUM))]
+ "TARGET_64BIT"
+

Re: [PATCH 0/2] When cmodel=extreme, add macro support and only

2023-12-27 Thread chenglulu




在 2023/12/27 下午4:46, Lulu Cheng 写道:

When cmodel=extreme, since the symbol address is obtained through four 
instructions,
errors may occur in some cases during linking. Therefore, in order to ensure 
that
the instructions for obtaining the symbol address are together, macro 
instructions
are used to obtain the symbol address when cmodel=extreme.

https://github.com/loongson/la-abi-specs/blob/release/laelf.adoc#extreme-code-model

There are some problems with the test case changes, I will fix them in 
the v2 version.



Lulu Cheng (2):
   LoongArch: Add the macro implementation of mcmodel=extreme.
   LoongArch: When the code model is extreme, the symbol address is
 obtained through macro instructions regardless of the value of
 -mexplicit-relocs.

  gcc/config/loongarch/loongarch.cc | 25 +-
  gcc/config/loongarch/loongarch.md | 47 ++-
  gcc/config/loongarch/predicates.md| 14 ++
  .../gcc.target/loongarch/attr-model-1.c   |  2 +-
  .../gcc.target/loongarch/attr-model-2.c   |  2 +-
  .../gcc.target/loongarch/attr-model-3.c   |  2 +-
  .../gcc.target/loongarch/attr-model-4.c   |  2 +-
  .../loongarch/func-call-extreme-1.c   |  6 +--
  .../loongarch/func-call-extreme-2.c   |  6 +--
  .../loongarch/func-call-extreme-3.c   |  6 +--
  .../loongarch/func-call-extreme-4.c   |  6 +--
  .../loongarch/func-call-extreme-5.c   |  7 +++
  .../loongarch/func-call-extreme-6.c   |  7 +++
  13 files changed, 102 insertions(+), 30 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/loongarch/func-call-extreme-5.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/func-call-extreme-6.c

RE: [PATCH v4] [tree-optimization/110279] Consider FMA in get_reassociation_width

2023-12-27 Thread Di Zhao OS

Committed at 6cec7b06b3c8187b36fc05cfd4dd38b42313d727

Thanks,
Di

> -Original Message-
> From: Richard Biener 
> Sent: Friday, December 22, 2023 11:40 PM
> To: Di Zhao OS 
> Cc: Thomas Schwinge ; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH v4] [tree-optimization/110279] Consider FMA in
> get_reassociation_width
> 
> 
> 
> > Am 22.12.2023 um 16:05 schrieb Di Zhao OS :
> >
> > Updated the fix in attachment.
> >
> > Is it OK for trunk?
> 
> Ok
> 
> > Tested on aarch64-unknown-linux-gnu and x86_64-pc-linux-gnu.
> >
> > Thanks,
> > Di Zhao
> >
> >> -Original Message-
> >> From: Di Zhao OS 
> >> Sent: Sunday, December 17, 2023 8:31 PM
> >> To: Thomas Schwinge ; gcc-patches@gcc.gnu.org
> >> Cc: Richard Biener 
> >> Subject: RE: [PATCH v4] [tree-optimization/110279] Consider FMA in
> >> get_reassociation_width
> >>
> >> Hello Thomas,
> >>
> >>> -Original Message-
> >>> From: Thomas Schwinge 
> >>> Sent: Friday, December 15, 2023 5:46 PM
> >>> To: Di Zhao OS ; gcc-patches@gcc.gnu.org
> >>> Cc: Richard Biener 
> >>> Subject: RE: [PATCH v4] [tree-optimization/110279] Consider FMA in
> >>> get_reassociation_width
> >>>
> >>> Hi!
> >>>
> >>> On 2023-12-13T08:14:28+, Di Zhao OS 
> >> wrote:
>  --- /dev/null
>  +++ b/gcc/testsuite/gcc.dg/pr110279-2.c
>  @@ -0,0 +1,41 @@
>  +/* PR tree-optimization/110279 */
>  +/* { dg-do compile } */
>  +/* { dg-options "-Ofast --param tree-reassoc-width=4 --param fully-
> >>> pipelined-fma=1 -fdump-tree-reassoc2-details -fdump-tree-optimized" } */
>  +/* { dg-additional-options "-march=armv8.2-a" { target aarch64-*-* } }
> */
>  +
>  +#define LOOP_COUNT 8
>  +typedef double data_e;
>  +
>  +#include 
>  +
>  +__attribute_noinline__ data_e
>  +foo (data_e in)
> >>>
> >>> Pushed to master branch commit 91e9e8faea4086b3b8aef2355fc12c1559d425f6
> >>> "Fix 'gcc.dg/pr110279-2.c' syntax error due to '__attribute_noinline__'",
> >>> see attached.
> >>>
> >>> However:
> >>>
>  +{
>  +  data_e a1, a2, a3, a4;
>  +  data_e tmp, result = 0;
>  +  a1 = in + 0.1;
>  +  a2 = in * 0.1;
>  +  a3 = in + 0.01;
>  +  a4 = in * 0.59;
>  +
>  +  data_e result2 = 0;
>  +
>  +  for (int ic = 0; ic < LOOP_COUNT; ic++)
>  +{
>  +  /* Test that a complete FMA chain with length=4 is not broken.  */
>  +  tmp = a1 + a2 * a2 + a3 * a3 + a4 * a4 ;
>  +  result += tmp - ic;
>  +  result2 = result2 / 2 - tmp;
>  +
>  +  a1 += 0.91;
>  +  a2 += 0.1;
>  +  a3 -= 0.01;
>  +  a4 -= 0.89;
>  +
>  +}
>  +
>  +  return result + result2;
>  +}
>  +
>  +/* { dg-final { scan-tree-dump-not "was chosen for reassociation"
> >>> "reassoc2"} } */
>  +/* { dg-final { scan-tree-dump-times {\.FMA } 3 "optimized"} } */
> >>
> >> Thank you for the fix.
> >>
> >>> ..., I still see these latter two tree dump scans FAIL, for GCN:
> >>>
> >>>$ grep -C2 'was chosen for reassociation' pr110279-2.c.197t.reassoc2
> >>>  2 *: a3_40
> >>>  2 *: a2_39
> >>>Width = 4 was chosen for reassociation
> >>>Transforming _15 = powmult_1 + powmult_3;
> >>> into _63 = powmult_1 + a1_38;
> >>>$ grep -F .FMA pr110279-2.c.265t.optimized
> >>>  _63 = .FMA (a2_39, a2_39, a1_38);
> >>>  _64 = .FMA (a3_40, a3_40, powmult_5);
> >>>
> >>> ..., nvptx:
> >>>
> >>>$ grep -C2 'was chosen for reassociation' pr110279-2.c.197t.reassoc2
> >>>  2 *: a3_40
> >>>  2 *: a2_39
> >>>Width = 4 was chosen for reassociation
> >>>Transforming _15 = powmult_1 + powmult_3;
> >>> into _63 = powmult_1 + a1_38;
> >>>$ grep -F .FMA pr110279-2.c.265t.optimized
> >>>  _63 = .FMA (a2_39, a2_39, a1_38);
> >>>  _64 = .FMA (a3_40, a3_40, powmult_5);
> >>
> >> For these 2 targets, the reassoc_width for FMUL is 1 (default value),
> >> While the testcase assumes that to be 4. The bug was introduced when I
> >> updated the patch but forgot to update the testcase.
> >>
> >>> ..., but also x86_64-pc-linux-gnu:
> >>>
> >>>$  grep -C2 'was chosen for reassociation' pr110279-2.c.197t.reassoc2
> >>>  2 *: a3_40
> >>>  2 *: a2_39
> >>>Width = 2 was chosen for reassociation
> >>>Transforming _15 = powmult_1 + powmult_3;
> >>> into _63 = powmult_1 + powmult_3;
> >>>$ grep -cF .FMA pr110279-2.c.265t.optimized
> >>>0
> >>
> >> For x86_64 this needs "-mfma". Sorry the compile options missed that.
> >> Can the change below fix these issues? I moved them into
> >> testsuite/gcc.target/aarch64, since they rely on tunings.
> >>
> >> Tested on aarch64-unknown-linux-gnu.
> >>
> >>>
> >>> Grüße
> >>> Thomas
> >>>
> >>>
> >>> -
> >>> Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201,
> >> 80634
> >>> München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas
> >>> Heurung, Frank Thürauf; Sitz der Gesellschaft: München;

[PATCH] aarch64: add 'AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA'

2023-12-27 Thread Di Zhao OS

This patch adds a new tuning option 'AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA',
to consider fully pipelined FMAs in reassociation. Also, set this option
by default for Ampere CPUs.

Tested on aarch64-unknown-linux-gnu. Is this OK for trunk?

Thanks,
Di Zhao

gcc/ChangeLog:

* config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNING_OPTION):
New tuning option AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA.
* config/aarch64/aarch64.cc (aarch64_override_options_internal): Set
param_fully_pipelined_fma according to tuning option.
* config/aarch64/tuning_models/ampere1.h: Add
AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA to tune_flags.
* config/aarch64/tuning_models/ampere1a.h: Likewise.
* config/aarch64/tuning_models/ampere1b.h: Likewise.

---
 gcc/config/aarch64/aarch64-tuning-flags.def | 2 ++
 gcc/config/aarch64/aarch64.cc   | 6 ++
 gcc/config/aarch64/tuning_models/ampere1.h  | 3 ++-
 gcc/config/aarch64/tuning_models/ampere1a.h | 3 ++-
 gcc/config/aarch64/tuning_models/ampere1b.h | 3 ++-
 5 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def 
b/gcc/config/aarch64/aarch64-tuning-flags.def
index f28a73839a6..256f17bad60 100644
--- a/gcc/config/aarch64/aarch64-tuning-flags.def
+++ b/gcc/config/aarch64/aarch64-tuning-flags.def
@@ -49,4 +49,6 @@ AARCH64_EXTRA_TUNING_OPTION ("matched_vector_throughput", 
MATCHED_VECTOR_THROUGH
 
 AARCH64_EXTRA_TUNING_OPTION ("avoid_cross_loop_fma", AVOID_CROSS_LOOP_FMA)
 
+AARCH64_EXTRA_TUNING_OPTION ("fully_pipelined_FMA", FULLY_PIPELINED_FMA)
+
 #undef AARCH64_EXTRA_TUNING_OPTION
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index f9850320f61..1b3b288cdf9 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -18289,6 +18289,12 @@ aarch64_override_options_internal (struct gcc_options 
*opts)
 SET_OPTION_IF_UNSET (opts, &global_options_set, param_avoid_fma_max_bits,
 512);
 
+  /* Consider fully pipelined FMA in reassociation.  */
+  if (aarch64_tune_params.extra_tuning_flags
+  & AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA)
+SET_OPTION_IF_UNSET (opts, &global_options_set, param_fully_pipelined_fma,
+1);
+
   aarch64_override_options_after_change_1 (opts);
 }
 
diff --git a/gcc/config/aarch64/tuning_models/ampere1.h 
b/gcc/config/aarch64/tuning_models/ampere1.h
index a144e8f94b3..d63788528a7 100644
--- a/gcc/config/aarch64/tuning_models/ampere1.h
+++ b/gcc/config/aarch64/tuning_models/ampere1.h
@@ -104,7 +104,8 @@ static const struct tune_params ampere1_tunings =
   2,   /* min_div_recip_mul_df.  */
   0,   /* max_case_values.  */
   tune_params::AUTOPREFETCHER_WEAK,/* autoprefetcher_model.  */
-  (AARCH64_EXTRA_TUNE_AVOID_CROSS_LOOP_FMA),   /* tune_flags.  */
+  (AARCH64_EXTRA_TUNE_AVOID_CROSS_LOOP_FMA |
+   AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA),/* tune_flags.  */
   &ere1_prefetch_tune,
   AARCH64_LDP_STP_POLICY_ALIGNED,   /* ldp_policy_model.  */
   AARCH64_LDP_STP_POLICY_ALIGNED/* stp_policy_model.  */
diff --git a/gcc/config/aarch64/tuning_models/ampere1a.h 
b/gcc/config/aarch64/tuning_models/ampere1a.h
index f688ed08a79..63506e1d1c6 100644
--- a/gcc/config/aarch64/tuning_models/ampere1a.h
+++ b/gcc/config/aarch64/tuning_models/ampere1a.h
@@ -56,7 +56,8 @@ static const struct tune_params ampere1a_tunings =
   2,   /* min_div_recip_mul_df.  */
   0,   /* max_case_values.  */
   tune_params::AUTOPREFETCHER_WEAK,/* autoprefetcher_model.  */
-  (AARCH64_EXTRA_TUNE_AVOID_CROSS_LOOP_FMA),   /* tune_flags.  */
+  (AARCH64_EXTRA_TUNE_AVOID_CROSS_LOOP_FMA |
+   AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA),/* tune_flags.  */
   &ere1_prefetch_tune,
   AARCH64_LDP_STP_POLICY_ALIGNED,   /* ldp_policy_model.  */
   AARCH64_LDP_STP_POLICY_ALIGNED/* stp_policy_model.  */
diff --git a/gcc/config/aarch64/tuning_models/ampere1b.h 
b/gcc/config/aarch64/tuning_models/ampere1b.h
index a98b6a980f7..7894e730174 100644
--- a/gcc/config/aarch64/tuning_models/ampere1b.h
+++ b/gcc/config/aarch64/tuning_models/ampere1b.h
@@ -106,7 +106,8 @@ static const struct tune_params ampere1b_tunings =
   0,   /* max_case_values.  */
   tune_params::AUTOPREFETCHER_STRONG,  /* autoprefetcher_model.  */
   (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND |
-   AARCH64_EXTRA_TUNE_AVOID_CROSS_LOOP_FMA),   /* tune_flags.  */
+   AARCH64_EXTRA_TUNE_AVOID_CROSS_LOOP_FMA |
+   AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA),/* tune_flags.  */
   &ere1b_prefetch_tune,
   AARCH64_LDP_STP_POLICY_ALIGNED,   /* ldp_policy_model.  */
   AARCH64_LDP_STP_POLICY_ALIGNED/* stp_policy_model.  */
-- 
2.25.1

Re: Ping: [PATCH] LoongArch: Replace -mexplicit-relocs=auto simple-used address peephole2 with combine

2023-12-27 Thread Xi Ruoyao

On Wed, 2023-12-27 at 11:59 +0800, chenglulu wrote:

> +FAIL: gcc.dg/pr86617.c scan-rtl-dump-times final "mem/v" 6
> 
> In r14-6818 the issue persists. I kind of chased the code and found that the 
> problem is like this:
>   volatile unsigned char u8;
> 
>   void test (void)
>   {
> u8 = u8 + u8;
> u8 = u8 - u8;
>   }
> 
> $./gcc/cc1 test.c -o test.s -fdump-rtl-all-all -fdiagnostics-plain-output  
> -Os -fdump-rtl-final -ffat-lto-objects
> 
> test.c.301r.outof_cfglayout
> 
>  (insn 7 6 9 2 (set (reg:DI 80 [ u8.0_1 ])
> (zero_extend:DI (mem/v/c:QI (symbol_ref:DI ("*.LANCHOR0") [flags 
> 0x182]) [0 u8D.2193+0 S1 A8]))) "volatile.c":5:11 459 {simple_load_uextdiqidi}
>  (nil))
> 
> test.c.302r.split1
> 
> (insn 27 6 28 2 (set (reg:DI 98)
> (unspec:DI [
> (symbol_ref:DI ("*.LANCHOR0") [flags 0x182])
> ] UNSPEC_PCALAU12I_GR)) "volatile.c":5:11 -1
>  (nil))
> (insn 28 27 9 2 (set (reg:DI 80 [ u8.0_1 ])
> (zero_extend:DI (mem:QI (lo_sum:DI (reg:DI 98)
> (symbol_ref:DI ("*.LANCHOR0") [flags 0x182])) [0  S1 
> A8]))) "volatile.c":5:11 -1
>  (nil))
> 
> The volatile property of the mem here is gone, so the test fails.

Phew.  I guess I couldn't reproduce it because I have Jeff's ext-dce
patch in my local repo, which removed the zero_extend...

I'll rework this patch.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

[C PATCH] C: Fix type compatibility for structs with variable sized fields.

2023-12-27 Thread Martin Uecker



This patch hopefully fixes the test failure we see with gnu23-tag-4.c.
It does for me locally with -march=native (which otherwise reproduces
the problem).

Bootstrapped and regession tested on x86_64


C: Fix type compatibility for structs with variable sized fields.

This fixes the test gcc.dg/gnu23-tag-4.c introduced by commit 23fee88f
which fails for -march=... because the DECL_FIELD_BIT_OFFSET are set
inconsistently for types with and without variable-sized field.  This
is fixed by testing for DECL_ALIGN instead.  The code is further
simplified by removing some unnecessary conditions, i.e. anon_field is
set unconditionaly and all fields are assumed to be DECL_FIELDs.

gcc/c:
* c-typeck.c (tagged_types_tu_compatible_p): Revise.

gcc/testsuite:
* gcc.dg./c23-tag-9.c: New test.
---
 gcc/c/c-typeck.cc| 19 ---
 gcc/testsuite/gcc.dg/c23-tag-9.c |  8 
 2 files changed, 16 insertions(+), 11 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/c23-tag-9.c

diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index 2d9139d09d2..84ddda1ebab 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -1511,8 +1511,6 @@ tagged_types_tu_compatible_p (const_tree t1, const_tree 
t2,
   if (!data->anon_field && TYPE_STUB_DECL (t1) != TYPE_STUB_DECL (t2))
 data->different_types_p = true;
 
-  data->anon_field = false;
-
   /* Incomplete types are incompatible inside a TU.  */
   if (TYPE_SIZE (t1) == NULL || TYPE_SIZE (t2) == NULL)
 return false;
@@ -1592,22 +1590,21 @@ tagged_types_tu_compatible_p (const_tree t1, const_tree 
t2,
 s1 && s2;
 s1 = DECL_CHAIN (s1), s2 = DECL_CHAIN (s2))
  {
-   if (TREE_CODE (s1) != TREE_CODE (s2)
-   || DECL_NAME (s1) != DECL_NAME (s2))
+   gcc_assert (TREE_CODE (s1) == FIELD_DECL);
+   gcc_assert (TREE_CODE (s2) == FIELD_DECL);
+
+   if (DECL_NAME (s1) != DECL_NAME (s2))
+ return false;
+
+   if (DECL_ALIGN (s1) != DECL_ALIGN (s2))
  return false;
 
-   if (!DECL_NAME (s1) && RECORD_OR_UNION_TYPE_P (TREE_TYPE (s1)))
- data->anon_field = true;
+   data->anon_field = !DECL_NAME (s1);
 
data->cache = &entry;
if (!comptypes_internal (TREE_TYPE (s1), TREE_TYPE (s2), data))
  return false;
 
-   if (TREE_CODE (s1) == FIELD_DECL
-   && simple_cst_equal (DECL_FIELD_BIT_OFFSET (s1),
-DECL_FIELD_BIT_OFFSET (s2)) != 1)
- return false;
-
tree st1 = TYPE_SIZE (TREE_TYPE (s1));
tree st2 = TYPE_SIZE (TREE_TYPE (s2));
 
diff --git a/gcc/testsuite/gcc.dg/c23-tag-9.c b/gcc/testsuite/gcc.dg/c23-tag-9.c
new file mode 100644
index 000..1d32560ec23
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/c23-tag-9.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-std=c23" } */
+
+struct foo { int x; } x;
+struct foo { alignas(128) int x; } y;  /* { dg-error "redefinition" } */
+static_assert(alignof(y) == 128);
+
+
-- 
2.39.2

Re: Fortran: Use non conflicting file extensions for intermediates [PR81615]

2023-12-27 Thread Harald Anlauf


Hi Rimvydas!

Am 24.12.23 um 02:33 schrieb Rimvydas Jasinskas:

Documentation part.
The makeinfo gcc/fortran/gfortran.texi does not seem to have any new warnings.


The patch is almost fine, except for a strange wording here:

+@smallexample
+gfortran -save-temps -c foo.F90
+@end smallexample
+
+preprocesses to in @file{foo.fii}, compiles to an intermediate
+@file{foo.s}, and then assembles to the (implied) output file
+@file{foo.o}, whereas:

I understand the formulation is copied from gcc/doc/invoke.texi,
where it does not fully make sense to me either.

How about:

"preprocesses input file @file{foo.F90} to @file{foo.fii}, ..."

Furthermore,

+@smallexample
+gfortran -save-temps -S foo.F
+@end smallexample
+
+saves the (no longer) temporary preprocessed file in @file{foo.fi}, and
+then compiles to the (implied) output file @file{foo.s}.

Even if this is copied from the gcc texinfo file, how about:

"saves the preprocessor output in @file{foo.fi}, ..."

which I find easier to read.

Can you also add a reference to the PR number in the commit message?


Is there a specific reason thy -fc-prototypes (Interoperability
Options section) is excluded from manpage?


Can you be more specific?  I get here (since gcc-9):

% man /opt/gcc/14/share/man/man1/gfortran.1 |grep -A 1 "Interoperability 
Options"

   Interoperability Options
   -fc-prototypes -fc-prototypes-external

although no detailed explanation (-> gfortran.info).


Regards,
Rimvydas


Thanks,
Harald

[Committed] RISC-V: Make dynamic LMUL cost model more accurate for conversion codes

2023-12-27 Thread Juzhe-Zhong

Notice current dynamic LMUL is not accurate for conversion codes.
Refine for it, there is current case is changed from choosing LMUL = 4 into 
LMUL = 8.

Tested no regression, committed.

Before this patch (LMUL = 4):  After this patch (LMUL = 8):  
lw  a7,56(sp) lwa7,56(sp)
ld  t5,0(sp)  ldt5,0(sp)
ld  t1,8(sp)  ldt1,8(sp)
ld  t6,16(sp) ldt6,16(sp)
ld  t0,24(sp) ldt0,24(sp)
ld  t3,32(sp) ldt3,32(sp)
ld  t4,40(sp) ldt4,40(sp)
ble a7,zero,.L5   ble   a7,zero,.L5
.L3:   .L3:
vsetvli a4,a7,e32,m2,ta,mavsetvli   a4,a7,e32,m4,ta
vle8.v  v1,0(a2)  vle8.vv3,0(a2)
vle8.v  v4,0(a1)  vle8.vv16,0(t0)
vsext.vf4   v8,v1 vle8.vv7,0(a1)
vsext.vf4   v2,v4 vle8.vv12,0(t6)
vsetvli zero,zero,e8,mf2,ta,mavle8.vv2,0(a5)
vadd.vv v4,v4,v1  vle8.vv1,0(t5)
vsetvli zero,zero,e32,m2,ta,mavsext.vf4 v20,v3
vle8.v  v5,0(t0)  vsext.vf4 v8,v7
vle8.v  v6,0(t6)  vadd.vv   v8,v8,v20
vadd.vv v2,v2,v8  vadd.vv   v8,v8,v8
vadd.vv v2,v2,v2  vadd.vv   v8,v8,v20
vadd.vv v2,v2,v8  vsetvli   zero,zero,e8,m1
vsetvli zero,zero,e8,mf2,ta,mavadd.vv   v15,v12,v16
vadd.vv v6,v6,v5  vsetvli   zero,zero,e32,m4
vsetvli zero,zero,e32,m2,ta,mavsext.vf4 v12,v15
vle8.v  v8,0(t5)  vadd.vv   v8,v8,v12
vle8.v  v9,0(a5)  vsetvli   zero,zero,e8,m1
vsext.vf4   v10,v4vadd.vv   v7,v7,v3
vsext.vf4   v12,v6vsetvli   zero,zero,e32,m4
vadd.vv v2,v2,v12 vsext.vf4 v4,v7
vadd.vv v2,v2,v10 vadd.vv   v8,v8,v4
vsetvli zero,zero,e16,m1,ta,mavsetvli   zero,zero,e16,m2
vncvt.x.x.w v4,v2 vncvt.x.x.w   v4,v8
vsetvli zero,zero,e32,m2,ta,mavsetvli   zero,zero,e8,m1
vadd.vv v6,v2,v2  vncvt.x.x.w   v4,v4
vsetvli zero,zero,e8,mf2,ta,mavadd.vv   v15,v3,v4
vncvt.x.x.w v4,v4 vadd.vv   v2,v2,v4
vadd.vv v5,v5,v4  vse8.vv15,0(t4)
vadd.vv v9,v9,v4  vadd.vv   v3,v16,v4
vadd.vv v1,v1,v4  vse8.vv2,0(a3)
vadd.vv v4,v8,v4  vadd.vv   v1,v1,v4
vse8.v  v1,0(t4)  vse8.vv1,0(a6)
vse8.v  v9,0(a3)  vse8.vv3,0(t1)
vsetvli zero,zero,e32,m2,ta,mavsetvli   zero,zero,e32,m4
vse8.v  v4,0(a6)  vsext.vf4 v4,v3
vsext.vf4   v8,v5 vadd.vv   v4,v4,v8
vse8.v  v5,0(t1)  vsetvli   zero,zero,e64,m8
vadd.vv v2,v8,v2  vsext.vf2 v16,v4
vsetvli zero,zero,e64,m4,ta,mavse64.v   v16,0(t3)
vsext.vf2   v8,v2 vsetvli   zero,zero,e32,m4
vsetvli zero,zero,e32,m2,ta,mavadd.vv   v8,v8,v8
sllit2,a4,3   vsext.vf4 v4,v15
vse64.v v8,0(t3)  slli  t2,a4,3
vsext.vf4   v2,v1 vadd.vv   v4,v8,v4
sub a7,a7,a4  sub   a7,a7,a4
vadd.vv v2,v6,v2  vsetvli   zero,zero,e64,m8
vsetvli zero,zero,e64,m4,ta,mavsext.vf2 v8,v4
vsext.vf2   v4,v2 vse64.v   v8,0(a0)
vse64.v v4,0(a0)  add   a1,a1,a4
add a2,a2,a4  add   a2,a2,a4
add a1,a1,a4  add   a5,a5,a4
add t6,t6,a4  add   t5,t5,a4
add t0,t0,a4  add   t6,t6,a4
add a5,a5,a4  add

[PATCH][V4] RISC-V: Nan-box the result of movhf on soft-fp16

2023-12-27 Thread KuanLin Chen

According to spec, fmv.h checks if the input operands are correctly
 NaN-boxed. If not, the input value is treated as an n-bit canonical NaN.
 This patch fixs the issue that operands returned by soft-fp16 libgcc
 (i.e., __truncdfhf2) was not correctly NaN-boxed.

*gcc/ChangeLog:*

* config/riscv/riscv.cc (riscv_legitimize_move): Expand movfh

with Nan-boxing value.

* config/riscv/riscv.md (*movhf_softfloat_unspec): New pattern.


*gcc/testsuite/ChangeLog:*

* gcc.target/riscv/_Float16-nanboxing.c: New test.


0001-RISC-V-Nan-box-the-result-of-movhf-on-soft-fp16.patch
Description: Binary data

Re: Fortran: Use non conflicting file extensions for intermediates [PR81615]

2023-12-27 Thread Rimvydas Jasinskas

On Wed, Dec 27, 2023 at 10:34 PM Harald Anlauf  wrote:
> The patch is almost fine, except for a strange wording here:
>
> +@smallexample
> +gfortran -save-temps -c foo.F90
> +@end smallexample
> +
> +preprocesses to in @file{foo.fii}, compiles to an intermediate
> +@file{foo.s}, and then assembles to the (implied) output file
> +@file{foo.o}, whereas:
>
> I understand the formulation is copied from gcc/doc/invoke.texi,
> where it does not fully make sense to me either.
>
> How about:
>
> "preprocesses input file @file{foo.F90} to @file{foo.fii}, ..."
>
> Furthermore,
>
> +@smallexample
> +gfortran -save-temps -S foo.F
> +@end smallexample
> +
> +saves the (no longer) temporary preprocessed file in @file{foo.fi}, and
> +then compiles to the (implied) output file @file{foo.s}.
>
> Even if this is copied from the gcc texinfo file, how about:
>
> "saves the preprocessor output in @file{foo.fi}, ..."
>
> which I find easier to read.
>
> Can you also add a reference to the PR number in the commit message?
I agree, wording sounds a lot better, included in v2 together with PR number.


> > Is there a specific reason thy -fc-prototypes (Interoperability
> > Options section) is excluded from manpage?
>
> Can you be more specific?  I get here (since gcc-9):
>
> % man /opt/gcc/14/share/man/man1/gfortran.1 |grep -A 1 "Interoperability
> Options"
> Interoperability Options
> -fc-prototypes -fc-prototypes-external
>
> although no detailed explanation (-> gfortran.info).
The https://gcc.gnu.org/onlinedocs/gfortran/Invoking-GNU-Fortran.html
does contain a working link to
https://gcc.gnu.org/onlinedocs/gfortran/Interoperability-Options.html
However the manpage has Interoperability section explicitly disabled
with "@c man end" ... "@c man begin ENVIRONMENT".
After digging into git log it seems that Interoperability section was
unintentionally added after this comment mark in
https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=e655a6cc43

Best regards,
Rimvydas
From f8663a022a8b9c4f1c4a76d8e4823e24f691623c Mon Sep 17 00:00:00 2001
From: Rimvydas Jasinskas 
Date: Sat, 23 Dec 2023 18:59:09 +
Subject: Fortran: Add Developer Options mini-section to documentation

Separate out -fdump-* options to the new section.  Sort by option name.

While there, document -save-temps intermediates.

gcc/fortran/ChangeLog:

	PR fortran/81615
	* invoke.texi: Add Developer Options section.  Move '-fdump-*'
	to it.  Add small examples about changed '-save-temps' behavior.

Signed-off-by: Rimvydas Jasinskas 
---
 gcc/fortran/invoke.texi | 117 ++--
 1 file changed, 77 insertions(+), 40 deletions(-)

diff --git a/gcc/fortran/invoke.texi b/gcc/fortran/invoke.texi
index c7fd019a7c5..5d526e23e5c 100644
--- a/gcc/fortran/invoke.texi
+++ b/gcc/fortran/invoke.texi
@@ -94,12 +94,13 @@ one is not the default.
  compiled.
 * Preprocessing Options::  Enable and customize preprocessing.
 * Error and Warning Options:: How picky should the compiler be?
-* Debugging Options::   Symbol tables, measurements, and debugging dumps.
+* Debugging Options::   Symbol tables, measurements.
 * Directory Options::   Where to find module files
 * Link Options ::   Influencing the linking step
 * Runtime Options:: Influencing runtime behavior
 * Code Gen Options::Specifying conventions for function calls, data layout
 and register usage.
+* Developer Options::   Printing GNU Fortran specific info, debugging dumps.
 * Interoperability Options::  Options for interoperability with other
   languages.
 * Environment Variables:: Environment variables that affect @command{gfortran}.
@@ -159,9 +160,8 @@ and warnings}.
 }
 
 @item Debugging Options
-@xref{Debugging Options,,Options for debugging your program or GNU Fortran}.
-@gccoptlist{-fbacktrace -fdump-fortran-optimized -fdump-fortran-original
--fdebug-aux-vars -fdump-fortran-global -fdump-parse-tree -ffpe-trap=@var{list}
+@xref{Debugging Options,,Options for debugging your program}.
+@gccoptlist{-fbacktrace -fdebug-aux-vars -ffpe-trap=@var{list}
 -ffpe-summary=@var{list}
 }
 
@@ -201,6 +201,12 @@ and warnings}.
 -fpack-derived -frealloc-lhs -frecursive -frepack-arrays
 -fshort-enums -fstack-arrays
 }
+
+@item Developer Options
+@xref{Developer Options,,GNU Fortran Developer Options}.
+@gccoptlist{-fdump-fortran-global -fdump-fortran-optimized
+-fdump-fortran-original -fdump-parse-tree -save-temps
+}
 @end table
 
 @node Fortran Dialect Options
@@ -1280,40 +1286,14 @@ and other GNU compilers.
 Some of these have no effect when compiling programs written in Fortran.
 
 @node Debugging Options
-@section Options for debugging your program or GNU Fortran
+@section Options for debugging your program
 @cindex options, debugging
 @cindex debugging information options
 
 GNU Fortran has various special options that are used for debugging
-either your program or the GNU Fortran compiler.
+yo

[Committed] RISC-V: Make known NITERS loop be aware of dynamic lmul cost model liveness information

RE: [gcc-wwwdocs PATCH v2] gcc-13/14: Mention recent update for x86_64 backend

[PATCH 0/2] When cmodel=extreme, add macro support and only

[PATCH 2/2] LoongArch: When the code model is extreme, the symbol address is obtained through macro instructions regardless of the value of -mexplicit-relocs.

[PATCH 1/2] LoongArch: Add the macro implementation of mcmodel=extreme.

Re: [PATCH 0/2] When cmodel=extreme, add macro support and only

RE: [PATCH v4] [tree-optimization/110279] Consider FMA in get_reassociation_width

[PATCH] aarch64: add 'AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA'

Re: Ping: [PATCH] LoongArch: Replace -mexplicit-relocs=auto simple-used address peephole2 with combine

[C PATCH] C: Fix type compatibility for structs with variable sized fields.

Re: Fortran: Use non conflicting file extensions for intermediates [PR81615]

[Committed] RISC-V: Make dynamic LMUL cost model more accurate for conversion codes

[PATCH][V4] RISC-V: Nan-box the result of movhf on soft-fp16

Re: Fortran: Use non conflicting file extensions for intermediates [PR81615]

14 matches

Site Navigation

Mail list logo

Footer information