[RFC] expr: don't clear SUBREG_PROMOTED_VAR_P flag for a promoted subreg [target/111466]

2023-09-28 Thread Vineet Gupta
RISC-V suffers from extraneous sign extensions, despite/given the ABI
guarantee that 32-bit quantities are sign-extended into 64-bit registers,
meaning incoming SI function args need not be explicitly sign extended
(so do SI return values as most ALU insns implicitly sign-extend too.)

Existing REE doesn't seem to handle this well and there are various ideas
floating around to smarten REE about it.

RISC-V also seems to correctly implement middle-end hook PROMOTE_MODE
etc.

Another approach would be to prevent EXPAND from generating the
sign_extend in the first place which this patch tries to do.

The hunk being removed was introduced way back in 1994 as
   5069803972 ("expand_expr, case CONVERT_EXPR .. clear the promotion flag")

This survived full testsuite run for RISC-V rv64gc with surprisingly no
fallouts: test results before/after are exactly same.

|   | # of unexpected case / # of unique unexpected 
case
|   |  gcc |  g++ | gfortran |
| rv64imafdc_zba_zbb_zbs_zicond/|  264 /87 |5 / 2 |   72 /12 |
|lp64d/medlow

Granted for something so old to have survived, there must be a valid
reason. Unfortunately the original change didn't have additional
commentary or a test case. That is not to say it can't/won't possibly
break things on other arches/ABIs, hence the RFC for someone to scream
that this is just bonkers, don't do this :-)

I've explicitly CC'ed Jakub and Roger who have last touched subreg
promoted notes in expr.cc for insight and/or screaming ;-)

Thanks to Robin for narrowing this down in an amazing debugging session
@ GNU Cauldron.

```
foo2:
sext.w  a6,a1 <-- this goes away
beq a1,zero,.L4
li  a5,0
li  a0,0
.L3:
addwa4,a2,a5
addwa5,a3,a5
addwa0,a4,a0
bltua5,a6,.L3
ret
.L4:
    li  a0,0
ret
```

Signed-off-by: Vineet Gupta 
Co-developed-by: Robin Dapp 
---
 gcc/expr.cc   |  7 ---
 gcc/testsuite/gcc.target/riscv/pr111466.c | 15 +++
 2 files changed, 15 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr111466.c

diff --git a/gcc/expr.cc b/gcc/expr.cc
index 308ddc09e631..d259c6e53385 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -9332,13 +9332,6 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode 
tmode,
  op0 = expand_expr (treeop0, target, VOIDmode,
 modifier);
 
- /* If the signedness of the conversion differs and OP0 is
-a promoted SUBREG, clear that indication since we now
-have to do the proper extension.  */
- if (TYPE_UNSIGNED (TREE_TYPE (treeop0)) != unsignedp
- && GET_CODE (op0) == SUBREG)
-   SUBREG_PROMOTED_VAR_P (op0) = 0;
-
  return REDUCE_BIT_FIELD (op0);
}
 
diff --git a/gcc/testsuite/gcc.target/riscv/pr111466.c 
b/gcc/testsuite/gcc.target/riscv/pr111466.c
new file mode 100644
index ..007792466a51
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr111466.c
@@ -0,0 +1,15 @@
+/* Simplified varaint of gcc.target/riscv/zba-adduw.c.  */
+
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zba_zbs -mabi=lp64" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" } } */
+
+int foo2(int unused, int n, unsigned y, unsigned delta){
+  int s = 0;
+  unsigned int x = 0;
+  for (;x

Re: [RFC] expr: don't clear SUBREG_PROMOTED_VAR_P flag for a promoted subreg [target/111466]

2023-09-28 Thread Vineet Gupta



On 9/28/23 20:17, Jeff Law wrote:
I can bootstrap & regression test alpha using QEMU user mode 
emulation. So we might be able to trigger something that way. It'll 
take some time, but might prove fruitful. 


That would be awesome. It's not like this this is burning or anything 
and one of the things in the long tail of things we need to do in this area.


Thx,
-Vineet


Re: [PATCH] RISC-V: Enable RVV scalable vectorization by default[PR111311]

2023-09-30 Thread Vineet Gupta




On 9/11/23 06:12, Jeff Law via Gcc-patches wrote:



On 9/10/23 21:42, juzhe.zh...@rivai.ai wrote:

Ping this patch.

I think it's time to enable scalable vectorization by default and do 
the whole regression every time (except vect.exp that we didn't 
enable yet)


Update current FAILs status:

Real FAILS (ICE and execution FAIL):

FAIL: gcc.dg/pr70252.c (internal compiler error: in 
gimple_expand_vec_cond_expr, at gimple-isel.cc:284)

FAIL: gcc.dg/pr70252.c (test for excess errors)
FAIL: gcc.dg/pr92301.c execution test

Robin is working on these 3 issues and will be solved soon.

FAIL: g++.dg/torture/vshuf-v4df.C   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  (internal compiler error: in as_a, at 
machmode.h:381)
FAIL: g++.dg/torture/vshuf-v4df.C   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  (test for excess errors)
FAIL: g++.dg/torture/vshuf-v4df.C   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  (internal compiler error: in as_a, at 
machmode.h:381)
FAIL: g++.dg/torture/vshuf-v4df.C   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  (test for excess errors)


This is a long time known issue I have mentioned many times, we need 
help for LTO since it's caused by mode bits extension.


The rest bogus FAILs:
FAIL: gcc.dg/unroll-8.c scan-rtl-dump loop2_unroll "Not unrolling 
loop, doesn't roll"
FAIL: gcc.dg/unroll-8.c scan-rtl-dump loop2_unroll "likely upper 
bound: 6"

FAIL: gcc.dg/unroll-8.c scan-rtl-dump loop2_unroll "realistic bound: -1"
FAIL: gcc.dg/var-expand1.c scan-rtl-dump loop2_unroll "Expanding 
Accumulator"
FAIL: gcc.dg/tree-ssa/cunroll-16.c scan-tree-dump cunroll "optimized: 
loop with [0-9]+ iterations completely unrolled"

FAIL: gcc.dg/tree-ssa/cunroll-16.c scan-tree-dump-not optimized "foo"
FAIL: gcc.dg/tree-ssa/forwprop-40.c scan-tree-dump-times optimized 
"BIT_FIELD_REF" 0
FAIL: gcc.dg/tree-ssa/forwprop-40.c scan-tree-dump-times optimized 
"BIT_INSERT_EXPR" 0
FAIL: gcc.dg/tree-ssa/forwprop-41.c scan-tree-dump-times optimized 
"BIT_FIELD_REF" 0
FAIL: gcc.dg/tree-ssa/forwprop-41.c scan-tree-dump-times optimized 
"BIT_INSERT_EXPR" 1
FAIL: gcc.dg/tree-ssa/gen-vect-11b.c scan-tree-dump-times vect 
"vectorized 0 loops" 1
FAIL: gcc.dg/tree-ssa/gen-vect-11c.c scan-tree-dump-times vect 
"vectorized 0 loops" 1
FAIL: gcc.dg/tree-ssa/gen-vect-26.c scan-tree-dump-times vect 
"Alignment of access forced using peeling" 1
FAIL: gcc.dg/tree-ssa/gen-vect-28.c scan-tree-dump-times vect 
"Alignment of access forced using peeling" 1
FAIL: gcc.dg/tree-ssa/loop-bound-1.c scan-tree-dump ivopts "bounded 
by 254"
FAIL: gcc.dg/tree-ssa/loop-bound-2.c scan-tree-dump ivopts "bounded 
by 254"
FAIL: gcc.dg/tree-ssa/predcom-2.c scan-tree-dump-times pcom 
"Unrolling 2 times." 2
FAIL: gcc.dg/tree-ssa/predcom-4.c scan-tree-dump-times pcom 
"Combination" 1
FAIL: gcc.dg/tree-ssa/predcom-4.c scan-tree-dump-times pcom 
"Unrolling 3 times." 1
FAIL: gcc.dg/tree-ssa/predcom-5.c scan-tree-dump-times pcom 
"Combination" 2
FAIL: gcc.dg/tree-ssa/predcom-5.c scan-tree-dump-times pcom 
"Unrolling 3 times." 1
FAIL: gcc.dg/tree-ssa/predcom-9.c scan-tree-dump pcom "Executing 
predictive commoning without unrolling"
FAIL: gcc.dg/tree-ssa/reassoc-46.c scan-tree-dump-times optimized 
"(?:vect_)?sum_[\\d._]+ = (?:(?:vect_)?_[\\d._]+ \\+ 
(?:vect_)?sum_[\\d._]+|(?:v   ect_)?sum_[\\d._]+ \\+ 
(?:vect_)?_[\\d._]+)" 1
FAIL: gcc.dg/tree-ssa/scev-10.c scan-tree-dump-times ivopts " 
  Type:\\tREFERENCE ADDRESS\n" 1
FAIL: gcc.dg/tree-ssa/scev-11.c scan-tree-dump-times ivopts " 
  Type:\\tREFERENCE ADDRESS\n" 2
FAIL: gcc.dg/tree-ssa/scev-14.c scan-tree-dump ivopts "Overflowness 
wrto loop niter:\tNo-overflow"
FAIL: gcc.dg/tree-ssa/scev-9.c scan-tree-dump-times ivopts " 
  Type:\\tREFERENCE ADDRESS\n" 1
FAIL: gcc.dg/tree-ssa/split-path-11.c scan-tree-dump-times 
split-paths "join point for if-convertable half-diamond" 1


These are bogus dump FAILs and I have 100% confirm each of them, we 
are having same behavior as SVE.


So is this patch ok for trunk ?

Yes, this is OK for the trunk.


Our internal SPEC regressions as of yesterday's tip are still failing 
due to this commit.
    2023-09-28 8552dcd8e444 Revert "[RA]: Improve cost calculation of 
pseudos with equivalences"


Is there a plan for addressing the blocker issues above anytime soon ?

Thx,
-Vineet



mvconst_internal splitter gated with !@ira_in_progess (was Re: Yet Another IRA question)

2023-10-02 Thread Vineet Gupta




On 9/28/23 12:52, Vineet Gupta wrote:


On 9/28/23 05:53, Jeff Law wrote:
Vineet -- assuming Vlad's patch goes in, the other obvious candidate 
for this would be the mvconst_internal define_insn_and_split where 
we'd probably want to reject the insn as a whole once IRA has started. 


Good point, although currently we've kind of papered over it with 
-fsched-pressure, but I'm sure there are way more cases that this will 
improve still.
I will spin up a full multilib test with that, hopefully with no 
fallout :-)


I have the results finally. This is testsuite neutral. Same results 
before/after


   = Summary of gcc testsuite =
    | # of unexpected case / # of unique 
unexpected case

    |  gcc |  g++ | gfortran |
   rv32imac/  ilp32/ medlow |  168 /    70 |   13 / 6 |   67 /    12 |
 rv32imafdc/ ilp32d/ medlow |  168 /    70 |   13 / 6 |   24 / 4 |
   rv64imac/   lp64/ medlow |  161 /    70 |    9 / 3 |   67 /    12 |
 rv64imafdc/  lp64d/ medlow |  160 /    69 |    5 / 2 |    6 / 1 |

But the SPEC runs are not affected at all, if anything it seems to be 
way under noise for 5 workloads.

Not sure if we still want to add the gate, your call

511.povray_r
< 3028749801452
---
> 3028749851039

521.wrf_r
< 3897783090826
---
> 3897783089025

523.xalancbmk_r
< 1062577057227
---
> 1062577053403

527.cam4_r
< 2141572063843
---
> 2141572063800

Thx,
-Vineet


Re: [RFC] expr: don't clear SUBREG_PROMOTED_VAR_P flag for a promoted subreg [target/111466]

2023-10-02 Thread Vineet Gupta




On 9/29/23 05:14, Jeff Law wrote:



On 9/28/23 21:49, Vineet Gupta wrote:


On 9/28/23 20:17, Jeff Law wrote:
I can bootstrap & regression test alpha using QEMU user mode 
emulation. So we might be able to trigger something that way. It'll 
take some time, but might prove fruitful. 


That would be awesome. It's not like this this is burning or anything 
and one of the things in the long tail of things we need to do in 
this area.
Absolutely true.  In fact, I doubt this particular quirk is all that 
important in the extension removal space.  But we've got enough data 
to do a bit of an experiment.  While it takes a long time to run, it 
doesn't require any kind of regular human intervention.  Better to 
fire it off now while it's still fresh in our minds.  If we wait a 
week or two I'll have forgotten everything.


Just as a data-point, the SPEC QEMU icount on RISC-V with this patch 
seems promising (+ve is better, lesser icount)


Baseline    subreg promoted
    note preserved   %
benchmark   workload #
500.perlbench_r 0   1222524143251   1222717055541 -0.02%
    1   741422677286    740573609468 0.11%
    2   694157786227    693787219643 0.05%
502.gcc_r   0   189519277112    188986824061 0.28%  <--
    1   224763075520    224225499491 0.24%
    2   216802546093    216426186662 0.17%
    3   179634122120    179165923074 0.26%  <--
    4   222757886491    222343753217 0.19%
503.bwaves_r    0   309660270217    309640234863 0.01%
    1   488871452301    488838894844 0.01%
    2   381243468081    381218065515 0.01%
    3   463786236849    463756755469 0.01%
505.mcf_r   0   675010417323    675014630665 0.00%
507.cactuBSSN_r 0   2856874668987   2856874679135 0.00%
508.namd_r  0   1877527849689   1877508676900 0.00%
510.parest_r    0   3479719575579   3479104891751 0.02%
511.povray_r    0   3028749801452   3030037805612 -0.04%
519.lbm_r   0   1172421269180   1172421185445 0.00%
520.omnetpp_r   0   1014838628822   1007680353306 0.71%  <--
521.wrf_r   0   3897783090826   3897162379983 0.02%
523.xalancbmk_r 0   1062577057227   1062508198843 0.01%
525.x264_r  0   451836043634    449289002218 0.56%  <--
    1   1761617424590   1758009904369 0.20%  <--
    2   1686037465791   1682815274048 0.19%  <--
526.blender_r   0   1660559350538   1660534452552 0.00%
527.cam4_r  0   2141572063843   2141254936488 0.01%
531.deepsjeng_r 0   1605692153702   1603021256993 0.17%
538.imagick_r   0   3401602661013   3401602660827 0.00%
541.leela_r 0   1989286081019   1987365526160 0.10%
544.nab_r   0   1537038165879   1528865609530 0.53%  <--
548.exchange2_r 0   2050220774922   2048840906692 0.07%
549.fotonik3d_r 0   2231807908394   2231807600916 0.00%
554.roms_r  0   2612969430882   2611529873683 0.06%
557.xz_r    0   367967125022    367760820700 0.06%
    1   961163448926    961025548415 0.01%
    2   522156857947    521939109911 0.04%
997.specrand_fr 0   413618578   413604068 0.00%
999.specrand_ir 0   413618578   413604068 0.00%



Re: [RFC] expr: don't clear SUBREG_PROMOTED_VAR_P flag for a promoted subreg [target/111466]

2023-10-04 Thread Vineet Gupta

On 10/4/23 10:59, Jeff Law wrote:



On 9/28/23 15:43, Vineet Gupta wrote:

RISC-V suffers from extraneous sign extensions, despite/given the ABI
guarantee that 32-bit quantities are sign-extended into 64-bit 
registers,

meaning incoming SI function args need not be explicitly sign extended
(so do SI return values as most ALU insns implicitly sign-extend too.)

Existing REE doesn't seem to handle this well and there are various 
ideas

floating around to smarten REE about it.

RISC-V also seems to correctly implement middle-end hook PROMOTE_MODE
etc.

Another approach would be to prevent EXPAND from generating the
sign_extend in the first place which this patch tries to do.

The hunk being removed was introduced way back in 1994 as
    5069803972 ("expand_expr, case CONVERT_EXPR .. clear the 
promotion flag")


This survived full testsuite run for RISC-V rv64gc with surprisingly no
fallouts: test results before/after are exactly same.

|   | # of unexpected case / # of unique 
unexpected case
|   |  gcc |  g++ | 
gfortran |
| rv64imafdc_zba_zbb_zbs_zicond/|  264 /    87 |    5 / 2 |   72 
/    12 |

|    lp64d/medlow

Granted for something so old to have survived, there must be a valid
reason. Unfortunately the original change didn't have additional
commentary or a test case. That is not to say it can't/won't possibly
break things on other arches/ABIs, hence the RFC for someone to scream
that this is just bonkers, don't do this :-)

I've explicitly CC'ed Jakub and Roger who have last touched subreg
promoted notes in expr.cc for insight and/or screaming ;-)

Thanks to Robin for narrowing this down in an amazing debugging session
@ GNU Cauldron.

```
foo2:
sext.w    a6,a1 <-- this goes away
beq    a1,zero,.L4
li    a5,0
li    a0,0
.L3:
addw    a4,a2,a5
addw    a5,a3,a5
addw    a0,a4,a0
bltu    a5,a6,.L3
ret
.L4:
    li    a0,0
ret
```

Signed-off-by: Vineet Gupta 
Co-developed-by: Robin Dapp 
---
  gcc/expr.cc   |  7 ---
  gcc/testsuite/gcc.target/riscv/pr111466.c | 15 +++
  2 files changed, 15 insertions(+), 7 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/riscv/pr111466.c
So mcore-elf is showing something interesting.  With that hunk of 
Kenner code removed, it actually has a few failing tests that flip to 
passes.



Tests that now work, but didn't before (11 tests):

mcore-sim: gcc.c-torture/execute/pr109986.c   -O1  execution test
mcore-sim: gcc.c-torture/execute/pr109986.c   -O2  execution test
mcore-sim: gcc.c-torture/execute/pr109986.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none  execution test
mcore-sim: gcc.c-torture/execute/pr109986.c   -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects  execution test

mcore-sim: gcc.c-torture/execute/pr109986.c   -O3 -g  execution test
mcore-sim: gcc.c-torture/execute/pr109986.c   -Os  execution test
mcore-sim: gcc.c-torture/execute/pr84524.c   -O2  execution test
mcore-sim: gcc.c-torture/execute/pr84524.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none  execution test
mcore-sim: gcc.c-torture/execute/pr84524.c   -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects  execution test

mcore-sim: gcc.dg/tree-ssa/pr84436-5.c execution test
mcore-sim: gcc.dg/tree-ssa/pr84436-5.c execution test


So that's a really interesting result.   If further analysis doesn't 
point the finger at a simulator bug or something like that, then we'll 
have strong evidence that Kenner's change is actively harmful from a 
correctness standpoint.  That would change the calculus here 
significantly.


Sadly, mcore-elf doesn't have a working gdb IIRC (don't ask how I know 
that!), so I'm going to have to analyze this further with less 
efficient techniques.  BUt definitely interesting news/results.


Really interesting. Can't thank you enough for spending time in chasing 
this down.


-Vineet


[PATCH v2] RISC-V: const: hide mvconst splitter from IRA

2023-10-06 Thread Vineet Gupta
Vlad recently introduced a new gate @ira_in_progress, similar to
counterparts @{reload,lra}_in_progress.

Use this to hide the constant synthesis splitter from being recog* ()
by IRA register equivalence logic which is eager to undo the splits,
generating worse code for constants (and sometimes no code at all).

See PR/109279 (large constant), PR/110748 (const -0.0) ...

Granted the IRA logic is subsided with -fsched-pressure which is now
enabled for RISC-V backend, the gate makes this future-proof in
addition to helping with -O1 etc.

This fixes 1 addition test

   = Summary of gcc testsuite =
| # of unexpected case / # of unique unexpected case
|  gcc |  g++ | gfortran |

   rv32imac/  ilp32/ medlow |  416 /   103 |   13 / 6 |   67 /12 |
 rv32imafdc/ ilp32d/ medlow |  416 /   103 |   13 / 6 |   24 / 4 |
   rv64imac/   lp64/ medlow |  417 /   104 |9 / 3 |   67 /12 |
 rv64imafdc/  lp64d/ medlow |  416 /   103 |5 / 2 |6 / 1 |

Also similar to v1, this doesn't move RISC-V SPEC scores at all.

gcc/ChangeLog:
* config/riscv/riscv.md (mvconst_internal): Add !ira_in_progress.

Suggested-by: Jeff Law 
Signed-off-by: Vineet Gupta 
---
changes since v1:
  - Fix bug: new condition to prevent recognition not splitting itself
---
 gcc/config/riscv/riscv.md | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index e00b8ee3579d..9b990ec2566d 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -1997,13 +1997,16 @@
 
 ;; Pretend to have the ability to load complex const_int in order to get
 ;; better code generation around them.
-;;
 ;; But avoid constants that are special cased elsewhere.
+;;
+;; Hide it from IRA register equiv recog* () to elide potential undoing of 
split
+;;
 (define_insn_and_split "*mvconst_internal"
   [(set (match_operand:GPR 0 "register_operand" "=r")
 (match_operand:GPR 1 "splittable_const_int_operand" "i"))]
-  "!(p2m1_shift_operand (operands[1], mode)
- || high_mask_shift_operand (operands[1], mode))"
+  "!ira_in_progress
+   && !(p2m1_shift_operand (operands[1], mode)
+|| high_mask_shift_operand (operands[1], mode))"
   "#"
   "&& 1"
   [(const_int 0)]
-- 
2.34.1



[COMMITTED] RISC-V: const: hide mvconst splitter from IRA

2023-10-06 Thread Vineet Gupta
Vlad recently introduced a new gate @ira_in_progress, similar to
counterparts @{reload,lra}_in_progress.

Use this to hide the constant synthesis splitter from being recog* ()
by IRA register equivalence logic which is eager to undo the splits,
generating worse code for constants (and sometimes no code at all).

See PR/109279 (large constant), PR/110748 (const -0.0) ...

Granted the IRA logic is subsided with -fsched-pressure which is now
enabled for RISC-V backend, the gate makes this future-proof in
addition to helping with -O1 etc.

This fixes 1 addition test

   = Summary of gcc testsuite =
| # of unexpected case / # of unique unexpected case
|  gcc |  g++ | gfortran |

   rv32imac/  ilp32/ medlow |  416 /   103 |   13 / 6 |   67 /12 |
 rv32imafdc/ ilp32d/ medlow |  416 /   103 |   13 / 6 |   24 / 4 |
   rv64imac/   lp64/ medlow |  417 /   104 |9 / 3 |   67 /12 |
 rv64imafdc/  lp64d/ medlow |  416 /   103 |5 / 2 |6 / 1 |

Also similar to v1, this doesn't move RISC-V SPEC scores at all.

gcc/ChangeLog:
* config/riscv/riscv.md (mvconst_internal): Add !ira_in_progress.

Suggested-by: Jeff Law 
Signed-off-by: Vineet Gupta 
---
 gcc/config/riscv/riscv.md | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 1ebe8f92284d..da84b9357bd3 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -1997,13 +1997,16 @@
 
 ;; Pretend to have the ability to load complex const_int in order to get
 ;; better code generation around them.
-;;
 ;; But avoid constants that are special cased elsewhere.
+;;
+;; Hide it from IRA register equiv recog* () to elide potential undoing of 
split
+;;
 (define_insn_and_split "*mvconst_internal"
   [(set (match_operand:GPR 0 "register_operand" "=r")
 (match_operand:GPR 1 "splittable_const_int_operand" "i"))]
-  "!(p2m1_shift_operand (operands[1], mode)
- || high_mask_shift_operand (operands[1], mode))"
+  "!ira_in_progress
+   && !(p2m1_shift_operand (operands[1], mode)
+|| high_mask_shift_operand (operands[1], mode))"
   "#"
   "&& 1"
   [(const_int 0)]
-- 
2.34.1



xthead regression with [COMMITTED] RISC-V: const: hide mvconst splitter from IRA

2023-10-09 Thread Vineet Gupta

Hi Christoph,

On 10/9/23 12:06, Patrick O'Neill wrote:


Hi Vineet,

We're seeing a regression on all riscv targets after this patch:|

FAIL: gcc.target/riscv/xtheadcondmov-indirect.c -O2 
check-function-bodies ConNmv_imm_imm_reg||
FAIL: gcc.target/riscv/xtheadcondmov-indirect.c -O3 -g 
check-function-bodies ConNmv_imm_imm_reg


Debug log output:
body: \taddi    a[0-9]+,a[0-9]+,-1000+
\tli    a[0-9]+,9998336+
\taddi    a[0-9]+,a[0-9]+,1664+
\tth.mveqz    a[0-9]+,a[0-9]+,a[0-9]+
\tret

against:     li    a5,9998336
    addi    a4,a0,-1000
    addi    a0,a5,1664
    th.mveqz    a0,a1,a4
    ret|

https://github.com/patrick-rivos/gcc-postcommit-ci/issues/8
https://github.com/ewlu/riscv-gnu-toolchain/issues/286



It seems with my patch, exactly same instructions get out of order (for 
-O2/-O3) tripping up the test results and differ from say O1 for exact 
same build.


-O2 w/ patch
ConNmv_imm_imm_reg:
    li    a5,9998336
    addi    a4,a0,-1000
    addi    a0,a5,1664
    th.mveqz    a0,a1,a4
    ret

-O1 w/ patch
ConNmv_imm_imm_reg:
    addi    a4,a0,-1000
    li    a5,9998336
    addi    a0,a5,1664
    th.mveqz    a0,a1,a4
    ret

I'm not sure if there is an easy way to handle that.
Is there a real reason for testing the full sequences verbatim, or is 
testing number of occurrences of th.mv{eqz,nez} enough.
It seems Jeff recently added -fno-sched-pressure to avoid similar issues 
but that apparently is no longer sufficient.


Thx,
-Vineet


Thanks,
Patrick

On 10/6/23 11:22, Vineet Gupta wrote:

Vlad recently introduced a new gate @ira_in_progress, similar to
counterparts @{reload,lra}_in_progress.

Use this to hide the constant synthesis splitter from being recog* ()
by IRA register equivalence logic which is eager to undo the splits,
generating worse code for constants (and sometimes no code at all).

See PR/109279 (large constant), PR/110748 (const -0.0) ...

Granted the IRA logic is subsided with -fsched-pressure which is now
enabled for RISC-V backend, the gate makes this future-proof in
addition to helping with -O1 etc.

This fixes 1 addition test

= Summary of gcc testsuite =
 | # of unexpected case / # of unique unexpected 
case
 |  gcc |  g++ | gfortran |

rv32imac/  ilp32/ medlow |  416 /   103 |   13 / 6 |   67 /12 |
  rv32imafdc/ ilp32d/ medlow |  416 /   103 |   13 / 6 |   24 / 4 |
rv64imac/   lp64/ medlow |  417 /   104 |9 / 3 |   67 /12 |
  rv64imafdc/  lp64d/ medlow |  416 /   103 |5 / 2 |6 / 1 |

Also similar to v1, this doesn't move RISC-V SPEC scores at all.

gcc/ChangeLog:
* config/riscv/riscv.md (mvconst_internal): Add !ira_in_progress.

Suggested-by: Jeff Law
Signed-off-by: Vineet Gupta
---
  gcc/config/riscv/riscv.md | 9 ++---
  1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 1ebe8f92284d..da84b9357bd3 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -1997,13 +1997,16 @@
  
  ;; Pretend to have the ability to load complex const_int in order to get

  ;; better code generation around them.
-;;
  ;; But avoid constants that are special cased elsewhere.
+;;
+;; Hide it from IRA register equiv recog* () to elide potential undoing of 
split
+;;
  (define_insn_and_split "*mvconst_internal"
[(set (match_operand:GPR 0 "register_operand" "=r")
  (match_operand:GPR 1 "splittable_const_int_operand" "i"))]
-  "!(p2m1_shift_operand (operands[1], mode)
- || high_mask_shift_operand (operands[1], mode))"
+  "!ira_in_progress
+   && !(p2m1_shift_operand (operands[1], mode)
+|| high_mask_shift_operand (operands[1], mode))"
"#"
"&& 1"
[(const_int 0)]




Re: xthead regression with [COMMITTED] RISC-V: const: hide mvconst splitter from IRA

2023-10-09 Thread Vineet Gupta

On 10/9/23 13:46, Christoph Müllner wrote:
Given that this causes repeated issues, I think that a fall-back to 
counting occurrences is the right thing to do. I can do that if that's ok.


Thanks Christoph.

-Vineet


Re: [RFC] expr: don't clear SUBREG_PROMOTED_VAR_P flag for a promoted subreg [target/111466]

2023-10-12 Thread Vineet Gupta




On 10/11/23 19:37, Hans-Peter Nilsson wrote:

```
foo2:
sext.w  a6,a1 <-- this goes away
beq a1,zero,.L4
li  a5,0
li  a0,0
.L3:
addwa4,a2,a5
addwa5,a3,a5
addwa0,a4,a0
bltua5,a6,.L3
ret
.L4:
li  a0,0
ret
```

...if your patch gets rid of that sign-extension above...


diff --git a/gcc/testsuite/gcc.target/riscv/pr111466.c 
b/gcc/testsuite/gcc.target/riscv/pr111466.c
new file mode 100644
index ..007792466a51
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr111466.c
@@ -0,0 +1,15 @@
+/* Simplified varaint of gcc.target/riscv/zba-adduw.c.  */
+
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zba_zbs -mabi=lp64" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" } } */
+
+int foo2(int unused, int n, unsigned y, unsigned delta){
+  int s = 0;
+  unsigned int x = 0;
+  for (;x
...then why test for the presence of a sign-extension
instruction in the test-case?

IOW, shouldn't that be a scan-assember-not?


Yes indeed.


(What am I missing?)


Nothing deep really, just a snafu on my side. I'll fix it in v2.

Thx,
-Vineet


brgds, H-P
PS. sorry I missed the Cauldron this year.  Hope to see you all next year!


Looking fwd to.

Thx,
-Vineet


Continued (Non)mutlib and stub header issue (was Re: [PATCH v2] RISC-V: Use stdint-gcc.h in rvv testsuite)

2023-10-13 Thread Vineet Gupta

Hi Kito, Christoph, Patrick,

I've been trying to test a specific non multilib config 
(rv64-zicond_zfa) and seeing noisy output (compared to a similar multlib 
build) and it seems there's still a bunch of this header issue (for the 
rv32 abi headers) in the non-multlib config.


e.g.
FAIL: gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-7.c -O3 
-ftree-vectorize --param riscv-autovec-lmul=dynamic (test for excess errors)

...
...

The errors are missing header :

 from 
gcc/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-7.c:4:
INSTALL/tc-up-231010-zicond-zfa-non-multilib/sysroot/usr/include/gnu/stubs.h:8:11: 
fatal error: gnu/stubs-ilp32.h: No such file or directory



When looking around, I stumbled upon commit
    d0bbecb1c41 "RISC-V: Add riscv_vector.h wrapper in testsuite to 
prevent pull in stdint.h from C library"


Which seems like a step in a right direction, but how does one ensure 
that the wrapper riscv_vector.h (containing stdint-gcc.h) is included by 
test vs. the default riscv_vector.h (which is more user facing and thus 
needs to include the system stdint.h)


P.S. I couldn't find the posting for above commit on gcc-patches ?

Thx,
-Vineet


On 10/5/23 16:45, Patrick O'Neill wrote:

stdint.h can be replaced with stdint-gcc.h to resolve some missing
system headers in non-multilib installations.

Tested using glibc rv32gcv and rv64gcv on r14-4381-g7eb5ce7f58e.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-1.h:
Replace stdint.h with stdint-gcc.h.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-2.h:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-1.h:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-2.h:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-1.h:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-2.h:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-1.h:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-2.h:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-8.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-8.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-9.c: Ditto.
* gcc.target/riscv/rvv/autovec/pr111232.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/cvt-0.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/cvt-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm.h: Ditto.
* gcc.target/riscv/rvv/base/abi-call-args-4-run.c: Ditto.
* gcc.target/riscv/rvv/base/pr110119-2.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/pr111255.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/wredsum_vlmax.c: Ditto.

Signed-off-by: Patrick O'Neill 
---
Changes from v1:
- Avoid changing riscv_vector.h

Failures looked like this:
In file included from 
/riscv-gnu-toolchain/build/sysroot/usr/include/features.h:515,
  from 
/riscv-gnu-toolchain/build/sysroot/usr/include/bits/libc-header-start.h:33,
  from /riscv-gnu-toolchain/build/sysroot/usr/include/stdint.h:26,
  from 
/riscv-gnu-toolchain/build/lib/gcc/riscv32-unknown-linux-gnu/14.0.0/include/stdint.h:9,
  from 
/riscv-gnu-toolchain/build/build-gcc-linux-stage2/gcc/include/stdint.h:9,
  from 
/riscv-gnu-toolchain/build/build-gcc-linux-stage2/gcc/include/riscv_vector.h:28,
  from 
/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-7.c:4:
/riscv-gnu-toolchain/build/sysroot/usr/include/gnu/stubs.h:8:11: fatal error: 
gnu/stubs-ilp32.h: No such file or directory

Resolves these failures on rv32gcv (non-multilib):
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-rv64-1.c (test 
for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-rv64-2.c (test 
for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-rv64-1.c (test 
for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-rv64-2.c (test 
for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv64-1.c (test 
for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv64-2.c (test 
for e

[PATCH] RISC-V/testsuite: add a default march (lacking zfa) to some fp tests

2023-10-15 Thread Vineet Gupta
A bunch of FP tests expecting specific FP asm output fail when built
with zfa because different insns are generated. And this happens
because those tests don't have an explicit -march and the default
used to configure gcc could end up with zfa causing the false fails.

Fix that by adding the -march explicitly which doesn't have zfa.

BTW it seems we have some duplication in tests for zfa and non-zfa and
it would have been better if they were consolidated, but oh well.

gcc/testsuite:
* gcc.target/riscv/fle-ieee.c: Updates dg-options with
explicit -march=rv64gc and -march=rv43gc.
* gcc.target/riscv/fle-snan.c: Ditto.
* gcc.target/riscv/fle.c: Ditto.
* gcc.target/riscv/flef-ieee.c: Ditto.
* gcc.target/riscv/flef.c: Ditto.
* gcc.target/riscv/flef-snan.c: Ditto.
* gcc.target/riscv/flt-ieee.c: Ditto.
* gcc.target/riscv/flt-snan.c: Ditto.
* gcc.target/riscv/fltf-ieee.c: Ditto.
* gcc.target/riscv/fltf-snan.c: Ditto.

Signed-off-by: Vineet Gupta 
---
 gcc/testsuite/gcc.target/riscv/fle-ieee.c  | 3 ++-
 gcc/testsuite/gcc.target/riscv/fle-snan.c  | 3 ++-
 gcc/testsuite/gcc.target/riscv/fle.c   | 3 ++-
 gcc/testsuite/gcc.target/riscv/flef-ieee.c | 3 ++-
 gcc/testsuite/gcc.target/riscv/flef-snan.c | 3 ++-
 gcc/testsuite/gcc.target/riscv/flef.c  | 3 ++-
 gcc/testsuite/gcc.target/riscv/flt-ieee.c  | 3 ++-
 gcc/testsuite/gcc.target/riscv/flt-snan.c  | 3 ++-
 gcc/testsuite/gcc.target/riscv/fltf-ieee.c | 3 ++-
 gcc/testsuite/gcc.target/riscv/fltf-snan.c | 3 ++-
 10 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/fle-ieee.c 
b/gcc/testsuite/gcc.target/riscv/fle-ieee.c
index e55331f925d6..12d04514ca29 100644
--- a/gcc/testsuite/gcc.target/riscv/fle-ieee.c
+++ b/gcc/testsuite/gcc.target/riscv/fle-ieee.c
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target hard_float } */
-/* { dg-options "-fno-finite-math-only -ftrapping-math -fno-signaling-nans" } 
*/
+/* { dg-options "-march=rv64gc -mabi=lp64d  -fno-finite-math-only 
-ftrapping-math -fno-signaling-nans" { target { rv64 } } } */
+/* { dg-options "-march=rv32gc -mabi=ilp32d -fno-finite-math-only 
-ftrapping-math -fno-signaling-nans" { target { rv32 } } } */
 
 long
 fle (double x, double y)
diff --git a/gcc/testsuite/gcc.target/riscv/fle-snan.c 
b/gcc/testsuite/gcc.target/riscv/fle-snan.c
index f40bb2cbf662..146b7866e888 100644
--- a/gcc/testsuite/gcc.target/riscv/fle-snan.c
+++ b/gcc/testsuite/gcc.target/riscv/fle-snan.c
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target hard_float } */
-/* { dg-options "-fno-finite-math-only -ftrapping-math -fsignaling-nans" } */
+/* { dg-options "-march=rv64gc -mabi=lp64d  -fno-finite-math-only 
-ftrapping-math -fsignaling-nans" { target { rv64 } } } */
+/* { dg-options "-march=rv32gc -mabi=ilp32d -fno-finite-math-only 
-ftrapping-math -fsignaling-nans" { target { rv32 } } } */
 
 long
 fle (double x, double y)
diff --git a/gcc/testsuite/gcc.target/riscv/fle.c 
b/gcc/testsuite/gcc.target/riscv/fle.c
index 97c8ab9ad864..2379e22d5062 100644
--- a/gcc/testsuite/gcc.target/riscv/fle.c
+++ b/gcc/testsuite/gcc.target/riscv/fle.c
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target hard_float } */
-/* { dg-options "-fno-finite-math-only -fno-trapping-math -fno-signaling-nans" 
} */
+/* { dg-options "-march=rv64gc -mabi=lp64d  -fno-finite-math-only 
-fno-trapping-math -fno-signaling-nans" { target { rv64 } } } */
+/* { dg-options "-march=rv32gc -mabi=ilp32d -fno-finite-math-only 
-fno-trapping-math -fno-signaling-nans" { target { rv32 } } } */
 
 long
 fle (double x, double y)
diff --git a/gcc/testsuite/gcc.target/riscv/flef-ieee.c 
b/gcc/testsuite/gcc.target/riscv/flef-ieee.c
index f3e7e7d75d6c..b6ee6ed08a4d 100644
--- a/gcc/testsuite/gcc.target/riscv/flef-ieee.c
+++ b/gcc/testsuite/gcc.target/riscv/flef-ieee.c
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target hard_float } */
-/* { dg-options "-fno-finite-math-only -ftrapping-math -fno-signaling-nans" } 
*/
+/* { dg-options "-march=rv64gc -mabi=lp64d  -fno-finite-math-only 
-ftrapping-math -fno-signaling-nans" { target { rv64 } } } */
+/* { dg-options "-march=rv32gc -mabi=ilp32f -fno-finite-math-only 
-ftrapping-math -fno-signaling-nans" { target { rv32 } } } */
 
 long
 flef (float x, float y)
diff --git a/gcc/testsuite/gcc.target/riscv/flef-snan.c 
b/gcc/testsuite/gcc.target/riscv/flef-snan.c
index ef75b3523057..e8611e8c0215 100644
--- a/gcc/testsuite/gcc.target/riscv/flef-snan.c
+++ b/gcc/testsuite/gcc.target/riscv/flef-snan.c
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target hard_float } */
-/* { dg-options "-fno-finite-math-only -ftrapping-math -fsignaling-nans" } */
+/* { dg-options "-march=r

[COMMITTED] RISC-V/testsuite: add a default march (lacking zfa) to some fp tests

2023-10-16 Thread Vineet Gupta
A bunch of FP tests expecting specific FP asm output fail when built
with zfa because different insns are generated. And this happens
because those tests don't have an explicit -march and the default
used to configure gcc could end up with zfa causing the false fails.

Fix that by adding the -march explicitly which doesn't have zfa.

BTW it seems we have some duplication in tests for zfa and non-zfa and
it would have been better if they were consolidated, but oh well.

gcc/testsuite:
* gcc.target/riscv/fle-ieee.c: Updates dg-options with
explicit -march=rv64gc and -march=rv32gc.
* gcc.target/riscv/fle-snan.c: Ditto.
* gcc.target/riscv/fle.c: Ditto.
* gcc.target/riscv/flef-ieee.c: Ditto.
* gcc.target/riscv/flef.c: Ditto.
* gcc.target/riscv/flef-snan.c: Ditto.
* gcc.target/riscv/flt-ieee.c: Ditto.
* gcc.target/riscv/flt-snan.c: Ditto.
* gcc.target/riscv/fltf-ieee.c: Ditto.
* gcc.target/riscv/fltf-snan.c: Ditto.

Signed-off-by: Vineet Gupta 
---
 gcc/testsuite/gcc.target/riscv/fle-ieee.c  | 3 ++-
 gcc/testsuite/gcc.target/riscv/fle-snan.c  | 3 ++-
 gcc/testsuite/gcc.target/riscv/fle.c   | 3 ++-
 gcc/testsuite/gcc.target/riscv/flef-ieee.c | 3 ++-
 gcc/testsuite/gcc.target/riscv/flef-snan.c | 3 ++-
 gcc/testsuite/gcc.target/riscv/flef.c  | 3 ++-
 gcc/testsuite/gcc.target/riscv/flt-ieee.c  | 3 ++-
 gcc/testsuite/gcc.target/riscv/flt-snan.c  | 3 ++-
 gcc/testsuite/gcc.target/riscv/fltf-ieee.c | 3 ++-
 gcc/testsuite/gcc.target/riscv/fltf-snan.c | 3 ++-
 10 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/fle-ieee.c 
b/gcc/testsuite/gcc.target/riscv/fle-ieee.c
index e55331f925d6..12d04514ca29 100644
--- a/gcc/testsuite/gcc.target/riscv/fle-ieee.c
+++ b/gcc/testsuite/gcc.target/riscv/fle-ieee.c
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target hard_float } */
-/* { dg-options "-fno-finite-math-only -ftrapping-math -fno-signaling-nans" } 
*/
+/* { dg-options "-march=rv64gc -mabi=lp64d  -fno-finite-math-only 
-ftrapping-math -fno-signaling-nans" { target { rv64 } } } */
+/* { dg-options "-march=rv32gc -mabi=ilp32d -fno-finite-math-only 
-ftrapping-math -fno-signaling-nans" { target { rv32 } } } */
 
 long
 fle (double x, double y)
diff --git a/gcc/testsuite/gcc.target/riscv/fle-snan.c 
b/gcc/testsuite/gcc.target/riscv/fle-snan.c
index f40bb2cbf662..146b7866e888 100644
--- a/gcc/testsuite/gcc.target/riscv/fle-snan.c
+++ b/gcc/testsuite/gcc.target/riscv/fle-snan.c
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target hard_float } */
-/* { dg-options "-fno-finite-math-only -ftrapping-math -fsignaling-nans" } */
+/* { dg-options "-march=rv64gc -mabi=lp64d  -fno-finite-math-only 
-ftrapping-math -fsignaling-nans" { target { rv64 } } } */
+/* { dg-options "-march=rv32gc -mabi=ilp32d -fno-finite-math-only 
-ftrapping-math -fsignaling-nans" { target { rv32 } } } */
 
 long
 fle (double x, double y)
diff --git a/gcc/testsuite/gcc.target/riscv/fle.c 
b/gcc/testsuite/gcc.target/riscv/fle.c
index 97c8ab9ad864..2379e22d5062 100644
--- a/gcc/testsuite/gcc.target/riscv/fle.c
+++ b/gcc/testsuite/gcc.target/riscv/fle.c
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target hard_float } */
-/* { dg-options "-fno-finite-math-only -fno-trapping-math -fno-signaling-nans" 
} */
+/* { dg-options "-march=rv64gc -mabi=lp64d  -fno-finite-math-only 
-fno-trapping-math -fno-signaling-nans" { target { rv64 } } } */
+/* { dg-options "-march=rv32gc -mabi=ilp32d -fno-finite-math-only 
-fno-trapping-math -fno-signaling-nans" { target { rv32 } } } */
 
 long
 fle (double x, double y)
diff --git a/gcc/testsuite/gcc.target/riscv/flef-ieee.c 
b/gcc/testsuite/gcc.target/riscv/flef-ieee.c
index f3e7e7d75d6c..b6ee6ed08a4d 100644
--- a/gcc/testsuite/gcc.target/riscv/flef-ieee.c
+++ b/gcc/testsuite/gcc.target/riscv/flef-ieee.c
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target hard_float } */
-/* { dg-options "-fno-finite-math-only -ftrapping-math -fno-signaling-nans" } 
*/
+/* { dg-options "-march=rv64gc -mabi=lp64d  -fno-finite-math-only 
-ftrapping-math -fno-signaling-nans" { target { rv64 } } } */
+/* { dg-options "-march=rv32gc -mabi=ilp32f -fno-finite-math-only 
-ftrapping-math -fno-signaling-nans" { target { rv32 } } } */
 
 long
 flef (float x, float y)
diff --git a/gcc/testsuite/gcc.target/riscv/flef-snan.c 
b/gcc/testsuite/gcc.target/riscv/flef-snan.c
index ef75b3523057..e8611e8c0215 100644
--- a/gcc/testsuite/gcc.target/riscv/flef-snan.c
+++ b/gcc/testsuite/gcc.target/riscv/flef-snan.c
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target hard_float } */
-/* { dg-options "-fno-finite-math-only -ftrapping-math -fsignaling-nans" } */
+/* { dg-options "-march=r

Re: [RFC] expr: don't clear SUBREG_PROMOTED_VAR_P flag for a promoted subreg [target/111466]

2023-10-17 Thread Vineet Gupta




On 10/16/23 21:07, Jeff Law wrote:



On 9/28/23 15:43, Vineet Gupta wrote:

RISC-V suffers from extraneous sign extensions, despite/given the ABI
guarantee that 32-bit quantities are sign-extended into 64-bit 
registers,

meaning incoming SI function args need not be explicitly sign extended
(so do SI return values as most ALU insns implicitly sign-extend too.)

[...]
---
  gcc/expr.cc   |  7 ---
  gcc/testsuite/gcc.target/riscv/pr111466.c | 15 +++
  2 files changed, 15 insertions(+), 7 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/riscv/pr111466.c
I created a ChangeLog and pushed this after a final 
bootstrap/comparison run on x86_64.


Awesome.

As I've noted before, this has been running across the various targets 
in my tester for quite a while with no issues. Additionally Robin and 
myself have dug into various paths through expr_expr_real_2 and we're 
reasonably confident this is safe (about as much as we can be given 
the lack of information about the original patch).


My strong suspicion is that Michael M. made this code obsolete when he 
last revamped the gimple/ssa->RTL expansion path.


Thanks for your patience Vineet.  It's been a long road.


All the thanks to you for verifying this across targets and deep analysis.
There was a little snafu on my part in the test for which I'll post a fixup.


Jivan is diving into Joern's work.  It shows significant promise, 
though we are seeing some very weird behavior on perlbench.


That's great to hear. I was away from sign extension work much of last 
week. Back on it now. The prev example I was chasing 
(gcc.c-torture/compile/20040401-1.c) turned out to be a dead end as it 
has explicit casts and such so not an ideal case. I'm sifting through 
the logs and looking for better tests, there's a ton of them so I'm sure 
there's bunch more we can do at expand time to eliminate extensions 
early which as you mentioned is better in general, to have to "undo" 
less in later passes.


Thx,
-Vineet


[PATCH] RISC-V/testsuite/pr111466.c: fix expected output to not detect SEXT.W

2023-10-17 Thread Vineet Gupta
gcc/testsuite/ChangeLog:
* gcc.target/riscv/pr111466.c: Change to scan-assembler-not
to not detect sext.w.

Signed-off-by: Vineet Gupta 
---
 gcc/testsuite/gcc.target/riscv/pr111466.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/riscv/pr111466.c 
b/gcc/testsuite/gcc.target/riscv/pr111466.c
index 007792466a51..01e20235f7fe 100644
--- a/gcc/testsuite/gcc.target/riscv/pr111466.c
+++ b/gcc/testsuite/gcc.target/riscv/pr111466.c
@@ -12,4 +12,4 @@ int foo2(int unused, int n, unsigned y, unsigned delta){
   return s;
 }
 
-/* { dg-final { scan-assembler "\msext\M" } } */
+/* { dg-final { scan-assembler-not "\msext\M" } } */
-- 
2.34.1



[PATCH v2] RISC-V/testsuite/pr111466.c: update test and expected output

2023-10-17 Thread Vineet Gupta
Update the test to potentially generate two SEXT.W instructions: one for
incoming function arg, other for function return.

But after commit 8eb9cdd14218
("expr: don't clear SUBREG_PROMOTED_VAR_P flag for a promoted subreg")
the test is not supposed to generate either of them so fix the expected
assembler output which was errorneously introduced by commit above.

gcc/testsuite/ChangeLog:
* gcc.target/riscv/pr111466.c (foo2): Change return to unsigned
int as that will potentially generate two SEXT.W instructions.
dg-final: Change to scan-assembler-not SEXT.W.

Signed-off-by: Vineet Gupta 
---
Changes since v1:
  - Changed function return to be unsigned int
---
 gcc/testsuite/gcc.target/riscv/pr111466.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/pr111466.c 
b/gcc/testsuite/gcc.target/riscv/pr111466.c
index 007792466a51..3348d593813d 100644
--- a/gcc/testsuite/gcc.target/riscv/pr111466.c
+++ b/gcc/testsuite/gcc.target/riscv/pr111466.c
@@ -4,7 +4,7 @@
 /* { dg-options "-march=rv64gc_zba_zbs -mabi=lp64" } */
 /* { dg-skip-if "" { *-*-* } { "-O0" } } */
 
-int foo2(int unused, int n, unsigned y, unsigned delta){
+unsigned int foo2(int unused, int n, unsigned y, unsigned delta){
   int s = 0;
   unsigned int x = 0;
   for (;x

[COMMITTED] RISC-V/testsuite/pr111466.c: update test and expected output

2023-10-17 Thread Vineet Gupta
Update the test to potentially generate two SEXT.W instructions: one for
incoming function arg, other for function return.

But after commit 8eb9cdd14218
("expr: don't clear SUBREG_PROMOTED_VAR_P flag for a promoted subreg")
the test is not supposed to generate either of them so fix the expected
assembler output which was errorneously introduced by commit above.

gcc/testsuite/ChangeLog:
* gcc.target/riscv/pr111466.c (foo2): Change return to unsigned
int as that will potentially generate two SEXT.W instructions.
dg-final: Change to scan-assembler-not SEXT.W.

Signed-off-by: Vineet Gupta 
---
 gcc/testsuite/gcc.target/riscv/pr111466.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/pr111466.c 
b/gcc/testsuite/gcc.target/riscv/pr111466.c
index 007792466a51..3348d593813d 100644
--- a/gcc/testsuite/gcc.target/riscv/pr111466.c
+++ b/gcc/testsuite/gcc.target/riscv/pr111466.c
@@ -4,7 +4,7 @@
 /* { dg-options "-march=rv64gc_zba_zbs -mabi=lp64" } */
 /* { dg-skip-if "" { *-*-* } { "-O0" } } */
 
-int foo2(int unused, int n, unsigned y, unsigned delta){
+unsigned int foo2(int unused, int n, unsigned y, unsigned delta){
   int s = 0;
   unsigned int x = 0;
   for (;x

Re: [PATCH] RISC-V: Add missing torture-init and torture-finish for rvv.exp

2023-05-26 Thread Vineet Gupta




On 5/25/23 13:26, Thomas Schwinge wrote:


I'm pasting a snippet of gcc.log. Issue is indeed triggered by rvv.exp
which needs some love.

I'd intentionally asked to "see a complete 'gcc.log' file where the
ERRORs are visible".


The full log files are humongous - even xz compressed is ~ 7 MB - how 
can I share that w/o the list dropping it.

I guess I can try emailing it you directly on work email - if that's OK.


The torture-{init,finish} needs to be in riscv.exp not rvv.exp
Running full tests now.

I still don't understand this.

My current theory would be that some other '*.exp' file runs
'torture-init' and then prematurely ends without 'torture-finish', and
thus the torture testing state bleeds into the next '*.exp' file(s).  I'd
hoped that I could pinpoint that via "a complete 'gcc.log' file where the
ERRORs are visible".


Seems likely. So back to good old printf style debugging: I added 
dumping of the dup options to see what exactly was leaking.


setup #1
 - riscv.exp: Added torture-init/finish
 - Deleted rvv.exp (to isolate the problem)

Leaking toggles are from torture-options.exp where the debug prints 
themselves are added.

A trimmed version of gcc.log is at [1]


--->8-
Running 
/scratch/vineetg/gnu/toolchain-upstream/gcc/gcc/testsuite/gcc.target/pru/pru.exp 
...
testcase 
/scratch/vineetg/gnu/toolchain-upstream/gcc/gcc/testsuite/gcc.target/pru/pru.exp 
completed in 0 seconds
Running 
/scratch/vineetg/gnu/toolchain-upstream/gcc/gcc/testsuite/gcc.target/riscv/riscv.exp 
...
ERROR: tcl error sourcing 
/scratch/vineetg/gnu/toolchain-upstream/gcc/gcc/testsuite/gcc.target/riscv/riscv.exp.

ERROR: tcl error code NONE
ERROR: torture-init: LTO_TORTURE_OPTIONS is not empty as expected = "{ 
-O2 -flto -fno-use-linker-plugin -flto-partition=none } { -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects }"

    while executing
"error "torture-init: LTO_TORTURE_OPTIONS is not empty as expected =  
\"${LTO_TORTURE_OPTIONS}\"""

    invoked from within
"if [info exists LTO_TORTURE_OPTIONS] {
    error "torture-init: LTO_TORTURE_OPTIONS is not empty as expected 
=  \"${LTO_TORTURE_OPTIONS}\""

    }"
    (procedure "torture-init" line 12)
    invoked from within
"torture-init"
    (file 
"/scratch/vineetg/gnu/toolchain-upstream/gcc/gcc/testsuite/gcc.target/riscv/riscv.exp" 
line 44)

    invoked from within
"source 
/scratch/vineetg/gnu/toolchain-upstream/gcc/gcc/testsuite/gcc.target/riscv/riscv.exp"

    ("uplevel" body line 1)
    invoked from within
"uplevel #0 source 
/scratch/vineetg/gnu/toolchain-upstream/gcc/gcc/testsuite/gcc.target/riscv/riscv.exp"

    invoked from within
"catch "uplevel #0 source $test_file_name" msg"
UNRESOLVED: testcase 
'/scratch/vineetg/gnu/toolchain-upstream/gcc/gcc/testsuite/gcc.target/riscv/riscv.exp' 
aborted due to Tcl error
testcase 
/scratch/vineetg/gnu/toolchain-upstream/gcc/gcc/testsuite/gcc.target/riscv/riscv.exp 
completed in 0 seconds

--->8---

Setup #2
 - riscv.exp: Added torture-init/finish
 - riscv.exp: commented away ADDITIONAL_TORTURE_OPTIONS line
 - rvv.exp remains, unchanged

This has more errors since I'm actually running all multilib variants.

--->8---
testcase 
/scratch/vineetg/gnu/toolchain-upstream2/gcc/gcc/testsuite/gcc.target/riscv/riscv.exp 
completed in 0 seconds
Running 
/scratch/vineetg/gnu/toolchain-upstream2/gcc/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp 
...
ERROR: tcl error sourcing 
/scratch/vineetg/gnu/toolchain-upstream2/gcc/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp.

ERROR: tcl error code NONE
ERROR: torture-init: LTO_TORTURE_OPTIONS is not empty as expected = "{ 
-O2 -flto -fno-use-linker-plugin -flto-partition=none } { -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects }"

    while executing
"error "torture-init: LTO_TORTURE_OPTIONS is not empty as expected =  
\"${LTO_TORTURE_OPTIONS}\"""

    invoked from within
"if [info exists LTO_TORTURE_OPTIONS] {
    error "torture-init: LTO_TORTURE_OPTIONS is not empty as expected 
=  \"${LTO_TORTURE_OPTIONS}\""

    }"
    (procedure "torture-init" line 12)
    invoked from within
"torture-init"
    (file 
"/scratch/vineetg/gnu/toolchain-upstream2/gcc/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp" 
line 42)

    invoked from within
"source 
/scratch/vineetg/gnu/toolchain-upstream2/gcc/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp"

    ("uplevel" body line 1)
    invoked from within
"uplevel #0 source 
/scratch/vineetg/gnu/toolchain-upstream2/gcc/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp"

    invoked from within
"catch "uplevel #0 source $test_file_name" msg"
UNRESOLVED: testcase 
'/scratch/vineetg/gnu/toolchain-upstream2/gcc/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp' 
aborted due to Tcl error



...

Running 
/scratch/vineetg/gnu/toolchain-upstream2/gcc/gcc/testsuite/gcc.c-torture/execute/builtins/builtins.exp 
...
ERROR: tcl error sourcing 
/scratch/vineetg/gnu/toolchain-upstream2/gcc/gcc/testsuite/gcc.c-torture/execute/builtins/builtins.exp.

ERROR: tcl error code NONE
ERROR: torture

Re: [PATCH] RISC-V: Add missing torture-init and torture-finish for rvv.exp

2023-05-30 Thread Vineet Gupta

On 5/26/23 16:38, Vineet Gupta wrote:



On 5/25/23 13:26, Thomas Schwinge wrote:


I'm pasting a snippet of gcc.log. Issue is indeed triggered by rvv.exp
which needs some love.

I'd intentionally asked to "see a complete 'gcc.log' file where the
ERRORs are visible".


The full log files are humongous - even xz compressed is ~ 7 MB - how 
can I share that w/o the list dropping it.

I guess I can try emailing it you directly on work email - if that's OK.


The torture-{init,finish} needs to be in riscv.exp not rvv.exp
Running full tests now.

I still don't understand this.

My current theory would be that some other '*.exp' file runs
'torture-init' and then prematurely ends without 'torture-finish', and
thus the torture testing state bleeds into the next '*.exp' file(s).  
I'd
hoped that I could pinpoint that via "a complete 'gcc.log' file where 
the

ERRORs are visible".


Seems likely. So back to good old printf style debugging: I added 
dumping of the dup options to see what exactly was leaking.


setup #1
 - riscv.exp: Added torture-init/finish
 - Deleted rvv.exp (to isolate the problem)

...

Setup #2
 - riscv.exp: Added torture-init/finish
 - riscv.exp: commented away ADDITIONAL_TORTURE_OPTIONS line
 - rvv.exp remains, unchanged

...


In the 3rd setup, I've removed riscv.exp and rvv.exp and running the 
testsuite: errors still show.


So we are iterating over multilib combinations.
Things are fine for the first one. The initial flags comprise of 
DG_TORTURE_OPTIONS from gcc-dg.exp (-O0, -O1 )
However when the 2nd multilib runs, it seems the old ones are not 
getting cleared, hence the splat.


--->8---

    === gcc tests ===

Schedule of variations:
    riscv-sim/-march=rv32imac/-mabi=ilp32/-mcmodel=medlow
    riscv-sim/-march=rv32imafdc/-mabi=ilp32d/-mcmodel=medlow
    riscv-sim/-march=rv64imac/-mabi=lp64/-mcmodel=medlow
    riscv-sim/-march=rv64imafdc/-mabi=lp64d/-mcmodel=medlow

Running target riscv-sim/-march=rv32imac/-mabi=ilp32/-mcmodel=medlow
Using 
/scratch/vineetg/gnu/INSTALL/tc-up-230524-273895500425/share/dejagnu/baseboards/riscv-sim.exp 
as board description file for target.
Using 
/scratch/vineetg/gnu/INSTALL/tc-up-230524-273895500425/share/dejagnu/config/sim.exp 
as generic interface file for target.
Using 
/scratch/vineetg/gnu/INSTALL/tc-up-230524-273895500425/share/dejagnu/baseboards/basic-sim.exp 
as board description file for target.
Using 
/scratch/vineetg/gnu/toolchain-upstream-pristine/gcc/gcc/testsuite/config/default.exp 
as tool-and-target-specific interface file.
Running 
/scratch/vineetg/gnu/toolchain-upstream-pristine/gcc/gcc/testsuite/gcc.c-torture/compile/compile.exp 
...
Running 
/scratch/vineetg/gnu/toolchain-upstream-pristine/gcc/gcc/testsuite/gcc.c-torture/execute/builtins/builtins.exp 
...
Running 
/scratch/vineetg/gnu/toolchain-upstream-pristine/gcc/gcc/testsuite/gcc.c-torture/execute/execute.exp 
...
Running 
/scratch/vineetg/gnu/toolchain-upstream-pristine/gcc/gcc/testsuite/gcc.c-torture/execute/ieee/ieee.exp 
...
Running 
/scratch/vineetg/gnu/toolchain-upstream-pristine/gcc/gcc/testsuite/gcc.c-torture/unsorted/unsorted.exp 
...
Running 
/scratch/vineetg/gnu/toolchain-upstream-pristine/gcc/gcc/testsuite/gcc.dg/analyzer/analyzer.exp 
...
Running 
/scratch/vineetg/gnu/toolchain-upstream-pristine/gcc/gcc/testsuite/gcc.dg/analyzer/torture/analyzer-torture.exp 
...
Running 
/scratch/vineetg/gnu/toolchain-upstream-pristine/gcc/gcc/testsuite/gcc.dg/asan/asan.exp 
...
Running 
/scratch/vineetg/gnu/toolchain-upstream-pristine/gcc/gcc/testsuite/gcc.dg/atomic/atomic.exp 
...
Running 
/scratch/vineetg/gnu/toolchain-upstream-pristine/gcc/gcc/testsuite/gcc.dg/autopar/autopar.exp 
...
Running 
/scratch/vineetg/gnu/toolchain-upstream-pristine/gcc/gcc/testsuite/gcc.dg/charset/charset.exp 
...
Running 
/scratch/vineetg/gnu/toolchain-upstream-pristine/gcc/gcc/testsuite/gcc.dg/compat/compat.exp 
...
Running 
/scratch/vineetg/gnu/toolchain-upstream-pristine/gcc/gcc/testsuite/gcc.dg/compat/struct-layout-1.exp 
...
Running 
/scratch/vineetg/gnu/toolchain-upstream-pristine/gcc/gcc/testsuite/gcc.dg/cpp/cpp.exp 
...
Running 
/scratch/vineetg/gnu/toolchain-upstream-pristine/gcc/gcc/testsuite/gcc.dg/cpp/trad/trad.exp 
...
Running 
/scratch/vineetg/gnu/toolchain-upstream-pristine/gcc/gcc/testsuite/gcc.dg/debug/btf/btf.exp 
...
Running 
/scratch/vineetg/gnu/toolchain-upstream-pristine/gcc/gcc/testsuite/gcc.dg/debug/ctf/ctf.exp 
...
Running 
/scratch/vineetg/gnu/toolchain-upstream-pristine/gcc/gcc/testsuite/gcc.dg/debug/debug.exp 
...
Running 
/scratch/vineetg/gnu/toolchain-upstream-pristine/gcc/gcc/testsuite/gcc.dg/debug/dwarf2/dwarf2.exp 
...
Running 
/scratch/vineetg/gnu/toolchain-upstream-pristine/gcc/gcc/testsuite/gcc.dg/dfp/dfp.exp 
...
Running 
/scratch/vineetg/gnu/toolchain-upstream-pristine/gcc/gcc/testsuite/gcc.dg/dg.exp 
...
XPASS: gcc.dg/attr-alloc_size-11.c 

[PATCH 0/3] Unbork testsuite for multlib setups

2023-05-31 Thread Vineet Gupta
Hi,

This fixes the current broken multilib testing on trunk.
1/3 is th actual fix
2/3 is preventive and
3/3 is debug aid in case this bites someone else in future.

Thx,
-Vineet

Kito Cheng (1):
  RISC-V: Add missing torture-init and torture-finish for rvv.exp

Vineet Gupta (2):
  testsuite: Unbork multilib testing on RISC-V (and any target really)
  testsuite: print any leaking torture options for debugging

 gcc/testsuite/gcc.misc-tests/i386-prefetch.exp | 14 +++---
 gcc/testsuite/gcc.target/riscv/rvv/rvv.exp |  3 +++
 gcc/testsuite/lib/torture-options.exp  |  6 +++---
 3 files changed, 13 insertions(+), 10 deletions(-)

-- 
2.34.1



[PATCH 1/3] testsuite: Unbork multilib testing on RISC-V (and any target really)

2023-05-31 Thread Vineet Gupta
Multilib testing on trunk is currently busted (and surprisingly this
affects any/all targets but it seems nobody cares). We currently get the
following splat:

| ERROR: tcl error code NONE
| ERROR: torture-init: torture_without_loops is not empty as expected

And this takes down pretty much all of testsuite.

|   = Summary of gcc testsuite =
|| # of unexpected case / # of unique unexpected 
case
||  gcc |  g++ | gfortran |
| rv64imafdc/  lp64d/ medlow | 5421 / 4 |1 / 1 |   72 /12 |
| rv32imafdc/ ilp32d/ medlow | 5422 / 5 |3 / 2 |   72 /12 |
|   rv32imac/  ilp32/ medlow |  391 / 5 |3 / 2 |  109 /19 |
|   rv64imac/   lp64/ medlow | 5422 / 5 |1 / 1 |  109 /19 |

There have been recent improvements in test harness around pairing of
torture-{init,finish} and checking for leaking torture options. This
however triggers a latent bug introduced way back in 2009: commit 3dd1415dc88
"i386-prefetch.exp: Skip tests when multilib flags contain -march" which
missed a pairing torture-finish. It was benign so far but in the new
regime it causes extra state "torture-init-done" confusing the 2nd round of
tests (in multilib).

This fix moves the early exit outside of torture-{init,finish} bracket
and brings RISC-V testing back to sanity.

| rv64imafdc/  lp64d/ medlow |3 / 2 |1 / 1 |   72 /12 |
| rv32imafdc/ ilp32d/ medlow |4 / 3 |3 / 2 |   72 /12 |
|   rv32imac/  ilp32/ medlow |3 / 2 |3 / 2 |  109 /19 |
|   rv64imac/   lp64/ medlow |5 / 4 |1 / 1 |  109 /19 |

gcc/testsuite:
* gcc.misc-tests/i386-prefetch.exp: Move early return outside
  the torture-{init,finish}

Signed-off-by: Vineet Gupta 
---
 gcc/testsuite/gcc.misc-tests/i386-prefetch.exp | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/gcc/testsuite/gcc.misc-tests/i386-prefetch.exp 
b/gcc/testsuite/gcc.misc-tests/i386-prefetch.exp
index ad9e56a54bcf..7101b1e30576 100644
--- a/gcc/testsuite/gcc.misc-tests/i386-prefetch.exp
+++ b/gcc/testsuite/gcc.misc-tests/i386-prefetch.exp
@@ -82,6 +82,13 @@ if $tracelevel then {
 strace $tracelevel
 }
 
+if { [board_info target exists multilib_flags]
+ && [string match "* -march=*" " [board_info target multilib_flags] "] } {
+# Multilib flags come after the -march flags we pass and override
+# them, so skip these tests when such flags are passed.
+return
+}
+
 # Load support procs.
 load_lib gcc-dg.exp
 load_lib torture-options.exp
@@ -90,13 +97,6 @@ load_lib torture-options.exp
 dg-init
 torture-init
 
-if { [board_info target exists multilib_flags]
- && [string match "* -march=*" " [board_info target multilib_flags] "] } {
-# Multilib flags come after the -march flags we pass and override
-# them, so skip these tests when such flags are passed.
-return
-}
-
 set-torture-options $PREFETCH_NONE
 gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/i386-pf-none-*.c]] "" 
""
 
-- 
2.34.1



[PATCH 2/3] RISC-V: Add missing torture-init and torture-finish for rvv.exp

2023-05-31 Thread Vineet Gupta
From: Kito Cheng 

This is in line with recent test harness expectations and is a
preventive change as it doesn't actually fix any errors.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/rvv.exp: Add torture-init and
torture-finish.

Signed-off-by: Vineet Gupta 
---
 gcc/testsuite/gcc.target/riscv/rvv/rvv.exp | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp 
b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
index 5e69235a268c..7ab7456d1d15 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
+++ b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
@@ -39,6 +39,7 @@ if [istarget riscv32-*-*] then {
 
 # Initialize `dg'.
 dg-init
+torture-init
 
 # Main loop.
 set CFLAGS "$DEFAULT_CFLAGS -march=$gcc_march -mabi=$gcc_mabi -O3"
@@ -90,5 +91,7 @@ foreach op $AUTOVEC_TEST_OPTS {
 dg-runtest [lsort [glob -nocomplain 
$srcdir/$subdir/autovec/vls-vlmax/*.\[cS\]]] \
"-std=c99 -O3 -ftree-vectorize --param 
riscv-autovec-preference=fixed-vlmax" $CFLAGS
 
+torture-finish
+
 # All done.
 dg-finish
-- 
2.34.1



[PATCH 3/3] testsuite: print any leaking torture options for debugging

2023-05-31 Thread Vineet Gupta
This was helpful when debugging the recent multilib testsuite failure.

gcc/testsuite:
* lib/torture-options.exp: print the value of non-empty options:
torture_without_loops, torture_with_loops, LTO_TORTURE_OPTIONS.

Signed-off-by: Vineet Gupta 
---
 gcc/testsuite/lib/torture-options.exp | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/lib/torture-options.exp 
b/gcc/testsuite/lib/torture-options.exp
index d00d07e9378d..dfb536d1d96c 100644
--- a/gcc/testsuite/lib/torture-options.exp
+++ b/gcc/testsuite/lib/torture-options.exp
@@ -23,15 +23,15 @@ proc torture-init { args } {
 global torture_without_loops global_with_loops
 
 if [info exists torture_without_loops] {
-   error "torture-init: torture_without_loops is not empty as expected"
+   error "torture-init: torture_without_loops is not empty as expected = 
\"${torture_without_loops}\""
 }
 if [info exists torture_with_loops] {
-   error "torture-init: torture_with_loops is not empty as expected"
+   error "torture-init: torture_with_loops is not empty as expected = 
\"${torture_with_loops}\""
 }
 
 global LTO_TORTURE_OPTIONS
 if [info exists LTO_TORTURE_OPTIONS] {
-   error "torture-init: LTO_TORTURE_OPTIONS is not empty as expected"
+   error "torture-init: LTO_TORTURE_OPTIONS is not empty as expected =  
\"${LTO_TORTURE_OPTIONS}\""
 }
 set LTO_TORTURE_OPTIONS ""
 if [check_effective_target_lto] {
-- 
2.34.1



Re: [PATCH] RISC-V: Add missing torture-init and torture-finish for rvv.exp

2023-05-31 Thread Vineet Gupta



On 5/30/23 11:43, Vineet Gupta wrote:

On 5/26/23 16:38, Vineet Gupta wrote:



On 5/25/23 13:26, Thomas Schwinge wrote:


I'm pasting a snippet of gcc.log. Issue is indeed triggered by rvv.exp
which needs some love.

I'd intentionally asked to "see a complete 'gcc.log' file where the
ERRORs are visible".


The full log files are humongous - even xz compressed is ~ 7 MB - how 
can I share that w/o the list dropping it.

I guess I can try emailing it you directly on work email - if that's OK.


The torture-{init,finish} needs to be in riscv.exp not rvv.exp
Running full tests now.

I still don't understand this.

My current theory would be that some other '*.exp' file runs
'torture-init' and then prematurely ends without 'torture-finish', and
thus the torture testing state bleeds into the next '*.exp' 
file(s).  I'd
hoped that I could pinpoint that via "a complete 'gcc.log' file 
where the

ERRORs are visible".


Seems likely. So back to good old printf style debugging: I added 
dumping of the dup options to see what exactly was leaking.


setup #1
 - riscv.exp: Added torture-init/finish
 - Deleted rvv.exp (to isolate the problem)

...

Setup #2
 - riscv.exp: Added torture-init/finish
 - riscv.exp: commented away ADDITIONAL_TORTURE_OPTIONS line
 - rvv.exp remains, unchanged

...


In the 3rd setup, I've removed riscv.exp and rvv.exp and running the 
testsuite: errors still show.


So we are iterating over multilib combinations.
Things are fine for the first one. The initial flags comprise of 
DG_TORTURE_OPTIONS from gcc-dg.exp (-O0, -O1 )
However when the 2nd multilib runs, it seems the old ones are not 
getting cleared, hence the splat. 


I've posted the fix [1] . printf/send_user() to the rescue !

@Thomas I fat fingered the send and missed CC'ing you on the patches.

Thx,
-Vineet

[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620263.html


Re: [PATCH 1/3] testsuite: Unbork multilib testing on RISC-V (and any target really)

2023-05-31 Thread Vineet Gupta




On 5/31/23 10:57, Jeff Law wrote:



On 5/31/23 10:25, Vineet Gupta wrote:

Multilib testing on trunk is currently busted (and surprisingly this
affects any/all targets but it seems nobody cares). We currently get the
following splat:
I wouldn't say that nobody cares, it just hasn't bubbled up on 
anyone's priority list yet (most developers aren't working on targets 
that make heavy use of multilibs).


Pardon my theatrics :-)

But probably more importantly, this problem seems to not be triggering 
on all multilib targets.  For example, I just examined my tester's 
build logs and couldn't see this on the H8/300 or V850 ports.  Which 
begs the question, why?


Are just in case, this is not running a subset using some stray 
RUNTESTFLAGS.


Yes I'm curious to see why others are not seeing it. Could you rerun 
upstream with following debug (and avoid -j when running the testsuite 
just to serialize the logs - the problem does happen for -j runs too 
though). Then in the logs we could see if init/finish get out of sync 
around the culprit file (or my case at least)



--->
diff --git a/gcc/testsuite/lib/torture-options.exp 
b/gcc/testsuite/lib/torture-options.exp

index dfb536d1d96c..95a6f818fded 100644
--- a/gcc/testsuite/lib/torture-options.exp
+++ b/gcc/testsuite/lib/torture-options.exp
@@ -22,6 +22,8 @@
 proc torture-init { args } {
 global torture_without_loops global_with_loops

+    send_user "\n\n===  torture-init\n"
+
 if [info exists torture_without_loops] {
    error "torture-init: torture_without_loops is not empty as 
expected = \"${torture_without_loops}\""

 }
@@ -116,6 +118,8 @@ proc set-torture-options { args } {
 proc torture-finish { args } {
 global torture_without_loops torture_with_loops

+    send_user "\n\n===  torture-finish\n"
+
 if [info exists torture_without_loops] {
    unset torture_without_loops
 } else {

--->8---

FWIW I'd like to be able to test stuff cross-arch too (at least x86, 
aarch64 and a few others).


Thx,
-Vineet


Re: [PATCH 1/3] testsuite: Unbork multilib testing on RISC-V (and any target really)

2023-05-31 Thread Vineet Gupta



On 5/31/23 11:13, Iain Sandoe wrote:

I do have a multilib problem [with libgomp] on Darwin (which has been noticed 
:https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109951) but it is not obvious how 
the fix proposed would solve this - unless it’s some subtle change in global 
content for the multilib options.


For the issue I was seeing, the actual multilib content didn't matter. 
Its just that the test needed to be run for more than 1 rounds so that 
the imbalanced torture-init from 1st multlib/round created the state 
which triggered errors for 2nd round...



Schedule of variations:
    riscv-sim/-march=rv32imac/-mabi=ilp32/-mcmodel=medlow
    riscv-sim/-march=rv32imafdc/-mabi=ilp32d/-mcmodel=medlow
    riscv-sim/-march=rv64imac/-mabi=lp64/-mcmodel=medlow
    riscv-sim/-march=rv64imafdc/-mabi=lp64d/-mcmodel=medlow

Running target riscv-sim/-march=rv32imac/-mabi=ilp32/-mcmodel=medlow
Using 
/scratch/vineetg/gnu/INSTALL/tc-up-230524-273895500425/share/dejagnu/baseboards/riscv-sim.exp 
as board description file for target.
Using 
/scratch/vineetg/gnu/INSTALL/tc-up-230524-273895500425/share/dejagnu/config/sim.exp 
as generic interface file for target.
Using 
/scratch/vineetg/gnu/INSTALL/tc-up-230524-273895500425/share/dejagnu/baseboards/basic-sim.exp 
as board description file for target.
Using 
/scratch/vineetg/gnu/toolchain-upstream-pristine/gcc/gcc/testsuite/config/default.exp 
as tool-and-target-specific interface file.
Running 
/scratch/vineetg/gnu/toolchain-upstream-pristine/gcc/gcc/testsuite/gcc.c-torture/compile/compile.exp 
...

===  torture-init
===  torture-finish

Running 
/scratch/vineetg/gnu/toolchain-upstream-pristine/gcc/gcc/testsuite/gcc.c-torture/execute/builtins/builtins.exp 
...

===  torture-init
===  torture-finish

Running 
/scratch/vineetg/gnu/toolchain-upstream-pristine/gcc/gcc/testsuite/gcc.c-torture/execute/execute.exp 
...

===  torture-init
===  torture-finish

Running 
/scratch/vineetg/gnu/toolchain-upstream-pristine/gcc/gcc/testsuite/gcc.misc-tests/i386-prefetch.exp 
...

===  torture-init
    ^^
Running 
/scratch/vineetg/gnu/toolchain-upstream-pristine/gcc/gcc/testsuite/gcc.misc-tests/linkage.exp 
...
Running 
/scratch/vineetg/gnu/toolchain-upstream-pristine/gcc/gcc/testsuite/gcc.misc-tests/matrix1.exp 
...

...
Running 
/scratch/vineetg/gnu/toolchain-upstream-pristine/gcc/gcc/testsuite/gcc.target/xtensa/xtensa.exp 
...
Running 
/scratch/vineetg/gnu/toolchain-upstream-pristine/gcc/gcc/testsuite/gcc.test-framework/test-framework.exp 
...

skipping test framework tests, CHECK_TEST_FRAMEWORK is not defined

    === gcc Summary for 
riscv-sim/-march=rv32imac/-mabi=ilp32/-mcmodel=medlow ===


# of expected passes        136964
# of unexpected failures    4
# of unexpected successes    3
# of expected failures        1072
# of unsupported tests        3052

Running target riscv-sim/-march=rv32imafdc/-mabi=ilp32d/-mcmodel=medlow 
<--- 2nd round
Using 
/scratch/vineetg/gnu/INSTALL/tc-up-230524-273895500425/share/dejagnu/baseboards/riscv-sim.exp 
as board description file for target.
Using 
/scratch/vineetg/gnu/INSTALL/tc-up-230524-273895500425/share/dejagnu/config/sim.exp 
as generic interface file for target.
Using 
/scratch/vineetg/gnu/INSTALL/tc-up-230524-273895500425/share/dejagnu/baseboards/basic-sim.exp 
as board description file for target.
Using 
/scratch/vineetg/gnu/toolchain-upstream-pristine/gcc/gcc/testsuite/config/default.exp 
as tool-and-target-specific interface file.
Running 
/scratch/vineetg/gnu/toolchain-upstream-pristine/gcc/gcc/testsuite/gcc.c-torture/compile/compile.exp 
...


--- gcc-dg-runtest : NOT calling torture-init
--- gcc-dg-runtest : NOT calling torture-finish

Running 
/scratch/vineetg/gnu/toolchain-upstream-pristine/gcc/gcc/testsuite/gcc.c-torture/execute/builtins/builtins.exp 
...


===  torture-init
ERROR: tcl error sourcing 
/scratch/vineetg/gnu/toolchain-upstream-pristine/gcc/gcc/testsuite/gcc.c-torture/execute/builtins/builtins.exp.

ERROR: tcl error code NONE
ERROR: torture-init: torture_without_loops is not empty as expected = "{ 
-O0 } { -O1 } { -O2 } { -O3 -g } { -Os } { -O2 -flto 
-fno-use-linker-plugin -flto-partition=none } { -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects }"

    while executing




Re: [PATCH 2/3] RISC-V: Add missing torture-init and torture-finish for rvv.exp

2023-06-01 Thread Vineet Gupta

On 6/1/23 07:54, Jeff Law wrote:



On 5/31/23 10:25, Vineet Gupta wrote:

From: Kito Cheng 

This is in line with recent test harness expectations and is a
preventive change as it doesn't actually fix any errors.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/rvv.exp: Add torture-init and
torture-finish.
Thomas's recommendation was to drop this as it doesn't change any 
observed behavior.  Do you agree with that recommendation?


Yep dropped now.

Thx,
-Vineet


[Committed] testsuite: Unbork multilib setups using -march flags (RISC-V)

2023-06-01 Thread Vineet Gupta
RISC-V multilib testing is currently busted with follow splat all over:

|Schedule of variations:
|riscv-sim/-march=rv64imafdc/-mabi=lp64d/-mcmodel=medlow
|riscv-sim/-march=rv32imafdc/-mabi=ilp32d/-mcmodel=medlow
|riscv-sim/-march=rv32imac/-mabi=ilp32/-mcmodel=medlow
|riscv-sim/-march=rv64imac/-mabi=lp64/-mcmodel=medlow
...
...
| ERROR: tcl error code NONE
| ERROR: torture-init: torture_without_loops is not empty as expected

causing insane amount of false failures.

|   = Summary of gcc testsuite =
|| # of unexpected case / # of unique unexpected 
case
||  gcc |  g++ | gfortran |
| rv64imafdc/  lp64d/ medlow | 5421 / 4 |1 / 1 |6 / 1 |
| rv32imafdc/ ilp32d/ medlow | 5422 / 5 |3 / 2 |6 / 1 |
|   rv32imac/  ilp32/ medlow |  391 / 5 |3 / 2 |   43 / 8 |
|   rv64imac/   lp64/ medlow | 5422 / 5 |1 / 1 |   43 / 8 |

The error splat itself is from recent test harness improvements for stricter
checks for torture-{init,finish} pairing. But the real issue is a latent bug
from 2009: commit 3dd1415dc88, ("i386-prefetch.exp: Skip tests when multilib
flags contain -march") which added an "early exit" condition to 
i386-prefetch.exp
which could potentially cause an unpaired torture-{init,finish}.

The early exit only happens in a multlib setup using -march in flags
which is what RISC-V happens to use, hence the reason this was only seen
on RISC-V multilib testing.

Moving the early exit outside of torture-{init,finish} bracket
reinstates RISC-V testing.

| rv64imafdc/  lp64d/ medlow |3 / 2 |1 / 1 |6 / 1 |
| rv32imafdc/ ilp32d/ medlow |4 / 3 |3 / 2 |6 / 1 |
|   rv32imac/  ilp32/ medlow |3 / 2 |3 / 2 |   43 / 8 |
|   rv64imac/   lp64/ medlow |5 / 4 |1 / 1 |   43 / 8 |

gcc/testsuite:
* gcc.misc-tests/i386-prefetch.exp: Move early return outside
the torture-{init,finish}

Signed-off-by: Vineet Gupta 
---
 gcc/testsuite/gcc.misc-tests/i386-prefetch.exp | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/gcc/testsuite/gcc.misc-tests/i386-prefetch.exp 
b/gcc/testsuite/gcc.misc-tests/i386-prefetch.exp
index ad9e56a54bcf..98aab506cba0 100644
--- a/gcc/testsuite/gcc.misc-tests/i386-prefetch.exp
+++ b/gcc/testsuite/gcc.misc-tests/i386-prefetch.exp
@@ -14,6 +14,13 @@
 # along with GCC; see the file COPYING3.  If not see
 # <http://www.gnu.org/licenses/>.
 
+if { [board_info target exists multilib_flags]
+ && [string match "* -march=*" " [board_info target multilib_flags] "] } {
+# Multilib flags come after the -march flags we pass and override
+# them, so skip these tests when such flags are passed.
+return
+}
+
 # Test that the correct data prefetch instructions (SSE or 3DNow! variant,
 # or none) are used for various i386 cpu-type and instruction set
 # extension options for __builtin_prefetch.  When using -mtune, specify
@@ -90,13 +97,6 @@ load_lib torture-options.exp
 dg-init
 torture-init
 
-if { [board_info target exists multilib_flags]
- && [string match "* -march=*" " [board_info target multilib_flags] "] } {
-# Multilib flags come after the -march flags we pass and override
-# them, so skip these tests when such flags are passed.
-return
-}
-
 set-torture-options $PREFETCH_NONE
 gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/i386-pf-none-*.c]] "" 
""
 
-- 
2.34.1



[Committed] testsuite: print any leaking torture options for debugging

2023-06-01 Thread Vineet Gupta
This was helpful when debugging the recent multilib testsuite failure.

gcc/testsuite:
* lib/torture-options.exp: print the value of non-empty options:
torture_without_loops, torture_with_loops, LTO_TORTURE_OPTIONS.

Signed-off-by: Vineet Gupta 
---
 gcc/testsuite/lib/torture-options.exp | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/lib/torture-options.exp 
b/gcc/testsuite/lib/torture-options.exp
index d00d07e9378d..dfb536d1d96c 100644
--- a/gcc/testsuite/lib/torture-options.exp
+++ b/gcc/testsuite/lib/torture-options.exp
@@ -23,15 +23,15 @@ proc torture-init { args } {
 global torture_without_loops global_with_loops
 
 if [info exists torture_without_loops] {
-   error "torture-init: torture_without_loops is not empty as expected"
+   error "torture-init: torture_without_loops is not empty as expected = 
\"${torture_without_loops}\""
 }
 if [info exists torture_with_loops] {
-   error "torture-init: torture_with_loops is not empty as expected"
+   error "torture-init: torture_with_loops is not empty as expected = 
\"${torture_with_loops}\""
 }
 
 global LTO_TORTURE_OPTIONS
 if [info exists LTO_TORTURE_OPTIONS] {
-   error "torture-init: LTO_TORTURE_OPTIONS is not empty as expected"
+   error "torture-init: LTO_TORTURE_OPTIONS is not empty as expected =  
\"${LTO_TORTURE_OPTIONS}\""
 }
 set LTO_TORTURE_OPTIONS ""
 if [check_effective_target_lto] {
-- 
2.34.1



Followup on PR/109279: large constants on RISCV

2023-06-01 Thread Vineet Gupta

Hi Jeff,

I finally got around to collecting various observations on PR/109279 - 
more importantly the state of large constants in RV backend, apologies 
in advance for the long email.


It seems the various commits in area have improved the original test 
case of 0x1010101_01010101


  Before 2e886eef7f2b  |   With 2e886eef7f2b   | With 
0530254413f8 | With c104ef4b5eb1
    (const pool)   | define_insn_and_split | "splitter relaxed 
new |

   | "*mvconst_internal"   | pseudos" |
lui  a5,%hi(.LANCHOR0) | li a0,0x0101  | li a5,0x0101    
| li   a5,0x0101
ld   a0,%lo(.LANCHOR0)(a5) | addi   a0,0x0101  | addi 
a5,a5,0x0101 | addi a5,a5,0x0101
ret    | slli   a0,a0,16   | mv a0,a5    
| slli a0,a5,32
   | addi   a0,a0,0x0101   | slli 
a5,a5,32 | add  a0,a0,a5

   | slli   a0,a0,16   | add a0,a5,a0 |
   | addi   a0,a0,0x0101   | 
ret   |

   | ret   |

But same commits seem to have regressed Andrew's test from same PR 
(which is the theme of this email).

The seemingly contrived test turned out to be much more than I'd hoped for.

   long long f(void)
   {
 unsigned t = 0x101_0101;
 long long t1 = t;
 long long t2 = ((unsigned long long )t) << 32;
 asm("":"+r"(t1));
 return t1 | t2;
   }

  Before 2e886eef7f2b  |   With 2e886eef7f2b    | With 0530254413f8
    (ideal code)   | define_insn_and_split  | "splitter relaxed new
   |    |  pseudos"
   li   a0,0x101   |    li   a5,0x101   |    li a0,0x101_
   addi a0,a0,0x101    |    addi a5,a5,0x101    |    addi a0,a0,0x101
   slli a5,a0,32   |    mv   a0,a5  |    li a5,0x101_
   or   a0,a0,a5   |    slli a5,a5,32   |    slli a0,a0,32
   ret |    or   a0,a0,a5   |    addi a5,a5,0x101
   |    ret |    or   a0,a5,a0
    |    ret

As a baseline, RTL just before cse1 (in 260r.dfinit) in all of above is:

   # lower word

   (insn 6 2 7 2 (set (reg:DI 138)
    (const_int [0x101]))  {*movdi_64bit}

   (insn 7 6 8 2 (set (reg:DI 137)
    (plus:DI (reg:DI 138)
    (const_int [0x101]))) {adddi3}
 (expr_list:REG_EQUAL (const_int [0x1010101]) )

   (insn 5 8 9 2 (set (reg/v:DI 134 [ t1 ])
    (reg:DI 136 [ t1 ])) {*movdi_64bit}

   # upper word (created independently)

   (insn 9 5 10 2 (set (reg:DI 141)
    (const_int [0x101]))  {*movdi_64bit}

   (insn 10 9 11 2 (set (reg:DI 142)
    (plus:DI (reg:DI 141)
    (const_int [0x101]))) {adddi3}

   (insn 11 10 12 2 (set (reg:DI 140)
    (ashift:DI (reg:DI 142)
    (const_int 32 [0x20]))) {ashldi3}
   (expr_list:REG_EQUAL (const_int [0x1010101]))

   # stitch them
   (insn 12 11 13 2 (set (reg:DI 139)
    (ior:DI (reg/v:DI 134 [ t1 ])
    (reg:DI 140))) "const2.c":7:13 99 {iordi3}


Prior to 2e886eef7f2b, cse1 could do its job: finding oldest equivalent 
registers for the fragments of const and reusing the reg.


   (insn 7 6 8 2 (set (reg:DI 137)
    (plus:DI (reg:DI 138)
    (const_int [0x101]))) {adddi3}
    (expr_list:REG_EQUAL (const_int [0x1010101])))
   [...]

   (insn 11 10 12 2 (set (reg:DI 140)
    (ashift:DI (reg:DI 137)
 ^   OLD EQUIV REG
    (const_int 32 [0x20]))) {ashldi3}
    (expr_list:REG_EQUAL (const_int [0x1010101_])))


With 2e886eef7f2b, define_insn_and_split "*mvconst_internal" recog() 
kicks in during cse1, eliding insns for a const_int.


   (insn 7 6 8 2 (set (reg:DI 137)
    (const_int [0x1010101])) {*mvconst_internal}
    (expr_list:REG_EQUAL (const_int [0x1010101])))
   [...]

   (insn 11 10 12 2 (set (reg:DI 140)
    (const_int [0x1010101_])) {*mvconst_internal}
    (expr_list:REG_EQUAL (const_int  [0x1010101_]) ))

Eventually split1 breaks it up using same mvconst_internal splitter, but 
the cse opportunity has been lost.
*This is a now a baseline for large consts handling for RV backend which 
we all need to be aware of*.



(2) Now on to the nuances as to why things get progressively worse after 
commit 0530254413f8.


It all seems to get down to register allocation passes:

sched1 before 0530254413f8

   ;; 0--> b  0: i  22 r140=0x101    :alu
   ;; 1--> b  0: i  20 r137=0x101    :alu
   ;; 2--> b  0: i  23 r140=r140+0x101   :alu
   ;; 3--> b  0: i  21 r137=r137+0x101   :alu
   ;; 4--> b  0: i  24 r140=r140<<0x20   :alu
   ;; 5--> b  0: i  25 r136=r137 :alu
   ;; 6--> b  0: i   8 r136=asm_operands :nothing
   ;; 7--> b  0: i  17 a0=r136|r140  :alu
   ;; 8--> b  0: i  18 use a0    :nothing

sched1 with 053

Re: Followup on PR/109279: large constants on RISCV

2023-06-12 Thread Vineet Gupta

Hi Jeff,

Thx for the detailed explanation and insight.

On 6/7/23 16:44, Jeff Law wrote:
With 2e886eef7f2b, define_insn_and_split "*mvconst_internal" recog() 
kicks in during cse1, eliding insns for a const_int.


    (insn 7 6 8 2 (set (reg:DI 137)
 (const_int [0x1010101])) {*mvconst_internal}
 (expr_list:REG_EQUAL (const_int [0x1010101])))
    [...]

    (insn 11 10 12 2 (set (reg:DI 140)
 (const_int [0x1010101_])) {*mvconst_internal}
 (expr_list:REG_EQUAL (const_int  [0x1010101_]) ))
Understood.  Not ideal, but we generally don't have good ways to limit 
patterns to being available at different times during the optimization 
phase.  One thing you might want to try (which I thought we used at 
one point) was make the pattern conditional on cse_not_expected.  The 
goal would be to avoid exposing the pattern until a later point in the 
optimizer pipeline.  It may have been the case that we dropped that 
over time during development.  It's all getting fuzzy at this point.


Gave this a try and it seems to fix Andrew's test, but then regresses 
the actual large const case: 0x1010101_01010101 : the mem to const_int 
transformation was being done in cse1 which no longer happens and the 
const pool from initial expand remains all the way into asm generated. I 
don't think we want to go back to that state






Eventually split1 breaks it up using same mvconst_internal splitter, 
but the cse opportunity has been lost.
Right.  I'd have to look at the pass definitions, but I suspect the 
splitting pass where this happens is after the last standard CSE pass. 
So we don't get a chance to CSE the constant synthesis.


Yep split1 and friends happen after cse1 and cse2. At -O2 gcse doesn't 
kick in and if forced to, it is currently limited in what it can do more 
so given this is post reload.




*This is a now a baseline for large consts handling for RV backend 
which we all need to be aware of*.
Understood.  Though it's not as bad as you might think :-)  You can 
spend an inordinate amount of time improving constant synthesis, 
generate code that looks really good, but in the end it may not make a 
bit of different in real performance.  Been there, done that.  I'm not 
saying we give up, but we need to keep in mind that we're often better 
off trading a bit on the constant synthesis if doing so helps code 
where those constants get used.


Understood :-) I was coming to same realization and this seems like a 
good segway into switching topic and investigating post reload gcse for 
Const Rematerialization, another pesky issue with RV and likely to have 
bigger impact across a whole bunch of workloads.


FWIW, IRA for latter case only, emits additional REG_EQUIV notes 
which could also be playing a role.
REG_EQUAL notes get promoted to REG_EQUIV notes in some cases. And 
when other equivalences are discovered it may create a REG_EQUIV note 
out of thin air.


The REG_EQUIV note essentially means that everywhere the register 
occurs you can validly (from a program semantics standpoint) replace 
the register with the value.  It might require reloading, but it's a 
valid semantic transformation which may reduce register pressure -- 
especially for constants that were subject to LICM.


Contrast to REG_EQUAL which creates an equivalence at a particular 
point in the IL, but the equivalence may not hold elsewhere in the IL.


Ok. From reading gccint it seems REG_EQUIV is a stronger form of 
equivalence and seems to be prefered by post reload passes, while 
REG_EQUAL is more of use in pre-reload.



  I would also look at reload_cse_regs which should give us some 
chance at seeing the value reuse if/when IRA/LRA muck things up.


I'll be out of office for the rest of week, will look into this once I'm 
back.


Thx,
-Vineet


Re: [PATCH v3] Implement new RTL optimizations pass: fold-mem-offsets.

2023-07-18 Thread Vineet Gupta

Hi Manolis,

On 7/18/23 11:01, Jeff Law via Gcc-patches wrote:
Vineet @ Rivos has indicated he stumbled across an ICE with the V3 
code.  Hopefully he'll get a testcase for that extracted shortly. 


Yeah, I was trying to build SPEC2017 with this patch and ran into ICE 
for several of them with -Ofast build: The reduced test from 455.nab is 
attached here.

The issue happens with v2 as well, so not something introduced by v3.

There's ICE in cprop_hardreg which immediately follows f-m-o.


The protagonist is ins 93 which starts off in combine as a simple set of 
a DF 0.


| sff.i.288r.combine:(insn 93 337 94 8 (set (reg/v:DF 236 [ e ])
| sff.i.288r.combine- (const_double:DF 0.0 [0x0.0p+0])) "sff.i":23:11 
190 {*movdf_hardfloat_rv64}


Subsequently reload transforms it into SP + offset

| sff.i.303r.reload:(insn 93 337 94 9 (set (mem/c:DF (plus:DI (reg/f:DI 
2 sp)

| sff.i.303r.reload- (const_int 8 [0x8])) [4 %sfp+-8 S8 A64])
| sff.i.303r.reload- (const_double:DF 0.0 [0x0.0p+0])) "sff.i":23:11 190 
{*movdf_hardfloat_rv64}

| sff.i.303r.reload- (expr_list:REG_EQUAL (const_double:DF 0.0 [0x0.0p+0])

It gets processed by f-m-o and lands in cprop_hardreg, where it triggers 
ICE.


| (insn 93 337 523 11 (set (mem/c:DF (plus:DI (reg/f:DI 2 sp)
|     (const_int 8 [0x8])) [4 %sfp+-8 S8 A64])
|     (const_double:DF 0.0 [0x0.0p+0])) "sff.i":23:11 -1
^^^
|      (expr_list:REG_EQUAL (const_double:DF 0.0 [0x0.0p+0])
|    (nil)))
| during RTL pass: cprop_hardreg

Here's my analysis:

f-m-o: do_check_validity() -> insn_invalid_p() tries to recog() a 
modified version of insn 93 (actually there is no change, so perhaps 
something we can optimize later). The corresponding md pattern 
movdf_hardfloat_rv64 no longer matches since it expects REG_P for 
operand0, while reload has converted it into SP + offset. f-m-o then 
does the right thing by invalidating INSN_CODE=-1 for a subsequent 
recog() to work correctly.
But it seems this -1 lingers into the next pass, and trips up 
copyprop_hardreg_forward_1() -> extract_constrain_insn()

So I don't know what the right fix here should be.

In a run with -fno-fold-mem-offsets, the same insn 93 is successfully 
grok'ed by cprop_hardreg,


| (insn 93 337 522 11 (set (mem/c:DF (plus:DI (reg/f:DI 2 sp)
|    (const_int 8 [0x8])) [4 %sfp+-8 S8 A64])
|    (const_double:DF 0.0 [0x0.0p+0])) "sff.i":23:11 190 
{*movdf_hardfloat_rv64}

^^^
| (expr_list:REG_EQUAL (const_double:DF 0.0 [0x0.0p+0])
|    (nil)))

P.S. I wonder if it is a good idea in general to call recog() post 
reload since the insn could be changed sufficiently to no longer match 
the md patterns. Of course I don't know the answer.


P.S.2 When debugging code, I noticed a minor annoyance in the patch with 
the whole fold_mem_offsets_driver() switch-case indirection. It doesn't 
seem to be serving any purpose, and we could simply call corresponding 
do_* routines in execute () itself.


Thx,
-Vineeta, c, d, g, h, i, j, k, m, p, q, r, aa, t, u, x;
double *f, *s;
double l, n, o, v, w;
b() {
  double e, ad, ae, af, ag, ah, ai, aj, am, an, ap, aq, ar, as, at, av, aw;
  for (; q; q = aa) {
r = f[x];
ah = f[r + 2] - g;
af = f[0] - f[r];
ag = 1 - f[r + 1];
av = ae * af * ah * ai;
aj = h - w * p * ah;
am = o + av * af;
an = j * o * av * ag;
ap = (am - m) * ad * (k - an - n) * a - v * c;
ar = (aj - l) * c;
if (a)
  ;
else
az:
  switch (d) {
  case 1:
e = aw * i;
break;
  case 2:
exit(1);
  }
s[0] = ap;
t += at * aq;
u = as += at * ar;
if (c)
  goto az;
  }
  return e;
}


Re: [PATCH v3] Implement new RTL optimizations pass: fold-mem-offsets.

2023-07-19 Thread Vineet Gupta




On 7/18/23 21:31, Jeff Law via Gcc-patches wrote:


In a run with -fno-fold-mem-offsets, the same insn 93 is successfully 
grok'ed by cprop_hardreg,


| (insn 93 337 522 11 (set (mem/c:DF (plus:DI (reg/f:DI 2 sp)
|    (const_int 8 [0x8])) [4 %sfp+-8 S8 A64])
|    (const_double:DF 0.0 [0x0.0p+0])) "sff.i":23:11 190 
{*movdf_hardfloat_rv64}

^^^
| (expr_list:REG_EQUAL (const_double:DF 0.0 [0x0.0p+0])
|    (nil)))

P.S. I wonder if it is a good idea in general to call recog() post 
reload since the insn could be changed sufficiently to no longer 
match the md patterns. Of course I don't know the answer.

If this ever causes a problem, it's a backend bug.  It's that simple.

Conceptually it should always be safe to set INSN_CODE to -1 for any 
insn.


Sure the -1 should be handled, but are you implying that f-mo- will 
always generate a valid combination and recog() failing is simply a bug 
in backend and/or f-m-o. If not, the -1 setting can potentially trigger 
an ICE in future.




Odds are for this specific case in the RV backend, we just need a 
constraint to store 0.0 into a memory location.  That can actually be 
implemented as a store from x0 since 0.0 has the bit pattern 0x0.  
This is probably a good thing to expose anyway as an optimization and 
can move forward independently of the f-m-o patch.


I call dibs on this :-) Seems like an interesting little side project.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110748

-Vineet


[PATCH] RISC-V: optim const DF +0.0 store to mem [PR/110748]

2023-07-21 Thread Vineet Gupta
DF +0.0 is bitwise all zeros so int x0 store to mem can be used to optimize it.

void zd(double *) { *d = 0.0; }

currently:

| fmv.d.x fa5,zero
| fsd fa5,0(a0)
| ret

With patch

| sd  zero,0(a0)
| ret

This came to light when testing the in-flight f-m-o patch where an ICE
was gettinh triggered due to lack of this pattern but turns out this
is an independent optimization of its own [1]

[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624857.html

Apparently this is a regression in gcc-13, introduced by commit
ef85d150b5963 ("RISC-V: Enable TARGET_SUPPORTS_WIDE_INT") and the fix
thus is a partial revert of that change.

Ran thru full multilib testsuite, there was 1 false failure due to
random string "lw" appearing in lto build assembler output,
which is also fixed in the patch.

gcc/Changelog:

* config/riscv/predicates.md (const_0_operand): Add back
  const_double.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr110748-1.c: New Test.
* gcc.target/riscv/xtheadfmv-fmv.c: Add '\t' around test
  patterns to avoid random string matches.

Signed-off-by: Vineet Gupta 
---
 gcc/config/riscv/predicates.md |  2 +-
 gcc/testsuite/gcc.target/riscv/pr110748-1.c| 10 ++
 gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c |  8 
 3 files changed, 15 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr110748-1.c

diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 5a22c77f0cd0..9db28c2def7e 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -58,7 +58,7 @@
(match_test "INTVAL (op) + 1 != 0")))
 
 (define_predicate "const_0_operand"
-  (and (match_code "const_int,const_wide_int,const_vector")
+  (and (match_code "const_int,const_wide_int,const_double,const_vector")
(match_test "op == CONST0_RTX (GET_MODE (op))")))
 
 (define_predicate "const_1_operand"
diff --git a/gcc/testsuite/gcc.target/riscv/pr110748-1.c 
b/gcc/testsuite/gcc.target/riscv/pr110748-1.c
new file mode 100644
index ..2f5bc08aae72
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr110748-1.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-march=rv64g -mabi=lp64d -O2" } */
+
+
+void zd(double *d) { *d = 0.0;  }
+void zf(float *f)  { *f = 0.0;  }
+
+/* { dg-final { scan-assembler-not "\tfmv\\.d\\.x\t" } } */
+/* { dg-final { scan-assembler-not "\tfmv\\.s\\.x\t" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c 
b/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
index 1036044291e7..89eb48bed1b9 100644
--- a/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
+++ b/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
@@ -18,7 +18,7 @@ d2ll (double d)
 /* { dg-final { scan-assembler "th.fmv.hw.x" } } */
 /* { dg-final { scan-assembler "fmv.x.w" } } */
 /* { dg-final { scan-assembler "th.fmv.x.hw" } } */
-/* { dg-final { scan-assembler-not "sw" } } */
-/* { dg-final { scan-assembler-not "fld" } } */
-/* { dg-final { scan-assembler-not "fsd" } } */
-/* { dg-final { scan-assembler-not "lw" } } */
+/* { dg-final { scan-assembler-not "\tsw\t" } } */
+/* { dg-final { scan-assembler-not "\tfld\t" } } */
+/* { dg-final { scan-assembler-not "\tfsd\t" } } */
+/* { dg-final { scan-assembler-not "\tlw\t" } } */
-- 
2.34.1



Re: [PATCH] RISC-V: optim const DF +0.0 store to mem [PR/110748]

2023-07-21 Thread Vineet Gupta




On 7/21/23 11:15, Philipp Tomsich wrote:

On Fri, 21 Jul 2023 at 19:56, Vineet Gupta  wrote:

DF +0.0 is bitwise all zeros so int x0 store to mem can be used to optimize it.

void zd(double *) { *d = 0.0; }

currently:

| fmv.d.x fa5,zero
| fsd fa5,0(a0)
| ret

With patch

| sd  zero,0(a0)
| ret
This came to light when testing the in-flight f-m-o patch where an ICE
was gettinh triggered due to lack of this pattern but turns out this

typo: "gettinh" -> "getting"


Fixed.


is an independent optimization of its own [1]

[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624857.html

Apparently this is a regression in gcc-13, introduced by commit
ef85d150b5963 ("RISC-V: Enable TARGET_SUPPORTS_WIDE_INT") and the fix
thus is a partial revert of that change.

Should we add a "Fixes: "?


Sure. Although gcc usage of Fixes tag seems slightly different than say 
linux kernel's.






Ran thru full multilib testsuite, there was 1 false failure due to
random string "lw" appearing in lto build assembler output,
which is also fixed in the patch.

gcc/Changelog:

PR target/110748


Added.

Thx,
-Vineet





 * config/riscv/predicates.md (const_0_operand): Add back
   const_double.

gcc/testsuite/ChangeLog:

 * gcc.target/riscv/pr110748-1.c: New Test.
 * gcc.target/riscv/xtheadfmv-fmv.c: Add '\t' around test
   patterns to avoid random string matches.

Signed-off-by: Vineet Gupta 
---
  gcc/config/riscv/predicates.md |  2 +-
  gcc/testsuite/gcc.target/riscv/pr110748-1.c| 10 ++
  gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c |  8 
  3 files changed, 15 insertions(+), 5 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/riscv/pr110748-1.c

diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 5a22c77f0cd0..9db28c2def7e 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -58,7 +58,7 @@
 (match_test "INTVAL (op) + 1 != 0")))

  (define_predicate "const_0_operand"
-  (and (match_code "const_int,const_wide_int,const_vector")
+  (and (match_code "const_int,const_wide_int,const_double,const_vector")
 (match_test "op == CONST0_RTX (GET_MODE (op))")))

  (define_predicate "const_1_operand"
diff --git a/gcc/testsuite/gcc.target/riscv/pr110748-1.c 
b/gcc/testsuite/gcc.target/riscv/pr110748-1.c
new file mode 100644
index ..2f5bc08aae72
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr110748-1.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-march=rv64g -mabi=lp64d -O2" } */
+
+
+void zd(double *d) { *d = 0.0;  }
+void zf(float *f)  { *f = 0.0;  }
+
+/* { dg-final { scan-assembler-not "\tfmv\\.d\\.x\t" } } */
+/* { dg-final { scan-assembler-not "\tfmv\\.s\\.x\t" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c 
b/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
index 1036044291e7..89eb48bed1b9 100644
--- a/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
+++ b/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
@@ -18,7 +18,7 @@ d2ll (double d)
  /* { dg-final { scan-assembler "th.fmv.hw.x" } } */
  /* { dg-final { scan-assembler "fmv.x.w" } } */
  /* { dg-final { scan-assembler "th.fmv.x.hw" } } */
-/* { dg-final { scan-assembler-not "sw" } } */
-/* { dg-final { scan-assembler-not "fld" } } */
-/* { dg-final { scan-assembler-not "fsd" } } */
-/* { dg-final { scan-assembler-not "lw" } } */
+/* { dg-final { scan-assembler-not "\tsw\t" } } */
+/* { dg-final { scan-assembler-not "\tfld\t" } } */
+/* { dg-final { scan-assembler-not "\tfsd\t" } } */
+/* { dg-final { scan-assembler-not "\tlw\t" } } */
--
2.34.1





[PATCH v2] RISC-V: optim const DF +0.0 store to mem [PR/110748]

2023-07-21 Thread Vineet Gupta
Fixes: ef85d150b5963 ("RISC-V: Enable TARGET_SUPPORTS_WIDE_INT")
(gcc-13 regression)

DF +0.0 is bitwise all zeros so int x0 store to mem can be used to optimize it.

void zd(double *) { *d = 0.0; }

currently:

| fmv.d.x fa5,zero
| fsd fa5,0(a0)
| ret

With patch

| sd  zero,0(a0)
| ret

This came to light when testing the in-flight f-m-o patch where an ICE
was getting triggered due to lack of this pattern but turns out this
is an independent optimization of its own [1]

[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624857.html

Ran thru full multilib testsuite, there was 1 false failure due to
random string "lw" appearing in lto build assembler output, which is
also fixed in the patch.

gcc/Changelog:

PR target/110748
* config/riscv/predicates.md (const_0_operand): Add back
  const_double.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr110748-1.c: New Test.
* gcc.target/riscv/xtheadfmv-fmv.c: Add '\t' around test
  patterns to avoid random string matches.

Signed-off-by: Vineet Gupta 
---
Changes since v1:
  - No code changes
  - Updated commitlog: typo, "Fixes:" tag, mention PR in Changelog entry
---
 gcc/config/riscv/predicates.md |  2 +-
 gcc/testsuite/gcc.target/riscv/pr110748-1.c| 10 ++
 gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c |  8 
 3 files changed, 15 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr110748-1.c

diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 5a22c77f0cd0..9db28c2def7e 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -58,7 +58,7 @@
(match_test "INTVAL (op) + 1 != 0")))
 
 (define_predicate "const_0_operand"
-  (and (match_code "const_int,const_wide_int,const_vector")
+  (and (match_code "const_int,const_wide_int,const_double,const_vector")
(match_test "op == CONST0_RTX (GET_MODE (op))")))
 
 (define_predicate "const_1_operand"
diff --git a/gcc/testsuite/gcc.target/riscv/pr110748-1.c 
b/gcc/testsuite/gcc.target/riscv/pr110748-1.c
new file mode 100644
index ..2f5bc08aae72
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr110748-1.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-march=rv64g -mabi=lp64d -O2" } */
+
+
+void zd(double *d) { *d = 0.0;  }
+void zf(float *f)  { *f = 0.0;  }
+
+/* { dg-final { scan-assembler-not "\tfmv\\.d\\.x\t" } } */
+/* { dg-final { scan-assembler-not "\tfmv\\.s\\.x\t" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c 
b/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
index 1036044291e7..89eb48bed1b9 100644
--- a/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
+++ b/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
@@ -18,7 +18,7 @@ d2ll (double d)
 /* { dg-final { scan-assembler "th.fmv.hw.x" } } */
 /* { dg-final { scan-assembler "fmv.x.w" } } */
 /* { dg-final { scan-assembler "th.fmv.x.hw" } } */
-/* { dg-final { scan-assembler-not "sw" } } */
-/* { dg-final { scan-assembler-not "fld" } } */
-/* { dg-final { scan-assembler-not "fsd" } } */
-/* { dg-final { scan-assembler-not "lw" } } */
+/* { dg-final { scan-assembler-not "\tsw\t" } } */
+/* { dg-final { scan-assembler-not "\tfld\t" } } */
+/* { dg-final { scan-assembler-not "\tfsd\t" } } */
+/* { dg-final { scan-assembler-not "\tlw\t" } } */
-- 
2.34.1



Re: [PATCH] RISC-V: optim const DF +0.0 store to mem [PR/110748]

2023-07-21 Thread Vineet Gupta




On 7/21/23 11:31, Palmer Dabbelt wrote:

On Fri, 21 Jul 2023 10:55:52 PDT (-0700), Vineet Gupta wrote:
DF +0.0 is bitwise all zeros so int x0 store to mem can be used to 
optimize it.


void zd(double *) { *d = 0.0; }

currently:

| fmv.d.x fa5,zero
| fsd fa5,0(a0)
| ret

With patch

| sd  zero,0(a0)
| ret

This came to light when testing the in-flight f-m-o patch where an ICE
was gettinh triggered due to lack of this pattern but turns out this
is an independent optimization of its own [1]

[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624857.html

Apparently this is a regression in gcc-13, introduced by commit
ef85d150b5963 ("RISC-V: Enable TARGET_SUPPORTS_WIDE_INT") and the fix
thus is a partial revert of that change.


Given that it can ICE, we should probably backport it to 13.


FWIW ICE is on an in-flight for-gcc-14 patch, not something in tree 
already. And this will merge ahead of that.

I'm fine with backport though.




Ran thru full multilib testsuite, there was 1 false failure due to


Did you run the test with autovec?


I have standard 32/64 mutlilibs, but no 'v' in arch so autovec despite 
being enabled at -O2 and above will not kick in.

I think we should add a 'v' multilib.


There's also a pmode_reg_or_0_operand, some of those don't appear 
protected from FP values.  So we might need something like


diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index cd5b19457f8..d8ce9223343 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -63,7 +63,7 @@ (define_expand "movmisalign"

(define_expand "len_mask_gather_load"
  [(match_operand:VNX1_QHSD 0 "register_operand")
-   (match_operand 1 "pmode_reg_or_0_operand")
+   (match_operand:P 1 "pmode_reg_or_0_operand")
   (match_operand:VNX1_QHSDI 2 "register_operand")
   (match_operand 3 "")
   (match_operand 4 "")

a bunch of times, as there's a ton of them?  I'm not entirely sure if 
that

could manifest as an actual bug, though...


What does 'P' do here ?


+
+void zd(double *d) { *d = 0.0;  }
+void zf(float *f)  { *f = 0.0;  }
+
+/* { dg-final { scan-assembler-not "\tfmv\\.d\\.x\t" } } */
+/* { dg-final { scan-assembler-not "\tfmv\\.s\\.x\t" } } */


IIUC the pattern to emit fmv suffers from the same bug -- it's fixed 
in the same
way, but I think we might be able to come up with a test for it: 
`fmv.d.x FREG,

x0` would be the fastest way to generate 0.0, so maybe something like

   double sum(double *d) {
 double sum = 0;
 for (int i = 0; i < 8; ++i)
   sum += d[i];
 return sum;
   }

would do it?  That's generating the fmv on 13 for me, though, so maybe 
I'm

missing something?`


I need to unpack this first :-)



diff --git a/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c 
b/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c

index 1036044291e7..89eb48bed1b9 100644
--- a/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
+++ b/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
@@ -18,7 +18,7 @@ d2ll (double d)
 /* { dg-final { scan-assembler "th.fmv.hw.x" } } */
 /* { dg-final { scan-assembler "fmv.x.w" } } */
 /* { dg-final { scan-assembler "th.fmv.x.hw" } } */
-/* { dg-final { scan-assembler-not "sw" } } */
-/* { dg-final { scan-assembler-not "fld" } } */
-/* { dg-final { scan-assembler-not "fsd" } } */
-/* { dg-final { scan-assembler-not "lw" } } */
+/* { dg-final { scan-assembler-not "\tsw\t" } } */
+/* { dg-final { scan-assembler-not "\tfld\t" } } */
+/* { dg-final { scan-assembler-not "\tfsd\t" } } */
+/* { dg-final { scan-assembler-not "\tlw\t" } } */


I think that autovec one is the only possible dependency that might 
have snuck

in, so we should be safe otherwise.  Thanks!


I'm not sure if this specific comment is related to the xthead test or 
continuation of above.
For xthead it is real issue since I saw a random "lw" in lto assembler 
output.




Re: [PATCH] RISC-V: optim const DF +0.0 store to mem [PR/110748]

2023-07-21 Thread Vineet Gupta




On 7/21/23 11:31, Palmer Dabbelt wrote:


IIUC the pattern to emit fmv suffers from the same bug -- it's fixed 
in the same
way, but I think we might be able to come up with a test for it: 
`fmv.d.x FREG,

x0` would be the fastest way to generate 0.0, so maybe something like

   double sum(double *d) {
 double sum = 0;
 for (int i = 0; i < 8; ++i)
   sum += d[i];
 return sum;
   }

would do it?  That's generating the fmv on 13 for me, though, so maybe 
I'm
missing something?` 


I don't think we can avoid FMV in this case

    fmv.d.x    fa0,zero #1
    addi    a5,a0,64
.L2:
    fld    fa5,0(a0)
    addi    a0,a0,8
    fadd.d    fa0,fa0,fa5   #2
    bne    a0,a5,.L2
    ret

In #1, the zero needs to be setup in FP reg (possible using FMV), since 
in #2 it will be used for FP math.


If we change ur test slightly,

double zadd(double *d) {
 double sum = 0.0;
 for (int i = 0; i < 8; ++i)
   d[i] = sum;
 return sum;
}

We still get the optimal code for writing to FP 0. The last FMV is 
unavoidable as we need an FP return reg.



    addi    a5,a0,64
.L2:
    sd    zero,0(a0)
    addi    a0,a0,8
    bne    a0,a5,.L2
    fmv.d.x    fa0,zero
    ret


Re: [PATCH v2] RISC-V: optim const DF +0.0 store to mem [PR/110748]

2023-07-22 Thread Vineet Gupta

On 7/21/23 23:05, Jeff Law wrote:



On 7/21/23 12:30, Vineet Gupta wrote:

Fixes: ef85d150b5963 ("RISC-V: Enable TARGET_SUPPORTS_WIDE_INT")
(gcc-13 regression)

DF +0.0 is bitwise all zeros so int x0 store to mem can be used to 
optimize it.


void zd(double *) { *d = 0.0; }

currently:

| fmv.d.x fa5,zero
| fsd fa5,0(a0)
| ret

With patch

| sd  zero,0(a0)
| ret

This came to light when testing the in-flight f-m-o patch where an ICE
was getting triggered due to lack of this pattern but turns out this
is an independent optimization of its own [1]

[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624857.html

Ran thru full multilib testsuite, there was 1 false failure due to
random string "lw" appearing in lto build assembler output, which is
also fixed in the patch.

gcc/Changelog:

PR target/110748
* config/riscv/predicates.md (const_0_operand): Add back
  const_double.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr110748-1.c: New Test.
* gcc.target/riscv/xtheadfmv-fmv.c: Add '\t' around test
  patterns to avoid random string matches.

OK
jeff


Thx Jeff. I couldn't resist beefing up the changelog some more to 
capture the technicalities at heart. Hopefully someone in future getting 
up to speed on gcc will find it informing.


-Vineet


[Committed] RISC-V: optim const DF +0.0 store to mem [PR/110748]

2023-07-22 Thread Vineet Gupta
Fixes: ef85d150b5963 ("RISC-V: Enable TARGET_SUPPORTS_WIDE_INT")

DF +0.0 is bitwise all zeros so int x0 store to mem can be used to optimize it.

void zd(double *) { *d = 0.0; }

currently:

| fmv.d.x fa5,zero
| fsd fa5,0(a0)
| ret

With patch

| sd  zero,0(a0)
| ret

The fix updates predicate const_0_operand() so reg_or_0_operand () now
includes const_double, enabling movdf expander -> riscv_legitimize_move ()
to generate below vs. an intermediate set (reg:DF) const_double:DF

| (insn 6 3 0 2 (set (mem:DF (reg/v/f:DI 134 [ d ])
|(const_double:DF 0.0 [0x0.0p+0]))

This change also enables such insns to be recog() by later passes.
The md pattern "*movdf_hardfloat_rv64" despite already supporting the
needed constraints {"m","G"} mem/const 0.0 was failing to match because
the additional condition check reg_or_0_operand() was failing due to
missing const_double.

This failure to recog() was triggering an ICE when testing the in-flight
f-m-o patches and is how all of this started, but then was deemed to be
an independent optimization of it's own [1].

[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624857.html

Its worthwhile to note all the set peices were already there and working
up until my own commit mentioned at top regressed the whole thing.

Ran thru full multilib testsuite and no surprises. There was 1 false
failure due to random string "lw" appearing in lto build assembler output,
which is also fixed here.

gcc/ChangeLog:

PR target/110748
* config/riscv/predicates.md (const_0_operand): Add back
const_double.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr110748-1.c: New Test.
* gcc.target/riscv/xtheadfmv-fmv.c: Add '\t' around test
    patterns to avoid random string matches.

Signed-off-by: Vineet Gupta 
---
 gcc/config/riscv/predicates.md |  2 +-
 gcc/testsuite/gcc.target/riscv/pr110748-1.c| 10 ++
 gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c |  8 
 3 files changed, 15 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr110748-1.c

diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 5a22c77f0cd0..9db28c2def7e 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -58,7 +58,7 @@
(match_test "INTVAL (op) + 1 != 0")))
 
 (define_predicate "const_0_operand"
-  (and (match_code "const_int,const_wide_int,const_vector")
+  (and (match_code "const_int,const_wide_int,const_double,const_vector")
(match_test "op == CONST0_RTX (GET_MODE (op))")))
 
 (define_predicate "const_1_operand"
diff --git a/gcc/testsuite/gcc.target/riscv/pr110748-1.c 
b/gcc/testsuite/gcc.target/riscv/pr110748-1.c
new file mode 100644
index ..2f5bc08aae72
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr110748-1.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-march=rv64g -mabi=lp64d -O2" } */
+
+
+void zd(double *d) { *d = 0.0;  }
+void zf(float *f)  { *f = 0.0;  }
+
+/* { dg-final { scan-assembler-not "\tfmv\\.d\\.x\t" } } */
+/* { dg-final { scan-assembler-not "\tfmv\\.s\\.x\t" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c 
b/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
index 1036044291e7..89eb48bed1b9 100644
--- a/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
+++ b/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
@@ -18,7 +18,7 @@ d2ll (double d)
 /* { dg-final { scan-assembler "th.fmv.hw.x" } } */
 /* { dg-final { scan-assembler "fmv.x.w" } } */
 /* { dg-final { scan-assembler "th.fmv.x.hw" } } */
-/* { dg-final { scan-assembler-not "sw" } } */
-/* { dg-final { scan-assembler-not "fld" } } */
-/* { dg-final { scan-assembler-not "fsd" } } */
-/* { dg-final { scan-assembler-not "lw" } } */
+/* { dg-final { scan-assembler-not "\tsw\t" } } */
+/* { dg-final { scan-assembler-not "\tfld\t" } } */
+/* { dg-final { scan-assembler-not "\tfsd\t" } } */
+/* { dg-final { scan-assembler-not "\tlw\t" } } */
-- 
2.34.1



Re: RISC-V: Folding memory for FP + constant case

2023-08-01 Thread Vineet Gupta



On 7/25/23 20:31, Jeff Law via Gcc-patches wrote:



On 7/25/23 05:24, Jivan Hakobyan wrote:

Hi.

I re-run the benchmarks and hopefully got the same profit.
I also compared the leela's code and figured out the reason.

Actually, my and Manolis's patches do the same thing. The difference 
is only execution order.
But shouldn't your patch also allow for for at the last the potential 
to pull the fp+offset computation out of a loop?  I'm pretty sure 
Manolis's patch can't do that.


Because of f-m-o held after the register allocation it cannot 
eliminate redundant move 'sp' to another register.
Actually that's supposed to be handled by a different patch that 
should already be upstream.  Specifically;



commit 6a2e8dcbbd4bab374b27abea375bf7a921047800
Author: Manolis Tsamis 
Date:   Thu May 25 13:44:41 2023 +0200

    cprop_hardreg: Enable propagation of the stack pointer if possible
        Propagation of the stack pointer in cprop_hardreg is currenty
    forbidden in all cases, due to maybe_mode_change returning NULL.
    Relax this restriction and allow propagation when no mode change is
    requested.
        gcc/ChangeLog:
        * regcprop.cc (maybe_mode_change): Enable stack pointer
    propagation.
I think there were a couple-follow-ups.  But that's the key change 
that should allow propagation of copies from the stack pointer and 
thus eliminate the mov gpr,sp instructions.  If that's not happening, 
then it's worth investigating why.




Besides that, I have checked the build failure on x264_r. It is 
already fixed on the third version.

Yea, this was a problem with re-recognition.  I think it was fixed by:


commit ecfa870ff29d979bd2c3d411643b551f2b6915b0
Author: Vineet Gupta 
Date:   Thu Jul 20 11:15:37 2023 -0700

    RISC-V: optim const DF +0.0 store to mem [PR/110748]
        Fixes: ef85d150b5963 ("RISC-V: Enable TARGET_SUPPORTS_WIDE_INT")
        DF +0.0 is bitwise all zeros so int x0 store to mem can be 
used to optimize it.

[ ... ]


So I think the big question WRT your patch is does it still help the 
case where we weren't pulling the fp+offset computation out of a loop.


I have some numbers for f-m-o v3 vs this. Attached here (vs. inline to 
avoid the Thunderbird mangling the test formatting)
benchmark   workload #   upstreamupstream +  
upstream +
 g54e54f77c1f-m-o   
fold-fp-off

500.perlbench_r 0   1217932817476   1217884553366   0.004%  
1217928953834   0.000%
1   7437572412017436555281330.014%  
7436958204260.008%
2   7034556460907034235592980.005%  
7034552962510.000%
502.gcc_r   0   1950043691041949734789450.016%  
1949841884000.010%
1   2327199387782326884911130.014%  
2326923790850.012%
2   2234432804592234136163680.013%  
2234241518480.009%
3   1862337046241862065164210.015%  
1862311376160.001%
4   2874063942322873788702790.010%  
2874037074660.001%
503.bwaves_r0   3161940436793161940436790.000%  
3161940436620.000%
1   4992934903804992934903800.000%  
4992934903630.000%
2   3893654016153893654016150.000%  
3893654015980.000%
3   4735143106794735143106790.000%  
4735143106620.000%
505.mcf_r   0   6892586949026892547403440.001%  
6892586948870.000%
507.cactuBSSN_r 0   3966612364613   3966498234698   0.003%  
3966612365068   0.000%
508.namd_r  0   1903766272166   1903766271701   0.000%  
1903765987301   0.000%
510.parest_r0   3512678127316   3512676752062   0.000%  
3512677505662   0.008%
511.povray_r0   3036725558618   3036722265149   0.000%  
3036725556997   0.000%
519.lbm_r   0   1134454304533   1134454304533   0.000%  
1134454304518   0.000%
520.omnetpp_r   0   1001937885126   1001937884542   0.000%  
1001937883931   0.000%
521.wrf_r   0   3959642601629   3959541912013   0.003%  
3959642615086   0.000%
523.xalancbmk_r 0   1065004269065   1064981413043   0.002%  
1065004132070   0.000%
525.x264_r  0   4964928575334964593675820.007%  
4964779884350.003%
1   1891248078083   1891222197535   0.001%  
1890990911614   0.014%
2   1815609267498   1815561397105   0.003%  
1815341248007   0.015%
526.blender_r   0   1672203767444   1671549923427   0.039%  
1672224626743  -0.001%
527.cam4_r  0   2326424925038   2320567166886   0.252

ICE for interim fix for PR/110748

2023-08-01 Thread Vineet Gupta

Hi Jeff,

As discussed this morning, I'm sending over dumps for the optim of DF 
const -0.0 (PR/110748)  [1]
For rv64gc_zbs build, IRA is undoing the split which eventually leads to 
ICE in final pass.


[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110748#c15

void znd(double *d) {  *d = -0.0;   }


*split1*

(insn 10 3 11 2 (set (reg:DI 136)
    (const_int [0x8000])) "neg.c":4:5 -1

(insn 11 10 0 2 (set (mem:DF (reg:DI 135) [1 *d_2(D)+0 S8 A64])
    (subreg:DF (reg:DI 136) 0)) "neg.c":4:5 -1

*ira*

(insn 11 9 12 2 (set (mem:DF (reg:DI 135) [1 *d_2(D)+0 S8 A64])
    (const_double:DF -0.0 [-0x0.0p+0])) "neg.c":4:5 190 
{*movdf_hardfloat_rv64}

 (expr_list:REG_DEAD (reg:DI 135)


For the working case, the large const is not involved and not subject to 
IRA playing foul.


Attached are split1 and IRA dumps for OK (rv64gc) and NOK (rv64gc_zbs) 
cases.


Thx,
-Vineet
;; Function znd (znd, funcdef_no=0, decl_uid=2278, cgraph_uid=1, symbol_order=0)

Starting decreasing number of live ranges...
starting the processing of deferred insns
ending the processing of deferred insns
df_analyze called
;; 1 loops found
;;
;; Loop 0
;;  header 0, latch 1
;;  depth 0, outer -1
;;  nodes: 0 1 2
;; 2 succs { 1 }
rescanning insn with uid = 11.
deleting insn with uid = 10.
starting the processing of deferred insns
ending the processing of deferred insns
df_analyze called
df_worklist_dataflow_doublequeue: n_basic_blocks 3 n_edges 2 count 3 (1)
Reg 135 uninteresting
;; 1 loops found
;;
;; Loop 0
;;  header 0, latch 1
;;  depth 0, outer -1
;;  nodes: 0 1 2
;; 2 succs { 1 }
Building IRA IR
starting the processing of deferred insns
ending the processing of deferred insns
df_analyze called

Pass 0 for finding pseudo/allocno costs

a0 (r135,l0) best GR_REGS, allocno GR_REGS

  a0(r135,l0) costs: SIBCALL_REGS:2000,2000 JALR_REGS:2000,2000 
GR_REGS:2000,2000 MEM:1,1


Pass 1 for finding pseudo/allocno costs

r135: preferred GR_REGS, alternative NO_REGS, allocno GR_REGS

  a0(r135,l0) costs: GR_REGS:2000,2000 MEM:1,1

   Insn 11(l0): point = 0
   Insn 9(l0): point = 2
 a0(r135): [1..2]
Compressing live ranges: from 5 to 2 - 40%
Ranges after the compression:
 a0(r135): [0..1]
+++Allocating 0 bytes for conflict table (uncompressed size 8)
;; a0(r135,l0) conflicts:
;; total conflict hard regs:
;; conflict hard regs:


  pref0:a0(r135)<-hr10@2000
  regions=1, blocks=3, points=2
allocnos=1 (big 0), copies=0, conflicts=0, ranges=1

 Allocnos coloring:


  Loop 0 (parent -1, header bb2, depth 0)
bbs: 2
all: 0r135
modified regnos: 135
border:
Pressure: GR_REGS=2
Hard reg set forest:
  0:( 1 5-63)@0
1:( 5-31)@24000
  Allocno a0r135 of GR_REGS(28) has 27 avail. regs  5-31, node:  5-31 
(confl regs =  0-4 32-127)
  Forming thread from colorable bucket:
  Pushing a0(r135,l0)(cost 0)
  Popping a0(r135,l0)  -- assign reg 10
Disposition:
0:r135 l010
New iteration of spill/restore move
+++Costs: overall -2000, reg -2000, mem 0, ld 0, st 0, move 0
+++   move loops 0, new jumps 0


znd

Dataflow summary:
;;  fully invalidated by EH  0 [zero] 3 [gp] 4 [tp] 5 [t0] 6 [t1] 7 [t2] 10 
[a0] 11 [a1] 12 [a2] 13 [a3] 14 [a4] 15 [a5] 16 [a6] 17 [a7] 28 [t3] 29 [t4] 30 
[t5] 31 [t6] 32 [ft0] 33 [ft1] 34 [ft2] 35 [ft3] 36 [ft4] 37 [ft5] 38 [ft6] 39 
[ft7] 42 [fa0] 43 [fa1] 44 [fa2] 45 [fa3] 46 [fa4] 47 [fa5] 48 [fa6] 49 [fa7] 
60 [ft8] 61 [ft9] 62 [ft10] 63 [ft11] 66 [vl] 67 [vtype] 68 [vxrm] 69 [frm] 70 
[N/A] 71 [N/A] 72 [N/A] 73 [N/A] 74 [N/A] 75 [N/A] 76 [N/A] 77 [N/A] 78 [N/A] 
79 [N/A] 80 [N/A] 81 [N/A] 82 [N/A] 83 [N/A] 84 [N/A] 85 [N/A] 86 [N/A] 87 
[N/A] 88 [N/A] 89 [N/A] 90 [N/A] 91 [N/A] 92 [N/A] 93 [N/A] 94 [N/A] 95 [N/A] 
96 [v0] 97 [v1] 98 [v2] 99 [v3] 100 [v4] 101 [v5] 102 [v6] 103 [v7] 104 [v8] 
105 [v9] 106 [v10] 107 [v11] 108 [v12] 109 [v13] 110 [v14] 111 [v15] 112 [v16] 
113 [v17] 114 [v18] 115 [v19] 116 [v20] 117 [v21] 118 [v22] 119 [v23] 120 [v24] 
121 [v25] 122 [v26] 123 [v27] 124 [v28] 125 [v29] 126 [v30] 127 [v31]
;;  hardware regs used   2 [sp] 64 [arg] 65 [frame]
;;  regular block artificial uses2 [sp] 8 [s0] 64 [arg] 65 [frame]
;;  eh block artificial uses 2 [sp] 8 [s0] 64 [arg] 65 [frame]
;;  entry block defs 1 [ra] 2 [sp] 8 [s0] 10 [a0] 11 [a1] 12 [a2] 13 [a3] 
14 [a4] 15 [a5] 16 [a6] 17 [a7] 42 [fa0] 43 [fa1] 44 [fa2] 45 [fa3] 46 [fa4] 47 
[fa5] 48 [fa6] 49 [fa7] 64 [arg] 65 [frame]
;;  exit block uses  1 [ra] 2 [sp] 8 [s0] 65 [frame]
;;  regs ever live   10 [a0]
;;  ref usage   r1={1d,1u} r2={1d,2u} r8={1d,2u} r10={1d,1u} r11={1d} r12={1d} 
r13={1d} r14={1d} r15={1d} r16={1d} r17={1d} r42={1d} r43={1d} r44={1d} 
r45={1d} r46={1d} r47={1d} r48={1d} r49={1d} r64={1d,1u} r65={1d,2u} 
r135={1d,1u} 
;;total ref usage 32{22d,10u,0e} in 2{2 regular + 0 call} insns.
(note 1 0 4 NOTE_INSN_DELETED)
(note 4 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
(note 2 4 3 2 NOTE_INSN_DELE

Re: RISC-V: Folding memory for FP + constant case

2023-08-01 Thread Vineet Gupta




On 8/1/23 15:07, Philipp Tomsich wrote:

+Manolis Tsamis

On Tue, 1 Aug 2023 at 23:56, Jeff Law via Gcc-patches
 wrote:



On 8/1/23 13:14, Vineet Gupta wrote:


I have some numbers for f-m-o v3 vs this. Attached here (vs. inline to
avoid the Thunderbird mangling the test formatting)

Thanks.  Of particular importance is the leela change.  My recollection
was that the f-m-o work also picked up that case.  But if my memory is
faulty (always a possibility), then that shows a clear case where
Jivan's work picks up a case not handled by Manolis's work.

f-m-o originally targeted (and benefited) the leela-case.  I wonder if
other optimizations/changes over the last year interfere with this and
what needs to be changed to accomodate this... looks like we need to
revisit against trunk.

Philipp.


And on the other direction we can see that deepsjeng isn't helped by
Jivan's work, but is helped by Manolis's new pass.

I'd always hoped/expected we'd have cases where one patch clearly helped
over the other.  While the .25% to .37% improvements for the three most
impacted benchmarks doesn't move the needle much across the whole suite
they do add up over time.

Jeff


I took a quick look at Leela, the significant difference is from 
additional insns with SP not getting propagated.


e.g.

   231b6:    mv    a4,sp
   231b8:    sh2add    a5,a5,a4

vs.

   1e824:    sh2add    a5,a5,sp

There are 5 such instances which more or less make up for the delta.

-Vineet



Re: RISC-V: Folding memory for FP + constant case

2023-08-01 Thread Vineet Gupta



On 8/1/23 16:06, Philipp Tomsich wrote:

Very helpful! Looks as if regprop for stack_pointer is now either too
conservative — or one of our patches is missing in everyone's test
setup; we'll take a closer look.


FWIW, all 5 of them involve a SH2ADD have SP as source in the fold FP 
case which f-m-o seems to be generating a MV for.




On Wed, 2 Aug 2023 at 01:03, Vineet Gupta  wrote:



On 8/1/23 15:07, Philipp Tomsich wrote:

+Manolis Tsamis

On Tue, 1 Aug 2023 at 23:56, Jeff Law via Gcc-patches
 wrote:


On 8/1/23 13:14, Vineet Gupta wrote:


I have some numbers for f-m-o v3 vs this. Attached here (vs. inline to
avoid the Thunderbird mangling the test formatting)

Thanks.  Of particular importance is the leela change.  My recollection
was that the f-m-o work also picked up that case.  But if my memory is
faulty (always a possibility), then that shows a clear case where
Jivan's work picks up a case not handled by Manolis's work.

f-m-o originally targeted (and benefited) the leela-case.  I wonder if
other optimizations/changes over the last year interfere with this and
what needs to be changed to accomodate this... looks like we need to
revisit against trunk.

Philipp.


And on the other direction we can see that deepsjeng isn't helped by
Jivan's work, but is helped by Manolis's new pass.

I'd always hoped/expected we'd have cases where one patch clearly helped
over the other.  While the .25% to .37% improvements for the three most
impacted benchmarks doesn't move the needle much across the whole suite
they do add up over time.

Jeff

I took a quick look at Leela, the significant difference is from
additional insns with SP not getting propagated.

e.g.

 231b6:mva4,sp
 231b8:sh2adda5,a5,a4

vs.

 1e824:sh2adda5,a5,sp

There are 5 such instances which more or less make up for the delta.

-Vineet





Re: RISC-V: Folding memory for FP + constant case

2023-08-01 Thread Vineet Gupta




On 8/1/23 16:22, Jeff Law wrote:
They must not be working as expected or folks are using old trees. 
Manolis's work for regcprop has been on the trunk for about 5-6 weeks 
ag this point: 


I have bleeding edge trunk from 2-3 days back. I think we are looking 
for the following which the tree does have.


2023-05-25 6a2e8dcbbd4b cprop_hardreg: Enable propagation of the stack 
pointer if possible


-Vineet


Re: RISC-V: Folding memory for FP + constant case

2023-08-01 Thread Vineet Gupta




On 8/1/23 16:27, Jeff Law wrote:



On 8/1/23 17:13, Vineet Gupta wrote:


On 8/1/23 16:06, Philipp Tomsich wrote:

Very helpful! Looks as if regprop for stack_pointer is now either too
conservative — or one of our patches is missing in everyone's test
setup; we'll take a closer look.


FWIW, all 5 of them involve a SH2ADD have SP as source in the fold FP 
case which f-m-o seems to be generating a MV for.
To clarify f-m-o isn't generating the mv.  It's simplifying a sequence 
by pushing the constant in an addi instruction into the memory 
reference.  As a result the addi simplifies into a sp->reg copy that 
is supposed to then be propagated away.


Yep, that's clear.



Also note that getting FP out of the shift-add sequences is the other 
key goal of Jivan's work.  FP elimination always results in a 
spill/reload if we have a shift-add insn where one operand is FP. 


Hmm, are you saying it should NOT be generating shift-add with SP as 
src, because currently thats exactly what fold FP offset *is* doing and 
is the reason it has 5 less insns.


-Vineet




Re: [PATCH] testsuite: constraint some of fp tests to hard_float

2022-07-13 Thread Vineet Gupta

Hi Jeff,

On 6/26/22 12:05, Jeff Law via Gcc-patches wrote:



On 5/29/2022 9:53 PM, Vineet Gupta wrote:

These tests validate fp conversions with various rounding modes which
would not work on soft-float ABIs.

On -march=rv64imac/-mabi=lp64 this reduces 5 unique failures (overall 35
due to multi flag combination builds)

gcc/testsuite/Changelog:
* gcc.dg/torture/fp-double-convert-float-1.c: Add
dg-require-effective-target hard_float.
* gcc.dg/torture/fp-int-convert-timode-3.c: Ditto.
* gcc.dg/torture/fp-int-convert-timode-4.c: Ditto.
* gcc.dg/torture/fp-uint64-convert-double-1.c: Ditto.
* gcc.dg/torture/fp-uint64-convert-double-2.c: Ditto.



Thanks.  I've pushed this to the trunk.


Can this be backported to gcc-12 please.

Thx,
-Vineet


Re: [PATCH v3] RISC-V/testsuite: constraint some of tests to hard_float

2022-07-13 Thread Vineet Gupta

On 5/29/22 20:50, Kito Cheng via Gcc-patches wrote:

Committed, thanks!


Can this be backported to gcc-12 please.

Thx,
-Vineet



On Fri, May 27, 2022 at 10:37 AM Vineet Gupta  wrote:


Commit 9ddd44b58649d1d ("RISC-V: Provide `fmin'/`fmax' RTL pattern") added
tests which check for hard float instructions which obviously fails on
soft-float ABI builds.

And my recent commit b646d7d279ae ("RISC-V: Inhibit FP <--> int register
moves via tune param") is guilty of same crime.

So constraint with "dg-require-effective-target hard_float"

This reduces bunch of new RV failures.

|   = Summary of gcc testsuite =
|| # of unexpected case / # of unique unexpected 
case
||  gcc |  g++ | gfortran |
|   rv64imac/   lp64/ medlow |  134 /22 |0 / 0 |- |  
BEFORE
|   rv64imac/   lp64/ medlow |   22 / 9 |0 / 0 |- |  
AFTER
|

gcc/testsuite/Changelog:
 * gcc.target/riscv/fmax.c: Add dg-require-effective-target hard_float.
 * gcc.target/riscv/fmaxf.c: Ditto.
 * gcc.target/riscv/fmin.c: Ditto.
 * gcc.target/riscv/fminf.c: Ditto.
 * gcc.target/riscv/smax-ieee.c: Ditto.
 * gcc.target/riscv/smax.c: Ditto.
 * gcc.target/riscv/smaxf-ieee.c: Ditto.
 * gcc.target/riscv/smaxf.c: Ditto.
 * gcc.target/riscv/smin-ieee.c: Ditto.
 * gcc.target/riscv/smin.c: Ditto.
 * gcc.target/riscv/sminf-ieee.c: Ditto.
 * gcc.target/riscv/sminf.c: Ditto.
 * gcc.target/riscv/pr105666.c: Ditto.

Signed-off-by: Vineet Gupta 
---
v3:
 Added fix to pr105666.c as well.
v2:
 Fixed the SoB snafu in v1
---
  gcc/testsuite/gcc.target/riscv/fmax.c   | 1 +
  gcc/testsuite/gcc.target/riscv/fmaxf.c  | 1 +
  gcc/testsuite/gcc.target/riscv/fmin.c   | 1 +
  gcc/testsuite/gcc.target/riscv/fminf.c  | 1 +
  gcc/testsuite/gcc.target/riscv/pr105666.c   | 1 +
  gcc/testsuite/gcc.target/riscv/smax-ieee.c  | 1 +
  gcc/testsuite/gcc.target/riscv/smax.c   | 1 +
  gcc/testsuite/gcc.target/riscv/smaxf-ieee.c | 1 +
  gcc/testsuite/gcc.target/riscv/smaxf.c  | 1 +
  gcc/testsuite/gcc.target/riscv/smin-ieee.c  | 1 +
  gcc/testsuite/gcc.target/riscv/smin.c   | 1 +
  gcc/testsuite/gcc.target/riscv/sminf-ieee.c | 1 +
  gcc/testsuite/gcc.target/riscv/sminf.c  | 1 +
  13 files changed, 13 insertions(+)

diff --git a/gcc/testsuite/gcc.target/riscv/fmax.c 
b/gcc/testsuite/gcc.target/riscv/fmax.c
index c71d35c9f9dc..e1b7fa8f918c 100644
--- a/gcc/testsuite/gcc.target/riscv/fmax.c
+++ b/gcc/testsuite/gcc.target/riscv/fmax.c
@@ -1,4 +1,5 @@
  /* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
  /* { dg-options "-fno-finite-math-only -fsigned-zeros -fno-signaling-nans 
-dp" } */

  double
diff --git a/gcc/testsuite/gcc.target/riscv/fmaxf.c 
b/gcc/testsuite/gcc.target/riscv/fmaxf.c
index f9980166887a..8da0513dc8f6 100644
--- a/gcc/testsuite/gcc.target/riscv/fmaxf.c
+++ b/gcc/testsuite/gcc.target/riscv/fmaxf.c
@@ -1,4 +1,5 @@
  /* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
  /* { dg-options "-fno-finite-math-only -fsigned-zeros -fno-signaling-nans 
-dp" } */

  float
diff --git a/gcc/testsuite/gcc.target/riscv/fmin.c 
b/gcc/testsuite/gcc.target/riscv/fmin.c
index 9634abd19af8..01993d49bc21 100644
--- a/gcc/testsuite/gcc.target/riscv/fmin.c
+++ b/gcc/testsuite/gcc.target/riscv/fmin.c
@@ -1,4 +1,5 @@
  /* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
  /* { dg-options "-fno-finite-math-only -fsigned-zeros -fno-signaling-nans 
-dp" } */

  double
diff --git a/gcc/testsuite/gcc.target/riscv/fminf.c 
b/gcc/testsuite/gcc.target/riscv/fminf.c
index 9a3687be3092..32ce363e10d8 100644
--- a/gcc/testsuite/gcc.target/riscv/fminf.c
+++ b/gcc/testsuite/gcc.target/riscv/fminf.c
@@ -1,4 +1,5 @@
  /* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
  /* { dg-options "-fno-finite-math-only -fsigned-zeros -fno-signaling-nans 
-dp" } */

  float
diff --git a/gcc/testsuite/gcc.target/riscv/pr105666.c 
b/gcc/testsuite/gcc.target/riscv/pr105666.c
index 904f3bc0763f..dd996eec8efc 100644
--- a/gcc/testsuite/gcc.target/riscv/pr105666.c
+++ b/gcc/testsuite/gcc.target/riscv/pr105666.c
@@ -6,6 +6,7 @@
 spilling to stack.  */

  /* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
  /* { dg-options "-march=rv64g -ffast-math" } */

  #define NITER 4
diff --git a/gcc/testsuite/gcc.target/riscv/smax-ieee.c 
b/gcc/testsuite/gcc.target/riscv/smax-ieee.c
index 3a98aeb45add..2dbccefe2f4d 100644
--- a/gcc/testsuite/gcc.target/riscv/smax-ieee.c
+++ b/gcc/testsuite/gcc.target/riscv/smax-ieee.c
@@ -1,4 +1,5 @@
  /* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
  /* { dg-options "-ffinite-math-only -

Re: [committed][RISC-V] Remove errant hunk of code

2023-08-03 Thread Vineet Gupta

On 8/3/23 11:12, Jeff Law via Gcc-patches wrote:

On Thu, 03 Aug 2023 08:05:09 PDT (-0700), gcc-patches@gcc.gnu.org wrote:

[...]

There's a bigger TODO in this space WRT a top-to-bottom evaluation of
the costing on RISC-V.  I'm still formulating what that evaluation is
going to look like, so don't hold your breath waiting on it.


[...]

Right.  I think we're probably tackling different issues.

The problem I'm looking at right now is when we ask for the cost of a 
sequence of insns, the underlying costs we get per insn are not at all 
in line with expectations.  We have likely either failed to provide 
one of the desired hook entries or we're missing toplevel constructs 
like INSN & SET in our riscv_rtx_costs. I'm not sure which it is yet, 
but there's clearly something wrong in there.  IMHO it falls into the 
"fix the stupid stuff" space.


As discussed in Tue call, I definitely have 1 fix to riscv_rtx_costs (), 
which is worth pondering. It adjusts the cost of consts and helps Hoist 
GCSE constants (which granted kicks in only at -Os). However it does 
affect codegen in subtle ways since CSE1 now for some cases generates 
additional REG_EQUAL note.


Again it might not help your issue. I'll post a RFC patch along with a 
test case - which in turns comes from an old PR I stumbled upon during 
research on const remat.


On a different but slightly related note, I was playing with Zicond this 
morning (enabling by default in toolchain build and qemu) and I was 
surprised to see that glibc build currently doesn't have a single czero* 
insn - although gcc has bene configured with


Point being, I can help with riscv_rtx_costs and friends, if you have 
something specific, but seems like you are in the thick of it and have 
that handled so I'll stay out of the way and refrain from zicond 
work/testing for the time being.


Thx,
-Vineet


Re: [committed][RISC-V] Remove errant hunk of code

2023-08-03 Thread Vineet Gupta




On 8/3/23 16:15, Jeff Law wrote:



On 8/3/23 16:26, Vineet Gupta wrote:


As discussed in Tue call, I definitely have 1 fix to riscv_rtx_costs 
(), which is worth pondering. It adjusts the cost of consts and helps 
Hoist GCSE constants (which granted kicks in only at -Os). However it 
does affect codegen in subtle ways since CSE1 now for some cases 
generates additional REG_EQUAL note.
Yea, we'll definitely want to take a look.  It's not likely affecting 
my work, but definitely want to evaluate as part of the overall 
costing question.


OK, I'll spin it and post to the list after a testsuite run.






On a different but slightly related note, I was playing with Zicond 
this morning (enabling by default in toolchain build and qemu) and I 
was surprised to see that glibc build currently doesn't have a single 
czero* insn - although gcc has bene configured with
Probably costing ;-)   That little hunk that snuck through is part of 
how I arrange to get more czero instructions when running the GCC 
testsuite.


Also note that if you're disassembling with binutils, I think it just 
dumps it out as a .word or something similar.


Yeah something was off, I was building /using the latest binutils but 
then my fancy over-engineered objdump helper was pointing to an a 
non-zicond binutils :-)






Point being, I can help with riscv_rtx_costs and friends, if you have 
something specific, but seems like you are in the thick of it and 
have that handled so I'll stay out of the way and refrain from zicond 
work/testing for the time being.
;-)  Actually if you wanted to poke at zicond, the most interesting 
unexplored area I've come across is the COND_EXPR handling in gimple. 
When we expand a COND_EXPR into RTL the first approach we take is to 
try movcc in RTL.


Unfortunately we don't create COND_EXPRs all that often in gimple.  
Some simple match.pd patterns would likely really help here.


The problem is RTL expansion when movcc FAILs is usually poor at 
best.  So if we're going to add those match.pd patterns, we probably 
need to beef up the RTL expansion code to do a better job when the 
target doesn't have a movcc RTL pattern.


Ok, I'll add that to my todo list.

Thx,
-Vineet


Re: [committed][RISC-V]Don't reject constants in cmov condition

2023-08-07 Thread Vineet Gupta



On 8/7/23 13:36, Jeff Law wrote:
Fixes several missed conditional moves with the trunk. 


I'm curious, how do you know what's missing. Does ifc have some stats 
like autovec which explicitly reports missed opportunities ?


-Vineet


RV64 Zicond ICE (was Re: [committed][RISC-V]Don't reject constants in cmov condition)

2023-08-08 Thread Vineet Gupta

Hi Jeff,

On 8/7/23 13:36, Jeff Law wrote:


This test is too aggressive.  Constants have VOIDmode, so we need to 
let the through this phase of conditional move support.


Fixes several missed conditional moves with the trunk.

Committed to the trunk,


As discussed this morning, this triggers an ICE when building glibc for 
rv64_zicond


programs/ld-ctype.c:3977:1: error: unrecognizable insn:
 3977 | }
  | ^
(insn 238 237 239 18 (set (reg:SI 727)
    (if_then_else:SI (eq:SI (reg:SI 725)
    (const_int 0 [0]))
    (const_int 0 [0])
    (reg:SI 721))) -1
 (nil))
during RTL pass: vregs
programs/ld-ctype.c:3977:1: internal compiler error: in extract_insn, at 
recog.cc:2791
0x885de7 _fatal_insn(char const*, rtx_def const*, char const*, int, char 
const*)     ../.././gcc/gcc/rtl-error.cc:108
0x885e09 _fatal_insn_not_found(rtx_def const*, char const*, int, char 
const*)    ../.././gcc/gcc/rtl-error.cc:116

0x8846f9 extract_insn(rtx_insn*)    ../.././gcc/gcc/recog.cc:2791
0xcf77ae instantiate_virtual_regs_in_insn ../.././gcc/gcc/function.cc:1610
0xcf77ae instantiate_virtual_regs ../.././gcc/gcc/function.cc:1983
0xcf77ae execute    ../.././gcc/gcc/function.cc:2030

Here's the reduced reproducer.

a, c;
long b;
d() {
  for (;;)
    if (a & (b < 8 ?: 1 << b))
  c = 1;
}

I'll try to muster a fix :-)

Thx,
-Vineet


Re: [PATCH V2] riscv: generate builtin macro for compilation with strict alignment:

2023-08-08 Thread Vineet Gupta




On 8/8/23 13:24, Edwin Lu wrote:

diff --git a/gcc/testsuite/gcc.target/riscv/attribute-1.c 
b/gcc/testsuite/gcc.target/riscv/attribute-1.c
index bc919c586b6..4b077c980a4 100644
--- a/gcc/testsuite/gcc.target/riscv/attribute-1.c
+++ b/gcc/testsuite/gcc.target/riscv/attribute-1.c
@@ -2,5 +2,20 @@
  /* { dg-options "-mriscv-attribute" } */
  int foo()
  {
+


Maybe add a comment that in absence of -m[no-]strict-align, we use the 
cpu tune param -> slow_unaligned_access and that default mcpu is rocket 
which has that set to _slow.



+#if defined(__riscv_unaligned_avoid)
+#error "__riscv_unaligned_avoid is unexpectedly set"
+#endif


Lets first check what is really expected.
#if !defined (_slow) #error


+
+/* Check __riscv_unaligned_slow xor __riscv_unaligned_fast is set.  */
+#if !defined(__riscv_unaligned_slow) && !defined(__riscv_unaligned_fast)
+#error "either __riscv_unaligned_slow or__riscv_unaligned_fast is not set"
+#endif
+
+#if defined(__riscv_unaligned_slow) && defined(__riscv_unaligned_fast)
+#error "both __riscv_unaligned_slow and__riscv_unaligned_fast are set"
+#endif
+


I think we can simplify this a bit (sorry I if wasn't clear enough in 
our off-list discussions).
We now need to ensure that unexpected toggles are NOT defined: #if 
defined(_avoid) || defined(_fast) #error
I don't think we need to check for the combinations as that is covered 
by first one and this.


While separate #error prints for 2 failures are ideal, personally it 
feels excessive given that the current implementation will only ever 
generate 1 of them. If a future code change breaks that assumption, the 
onus is on that change to fix/update the errors.



+return 0;
  }
  /* { dg-final { scan-assembler ".attribute arch" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/attribute-4.c 
b/gcc/testsuite/gcc.target/riscv/attribute-4.c
index 7c565c4963e..c505dbae2aa 100644
--- a/gcc/testsuite/gcc.target/riscv/attribute-4.c
+++ b/gcc/testsuite/gcc.target/riscv/attribute-4.c
@@ -2,5 +2,19 @@
  /* { dg-options "-mriscv-attribute -mstrict-align" } */
  int foo()
  {
+
+#if !defined(__riscv_unaligned_avoid)
+#error "__riscv_unaligned_avoid is not set"
+#endif
+
+#if defined(__riscv_unaligned_fast)
+#error "__riscv_unaligned_fast is unexpectedly set"
+#endif
+
+#if defined(__riscv_unaligned_slow)
+#error "__riscv_unaligned_slow is unexpectedly set"
+#endif
+
+  return 0;
  }
  /* { dg-final { scan-assembler ".attribute unaligned_access, 0" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/attribute-5.c 
b/gcc/testsuite/gcc.target/riscv/attribute-5.c
index ee9cf693be6..45afa6c464d 100644
--- a/gcc/testsuite/gcc.target/riscv/attribute-5.c
+++ b/gcc/testsuite/gcc.target/riscv/attribute-5.c
@@ -2,5 +2,20 @@
  /* { dg-options "-mriscv-attribute -mno-strict-align" } */
  int foo()
  {
+
+#if defined(__riscv_unaligned_avoid)
+#error "__riscv_unaligned_avoid is unexpectedly set"
+#endif
+
+/* Check __riscv_unaligned_slow xor __riscv_unaligned_fast is set.  */
+#if !defined(__riscv_unaligned_slow) && !defined(__riscv_unaligned_fast)
+#error "either __riscv_unaligned_slow or__riscv_unaligned_fast is not set"
+#endif
+
+#if defined(__riscv_unaligned_slow) && defined(__riscv_unaligned_fast)
+#error "both __riscv_unaligned_slow and__riscv_unaligned_fast are set"
+#endif


Same as my comment for attribute-1. Please recheck all of them.


+
+return 0;
  }
  /* { dg-final { scan-assembler ".attribute unaligned_access, 1" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/predef-align-1.c 
b/gcc/testsuite/gcc.target/riscv/predef-align-1.c
new file mode 100644
index 000..2a889dd9284
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/predef-align-1.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-mtune=thead-c906" } */
+
+int main() {
+
+/* thead-c906 default is cpu tune param unaligned access fast */
+#if !defined(__riscv_unaligned_fast)
+#error "__riscv_unaligned_fast is not set"
+#endif
+
+#if defined(__riscv_unaligned_slow)
+#error "__riscv_unaligned_slow is unexpectedly set"
+#endif
+
+#if defined(__risvc_unaligned_avoid)
+#error "__riscv_unaligned_avoid is unexpectedly set"
+#endif
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/predef-align-2.c 
b/gcc/testsuite/gcc.target/riscv/predef-align-2.c
new file mode 100644
index 000..76cbce60d9f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/predef-align-2.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mtune=thead-c906 -mstrict-align" } */
+
+int main() {
+
+#if !defined(__riscv_unaligned_avoid)
+#error "__riscv_unaligned_avoid is not set"
+#endif
+
+#if defined(__riscv_unaligned_fast)
+#error "__riscv_unaligned_fast is unexpectedly set"
+#endif
+
+#if defined(__riscv_unaligned_slow)
+#error "__riscv_unaligned_slow is unexpectedly set"
+#endif
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/predef-align-3.c 
b/gcc/testsuite/gcc.target/riscv/predef-align-3.c
new file mode 100644
index 000..f8caef48180
--- /dev/n

Re: [PATCH v9] RISC-V: Add the 'zfa' extension, version 0.2

2023-08-09 Thread Vineet Gupta

Hi Jin Ma,

On 5/16/23 00:06, jinma via Gcc-patches wrote:

On 5/15/23 07:16, Jin Ma wrote:


Do we also need to check Z[FDH]INX too?

Otherwise it looks pretty good.  We just need to wait for everything to
freeze and finalization on the assembler interface.

jeff

Yes, you are right, we also need to check Z[FDH]INX. I will send a patch
again to fix it after others give some review comments.


Can we please revisit this and get this merged upstream.
Seems like gcc is supporting frozen but not ratified extensions.

Thx,
-Vineet


[PATCH] RISC-V: Enable Hoist to GCSE simple constants

2023-08-09 Thread Vineet Gupta
Hoist want_to_gcse_p () calls rtx_cost () to compute max distance for
hoist candidates . For a const with cost 1 backend currently returns 0,
causing Hoist to bail and elide GCSE.

Note that constants requiring more than 1 insns to setup were working
already since backend is returning 1 as well. Arguably that needs updating
as well to reflect cost better, but that should be a different change
anyways.

To keep testsuite parity, some V tests need updatinge which started failing
in the new costing regime.

gcc/ChangeLog:
* gcc/config/riscv.cc (riscv_rtx_costs): Adjust const_int cost.

gcc/testsuite/ChangeLog:
* gcc.target/riscv/gcse-const.c: New Test
* gcc.target/riscv/rvv/vsetvl/vlmax_conflict-7.c: Disable for
  -O2.
* gcc.target/riscv/rvv/vsetvl/vlmax_conflict-8.c: Ditto.

Signed-off-by: Vineet Gupta 
---
 gcc/config/riscv/riscv.cc   |  9 ++---
 gcc/testsuite/gcc.target/riscv/gcse-const.c | 13 +
 .../gcc.target/riscv/rvv/vsetvl/vlmax_conflict-7.c  |  2 +-
 .../gcc.target/riscv/rvv/vsetvl/vlmax_conflict-8.c  |  2 +-
 4 files changed, 17 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/gcse-const.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 8b7256108157..1802eef908fc 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -2464,14 +2464,9 @@ riscv_rtx_costs (rtx x, machine_mode mode, int 
outer_code, int opno ATTRIBUTE_UN
 case CONST:
   if ((cost = riscv_const_insns (x)) > 0)
{
- /* If the constant is likely to be stored in a GPR, SETs of
-single-insn constants are as cheap as register sets; we
-never want to CSE them.  */
+ /* Hoist will GCSE constants only if cost is non-zero.  */
  if (cost == 1 && outer_code == SET)
-   *total = 0;
- /* When we load a constant more than once, it usually is better
-to duplicate the last operation in the sequence than to CSE
-the constant itself.  */
+   *total = COSTS_N_INSNS (1);
  else if (outer_code == SET || GET_MODE (x) == VOIDmode)
*total = COSTS_N_INSNS (1);
}
diff --git a/gcc/testsuite/gcc.target/riscv/gcse-const.c 
b/gcc/testsuite/gcc.target/riscv/gcse-const.c
new file mode 100644
index ..b04707ce9745
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/gcse-const.c
@@ -0,0 +1,13 @@
+/* Slightly modified copy of gcc.target/arm/pr40956.c.  */
+/* { dg-options "-Os" }  */
+/* Make sure the constant "6" is loaded into register only once.  */
+/* { dg-final { scan-assembler-times "\tli.*6" 1 } } */
+
+int foo(int p, int* q)
+{
+  if (p!=9)
+*q = 6;
+  else
+*(q+1) = 6;
+  return 3;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_conflict-7.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_conflict-7.c
index 60ad108666f8..83618880c8ea 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_conflict-7.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_conflict-7.c
@@ -21,5 +21,5 @@ void f (int32_t * restrict in, int32_t * restrict out, size_t 
n, size_t cond, si
 }
 
 /* { dg-final { scan-assembler-times {vsetvli} 4 { target { no-opts "-O0"  
no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts 
"-g" } } } } */
-/* { dg-final { scan-assembler-times {j\s+\.L[0-9]+\s+\.L[0-9]+:\s+vlm\.v} 1 { 
target { no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts 
"-funroll-loops" no-opts "-g" } } } } */
+/* { dg-final { scan-assembler-times {j\s+\.L[0-9]+\s+\.L[0-9]+:\s+vlm\.v} 1 { 
target { no-opts "-O0" no-opts "-O1" no-opts "-O2" no-opts "-Os" no-opts "-Oz" 
no-opts "-funroll-loops" no-opts "-g" } } } } */
 /* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e8,\s*m8,\s*t[au],\s*m[au]} 3 { target { 
no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts 
"-funroll-loops" no-opts "-g" } } } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_conflict-8.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_conflict-8.c
index 7b9574cc332d..8aca839c49aa 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_conflict-8.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_conflict-8.c
@@ -21,7 +21,7 @@ void f (int32_t * restrict in, int32_t * restrict out, size_t 
n, size_t cond, si
 }
 
 /* { dg-final { scan-assembler-times {vsetvli} 5 { target { no-opts "-O0"  
no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts 
"-g" } } } } */
-/* { dg-final { scan-assembler-times {j\s+\

IRA update_equiv_regs for (was Re: ICE for interim fix for PR/110748)

2023-08-11 Thread Vineet Gupta



On 8/1/23 12:17, Vineet Gupta wrote:

Hi Jeff,

As discussed this morning, I'm sending over dumps for the optim of DF 
const -0.0 (PR/110748)  [1]
For rv64gc_zbs build, IRA is undoing the split which eventually leads 
to ICE in final pass.


[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110748#c15

void znd(double *d) {  *d = -0.0;   }


*split1*

(insn 10 3 11 2 (set (reg:DI 136)
    (const_int [0x8000])) "neg.c":4:5 -1

(insn 11 10 0 2 (set (mem:DF (reg:DI 135) [1 *d_2(D)+0 S8 A64])
    (subreg:DF (reg:DI 136) 0)) "neg.c":4:5 -1

*ira*

(insn 11 9 12 2 (set (mem:DF (reg:DI 135) [1 *d_2(D)+0 S8 A64])
    (const_double:DF -0.0 [-0x0.0p+0])) "neg.c":4:5 190 
{*movdf_hardfloat_rv64}

 (expr_list:REG_DEAD (reg:DI 135)


For the working case, the large const is not involved and not subject 
to IRA playing foul.


I investigated this some more. So IRA update_equiv_regs () has code 
identifying potential replacements: if a reg is referenced exactly 
twice: set once and used once.


      if (REG_N_REFS (regno) == 2
      && (rtx_equal_p (replacement, src)
          || ! equiv_init_varies_p (src))
      && NONJUMP_INSN_P (insn)
      && equiv_init_movable_p (PATTERN (insn), regno))
        reg_equiv[regno].replace = 1;
    }

And combine_and_move_insns () does the replacement, undoing the split1 
above.


As a hack if I disable this code, the ICE above goes away and we get the 
right output.


    bseti    a5,zero,63
    sd    a5,0(a0)
    ret

In fact this is the reason for many more split1 being undone. See the 
suboptimal codegen for large const for Andrew Pinski's test case [1]


  long long f(void)
    {
  unsigned t = 0x101_0101;
  long long t1 = t;
  long long t2 = ((unsigned long long )t) << 32;
  asm("":"+r"(t1));
  return t1 | t2;
    }

[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-June/620413.html

Again if I use the hacked compiler the redundant immediate goes away.

  upstream  |   ira hack     |
    li a0,0x101_    |    li   a5,0x101   |
    addi a0,a0,0x101    |    addi a5,a5,0x101    |
    li a5,0x101_    |    mv   a0,a5  |
    slli a0,a0,32   |    slli a5,a5,32   |
    addi a5,a5,0x101|    or   a0,a0,a5   |
    or   a0,a5,a0   |    ret |
    ret ||



This code has been there since 2009, when ira.c was created so it 
obviously per design/expected.


I'm wondering (naively) if there is some way to tune this - for a given 
backend. In general it would make sense to do the replacement, but not 
if the cost changes (e.g. consts could be embedded in x86 insn freely, 
but not for RISC-V where this is costly and if something is split, it 
might been intentional.


Thx,
-Vineet


Re: IRA update_equiv_regs for (was Re: ICE for interim fix for PR/110748)

2023-08-14 Thread Vineet Gupta



On 8/11/23 17:04, Jeff Law wrote:


I'm wondering (naively) if there is some way to tune this - for a 
given backend. In general it would make sense to do the replacement, 
but not if the cost changes (e.g. consts could be embedded in x86 
insn freely, but not for RISC-V where this is costly and if something 
is split, it might been intentional.

I'm not immediately aware of a way to tune.

When it comes to tuning, the toplevel questions are do we have any of 
the info we need to tune at the point where the transformation occurs. 
The two most obvious pieces here would be loop info an register pressure.


ie, do we have enough loop structure to know if the def is at a 
shallower loop nest than the use.  There's a reasonable chance we have 
this information as my recollection is this analysis is done fairly 
early in IRA.


But that means we likely don't have any sense of register pressure at 
the points between the def and use.   So the most useful metric for 
tuning isn't really available.


I'd argue that even if the register pressure were high, in some cases, 
there's just no way around it and RA needs to honor what the backend did 
apriori (split in this case), otherwise we end up with something which 
doesn't compute literally and leads to ICE. I'm puzzled that in this 
case, intentional implementation is getting in the way. So while I don't 
care about the -0.0 case in itself, it seems with the current framework 
we can't just achieve the results, other that the roundabout way of 
peephole2 you alluded to.




The one thing that stands out is we don't do this transformation at 
all when register pressure sensitive scheduling is enabled. And we 
really should be turning that on by default.  Our data shows register 
pressure sensitive scheduling is about a 6-7% cycle improvement on 
x264 as it avoids spilling in those key satd loops.



 /* Don't move insns if live range shrinkage or register
 pressure-sensitive scheduling were done because it will not
 improve allocation but likely worsen insn scheduling.  */
  if (optimize
  && !flag_live_range_shrinkage
  && !(flag_sched_pressure && flag_schedule_insns))
    combine_and_move_insns ();



So you might want to look at register pressure sensitive scheduling 
first.  If you go into x264_r from specint and look at 
x264_pixel_satd_8x4.  First verify the loops are fully unrolled. If 
they are, then look for 32bit loads/stores into the stack.  If you 
have them, then you're spilling and getting crappy performance.  Using 
register pressure sensitive scheduling should help significantly.


Is that -fira-loop-pressure ?


We've certainly seen that internally.  The plan was to submit a patch 
to make register pressure sensitive scheduling the default when the 
scheduler is enabled.  We just haven't pushed on it.  If you can 
verify that you're seeing spilling as well, then it'd certainly 
bolster the argument that register-pressure-sensitive-scheduling is 
desirable.


I can confirm that the loop is fully unrolled and there's a zillion 
stack spills there for intermediate computes (-Ofast 
-march=rv64gc_zba_zbb_zbs, no V in that build).


Thx,
-Vineet


Re: [PATCH] RISC-V: Enable pressure-aware scheduling by default.

2023-08-18 Thread Vineet Gupta




On 8/18/23 16:08, Jeff Law wrote:

There is some slight regression in code quality for a number of
vector tests where we spill more due to different instructions order.
The ones I looked at were a mix of bad luck and/or brittle tests.
Comparing the size of the generated assembly or the number of vsetvls
for SPECint also didn't show any immediate benefit but that's obviously
not a very fine-grained analysis.
Yea.  In fact I wouldn't really expect significant changes other than 
those key loops in x264.


Care to elaborate a bit more please. I've seen severe reg pressure / 
spills in a bunch of others: cactu, lbm, exchange2. Is there something 
specific to x264 spills ?






diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc

index 4737dcd44a1..59848b21162 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -2017,9 +2017,11 @@ static const struct default_options 
riscv_option_optimization_table[] =

    {
  { OPT_LEVELS_1_PLUS, OPT_fsection_anchors, NULL, 1 },
  { OPT_LEVELS_2_PLUS, OPT_free, NULL, 1 },
+    { OPT_LEVELS_1_PLUS, OPT_fsched_pressure, NULL, 1 },


Nit2: maybe move this 1 line up to keep LEVEL_1 together, at least the 
new ones being added.



  #if TARGET_DEFAULT_ASYNC_UNWIND_TABLES == 1
  { OPT_LEVELS_ALL, OPT_fasynchronous_unwind_tables, NULL, 1 },
  { OPT_LEVELS_ALL, OPT_funwind_tables, NULL, 1},
+    /* Enable -fsched-pressure by default when optimizing.  */
  #endif
  { OPT_LEVELS_NONE, 0, NULL, 0 }
    };

Shouldn't the comment move up to before the OPT_fsched_pressure line?


Yep I had the exact same first though but then thought it was something 
deeper.

Turned out to be Occam's Razor after all :-)

Thx,
-Vineet


[PATCH] RISC-V: improve codegen for repeating large constants [3]

2023-06-30 Thread Vineet Gupta
Ran into a minor snafu in const splitting code when playing with test
case from an old PR/23813.

long long f(void) { return 0xF0F0F0F0F0F0F0F0ull; }

This currently generates

li  a5,-252645376
addia5,a5,241
li  a0,-252645376
sllia5,a5,32
addia0,a0,240
add a0,a5,a0
ret

The signed math in hival extraction introduces an additional bit,
causing loval == hival check to fail.

| riscv_split_integer (val=-1085102592571150096, mode=E_DImode) at 
../gcc/config/riscv/riscv.cc:702
| 702 unsigned HOST_WIDE_INT loval = sext_hwi (val, 32);
| (gdb)n
| 703 unsigned HOST_WIDE_INT hival = sext_hwi ((val - loval) >> 32, 32);
| (gdb)
| 704 rtx hi = gen_reg_rtx (mode), lo = gen_reg_rtx (mode);
| (gdb) p/x val
| $2 = 0xf0f0f0f0f0f0f0f0
| (gdb) p/x loval
| $3 = 0xf0f0f0f0
| (gdb) p/x hival
| $4 = 0xf0f0f0f1
   ^^^
Fix that by eliding the subtraction in shift.

With patch:

li  a5,-252645376
addia5,a5,240
sllia0,a5,32
add a0,a0,a5
ret

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_split_integer): hival computation
  do elide subtraction of loval.
* (riscv_split_integer_cost): Ditto.
* (riscv_build_integer): Ditto

Signed-off-by: Vineet Gupta 
---
I wasn't planning to do any more work on large const stuff, but just ran into 
it this
on a random BZ entry when trying search for redundant constant stuff.
The test seemed too good to pass :-)
---
 gcc/config/riscv/riscv.cc | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 5ac187c1b1b4..377d3aac794b 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -643,7 +643,7 @@ riscv_build_integer (struct riscv_integer_op *codes, 
HOST_WIDE_INT value,
   && (value > INT32_MAX || value < INT32_MIN))
 {
   unsigned HOST_WIDE_INT loval = sext_hwi (value, 32);
-  unsigned HOST_WIDE_INT hival = sext_hwi ((value - loval) >> 32, 32);
+  unsigned HOST_WIDE_INT hival = sext_hwi (value >> 32, 32);
   struct riscv_integer_op alt_codes[RISCV_MAX_INTEGER_OPS];
   struct riscv_integer_op hicode[RISCV_MAX_INTEGER_OPS];
   int hi_cost, lo_cost;
@@ -674,7 +674,7 @@ riscv_split_integer_cost (HOST_WIDE_INT val)
 {
   int cost;
   unsigned HOST_WIDE_INT loval = sext_hwi (val, 32);
-  unsigned HOST_WIDE_INT hival = sext_hwi ((val - loval) >> 32, 32);
+  unsigned HOST_WIDE_INT hival = sext_hwi (val >> 32, 32);
   struct riscv_integer_op codes[RISCV_MAX_INTEGER_OPS];
 
   cost = 2 + riscv_build_integer (codes, loval, VOIDmode);
@@ -700,7 +700,7 @@ static rtx
 riscv_split_integer (HOST_WIDE_INT val, machine_mode mode)
 {
   unsigned HOST_WIDE_INT loval = sext_hwi (val, 32);
-  unsigned HOST_WIDE_INT hival = sext_hwi ((val - loval) >> 32, 32);
+  unsigned HOST_WIDE_INT hival = sext_hwi (val >> 32, 32);
   rtx hi = gen_reg_rtx (mode), lo = gen_reg_rtx (mode);
 
   riscv_move_integer (lo, lo, loval, mode);
-- 
2.34.1



Re: [PATCH] RISC-V: improve codegen for repeating large constants [3]

2023-06-30 Thread Vineet Gupta




On 6/30/23 16:33, Vineet Gupta wrote:

Ran into a minor snafu in const splitting code when playing with test
case from an old PR/23813.

long long f(void) { return 0xF0F0F0F0F0F0F0F0ull; }

This currently generates

li  a5,-252645376
addia5,a5,241
li  a0,-252645376
sllia5,a5,32
addia0,a0,240
add a0,a5,a0
ret

The signed math in hival extraction introduces an additional bit,
causing loval == hival check to fail.

| riscv_split_integer (val=-1085102592571150096, mode=E_DImode) at 
../gcc/config/riscv/riscv.cc:702
| 702 unsigned HOST_WIDE_INT loval = sext_hwi (val, 32);
| (gdb)n
| 703 unsigned HOST_WIDE_INT hival = sext_hwi ((val - loval) >> 32, 32);
| (gdb)


FWIW (and I missed adding this observation to the changelog) I pondered 
about using unsigned loval/hival with zext_hwi() but that in certain 
cases can cause additional insns


e.g. constant 0x8000_ is codegen to LI 1 +SLLI 31 vs, LI 
0x_8000




| 704 rtx hi = gen_reg_rtx (mode), lo = gen_reg_rtx (mode);
| (gdb) p/x val
| $2 = 0xf0f0f0f0f0f0f0f0
| (gdb) p/x loval
| $3 = 0xf0f0f0f0
| (gdb) p/x hival
| $4 = 0xf0f0f0f1
^^^
Fix that by eliding the subtraction in shift.

With patch:

li  a5,-252645376
addia5,a5,240
sllia0,a5,32
add a0,a0,a5
ret

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_split_integer): hival computation
  do elide subtraction of loval.
* (riscv_split_integer_cost): Ditto.
* (riscv_build_integer): Ditto

Signed-off-by: Vineet Gupta 
---
I wasn't planning to do any more work on large const stuff, but just ran into 
it this
on a random BZ entry when trying search for redundant constant stuff.
The test seemed too good to pass :-)
---
  gcc/config/riscv/riscv.cc | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 5ac187c1b1b4..377d3aac794b 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -643,7 +643,7 @@ riscv_build_integer (struct riscv_integer_op *codes, 
HOST_WIDE_INT value,
&& (value > INT32_MAX || value < INT32_MIN))
  {
unsigned HOST_WIDE_INT loval = sext_hwi (value, 32);
-  unsigned HOST_WIDE_INT hival = sext_hwi ((value - loval) >> 32, 32);
+  unsigned HOST_WIDE_INT hival = sext_hwi (value >> 32, 32);
struct riscv_integer_op alt_codes[RISCV_MAX_INTEGER_OPS];
struct riscv_integer_op hicode[RISCV_MAX_INTEGER_OPS];
int hi_cost, lo_cost;
@@ -674,7 +674,7 @@ riscv_split_integer_cost (HOST_WIDE_INT val)
  {
int cost;
unsigned HOST_WIDE_INT loval = sext_hwi (val, 32);
-  unsigned HOST_WIDE_INT hival = sext_hwi ((val - loval) >> 32, 32);
+  unsigned HOST_WIDE_INT hival = sext_hwi (val >> 32, 32);
struct riscv_integer_op codes[RISCV_MAX_INTEGER_OPS];
  
cost = 2 + riscv_build_integer (codes, loval, VOIDmode);

@@ -700,7 +700,7 @@ static rtx
  riscv_split_integer (HOST_WIDE_INT val, machine_mode mode)
  {
unsigned HOST_WIDE_INT loval = sext_hwi (val, 32);
-  unsigned HOST_WIDE_INT hival = sext_hwi ((val - loval) >> 32, 32);
+  unsigned HOST_WIDE_INT hival = sext_hwi (val >> 32, 32);
rtx hi = gen_reg_rtx (mode), lo = gen_reg_rtx (mode);
  
riscv_move_integer (lo, lo, loval, mode);




Re: [PATCH] RISC-V: improve codegen for repeating large constants [3]

2023-06-30 Thread Vineet Gupta




On 6/30/23 16:50, Andrew Waterman wrote:

I don't believe this is correct; the subtraction is needed to account
for the fact that the low part might be negative, resulting in a
borrow from the high part.  See the output for your test case below:

$ cat test.c
#include 

int main()
{
   unsigned long result, tmp;

asm (
   "li  %1,-252645376\n"
   "addi%1,%1,240\n"
   "slli%0,%1,32\n"
   "add %0,%0,%1"
 : "=r" (result), "=r" (tmp));

   printf("%lx\n", result);

   return 0;
}
$ riscv64-unknown-elf-gcc -O2 test.c
$ spike pk a.out
bbl loader
f0f0f0eff0f0f0f0
$


Thx for the quick feedback Andew. I'm clearly lacking in signed math :-(
So is it possible to have a better code seq for the testcase at all ?

-Vineet




On Fri, Jun 30, 2023 at 4:42 PM Vineet Gupta  wrote:



On 6/30/23 16:33, Vineet Gupta wrote:

Ran into a minor snafu in const splitting code when playing with test
case from an old PR/23813.

   long long f(void) { return 0xF0F0F0F0F0F0F0F0ull; }

This currently generates

   li  a5,-252645376
   addia5,a5,241
   li  a0,-252645376
   sllia5,a5,32
   addia0,a0,240
   add a0,a5,a0
   ret

The signed math in hival extraction introduces an additional bit,
causing loval == hival check to fail.

| riscv_split_integer (val=-1085102592571150096, mode=E_DImode) at 
../gcc/config/riscv/riscv.cc:702
| 702   unsigned HOST_WIDE_INT loval = sext_hwi (val, 32);
| (gdb)n
| 703   unsigned HOST_WIDE_INT hival = sext_hwi ((val - loval) >> 32, 32);
| (gdb)

FWIW (and I missed adding this observation to the changelog) I pondered
about using unsigned loval/hival with zext_hwi() but that in certain
cases can cause additional insns

e.g. constant 0x8000_ is codegen to LI 1 +SLLI 31 vs, LI
0x_8000



| 704   rtx hi = gen_reg_rtx (mode), lo = gen_reg_rtx (mode);
| (gdb) p/x val
| $2 = 0xf0f0f0f0f0f0f0f0
| (gdb) p/x loval
| $3 = 0xf0f0f0f0
| (gdb) p/x hival
| $4 = 0xf0f0f0f1
 ^^^
Fix that by eliding the subtraction in shift.

With patch:

   li  a5,-252645376
   addia5,a5,240
   sllia0,a5,32
   add a0,a0,a5
   ret

gcc/ChangeLog:

   * config/riscv/riscv.cc (riscv_split_integer): hival computation
 do elide subtraction of loval.
   * (riscv_split_integer_cost): Ditto.
   * (riscv_build_integer): Ditto

Signed-off-by: Vineet Gupta 
---
I wasn't planning to do any more work on large const stuff, but just ran into 
it this
on a random BZ entry when trying search for redundant constant stuff.
The test seemed too good to pass :-)
---
   gcc/config/riscv/riscv.cc | 6 +++---
   1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 5ac187c1b1b4..377d3aac794b 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -643,7 +643,7 @@ riscv_build_integer (struct riscv_integer_op *codes, 
HOST_WIDE_INT value,
 && (value > INT32_MAX || value < INT32_MIN))
   {
 unsigned HOST_WIDE_INT loval = sext_hwi (value, 32);
-  unsigned HOST_WIDE_INT hival = sext_hwi ((value - loval) >> 32, 32);
+  unsigned HOST_WIDE_INT hival = sext_hwi (value >> 32, 32);
 struct riscv_integer_op alt_codes[RISCV_MAX_INTEGER_OPS];
 struct riscv_integer_op hicode[RISCV_MAX_INTEGER_OPS];
 int hi_cost, lo_cost;
@@ -674,7 +674,7 @@ riscv_split_integer_cost (HOST_WIDE_INT val)
   {
 int cost;
 unsigned HOST_WIDE_INT loval = sext_hwi (val, 32);
-  unsigned HOST_WIDE_INT hival = sext_hwi ((val - loval) >> 32, 32);
+  unsigned HOST_WIDE_INT hival = sext_hwi (val >> 32, 32);
 struct riscv_integer_op codes[RISCV_MAX_INTEGER_OPS];

 cost = 2 + riscv_build_integer (codes, loval, VOIDmode);
@@ -700,7 +700,7 @@ static rtx
   riscv_split_integer (HOST_WIDE_INT val, machine_mode mode)
   {
 unsigned HOST_WIDE_INT loval = sext_hwi (val, 32);
-  unsigned HOST_WIDE_INT hival = sext_hwi ((val - loval) >> 32, 32);
+  unsigned HOST_WIDE_INT hival = sext_hwi (val >> 32, 32);
 rtx hi = gen_reg_rtx (mode), lo = gen_reg_rtx (mode);

 riscv_move_integer (lo, lo, loval, mode);




Re: [PATCH v9 4/4] ree: Improve ree pass for rs6000 target using defined ABI interfaces

2023-10-20 Thread Vineet Gupta

On 10/19/23 23:50, Ajit Agarwal wrote:

Hello All:

This version 9 of the patch uses abi interfaces to remove zero and sign 
extension elimination.
Bootstrapped and regtested on powerpc-linux-gnu.

In this version (version 9) of the patch following review comments are 
incorporated.

a) Removal of hard code zero_extend and sign_extend  in abi interfaces.
b) Source and destination with different registers are considered.
c) Further enhancements.
d) Added sign extension elimination using abi interfaces.


As has been trend in the past, I don't think all the review comments 
have been addressed.
The standard practice is to reply to reviewer's email and say yay/nay 
explicitly to each comment. Some of my comments in [1a] are still not 
resolved, importantly the last two. To be fair you did reply [1b] but 
the comments were not addressed explicitly.


[1a] https://gcc.gnu.org/pipermail/gcc-patches/2023-September/630814.html
[1b] https://gcc.gnu.org/pipermail/gcc-patches/2023-September/630865.html

Anyhow I gave this a try for RISC-V, specially after [2a][2b] as I was 
curious to see if this uncovers REE handling extraneous extensions which 
could potentially be eliminated in Expand itself, which is generally 
better as it happens earlier in the pipeline.


[2a] 2023-10-16 8eb9cdd14218 expr: don't clear SUBREG_PROMOTED_VAR_P 
flag for a promoted subreg [target/111466]

[2b] https://gcc.gnu.org/pipermail/gcc-patches/2023-October/631818.html

Bad news is with the patch, we fail to even bootstrap risc-v, buckles 
over when building libgcc itself.


The reproducer is embarrassingly simple, build with -O2:

float a;
b() { return a; }

See details below


Thanks & Regards
Ajit

ree: Improve ree pass for rs6000 target using defined abi interfaces

For rs6000 target we see redundant zero and sign extension and done
to improve ree pass to eliminate such redundant zero and sign extension
using defined ABI interfaces.

2023-10-20  Ajit Kumar Agarwal  

gcc/ChangeLog:

 * ree.cc (combine_reaching_defs): Use of zero_extend and sign_extend
 defined abi interfaces.
 (add_removable_extension): Use of defined abi interfaces for no
 reaching defs.
 (abi_extension_candidate_return_reg_p): New function.
 (abi_extension_candidate_p): New function.
 (abi_extension_candidate_argno_p): New function.
 (abi_handle_regs): New function.
 (abi_target_promote_function_mode): New function.

gcc/testsuite/ChangeLog:

 * g++.target/powerpc/zext-elim-3.C
---
  gcc/ree.cc| 151 +-
  .../g++.target/powerpc/zext-elim-3.C  |  13 ++
  2 files changed, 161 insertions(+), 3 deletions(-)
  create mode 100644 gcc/testsuite/g++.target/powerpc/zext-elim-3.C

diff --git a/gcc/ree.cc b/gcc/ree.cc
index fc04249fa84..9f21f0e9907 100644
--- a/gcc/ree.cc
+++ b/gcc/ree.cc
@@ -514,7 +514,8 @@ get_uses (rtx_insn *insn, rtx reg)
  if (REGNO (DF_REF_REG (def)) == REGNO (reg))
break;
  
-  gcc_assert (def != NULL);

+  if (def == NULL)
+return NULL;
  
ref_chain = DF_REF_CHAIN (def);
  
@@ -750,6 +751,124 @@ get_extended_src_reg (rtx src)

return src;
  }
  
+/* Return TRUE if target mode is equal to source mode of zero_extend

+   or sign_extend otherwise false.  */
+
+static bool
+abi_target_promote_function_mode (machine_mode mode)
+{
+  int unsignedp;
+  machine_mode tgt_mode
+= targetm.calls.promote_function_mode (NULL_TREE, mode, &unsignedp,
+  NULL_TREE, 1);
+
+  if (tgt_mode == mode)
+return true;
+  else
+return false;
+}
+
+/* Return TRUE if the candidate insn is zero extend and regno is
+   a return registers.  */
+
+static bool
+abi_extension_candidate_return_reg_p (/*rtx_insn *insn, */int regno)
+{
+  if (targetm.calls.function_value_regno_p (regno))
+return true;
+
+  return false;
+}
+
+/* Return TRUE if reg source operand of zero_extend is argument registers
+   and not return registers and source and destination operand are same
+   and mode of source and destination operand are not same.  */
+
+static bool
+abi_extension_candidate_p (rtx_insn *insn)
+{
+  rtx set = single_set (insn);
+  machine_mode dst_mode = GET_MODE (SET_DEST (set));
+  rtx orig_src = XEXP (SET_SRC (set), 0);
+
+  if (!FUNCTION_ARG_REGNO_P (REGNO (orig_src))
+  || abi_extension_candidate_return_reg_p (/*insn,*/ REGNO (orig_src)))
+return false;
+
+  /* Mode of destination and source should be different.  */
+  if (dst_mode == GET_MODE (orig_src))
+return false;
+
+  machine_mode mode = GET_MODE (XEXP (SET_SRC (set), 0));
+  bool promote_p = abi_target_promote_function_mode (mode);
+
+  /* REGNO of source and destination should be same if not
+  promoted.  */
+  if (!promote_p && REGNO (SET_DEST (set)) != REGNO (orig_src))
+return false;
+
+  return true;
+}
+
+/* Return TRUE if the candidate insn is zero extend and regno is
+   an argument r

Re: [PATCH v9 4/4] ree: Improve ree pass for rs6000 target using defined ABI interfaces

2023-10-23 Thread Vineet Gupta




On 10/22/23 23:46, Ajit Agarwal wrote:

Hello All:

Addressed below review comments in the version 11 of the patch.
Please review and please let me know if its ok for trunk.

Thanks & Regards
Ajit


Again you are not paying attention to prior comments about fixing your 
submission practice and like some of the prior reviewers I'm starting to 
get tired, despite potentially good technical content.


1. The commentary above is NOT part of changelog. Either use a separate 
cover letter or add patch version change history between two "---" lines 
just before the start of code diff. And keep accumulating those as you 
post new versions. See [1]. This is so reviewers knwo what changed over 
10 months and automatically gets dropped when patch is eventually 
applied/merged into tree.


2. Acknowledge (even if it is yes) each and every comment of the 
reviewerw explicitly inline below. That ensures you don't miss 
addressing a change since this forces one to think about each of them.


I do have some technical comments which I'll follow up with later.
Just a short summary that v10 indeed bootstraps risc-v but I don't see 
any improvements at all - as in whenever abi interfaces code identifies 
and extension (saw missing a definition, the it is not able to eliminate 
any extensions despite the patch.


-Vineet

[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632180.html



On 22/10/23 12:56 am, rep.dot@gmail.com wrote:

On 21 October 2023 01:56:16 CEST, Vineet Gupta  wrote:

On 10/19/23 23:50, Ajit Agarwal wrote:

Hello All:

This version 9 of the patch uses abi interfaces to remove zero and sign 
extension elimination.
Bootstrapped and regtested on powerpc-linux-gnu.

In this version (version 9) of the patch following review comments are 
incorporated.

a) Removal of hard code zero_extend and sign_extend  in abi interfaces.
b) Source and destination with different registers are considered.
c) Further enhancements.
d) Added sign extension elimination using abi interfaces.

As has been trend in the past, I don't think all the review comments have been 
addressed.

And apart from that, may I ask if this is just me, or does anybody else think 
that it might be worthwhile to actually read a patch before (re-)posting?

Seeing e.g. the proposed abi_extension_candidate_p as written in a first POC 
would deserve some manual CSE, if nothing more then for clarity and conciseness?

Just curious from a meta perspective..

And:


ree: Improve ree pass for rs6000 target using defined abi interfaces

mentioning powerpc like this, and then changing generic code could be 
interpreted as misleading, IMHO.


For rs6000 target we see redundant zero and sign extension and done
to improve ree pass to eliminate such redundant zero and sign extension
using defined ABI interfaces.

Mentioning powerpc in the body as one of the affected target(s) is of course 
fine.



   +/* Return TRUE if target mode is equal to source mode of zero_extend
+   or sign_extend otherwise false.  */

, false otherwise.

But I'm not a native speaker



+/* Return TRUE if the candidate insn is zero extend and regno is
+   a return registers.  */
+
+static bool
+abi_extension_candidate_return_reg_p (/*rtx_insn *insn, */int regno)

Leftover debug comment.


+{
+  if (targetm.calls.function_value_regno_p (regno))
+return true;
+
+  return false;
+}
+

As said, I don't see why the below was not cleaned up before the V1 submission.
Iff it breaks when manually CSEing, I'm curious why?


+/* Return TRUE if reg source operand of zero_extend is argument registers
+   and not return registers and source and destination operand are same
+   and mode of source and destination operand are not same.  */
+
+static bool
+abi_extension_candidate_p (rtx_insn *insn)
+{
+  rtx set = single_set (insn);
+  machine_mode dst_mode = GET_MODE (SET_DEST (set));
+  rtx orig_src = XEXP (SET_SRC (set), 0);
+
+  if (!FUNCTION_ARG_REGNO_P (REGNO (orig_src))
+  || abi_extension_candidate_return_reg_p (/*insn,*/ REGNO (orig_src)))

On top, debug leftover.


+return false;
+
+  /* Mode of destination and source should be different.  */
+  if (dst_mode == GET_MODE (orig_src))
+return false;
+
+  machine_mode mode = GET_MODE (XEXP (SET_SRC (set), 0));
+  bool promote_p = abi_target_promote_function_mode (mode);
+
+  /* REGNO of source and destination should be same if not
+  promoted.  */
+  if (!promote_p && REGNO (SET_DEST (set)) != REGNO (orig_src))
+return false;
+
+  return true;
+}
+

As said, please also rephrase the above (and everything else if it obviously 
looks akin the above).

The rest, mentioned below,  should largely be covered by following the coding 
convention.


+/* Return TRUE if the candidate insn is zero extend and regno is
+   an argument registers.  */

Singular register.


+
+static bool
+abi_extension_candidate_argno_p (/*rtx_code code, */int regno)

Debug leftover

Re: [PATCH V14 4/4] ree: Improve ree pass using defined abi interfaces

2023-10-24 Thread Vineet Gupta




On 10/24/23 10:03, Ajit Agarwal wrote:

Hello Vineet, Jeff and Bernhard:

This version 14 of the patch uses abi interfaces to remove zero and sign 
extension elimination.
This fixes aarch64 regressions failures with aggressive CSE.


Once again, this information belong between the two "---" lines that you 
added for v6 and stopped updating.


And it seems the only code difference between v13 and v14 is

-  return tgt_mode == mode;
+  if (tgt_mode == mode)
+    return true;
+  else
+    return false;

How does that make any difference ?

-Vineet



Bootstrapped and regtested on powerpc-linux-gnu.

In this version (version 14) of the patch following review comments are 
incorporated.

a) Removal of hard code zero_extend and sign_extend  in abi interfaces.
b) Source and destination with different registers are considered.
c) Further enhancements.
d) Added sign extension elimination using abi interfaces.
d) Addressed remaining review comments from Vineet.
e) Addressed review comments from Bernhard.
f) Fix aarch64 regressions failure.

Please let me know if there is anything missing in this patch.

Ok for trunk?

Thanks & Regards
Ajit

ree: Improve ree pass using defined abi interfaces

For rs6000 target we see zero and sign extend with missing
definitions. Improved to eliminate such zero and sign extension
using defined ABI interfaces.

2023-10-24  Ajit Kumar Agarwal  

gcc/ChangeLog:

 * ree.cc (combine_reaching_defs): Eliminate zero_extend and sign_extend
 using defined abi interfaces.
 (add_removable_extension): Use of defined abi interfaces for no
 reaching defs.
 (abi_extension_candidate_return_reg_p): New function.
 (abi_extension_candidate_p): New function.
 (abi_extension_candidate_argno_p): New function.
 (abi_handle_regs): New function.
 (abi_target_promote_function_mode): New function.

gcc/testsuite/ChangeLog:

 * g++.target/powerpc/zext-elim-3.C
---
changes since v6:
   - Added missing abi interfaces.
   - Rearranging and restructuring the code.
   - Removal of hard coded zero extend and sign extend in abi interfaces.
   - Relaxed different registers with source and destination in abi interfaces.
   - Using CSE in abi interfaces.
   - Fix aarch64 regressions.
   - Add Sign extension removal in abi interfaces.
   - Modified comments as per coding convention.
   - Modified code as per coding convention.
   - Fix bug bootstrapping RISCV failures.
---
  gcc/ree.cc| 147 +-
  .../g++.target/powerpc/zext-elim-3.C  |  13 ++
  2 files changed, 154 insertions(+), 6 deletions(-)
  create mode 100644 gcc/testsuite/g++.target/powerpc/zext-elim-3.C

diff --git a/gcc/ree.cc b/gcc/ree.cc
index fc04249fa84..f557b49b366 100644
--- a/gcc/ree.cc
+++ b/gcc/ree.cc
@@ -514,7 +514,8 @@ get_uses (rtx_insn *insn, rtx reg)
  if (REGNO (DF_REF_REG (def)) == REGNO (reg))
break;
  
-  gcc_assert (def != NULL);

+  if (def == NULL)
+return NULL;
  
ref_chain = DF_REF_CHAIN (def);
  
@@ -750,6 +751,120 @@ get_extended_src_reg (rtx src)

return src;
  }
  
+/* Return TRUE if target mode is equal to source mode, false otherwise.  */

+
+static bool
+abi_target_promote_function_mode (machine_mode mode)
+{
+  int unsignedp;
+  machine_mode tgt_mode
+= targetm.calls.promote_function_mode (NULL_TREE, mode, &unsignedp,
+  NULL_TREE, 1);
+
+  if (tgt_mode == mode)
+return true;
+  else
+return false;
+}
+
+/* Return TRUE if regno is a return register.  */
+
+static inline bool
+abi_extension_candidate_return_reg_p (int regno)
+{
+  if (targetm.calls.function_value_regno_p (regno))
+return true;
+
+  return false;
+}
+
+/* Return TRUE if the following conditions are satisfied.
+
+  a) reg source operand is argument register and not return register.
+  b) mode of source and destination operand are different.
+  c) if not promoted REGNO of source and destination operand are same.  */
+
+static bool
+abi_extension_candidate_p (rtx_insn *insn)
+{
+  rtx set = single_set (insn);
+  machine_mode dst_mode = GET_MODE (SET_DEST (set));
+  rtx orig_src = XEXP (SET_SRC (set), 0);
+
+  if (!FUNCTION_ARG_REGNO_P (REGNO (orig_src))
+  || abi_extension_candidate_return_reg_p (REGNO (orig_src)))
+return false;
+
+  /* Return FALSE if mode of destination and source is same.  */
+  if (dst_mode == GET_MODE (orig_src))
+return false;
+
+  machine_mode mode = GET_MODE (XEXP (SET_SRC (set), 0));
+  bool promote_p = abi_target_promote_function_mode (mode);
+
+  /* Return FALSE if promote is false and REGNO of source and destination
+ is different.  */
+  if (!promote_p && REGNO (SET_DEST (set)) != REGNO (orig_src))
+return false;
+
+  return true;
+}
+
+/* Return TRUE if regno is an argument register.  */
+
+static inline bool
+abi_extension_candidate_argno_p (int regno)
+{
+  return FUNCTION_ARG_REGNO_P (regno);
+}
+
+

Re: [PATCH v9 4/4] ree: Improve ree pass for rs6000 target using defined ABI interfaces

2023-10-24 Thread Vineet Gupta

On 10/24/23 13:36, rep.dot@gmail.com wrote:

As said, I don't see why the below was not cleaned up before the V1 submission.
Iff it breaks when manually CSEing, I'm curious why?

The function below looks identical in v12 of the patch.
Why didn't you use common subexpressions?
ba

Using CSE here breaks aarch64 regressions hence I have reverted it back
not to use CSE,

Just for my own education, can you please paste your patch perusing common 
subexpressions and an assembly diff of the failing versus working aarch64 
testcase, along how you configured that failing (cross-?)compiler and the 
command-line of a typical testcase that broke when manually CSEing the function 
below?


I was meaning to ask this before, but what exactly is the CSE issue, 
manually or whatever.


-Vineet


[RFC] RISC-V: elide sign extend when expanding cmp_and_jump

2023-10-24 Thread Vineet Gupta
RV64 comapre and branch instructions only support 64-bit operands.
The backend unconditionally generates zero/sign extend at Expand time
for compare and branch operands even if they are already as such
e.g. function args which ABI guarantees to be sign-extended (at callsite).

And subsequently REE fails to eliminate them as
"missing defintion(s)"
specially for function args since they show up as missing explicit
definition.

So elide the sign extensions at Expand time for a subreg promoted var
with inner word sized value whcih doesn't need explicit sign extending
(fairly good represntative of an incoming function arg).

There are patches floating to enhance REE and/or new passes to elimiate
extensions, but it is always better to not generate them if possible,
given Expand is so early, the elimiated extend would help improve outcomes
of later passes such as combine if they have fewer operations/combinations
to deal with.

The test used was existing gcc.c-torture/execute/20020506-1.c -O2 zbb
It elimiates the SEXT.W and an additional branch

before  after
--- --
test2:  test2:
sext.b  a0,a0
blt a0,zero,.L15
bne a1,zero,.L17bne a1,zero,.L17

This is marked RFC as I only ran a spot check with a directed test and
want to use Patrick's pre-commit CI to do the A/B testing for me.

gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_extend_comparands): Don't
sign-extend operand if subreg promoted with inner word size.

Signed-off-by: Vineet Gupta 
---
 gcc/config/riscv/riscv.cc | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index f2dcb0db6fbd..a8d12717e43d 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -3707,7 +3707,16 @@ riscv_extend_comparands (rtx_code code, rtx *op0, rtx 
*op1)
}
   else
{
- *op0 = gen_rtx_SIGN_EXTEND (word_mode, *op0);
+   /* A subreg promoted SI of DI could be peeled to expose DI, eliding
+  an unnecessary sign extension.  */
+   if (GET_CODE (*op0) == SUBREG
+   && SUBREG_PROMOTED_VAR_P (*op0)
+   && GET_MODE_SIZE (GET_MODE (XEXP (*op0, 0))).to_constant ()
+  == GET_MODE_SIZE (word_mode))
+*op0 = XEXP (*op0, 0);
+   else
+*op0 = gen_rtx_SIGN_EXTEND (word_mode, *op0);
+
  if (*op1 != const0_rtx)
*op1 = gen_rtx_SIGN_EXTEND (word_mode, *op1);
}
-- 
2.34.1



Re: [RFC] RISC-V: elide sign extend when expanding cmp_and_jump

2023-10-25 Thread Vineet Gupta

Hey Robin,

On 10/25/23 00:12, Robin Dapp wrote:

Hi Vineet,

I was thinking of two things while skimming the code:

  - Couldn't we do this in the expanders directly?  Or is the
subreg-promoted info gone until we reach that?


Following is the call stack involved:

  expand_gimple_cond
    do_compare_and_jump
   emit_cmp_and_jump_insns
   gen_cbranchqi4
   riscv_expand_conditional_branch
   riscv_emit_int_compare
  riscv_extend_comparands


Last function is what introduces the extraneous sign extends, w/o taking 
subreg-promoted into consideration and what my patch attempts to address.



  - Should some common-code part be more suited to handle that?
We already elide redundant sign-zero extensions for other
reasons.  Maybe we could add subreg promoted handling there?


Not in the context of this specific issue.

-Vineet


Re: [RFC] RISC-V: elide sign extend when expanding cmp_and_jump

2023-10-25 Thread Vineet Gupta




On 10/25/23 09:30, Jeff Law wrote:

  - Should some common-code part be more suited to handle that?
    We already elide redundant sign-zero extensions for other
    reasons.  Maybe we could add subreg promoted handling there?


Not in the context of this specific issue.
Robin's point (IIUC) is that if we put this logic into a zero/sign 
extend expander, then it'll get used for *any* attempt to zero/sign 
extend that goes through the target expander.


It doesn't work for your case because we use 
gen_rtx_{ZERO,SIGN}_EXTEND directly.   But if those were adjusted to 
use the expander, then Robin's idea would be applicable to this case too


Understood. Definitely solid idea.

-Vineet


Re: [RFC] RISC-V: elide sign extend when expanding cmp_and_jump

2023-10-25 Thread Vineet Gupta




On 10/25/23 06:52, Jeff Law wrote:


On 10/25/23 07:47, Robin Dapp wrote:



Well, it doesn't seem like there's a lot of difference between doing
it in the generic expander bits vs target expander bits -- the former
just calls into the latter for the most part.  Thus if the
subreg-promoted state is available in the target expander, I'd expect
it to be available in the generic expander.


Ah, sorry I meant in the [sign|zero]extend expanders rather than the
compare expanders in order to catch promoted subregs from other origins
as well.  Maybe that doesn't work, though?
That's a *really* interesting idea.  Interested in playing around a 
bit with that Vineet?


Sure I'll tinker with the {sign,zero}expanders.

And there's a third playing field :-) There seem to be still cases where 
subreg-promoted note is not set when it probably should.
So we end up in riscv_extend_comparands but with note not being there 
(for something corresponding to function arg) thus can't skip the extension.


Thx,
-Vineet


Re: [RFC] RISC-V: elide sign extend when expanding cmp_and_jump

2023-10-25 Thread Vineet Gupta

On 10/24/23 22:01, Vineet Gupta wrote:

RV64 comapre and branch instructions only support 64-bit operands.
The backend unconditionally generates zero/sign extend at Expand time
for compare and branch operands even if they are already as such
e.g. function args which ABI guarantees to be sign-extended (at callsite).

And subsequently REE fails to eliminate them as
"missing defintion(s)"
specially for function args since they show up as missing explicit
definition.

So elide the sign extensions at Expand time for a subreg promoted var
with inner word sized value whcih doesn't need explicit sign extending
(fairly good represntative of an incoming function arg).

There are patches floating to enhance REE and/or new passes to elimiate
extensions, but it is always better to not generate them if possible,
given Expand is so early, the elimiated extend would help improve outcomes
of later passes such as combine if they have fewer operations/combinations
to deal with.

The test used was existing gcc.c-torture/execute/20020506-1.c -O2 zbb
It elimiates the SEXT.W and an additional branch

before  after
--- --
test2:  test2:
sext.b  a0,a0
blt a0,zero,.L15
bne a1,zero,.L17bne a1,zero,.L17

This is marked RFC as I only ran a spot check with a directed test and
want to use Patrick's pre-commit CI to do the A/B testing for me.

gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_extend_comparands): Don't
sign-extend operand if subreg promoted with inner word size.

Signed-off-by: Vineet Gupta 
---
  gcc/config/riscv/riscv.cc | 11 ++-
  1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index f2dcb0db6fbd..a8d12717e43d 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -3707,7 +3707,16 @@ riscv_extend_comparands (rtx_code code, rtx *op0, rtx 
*op1)
}
else
{
- *op0 = gen_rtx_SIGN_EXTEND (word_mode, *op0);
+   /* A subreg promoted SI of DI could be peeled to expose DI, eliding
+  an unnecessary sign extension.  */
+   if (GET_CODE (*op0) == SUBREG
+   && SUBREG_PROMOTED_VAR_P (*op0)
+   && GET_MODE_SIZE (GET_MODE (XEXP (*op0, 0))).to_constant ()
+  == GET_MODE_SIZE (word_mode))
+*op0 = XEXP (*op0, 0);
+   else
+*op0 = gen_rtx_SIGN_EXTEND (word_mode, *op0);
+
  if (*op1 != const0_rtx)
*op1 = gen_rtx_SIGN_EXTEND (word_mode, *op1);
}


Just a quick update that testing revealed additional failures and 
unpacking which took me a while and in hindsight embarrassing. I was 
misunderstanding what ABI guarantees and what ISA actually does :-)


The ABI guarantees sign extension this for 32-bit things in 64-bit reg 
(so in rv64, int), but *not* 8-bit things in a reg. i.e. not for char 
arguments.

e.g.

uint8_t abs8(uint8_t x)
{
   if (x & 0x80) // SEXT.b a4, a0
      ...
}

main()
{
    if (abs8(128) != 127) // LI a0, 128  => ADDI a0, x0, 128
   __builtin_abort();
}

So my patch was optimizing away the SEXT.B (despite claiming that subreg 
prom of SI) which is wrong.
Sure ADDI in main () sign-extends, but it does that for 12-bit imm 
0x080, which comes out to 0x80, but sign-extending for char scope 0x80 
would yield 0x0x80. Hence the issue.


I'll spin a v2 after more testing.

This is slightly disappointing as it reduces the scope of optimization, 
but correctness in this trade is non-negotiable :-)


-Vineet


Re: [PATCH v9 4/4] ree: Improve ree pass for rs6000 target using defined ABI interfaces

2023-10-27 Thread Vineet Gupta




On 10/27/23 10:16, Bernhard Reutner-Fischer wrote:

On Wed, 25 Oct 2023 16:41:07 +0530
Ajit Agarwal  wrote:


On 25/10/23 2:19 am, Vineet Gupta wrote:

On 10/24/23 13:36, rep.dot@gmail.com wrote:

As said, I don't see why the below was not cleaned up before the V1 submission.
Iff it breaks when manually CSEing, I'm curious why?

The function below looks identical in v12 of the patch.
Why didn't you use common subexpressions?
ba

Using CSE here breaks aarch64 regressions hence I have reverted it back
not to use CSE,

Just for my own education, can you please paste your patch perusing common 
subexpressions and an assembly diff of the failing versus working aarch64 
testcase, along how you configured that failing (cross-?)compiler and the 
command-line of a typical testcase that broke when manually CSEing the function 
below?

I was meaning to ask this before, but what exactly is the CSE issue, manually 
or whatever.

If nothing else it would hopefully improve the readability.

   

Here is the abi interface where I CSE'D and got a mail from automated 
regressions run that aarch64
test fails.

We already concluded that this failure was obviously a hiccup on the
testers, no problem.


+static inline bool
+abi_extension_candidate_return_reg_p (int regno)
+{
+  return targetm.calls.function_value_regno_p (regno);
+}

But i was referring to abi_extension_candidate_p :)

your v13 looks like this:

+static bool
+abi_extension_candidate_p (rtx_insn *insn)
+{
+  rtx set = single_set (insn);
+  machine_mode dst_mode = GET_MODE (SET_DEST (set));
+  rtx orig_src = XEXP (SET_SRC (set), 0);
+
+  if (!FUNCTION_ARG_REGNO_P (REGNO (orig_src))
+  || abi_extension_candidate_return_reg_p (REGNO (orig_src)))
+return false;
+
+  /* Return FALSE if mode of destination and source is same.  */
+  if (dst_mode == GET_MODE (orig_src))
+return false;
+
+  machine_mode mode = GET_MODE (XEXP (SET_SRC (set), 0));
+  bool promote_p = abi_target_promote_function_mode (mode);
+
+  /* Return FALSE if promote is false and REGNO of source and destination
+ is different.  */
+  if (!promote_p && REGNO (SET_DEST (set)) != REGNO (orig_src))
+return false;
+
+  return true;
+}

and i suppose it would be easier to read if phrased something like

static bool
abi_extension_candidate_p (rtx_insn *insn)
{
   rtx set = single_set (insn);
   rtx orig_src = XEXP (SET_SRC (set), 0);
   unsigned int src_regno = REGNO (orig_src);

   /* Not a function argument reg or is a function values return reg.  */
   if (!FUNCTION_ARG_REGNO_P (src_regno)
   || abi_extension_candidate_return_reg_p (src_regno))
 return false;

   rtx dst = SET_DST (set);
   machine_mode src_mode = GET_MODE (orig_src);

   /* Return FALSE if mode of destination and source is the same.  */
   if (GET_MODE (dst) == src_mode)
 return false;

   /* Return FALSE if the FIX THE COMMENT and REGNO of source and destination
  is different.  */
   if (!abi_target_promote_function_mode_p (src_mode)
   && REGNO (dst) != src_regno)
 return false;

   return true;
}

so no, that's not exactly better.

Maybe just do what the function comment says (i did not check the "not
promoted" part, but you get the idea):

^L

/* Return TRUE if
reg source operand is argument register and not return register,
mode of source and destination operand are different,
if not promoted REGNO of source and destination operand are the same.  */
static bool
abi_extension_candidate_p (rtx_insn *insn)
{
   rtx set = single_set (insn);
   rtx orig_src = XEXP (SET_SRC (set), 0);

   if (FUNCTION_ARG_REGNO_P (REGNO (orig_src))
   && !abi_extension_candidate_return_reg_p (REGNO (orig_src))
   && GET_MODE (SET_DST (set)) != GET_MODE (orig_src)
   && abi_target_promote_function_mode_p (GET_MODE (orig_src))
   && REGNO (SET_DST (set)) == REGNO (orig_src))
 return true;

   return false;
}


This may have been my doing as I asked to split out the logic as some of 
the conditions merit more commentary.

e.g. why does the mode need to be same
But granted this is the usual coding style in gcc and the extra comments 
could still be added before the big if


-Vineet



I think this is much easier to actually read (and that's why good
function comments are important). In the end it's not important and
just personal preference.
Either way, I did not check the plausibility of the logic therein.



I have not done any assembly diff as myself have not cross compiled with 
aarch64.

fair enough.




[PATCH v2] RISC-V: elide unnecessary sign extend when expanding cmp_and_jump

2023-10-29 Thread Vineet Gupta
RV64 compare and branch instructions only support 64-bit operands.
At Expand time, the backend conservatively zero/sign extends
its operands even if not needed, such as incoming 32-bit function args
which ABI/ISA guarantee to be sign-extended already.

And subsequently REE fails to eliminate them as
   "missing defintion(s)" or "multiple definition(s)
since function args don't have explicit definition.

So during expand riscv_extend_comparands (), if an operand is a
subreg-promoted SI with inner DI, which is representative of a function
arg, just peel away the subreg to expose the DI, eliding the sign
extension. As Jeff noted this routine is also used in if-conversion so
also helps there.

Note there's currently patches floating around to improve REE and also a
new pass to eliminate unneccesary extensions, but it is still beneficial
to not generate those extra extensions in first place. It is obviously
less work for post-reload passes such as REE, but even for earlier
passes, such as combine, having to deal with one less thing and ensuing
fewer combinations is a win too.

Way too many existing tests used to observe this issue.
e.g. gcc.c-torture/compile/20190827-1.c -O2 -march=rv64gc
It elimiates the SEXT.W

Tested with rv64gc with no regressions, I'm relying on PAtrick's
pre-commit CI to do the full testing.

gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_sign_extend_if_not_subreg_prom): New.
* (riscv_extend_comparands): Call New function on operands.

Signed-off-by: Vineet Gupta 
---
Changes since v1:
  - Elide sign extension for 32-bit operarnds only
  - Apply elison for both arguments
---
 gcc/config/riscv/riscv.cc | 23 +--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index f2dcb0db6fbd..3af834f92977 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -3678,6 +3678,24 @@ riscv_zero_if_equal (rtx cmp0, rtx cmp1)
   cmp0, cmp1, 0, 0, OPTAB_DIRECT);
 }
 
+/* Helper function for riscv_extend_comparands to Sign-extend the OP.
+   However if the OP is SI subreg promoted with an inner DI, such as
+   (subreg/s/v:SI (reg/v:DI 150 [ xx ]) 0)
+   just peel off the SUBREG to get DI, avoiding extraneous extension.  */
+
+static void
+riscv_sign_extend_if_not_subreg_prom (rtx *op)
+{
+  if (GET_MODE(*op) == SImode
+  && GET_CODE (*op) == SUBREG
+  && SUBREG_PROMOTED_VAR_P (*op)
+  && GET_MODE_SIZE (GET_MODE (XEXP (*op, 0))).to_constant ()
+== GET_MODE_SIZE (word_mode))
+*op = XEXP (*op, 0);
+  else
+*op = gen_rtx_SIGN_EXTEND (word_mode, *op);
+}
+
 /* Sign- or zero-extend OP0 and OP1 for integer comparisons.  */
 
 static void
@@ -3707,9 +3725,10 @@ riscv_extend_comparands (rtx_code code, rtx *op0, rtx 
*op1)
}
   else
{
- *op0 = gen_rtx_SIGN_EXTEND (word_mode, *op0);
+ riscv_sign_extend_if_not_subreg_prom(op0);
+
  if (*op1 != const0_rtx)
-   *op1 = gen_rtx_SIGN_EXTEND (word_mode, *op1);
+   riscv_sign_extend_if_not_subreg_prom(op1);
}
 }
 }
-- 
2.34.1



Re: [PATCH v2] RISC-V: elide unnecessary sign extend when expanding cmp_and_jump

2023-10-29 Thread Vineet Gupta




On 10/29/23 19:04, Vineet Gupta wrote:

RV64 compare and branch instructions only support 64-bit operands.
At Expand time, the backend conservatively zero/sign extends
its operands even if not needed, such as incoming 32-bit function args
which ABI/ISA guarantee to be sign-extended already.

And subsequently REE fails to eliminate them as
"missing defintion(s)" or "multiple definition(s)
since function args don't have explicit definition.

So during expand riscv_extend_comparands (), if an operand is a
subreg-promoted SI with inner DI, which is representative of a function
arg, just peel away the subreg to expose the DI, eliding the sign
extension. As Jeff noted this routine is also used in if-conversion so
also helps there.

Note there's currently patches floating around to improve REE and also a
new pass to eliminate unneccesary extensions, but it is still beneficial
to not generate those extra extensions in first place. It is obviously
less work for post-reload passes such as REE, but even for earlier
passes, such as combine, having to deal with one less thing and ensuing
fewer combinations is a win too.

Way too many existing tests used to observe this issue.
e.g. gcc.c-torture/compile/20190827-1.c -O2 -march=rv64gc
It elimiates the SEXT.W

Tested with rv64gc with no regressions, I'm relying on PAtrick's
pre-commit CI to do the full testing.

gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_sign_extend_if_not_subreg_prom): New.
* (riscv_extend_comparands): Call New function on operands.

Signed-off-by: Vineet Gupta 
---
Changes since v1:
   - Elide sign extension for 32-bit operarnds only
   - Apply elison for both arguments
---
  gcc/config/riscv/riscv.cc | 23 +--
  1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index f2dcb0db6fbd..3af834f92977 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -3678,6 +3678,24 @@ riscv_zero_if_equal (rtx cmp0, rtx cmp1)
   cmp0, cmp1, 0, 0, OPTAB_DIRECT);
  }
  
+/* Helper function for riscv_extend_comparands to Sign-extend the OP.

+   However if the OP is SI subreg promoted with an inner DI, such as
+   (subreg/s/v:SI (reg/v:DI 150 [ xx ]) 0)
+   just peel off the SUBREG to get DI, avoiding extraneous extension.  */
+
+static void
+riscv_sign_extend_if_not_subreg_prom (rtx *op)
+{
+  if (GET_MODE(*op) == SImode


Weird, this is flagged in pre-commit CI, but contrib scripts think this 
is ok


contrib/gcc-changelog/git_check_commit.py
Checking 3d9823e2fb1c1f99bb875bffd999ab8dafd53a50: OK



+  && GET_CODE (*op) == SUBREG
+  && SUBREG_PROMOTED_VAR_P (*op)
+  && GET_MODE_SIZE (GET_MODE (XEXP (*op, 0))).to_constant ()
+== GET_MODE_SIZE (word_mode))
+*op = XEXP (*op, 0);
+  else
+*op = gen_rtx_SIGN_EXTEND (word_mode, *op);
+}
+
  /* Sign- or zero-extend OP0 and OP1 for integer comparisons.  */
  
  static void

@@ -3707,9 +3725,10 @@ riscv_extend_comparands (rtx_code code, rtx *op0, rtx 
*op1)
}
else
{
- *op0 = gen_rtx_SIGN_EXTEND (word_mode, *op0);
+ riscv_sign_extend_if_not_subreg_prom(op0);
+
  if (*op1 != const0_rtx)
-   *op1 = gen_rtx_SIGN_EXTEND (word_mode, *op1);
+   riscv_sign_extend_if_not_subreg_prom(op1);
}
  }
  }




[PATCH v3] RISC-V: elide unnecessary sign extend when expanding cmp_and_jump

2023-10-29 Thread Vineet Gupta
RV64 compare and branch instructions only support 64-bit operands.
At Expand time, the backend conservatively zero/sign extends
its operands even if not needed, such as incoming 32-bit function args
which ABI/ISA guarantee to be sign-extended already.

And subsequently REE fails to eliminate them as
   "missing defintion(s)" or "multiple definition(s)
since function args don't have explicit definition.

So during expand riscv_extend_comparands (), if an operand is a
subreg-promoted SI with inner DI, which is representative of a function
arg, just peel away the subreg to expose the DI, eliding the sign
extension. As Jeff noted this routine is also used in if-conversion so
also helps there.

Note there's currently patches floating around to improve REE and also a
new pass to eliminate unneccesary extensions, but it is still beneficial
to not generate those extra extensions in first place. It is obviously
less work for post-reload passes such as REE, but even for earlier
passes, such as combine, having to deal with one less thing and ensuing
fewer combinations is a win too.

Way too many existing tests used to observe this issue.
e.g. gcc.c-torture/compile/20190827-1.c -O2 -march=rv64gc
It elimiates the SEXT.W

Tested with rv64gc with no regressions, I'm relying on PAtrick's
pre-commit CI to do the full testing.

gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_sign_extend_if_not_subreg_prom): New.
* (riscv_extend_comparands): Call New function on operands.

Signed-off-by: Vineet Gupta 
---
Changes since v2:
  - Fix linting issues flagged by pre-commit CI
Changes since v1:
  - Elide sign extension for 32-bit operarnds only
  - Apply elison for both arguments
---
 gcc/config/riscv/riscv.cc | 23 +--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index ca9a2ca81d53..269beb3b159b 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -3678,6 +3678,24 @@ riscv_zero_if_equal (rtx cmp0, rtx cmp1)
   cmp0, cmp1, 0, 0, OPTAB_DIRECT);
 }
 
+/* Helper function for riscv_extend_comparands to Sign-extend the OP.
+   However if the OP is SI subreg promoted with an inner DI, such as
+   (subreg/s/v:SI (reg/v:DI) 0)
+   just peel off the SUBREG to get DI, avoiding extraneous extension.  */
+
+static void
+riscv_sign_extend_if_not_subreg_prom (rtx *op)
+{
+  if (GET_MODE (*op) == SImode
+  && GET_CODE (*op) == SUBREG
+  && SUBREG_PROMOTED_VAR_P (*op)
+  && GET_MODE_SIZE (GET_MODE (XEXP (*op, 0))).to_constant ()
+== GET_MODE_SIZE (word_mode))
+*op = XEXP (*op, 0);
+  else
+*op = gen_rtx_SIGN_EXTEND (word_mode, *op);
+}
+
 /* Sign- or zero-extend OP0 and OP1 for integer comparisons.  */
 
 static void
@@ -3707,9 +3725,10 @@ riscv_extend_comparands (rtx_code code, rtx *op0, rtx 
*op1)
}
   else
{
- *op0 = gen_rtx_SIGN_EXTEND (word_mode, *op0);
+ riscv_sign_extend_if_not_subreg_prom (op0);
+
  if (*op1 != const0_rtx)
-   *op1 = gen_rtx_SIGN_EXTEND (word_mode, *op1);
+   riscv_sign_extend_if_not_subreg_prom (op1);
}
 }
 }
-- 
2.34.1



Re: [PATCH v3] RISC-V: elide unnecessary sign extend when expanding cmp_and_jump

2023-10-30 Thread Vineet Gupta




On 10/30/23 13:33, Jeff Law wrote:



On 10/29/23 21:21, Vineet Gupta wrote:

RV64 compare and branch instructions only support 64-bit operands.
At Expand time, the backend conservatively zero/sign extends
its operands even if not needed, such as incoming 32-bit function args
which ABI/ISA guarantee to be sign-extended already.

And subsequently REE fails to eliminate them as
    "missing defintion(s)" or "multiple definition(s)
since function args don't have explicit definition.

So during expand riscv_extend_comparands (), if an operand is a
subreg-promoted SI with inner DI, which is representative of a function
arg, just peel away the subreg to expose the DI, eliding the sign
extension. As Jeff noted this routine is also used in if-conversion so
also helps there.

Note there's currently patches floating around to improve REE and also a
new pass to eliminate unneccesary extensions, but it is still beneficial
to not generate those extra extensions in first place. It is obviously
less work for post-reload passes such as REE, but even for earlier
passes, such as combine, having to deal with one less thing and ensuing
fewer combinations is a win too.

Way too many existing tests used to observe this issue.
e.g. gcc.c-torture/compile/20190827-1.c -O2 -march=rv64gc
It elimiates the SEXT.W

Tested with rv64gc with no regressions, I'm relying on PAtrick's
pre-commit CI to do the full testing.

gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_sign_extend_if_not_subreg_prom): New.
* (riscv_extend_comparands): Call New function on operands.

Signed-off-by: Vineet Gupta 
---
Changes since v2:
   - Fix linting issues flagged by pre-commit CI
Changes since v1:
   - Elide sign extension for 32-bit operarnds only
   - Apply elison for both arguments
---
  gcc/config/riscv/riscv.cc | 23 +--
  1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index ca9a2ca81d53..269beb3b159b 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -3678,6 +3678,24 @@ riscv_zero_if_equal (rtx cmp0, rtx cmp1)
 cmp0, cmp1, 0, 0, OPTAB_DIRECT);
  }
  +/* Helper function for riscv_extend_comparands to Sign-extend the OP.
+   However if the OP is SI subreg promoted with an inner DI, such as
+   (subreg/s/v:SI (reg/v:DI) 0
+   just peel off the SUBREG to get DI, avoiding extraneous 
extension.  */

+
+static void
+riscv_sign_extend_if_not_subreg_prom (rtx *op)
+{
+  if (GET_MODE (*op) == SImode
+  && GET_CODE (*op) == SUBREG
+  && SUBREG_PROMOTED_VAR_P (*op)
+  && GET_MODE_SIZE (GET_MODE (XEXP (*op, 0))).to_constant ()
+ == GET_MODE_SIZE (word_mode))
+    *op = XEXP (*op, 0);
+  else
+    *op = gen_rtx_SIGN_EXTEND (word_mode, *op);
So for the wrapped test GET_MODE_SIZE stuff), add parenthesis and 
indent the "==" clause.  ie


  && (GET_MODE_SIZE (GET_MODE (XEXP (*op), 0))).to_constant ()
  == GET_MODE_SIZE (word_mode))


Ok. FWIW I was using the wrong checker: git_check_commit.py vs. 
check_GNU_style.sh




Don't you also need to verify that the subreg was sign extended? The 
PROMOTED_VAR_P just notes that it was promoted, not *how* it was 
promoted.  I think you just need to add a test like this:


  && SUBREG_PROMOTED_SIGNED_P (*op)


Thx for catching this.
The orig test case I used to spot the issue had an unsigned promoted 
subreg but I was convinced it could still be removed (wrong on so many 
counts).



I don't guess you have data on how this impacts dynamic instruction 
counts on anything significant do you?


No, haven't run it yet. I can fire one though. I doubt if this is as 
significant as the prev one, even if this is the right thing to do.




OK with the formatting nit fixed and adding the additional check to 
ensure the value was sign extended.


Thx. I just wait for SPEC run before pushing this.

-Vineet


[PATCH] RISC-V: fix TARGET_PROMOTE_FUNCTION_MODE hook for libcalls

2023-10-31 Thread Vineet Gupta
riscv_promote_function_mode doesn't promote a SI to DI for libcalls
case.

The fix is what generic promote_mode () in explow.cc does. I really
don't understand why the old code didn't work, but stepping thru the
debugger shows old code didn't and fixed does.

This showed up when testing Ajit's REE ABI extension series which probes
the ABI (using a NULL tree type) and ends up hitting the libcall code path.

[Usual caveat, I'll wait for Pre-commit CI to run the tests and report]

gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_promote_function_mode): Fix mode
  returned for libcall case.

Signed-off-by: Vineet Gupta 
---
 gcc/config/riscv/riscv.cc | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 3e27897d6d30..7b8e9af0a5af 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -8630,9 +8630,10 @@ riscv_promote_function_mode (const_tree type 
ATTRIBUTE_UNUSED,
 return promote_mode (type, mode, punsignedp);
 
   unsignedp = *punsignedp;
-  PROMOTE_MODE (as_a  (mode), unsignedp, type);
+  scalar_mode smode = as_a  (mode);
+  PROMOTE_MODE (smode, unsignedp, type);
   *punsignedp = unsignedp;
-  return mode;
+  return smode;
 }
 
 /* Implement TARGET_MACHINE_DEPENDENT_REORG.  */
-- 
2.34.1



Re: [PATCH v3] RISC-V: elide unnecessary sign extend when expanding cmp_and_jump

2023-10-31 Thread Vineet Gupta




On 10/30/23 16:21, Vineet Gupta wrote:
I don't guess you have data on how this impacts dynamic instruction 
counts on anything significant do you?


No, haven't run it yet. I can fire one though. I doubt if this is as 
significant as the prev one, even if this is the right thing to do. 


Very very small improvement overall

49,318,030,233,258 (w/o patch)
49,318,000,693,233 (W/ patch)






Re: [PATCH v3] RISC-V: elide unnecessary sign extend when expanding cmp_and_jump

2023-10-31 Thread Vineet Gupta

On 10/30/23 13:33, Jeff Law wrote:



+/* Helper function for riscv_extend_comparands to Sign-extend the OP.
+   However if the OP is SI subreg promoted with an inner DI, such as
+   (subreg/s/v:SI (reg/v:DI) 0
+   just peel off the SUBREG to get DI, avoiding extraneous 
extension.  */

+
+static void
+riscv_sign_extend_if_not_subreg_prom (rtx *op)
+{
+  if (GET_MODE (*op) == SImode


So I may have been partially wrong about v2 patch being wrong and 
needing this fixup ;-) [1]


It seems we don't have to limit this to SImode. I re-read the calling 
convention doc [2] and it says this


"When passed in registers or on the stack, integer scalars narrower than XLEN
 bits are widened according to the sign of their type up to 32 bits, then
 sign-extended to XLEN bits."

This essentially means signed short, and signed char will be already 
sign-extended at caller site and need not be done in callee: Palmer 
mention in internal slack that unadorned char is unsigned on RISC-V 
hence we don't see the compiler extra work for say 
gcc.dg/torture/pr75964.c. If the test is however tweaked to use signed 
chars (or short), it seems caller is doing the work (adjusting the 
constant being passed to be a sign-extended variant).


This further validates Jeff's comment about checking for 
SUBREG_PROMOTED_SIGNED_P (it was anyhow the right thing to begin with 
anyways).


At this point I feel like I'm into splitting hairs (in vain) territory, 
as fixing this might not matter much in practice 


I'd suppose we go ahead with the v3 with changes Jeff asked for and 
maybe do a later fixup to relax SI.



+  && GET_CODE (*op) == SUBREG
+  && SUBREG_PROMOTED_VAR_P (*op)
+  && GET_MODE_SIZE (GET_MODE (XEXP (*op, 0))).to_constant ()
+ == GET_MODE_SIZE (word_mode))
+    *op = XEXP (*op, 0);
+  else
+    *op = gen_rtx_SIGN_EXTEND (word_mode, *op);
So for the wrapped test GET_MODE_SIZE stuff), add parenthesis and 
indent the "==" clause.  ie


  && (GET_MODE_SIZE (GET_MODE (XEXP (*op), 0))).to_constant ()
  == GET_MODE_SIZE (word_mode))

Don't you also need to verify that the subreg was sign extended? The 
PROMOTED_VAR_P just notes that it was promoted, not *how* it was 
promoted.  I think you just need to add a test like this:


  && SUBREG_PROMOTED_SIGNED_P (*op)


[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634327.html
[2] 
https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-cc.adoc 






Re: [PATCH] RISC-V: fix TARGET_PROMOTE_FUNCTION_MODE hook for libcalls

2023-10-31 Thread Vineet Gupta




On 10/31/23 17:51, Jeff Law wrote:




We also have a non-orthogonality in the ABI sign extension rules 
between SI and DI, a few of us were talking about it on the internal 
slack (though the specifics were for a different patch, Vineet has a 
few in flight).
So the old issue I was thinking of really only affects targets that 
push arguments on the stack and when a sub-word push actually 
allocates a full word on the stack (m68k, but !coldfire, h8 and 
probably others of that era).


Point being, those issues don't apply here.


OK, I think Palmer was conflating this with the discussion in other 
thread/patch.


-Vineet


Re: [PATCH] RISC-V: fix TARGET_PROMOTE_FUNCTION_MODE hook for libcalls

2023-11-01 Thread Vineet Gupta




On 11/1/23 12:11, Jeff Law wrote:



On 10/31/23 12:35, Vineet Gupta wrote:

riscv_promote_function_mode doesn't promote a SI to DI for libcalls
case.

The fix is what generic promote_mode () in explow.cc does. I really
don't understand why the old code didn't work, but stepping thru the
debugger shows old code didn't and fixed does.

This showed up when testing Ajit's REE ABI extension series which probes
the ABI (using a NULL tree type) and ends up hitting the libcall code 
path.


[Usual caveat, I'll wait for Pre-commit CI to run the tests and report]

gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_promote_function_mode): Fix mode
  returned for libcall case.
Macro that change their arguments are evil ;(   I think Juzhe's patch 
in this space a few months back wasn't supposed to change behavior.


Oh, its a regression. I can add a Fixes: tag



OK once CI finishes without regressions.


Thx,
-Vineet


[[Committed]] RISC-V: fix TARGET_PROMOTE_FUNCTION_MODE hook for libcalls

2023-11-01 Thread Vineet Gupta
Fixes: 3496ca4e6566 ("RISC-V: Add runtime invariant support")

riscv_promote_function_mode doesn't promote a SI to DI for libcalls
case. It intends to do that however the code is broken (regression).

The fix is what generic promote_mode () in explow.cc does. I really
don't understand why the old code didn't work, but stepping thru the
debugger shows old code didn't and fixed does.

This showed up when testing Ajit's REE ABI extension series which probes
the ABI (using a NULL tree type) and ends up hitting the libcall code path.

gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_promote_function_mode): Fix mode
returned for libcall case.

Tested-by: Patrick O'Neill  # pre-commit-CI #526
Signed-off-by: Vineet Gupta 
---
 gcc/config/riscv/riscv.cc | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 0148a4f2e431..895a098cd9a6 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -8630,9 +8630,10 @@ riscv_promote_function_mode (const_tree type 
ATTRIBUTE_UNUSED,
 return promote_mode (type, mode, punsignedp);
 
   unsignedp = *punsignedp;
-  PROMOTE_MODE (as_a  (mode), unsignedp, type);
+  scalar_mode smode = as_a  (mode);
+  PROMOTE_MODE (smode, unsignedp, type);
   *punsignedp = unsignedp;
-  return mode;
+  return smode;
 }
 
 /* Implement TARGET_MACHINE_DEPENDENT_REORG.  */
-- 
2.34.1



[PATCH] RISC-V: costs: miscomputed shiftadd_cost triggering synth_mult [PR/108987]

2023-03-01 Thread Vineet Gupta
This showed up as dynamic icount regression in SPEC 531.deepsjeng with upstream
gcc (vs. gcc 12.2). gcc was resorting to synthetic multiply using shift+add(s)
even when multiply had clear cost benefit.

|000133b8 :
|   133b8:  srl a3,a1,s6
|   133bc:  and a3,a3,s5
|   133c0:  sllia4,a3,0x9
|   133c4:  add a4,a4,a3
|   133c6:  sllia4,a4,0x9
|   133c8:  add a4,a4,a3
|   133ca:  sllia3,a4,0x1b
|   133ce:  add a4,a4,a3

vs. gcc 12 doing something lke below.

|000131c4 :
|   131c4:  ld  s1,8(sp)
|   131c6:  srl a3,a1,s4
|   131ca:  and a3,a3,s11
|   131ce:  mul a3,a3,s1

Bisected this to f90cb39235c4 ("RISC-V: costs: support shift-and-add in
strength-reduction"). The intent was to optimize cost for
shift-add-pow2-{1,2,3} corresponding to bitmanip insns SH*ADD, but ended
up doing that for all shift values which seems to favor synthezing
multiply among others.

The bug itself is trivial, IN_RANGE() calling pow2p_hwi() which returns bool
vs. exact_log2() returning power of 2.

This fix also requires update to the test introduced by the same commit
which now generates MUL vs. synthesizing it.

gcc/Changelog:

* config/riscv/riscv.cc (riscv_rtx_costs): Fixed IN_RANGE() to
  use exact_log2().

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zba-shNadd-07.c: f2(i*783) now generates MUL vs.
  5 insn sh1add+slli+add+slli+sub.
* gcc.target/riscv/pr108987.c: New test.

Signed-off-by: Vineet Gupta 
---
 gcc/config/riscv/riscv.cc  | 3 ++-
 gcc/testsuite/gcc.target/riscv/pr108987.c  | 9 +
 gcc/testsuite/gcc.target/riscv/zba-shNadd-07.c | 6 +++---
 3 files changed, 14 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr108987.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index e36ff05695a6..2cf172f59c28 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -2496,7 +2496,8 @@ riscv_rtx_costs (rtx x, machine_mode mode, int 
outer_code, int opno ATTRIBUTE_UN
  && GET_CODE (XEXP (x, 0)) == MULT
  && REG_P (XEXP (XEXP (x, 0), 0))
  && CONST_INT_P (XEXP (XEXP (x, 0), 1))
- && IN_RANGE (pow2p_hwi (INTVAL (XEXP (XEXP (x, 0), 1))), 1, 3))
+ && pow2p_hwi (INTVAL (XEXP (XEXP (x, 0), 1)))
+ && IN_RANGE (exact_log2 (INTVAL (XEXP (XEXP (x, 0), 1))), 1, 3))
{
  *total = COSTS_N_INSNS (1);
  return true;
diff --git a/gcc/testsuite/gcc.target/riscv/pr108987.c 
b/gcc/testsuite/gcc.target/riscv/pr108987.c
new file mode 100644
index ..6179c7e13a45
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr108987.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zba -mabi=lp64 -O2" } */
+
+unsigned long long f5(unsigned long long i)
+{
+  return i * 0x0202020202020202ULL;
+}
+
+/* { dg-final { scan-assembler-times "mul" 1 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/zba-shNadd-07.c 
b/gcc/testsuite/gcc.target/riscv/zba-shNadd-07.c
index 98d35e1da9b4..93da241c9b60 100644
--- a/gcc/testsuite/gcc.target/riscv/zba-shNadd-07.c
+++ b/gcc/testsuite/gcc.target/riscv/zba-shNadd-07.c
@@ -26,6 +26,6 @@ f4 (unsigned long i)
 }
 
 /* { dg-final { scan-assembler-times "sh2add" 2 } } */
-/* { dg-final { scan-assembler-times "sh1add" 2 } } */
-/* { dg-final { scan-assembler-times "slli" 5 } } */
-/* { dg-final { scan-assembler-times "mul" 1 } } */
+/* { dg-final { scan-assembler-times "sh1add" 1 } } */
+/* { dg-final { scan-assembler-times "slli" 3 } } */
+/* { dg-final { scan-assembler-times "mul" 2 } } */
-- 
2.34.1



Re: RISC-V Test Errors and Failures

2023-05-16 Thread Vineet Gupta

On 5/16/23 16:06, Palmer Dabbelt wrote:
A few of us were talking about test-related issues in the patchwork 
meeting

this morning.  I bumped to trunk and did a full rebuild, I'm getting the
following (it's in riscv-systems-ci/riscv-gnu-toolchain).  This is 
about what I

remember seeing last time I ran the tests, which was a week or so ago.  I
figured it'd be best to just blast the lists, as Jeff said his test 
running had
been hanging so there might be some issue preventing folks from seeing 
the

failures.

I guess I didn't get time to look last time and I doubt things are 
looking any
better right now.  I'll try and take a look at some point, but any 
help would

of course be appreciated.


Yes I was seeing similar tcl errors and such - and in my case an even 
higher count.

Also for posterity, what was your configure cmdline ? multilibs or no
We really need to add some CI around RV toolchains to trip on these sooner !



$ cat toolchain/report
make[1]: Entering directory '/scratch/merges/rgt-gcc-trunk/toolchain'
/scratch/merges/rgt-gcc-trunk/riscv-gnu-toolchain/scripts/testsuite-filter 
gcc glibc 
/scratch/merges/rgt-gcc-trunk/riscv-gnu-toolchain/test/allowlist `find 
build-gcc-linux-stage2/gcc/testsuite/ -name *.sum |paste -sd "," -`

    === g++: Unexpected fails for rv64imac lp64 medlow ===
FAIL: g++.dg/contracts/contracts-tmpl-spec2.C   output pattern test
    === g++: Unexpected fails for rv32imac ilp32 medlow ===
FAIL: g++.dg/contracts/contracts-tmpl-spec2.C   output pattern test
FAIL: g++.dg/modules/xtreme-header-5_c.C -std=c++2a (test for excess 
errors)
FAIL: g++.dg/modules/xtreme-header-5_c.C -std=c++2b (test for excess 
errors)

    === g++: Unexpected fails for rv64imafdc lp64d medlow ===
FAIL: g++.dg/contracts/contracts-tmpl-spec2.C   output pattern test
    === g++: Unexpected fails for rv32imafdc ilp32d medlow ===
FAIL: g++.dg/contracts/contracts-tmpl-spec2.C   output pattern test
FAIL: g++.dg/modules/xtreme-header-5_c.C -std=c++2a (test for excess 
errors)
FAIL: g++.dg/modules/xtreme-header-5_c.C -std=c++2b (test for excess 
errors)

    === g++: Unexpected fails for rv64imafdcv lp64d  ===
FAIL: g++.dg/contracts/contracts-tmpl-spec2.C   output pattern test
FAIL: g++.target/riscv/rvv/base/bug-10.C execution test
FAIL: g++.target/riscv/rvv/base/bug-11.C execution test
FAIL: g++.target/riscv/rvv/base/bug-12.C execution test
FAIL: g++.target/riscv/rvv/base/bug-13.C execution test
FAIL: g++.target/riscv/rvv/base/bug-14.C execution test
FAIL: g++.target/riscv/rvv/base/bug-15.C execution test
FAIL: g++.target/riscv/rvv/base/bug-16.C execution test
FAIL: g++.target/riscv/rvv/base/bug-17.C execution test
FAIL: g++.target/riscv/rvv/base/bug-2.C execution test
FAIL: g++.target/riscv/rvv/base/bug-23.C execution test
FAIL: g++.target/riscv/rvv/base/bug-3.C execution test
FAIL: g++.target/riscv/rvv/base/bug-4.C execution test
FAIL: g++.target/riscv/rvv/base/bug-5.C execution test
FAIL: g++.target/riscv/rvv/base/bug-6.C execution test
FAIL: g++.target/riscv/rvv/base/bug-7.C execution test
FAIL: g++.target/riscv/rvv/base/bug-8.C execution test
FAIL: g++.target/riscv/rvv/base/bug-9.C execution test
    === g++: Unexpected fails for rv32imafdcv ilp32d  ===
FAIL: g++.dg/contracts/contracts-tmpl-spec2.C   output pattern test
FAIL: g++.dg/modules/xtreme-header-5_c.C -std=c++2a (test for excess 
errors)
FAIL: g++.dg/modules/xtreme-header-5_c.C -std=c++2b (test for excess 
errors)

FAIL: g++.target/riscv/rvv/base/bug-10.C execution test
FAIL: g++.target/riscv/rvv/base/bug-11.C execution test
FAIL: g++.target/riscv/rvv/base/bug-12.C execution test
FAIL: g++.target/riscv/rvv/base/bug-13.C execution test
FAIL: g++.target/riscv/rvv/base/bug-14.C (test for excess errors)
FAIL: g++.target/riscv/rvv/base/bug-15.C execution test
FAIL: g++.target/riscv/rvv/base/bug-16.C execution test
FAIL: g++.target/riscv/rvv/base/bug-17.C execution test
FAIL: g++.target/riscv/rvv/base/bug-18.C (test for excess errors)
FAIL: g++.target/riscv/rvv/base/bug-19.C (test for excess errors)
FAIL: g++.target/riscv/rvv/base/bug-2.C execution test
FAIL: g++.target/riscv/rvv/base/bug-20.C (test for excess errors)
FAIL: g++.target/riscv/rvv/base/bug-21.C (test for excess errors)
FAIL: g++.target/riscv/rvv/base/bug-22.C (test for excess errors)
FAIL: g++.target/riscv/rvv/base/bug-23.C execution test
FAIL: g++.target/riscv/rvv/base/bug-3.C execution test
FAIL: g++.target/riscv/rvv/base/bug-4.C execution test
FAIL: g++.target/riscv/rvv/base/bug-5.C execution test
FAIL: g++.target/riscv/rvv/base/bug-6.C execution test
FAIL: g++.target/riscv/rvv/base/bug-7.C execution test
FAIL: g++.target/riscv/rvv/base/bug-8.C execution test
FAIL: g++.target/riscv/rvv/base/bug-9.C (test for excess errors)
    === g++: Unexpected fails for rv64gczba_zbb_zbc_zbs lp64d ===
FAIL: g++.dg/contracts/contracts-tmpl-spec2.C   output pattern test
    === gcc: Unexpected fails for rv64imac lp64 medlow ===
ERROR: tcl error sourcing 
/scratch/

Re: RISC-V Test Errors and Failures

2023-05-16 Thread Vineet Gupta

+ Christoph, Jiawei

On 5/16/23 17:20, Palmer Dabbelt wrote:
We really need to add some CI around RV toolchains to trip on these 
sooner !


Sounds like you're volunteering to set one up? 


Patrick's github CI patch seems to be a great start. Lets wait for it to 
get merged, that will at least catch rv toolchain snafus: although the 
granularity of testing is not ideal (tc changes are not so frequent)


I think the most pressing need is bleeding edge gcc regression tracking.
 @Jeff is anything setup on sourceware and/or usable ? I thought they 
do have existing bots for some arches to spin up build / run - perhaps 
runs are native and not qemu.


FWIW rivos gitlab CI (not public) has capability to track upstream gcc 
(Kevin almost has it working), but there is no easy way to publish it 
for rest of the world and I'd rather that be done in a public infra.


Didn't ISCAS/PLCT have such infra - sorry Kito asked the same question 
this morning, but I was not fully awoke so don't remember what Jiawei 
replied.


Thx,
-Vineet


Re: RISC-V Test Errors and Failures

2023-05-16 Thread Vineet Gupta




On 5/16/23 18:29, Palmer Dabbelt wrote:

On Tue, 16 May 2023 18:04:37 PDT (-0700), Vineet Gupta wrote:

+ Christoph, Jiawei

On 5/16/23 17:20, Palmer Dabbelt wrote:

We really need to add some CI around RV toolchains to trip on these
sooner !


Sounds like you're volunteering to set one up?


Patrick's github CI patch seems to be a great start. Lets wait for it to
get merged, that will at least catch rv toolchain snafus: although the
granularity of testing is not ideal (tc changes are not so frequent)


You mean riscv-gnu-toolchain changes?  That's not super useful for GCC 
development, they're on a fork.


Well they are still useful to catch various snafus in toolchain plumbing 
itself - I've run into 2 of those and Patcrick 2 himself, when trying to 
use latest upstream toolchain scripts. But sure they are not testing 
bleeding edge gcc.






I think the most pressing need is bleeding edge gcc regression tracking.
  @Jeff is anything setup on sourceware and/or usable ? I thought they
do have existing bots for some arches to spin up build / run - perhaps
runs are native and not qemu.


IIRC Jeff said his builders were hanging right now.


Jeff it seems has his own test infra. I was asking if sourceware (or 
whatever the custodian of gcc project has).
I'd be really surprised if primary arches such as x86/aarch64 don't have 
any test bots there ?





FWIW rivos gitlab CI (not public) has capability to track upstream gcc
(Kevin almost has it working), but there is no easy way to publish it
for rest of the world and I'd rather that be done in a public infra.


+Kevin

At least having the failure lists public would be a must-have, and I 
think that's tricky to do with gitlab.


Yep.

Bjorn and Conor have something glued to the kernel patchwork that 
uploads test results to github as snippits, but IIRC we're trying to 
replace it with something more directly visible.



Didn't ISCAS/PLCT have such infra - sorry Kito asked the same question
this morning, but I was not fully awoke so don't remember what Jiawei
replied.


I didn't even remember he asked ;)




Re: RISC-V Test Errors and Failures

2023-05-16 Thread Vineet Gupta

On 5/16/23 19:21, Kito Cheng wrote:

Palmer:

For short-term, this should help your internal test:
https://github.com/riscv-collab/riscv-gnu-toolchain/pull/1233


That only helps if using bleeding edge toolchain scripts (which I 
regularly do and so did Patrick).


Palmer has a fork of toolchain scripts and I'm assuming he hasn't caught 
up to that point ;-)


-Vineet


Re: RISC-V Test Errors and Failures

2023-05-16 Thread Vineet Gupta



On 5/16/23 19:53, Palmer Dabbelt wrote:


Probably, I'll go try and bump stuff and see if it works... 


Word of caution: Best to not disturb your existing setup, a try a fresh 
checkout first


[PATCH] RISC-V: improve codegen for large constants with same 32-bit lo and hi parts [2]

2023-05-18 Thread Vineet Gupta
[part #2 of PR/109279]

SPEC2017 deepsjeng uses large constants which currently generates less than
ideal code. This fix improves codegen for large constants which have
same low and hi parts: e.g.

long long f(void) { return 0x0101010101010101ull; }

Before
li  a5,0x101
addia5,a5,0x101
mv  a0,a5
sllia5,a5,32
add a0,a5,a0
ret

With patch
li  a5,0x101
addia5,a5,0x101
sllia0,a5,32
add a0,a0,a5
ret

This is testsuite clean.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_split_integer): if loval is equal
  to hival, ASHIFT the corresponding regs.

Signed-off-by: Vineet Gupta 
---
 gcc/config/riscv/riscv.cc | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 79122699b6f5..4e1bb2f14cf8 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -703,13 +703,18 @@ riscv_split_integer (HOST_WIDE_INT val, machine_mode mode)
   unsigned HOST_WIDE_INT hival = sext_hwi ((val - loval) >> 32, 32);
   rtx hi = gen_reg_rtx (mode), lo = gen_reg_rtx (mode);
 
-  riscv_move_integer (hi, hi, hival, mode);
   riscv_move_integer (lo, lo, loval, mode);
 
-  hi = gen_rtx_fmt_ee (ASHIFT, mode, hi, GEN_INT (32));
-  hi = force_reg (mode, hi);
+  if (loval == hival)
+  hi = gen_rtx_ASHIFT (mode, lo, GEN_INT (32));
+  else
+{
+  riscv_move_integer (hi, hi, hival, mode);
+  hi = gen_rtx_ASHIFT (mode, hi, GEN_INT (32));
+}
 
-  return gen_rtx_fmt_ee (PLUS, mode, hi, lo);
+  hi = force_reg (mode, hi);
+  return gen_rtx_PLUS (mode, hi, lo);
 }
 
 /* Return true if X is a thread-local symbol.  */
-- 
2.34.1



Re: RISC-V Test Errors and Failures

2023-05-18 Thread Vineet Gupta




On 5/17/23 00:52, Andreas Schwab wrote:

On Mai 16 2023, Vineet Gupta wrote:


Yes I was seeing similar tcl errors and such - and in my case an even
higher count.

They are coming from commit d6654a4be3b.



As of a726d007f197 today I get a gazzilion splat for riscv multilib 
dejagnu runs and over 5k fails


ERROR: torture-init: torture_without_loops is not empty as expected
ERROR: tcl error code NONE
...
...
   = Summary of gcc testsuite =
    | # of unexpected case / # of unique 
unexpected case

    |  gcc |  g++ | gfortran |
 rv64imafdc/  lp64d/ medlow | 5033 / 4 |    1 / 1 |   72 /    12 |
 rv32imafdc/ ilp32d/ medlow | 5032 / 3 |    3 / 2 |   72 /    12 |
   rv32imac/  ilp32/ medlow |    1 / 1 |    3 / 2 |  109 /    19 |
   rv64imac/   lp64/ medlow | 5034 / 5 |    1 / 1 |  109 /    19 |

For a non multilib run things are sane:

   = Summary of gcc testsuite =
    | # of unexpected case / # of unique 
unexpected case

    |  gcc |  g++ | gfortran |
 rv64imafdc/  lp64d/ medlow |   11 / 4 |    1 / 1 |   72 /    12 |

It is really hard to test anything on upstream ATM.

-Vineet


Re: [PATCH] RISC-V: improve codegen for large constants with same 32-bit lo and hi parts [2]

2023-05-19 Thread Vineet Gupta



On 5/19/23 09:36, Palmer Dabbelt wrote:
Works for me.  Did you start that performance backports branch?  
Either way, I think this should go on it. 


Please note that there is a bit of dependency chain. Assuming the 
aforementioned branch is gcc 13.1 based, this change also needs my 
splitter relaxation fix g0530254413f8 to ensure incremental improvements 
to large const codegen.




[Committed] RISC-V: improve codegen for large constants with same 32-bit lo and hi parts [2]

2023-05-19 Thread Vineet Gupta

On 5/19/23 09:33, Jeff Law wrote:



On 5/18/23 14:57, Vineet Gupta wrote:

[part #2 of PR/109279]

SPEC2017 deepsjeng uses large constants which currently generates 
less than

ideal code. This fix improves codegen for large constants which have
same low and hi parts: e.g.

long long f(void) { return 0x0101010101010101ull; }

Before
 li  a5,0x101
 addi    a5,a5,0x101
 mv  a0,a5
 slli    a5,a5,32
 add a0,a5,a0
 ret

With patch
li    a5,0x101
addi    a5,a5,0x101
slli    a0,a5,32
add    a0,a0,a5
ret

This is testsuite clean.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_split_integer): if loval is equal
  to hival, ASHIFT the corresponding regs.
LGTM.  Please install.  Thanks for taking care of this!  The updated 
sequence looks good.


Jeff




Re: [PATCH] RISC-V: Add missing torture-init and torture-finish for rvv.exp

2023-05-22 Thread Vineet Gupta

On 5/22/23 02:17, Kito Cheng wrote:

Ooops, seems still some issue around here,


Yep still 5000 fails :-(


  but I found something might
related this issue:

https://github.com/gcc-mirror/gcc/commit/d6654a4be3ba44c0d57be7c8a51d76d9721345e1
https://github.com/gcc-mirror/gcc/commit/23c49bb8d09bc3bfce9a08be637cf32ac014de56


It seems both of these patches are essentially doing what yours did. So 
something else is amiss still.


Thx,
-Vineet



On Mon, May 22, 2023 at 2:42 PM Kito Cheng  wrote:

Hi Vineet:

Could you help to test this patch, this could resolve that issue on our
machine, but I would like to also work for other env.

Thanks :)

---

We got bunch of following error message for multi-lib run:

ERROR: torture-init: torture_without_loops is not empty as expected
ERROR: tcl error code NONE

And seems we need torture-init and torture-finish around the test
loop.

gcc/testsuite/ChangeLog:

 * gcc.target/riscv/rvv/rvv.exp: Add torture-init and
 torture-finish.
---
  gcc/testsuite/gcc.target/riscv/rvv/rvv.exp | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp 
b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
index bc99cc0c3cf4..19179564361a 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
+++ b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
@@ -39,6 +39,7 @@ if [istarget riscv32-*-*] then {

  # Initialize `dg'.
  dg-init
+torture-init

  # Main loop.
  set CFLAGS "$DEFAULT_CFLAGS -march=$gcc_march -mabi=$gcc_mabi -O3"
@@ -69,5 +70,7 @@ foreach op $AUTOVEC_TEST_OPTS {
  dg-runtest [lsort [glob -nocomplain 
$srcdir/$subdir/autovec/vls-vlmax/*.\[cS\]]] \
 "-std=c99 -O3 -ftree-vectorize --param 
riscv-autovec-preference=fixed-vlmax" $CFLAGS

+torture-finish
+
  # All done.
  dg-finish
--
2.40.1





Re: [PATCH] RISC-V: Add missing torture-init and torture-finish for rvv.exp

2023-05-24 Thread Vineet Gupta

+CC Thomas and Maciej

On 5/22/23 20:52, Vineet Gupta wrote:

On 5/22/23 02:17, Kito Cheng wrote:

Ooops, seems still some issue around here,


Yep still 5000 fails :-(


  but I found something might
related this issue:

https://github.com/gcc-mirror/gcc/commit/d6654a4be3ba44c0d57be7c8a51d76d9721345e1 

https://github.com/gcc-mirror/gcc/commit/23c49bb8d09bc3bfce9a08be637cf32ac014de56 



It seems both of these patches are essentially doing what yours did. 
So something else is amiss still.


Apparently in addition to Kito's patch below, If I comment out the 
additional torture options, failures go down drastically.


diff --git a/gcc/testsuite/gcc.target/riscv/riscv.exp 
b/gcc/testsuite/gcc.target/riscv/riscv.exp


-lappend ADDITIONAL_TORTURE_OPTIONS {-Og -g} {-Oz}
+#lappend ADDITIONAL_TORTURE_OPTIONS {-Og -g} {-Oz}

@Thomas, do you have some thoughts on how to fix riscv.exp properly in 
light of recent changes to exp files.




Thx,
-Vineet



On Mon, May 22, 2023 at 2:42 PM Kito Cheng  
wrote:

Hi Vineet:

Could you help to test this patch, this could resolve that issue on our
machine, but I would like to also work for other env.

Thanks :)

---

We got bunch of following error message for multi-lib run:

ERROR: torture-init: torture_without_loops is not empty as expected
ERROR: tcl error code NONE

And seems we need torture-init and torture-finish around the test
loop.

gcc/testsuite/ChangeLog:

 * gcc.target/riscv/rvv/rvv.exp: Add torture-init and
 torture-finish.
---
  gcc/testsuite/gcc.target/riscv/rvv/rvv.exp | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp 
b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp

index bc99cc0c3cf4..19179564361a 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
+++ b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
@@ -39,6 +39,7 @@ if [istarget riscv32-*-*] then {

  # Initialize `dg'.
  dg-init
+torture-init

  # Main loop.
  set CFLAGS "$DEFAULT_CFLAGS -march=$gcc_march -mabi=$gcc_mabi -O3"
@@ -69,5 +70,7 @@ foreach op $AUTOVEC_TEST_OPTS {
  dg-runtest [lsort [glob -nocomplain 
$srcdir/$subdir/autovec/vls-vlmax/*.\[cS\]]] \
 "-std=c99 -O3 -ftree-vectorize --param 
riscv-autovec-preference=fixed-vlmax" $CFLAGS


+torture-finish
+
  # All done.
  dg-finish
--
2.40.1







Re: [PATCH] RISC-V: Add missing torture-init and torture-finish for rvv.exp

2023-05-24 Thread Vineet Gupta

On 5/24/23 13:34, Thomas Schwinge wrote:

Yeah, at this point I'm not sure whether my recent changes really are
related/relevant here.


Apparently in addition to Kito's patch below, If I comment out the
additional torture options, failures go down drastically.

Meaning that *all* those ERRORs disappear?


No but they reduced significantly. Anyhow I think the issue should be 
simple enough for someone familiar with how the tcl stuff works...





diff --git a/gcc/testsuite/gcc.target/riscv/riscv.exp
b/gcc/testsuite/gcc.target/riscv/riscv.exp

-lappend ADDITIONAL_TORTURE_OPTIONS {-Og -g} {-Oz}
+#lappend ADDITIONAL_TORTURE_OPTIONS {-Og -g} {-Oz}

@Thomas, do you have some thoughts on how to fix riscv.exp properly in
light of recent changes to exp files.

I'm trying to understand this, but so far don't.  Can I please see a
complete 'gcc.log' file where the ERRORs are visible?


So we are at bleeding edge gcc from today
 2023-05-24 ec2e86274427 Fortran: reject bad DIM argument of SIZE 
intrinsic in simplification [PR104350]


With an additional fix from Kito along the lines of..

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp 
b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp


 dg-init
+torture-init

 # All done.
+torture-finish
 dg-finish

I'm pasting a snippet of gcc.log. Issue is indeed triggered by rvv.exp 
which needs some love.



PASS: gcc.target/riscv/zmmul-2.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  (test for excess errors)
PASS: gcc.target/riscv/zmmul-2.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects   scan-assembler-times mul\t 1
PASS: gcc.target/riscv/zmmul-2.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects   scan-assembler-not div\t
PASS: gcc.target/riscv/zmmul-2.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects   scan-assembler-not rem\t
testcase 
/scratch/vineetg/gnu/toolchain-upstream/gcc/gcc/testsuite/gcc.target/riscv/riscv.exp 
completed in 60 seconds
Running 
/scratch/vineetg/gnu/toolchain-upstream/gcc/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp 
...
ERROR: tcl error sourcing 
/scratch/vineetg/gnu/toolchain-upstream/gcc/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp.

ERROR: tcl error code NONE
ERROR: torture-init: torture_without_loops is not empty as expected
    while executing
"error "torture-init: torture_without_loops is not empty as expected""
    invoked from within
"if [info exists torture_without_loops] {
    error "torture-init: torture_without_loops is not empty as expected"
    }"
    (procedure "torture-init" line 4)
    invoked from within
"torture-init"
    (file 
"/scratch/vineetg/gnu/toolchain-upstream/gcc/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp" 
line 42)

    invoked from within
"source 
/scratch/vineetg/gnu/toolchain-upstream/gcc/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp"

    ("uplevel" body line 1)
    invoked from within
"uplevel #0 source 
/scratch/vineetg/gnu/toolchain-upstream/gcc/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp"

    invoked from within
"catch "uplevel #0 source $test_file_name" msg"
UNRESOLVED: testcase 
'/scratch/vineetg/gnu/toolchain-upstream/gcc/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp' 
aborted due to Tcl error
testcase 
/scratch/vineetg/gnu/toolchain-upstream/gcc/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp 
completed in 0 seconds
Running 
/scratch/vineetg/gnu/toolchain-upstream/gcc/gcc/testsuite/gcc.target/rl78/rl78.exp 
...

...



Re: [PATCH] RISC-V: Add missing torture-init and torture-finish for rvv.exp

2023-05-24 Thread Vineet Gupta




On 5/24/23 15:13, Vineet Gupta wrote:


PASS: gcc.target/riscv/zmmul-2.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  (test for excess errors)
PASS: gcc.target/riscv/zmmul-2.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects   scan-assembler-times mul\t 1
PASS: gcc.target/riscv/zmmul-2.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects   scan-assembler-not div\t
PASS: gcc.target/riscv/zmmul-2.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects   scan-assembler-not rem\t
testcase 
/scratch/vineetg/gnu/toolchain-upstream/gcc/gcc/testsuite/gcc.target/riscv/riscv.exp 
completed in 60 seconds
Running 
/scratch/vineetg/gnu/toolchain-upstream/gcc/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp 
...
ERROR: tcl error sourcing 
/scratch/vineetg/gnu/toolchain-upstream/gcc/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp.

ERROR: tcl error code NONE
ERROR: torture-init: torture_without_loops is not empty as expected
    while executing
"error "torture-init: torture_without_loops is not empty as expected""
    invoked from within
"if [info exists torture_without_loops] {
    error "torture-init: torture_without_loops is not empty as expected"
    }"
    (procedure "torture-init" line 4)
    invoked from within
"torture-init"
    (file 
"/scratch/vineetg/gnu/toolchain-upstream/gcc/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp" 
line 42)

    invoked from within
"source 
/scratch/vineetg/gnu/toolchain-upstream/gcc/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp"

    ("uplevel" body line 1)
    invoked from within
"uplevel #0 source 
/scratch/vineetg/gnu/toolchain-upstream/gcc/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp"

    invoked from within
"catch "uplevel #0 source $test_file_name" msg"
UNRESOLVED: testcase 
'/scratch/vineetg/gnu/toolchain-upstream/gcc/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp' 
aborted due to Tcl error
testcase 
/scratch/vineetg/gnu/toolchain-upstream/gcc/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp 
completed in 0 seconds
Running 
/scratch/vineetg/gnu/toolchain-upstream/gcc/gcc/testsuite/gcc.target/rl78/rl78.exp 
...

...



Never mind. Looks like I found the issue - with just trial and error and 
no idea of how this stuff works.

The torture-{init,finish} needs to be in riscv.exp not rvv.exp
Running full tests now.

-Vineet


Re: [PATCH] RISC-V/testsuite: Run target testing over all the usual optimization levels

2023-05-25 Thread Vineet Gupta

Hi Thomas,

On 5/25/23 13:56, Thomas Schwinge wrote:

Hi!

On 2022-02-08T00:22:37+0800, Kito Cheng via Gcc-patches 
 wrote:

Hi Maciej:

Thanks for doing this, OK to trunk.

On Tue, Feb 1, 2022 at 7:04 AM Maciej W. Rozycki  wrote:

Use `gcc-dg-runtest' test driver rather than `dg-runtest' to run the
RISC-V testsuite as several targets already do.  Adjust test options
across individual test cases accordingly where required.

As some tests want to be run at `-Og', add a suitable optimization
variant via ADDITIONAL_TORTURE_OPTIONS, and include the moderately
recent `-Oz' variant as well.

 * testsuite/gcc.target/riscv/riscv.exp: Use `gcc-dg-runtest'
 rather than `dg-runtest'.  Add `-Og -g' and `-Oz' variants via
 ADDITIONAL_TORTURE_OPTIONS.
  As to adding `-Og -g' and `-Oz', this should probably be done globally in
gcc-dg.exp, but such a change would affect all the interested targets at
once and would require a huge one-by-one test case review.  Therefore I
think adding targets one by one instead is more feasible, and then we can
switch once all the targets have.
--- gcc.orig/gcc/testsuite/gcc.target/riscv/riscv.exp
+++ gcc/gcc/testsuite/gcc.target/riscv/riscv.exp
@@ -21,6 +21,8 @@ if ![istarget riscv*-*-*] then {
return
  }

+lappend ADDITIONAL_TORTURE_OPTIONS {-Og -g} {-Oz}
+
  # Load support procs.
  load_lib gcc-dg.exp

Per my understanding, that is not the correct way to do this.  See
'gcc/doc/sourcebuild.texi':

 [...] add to the default list by defining
 @var{ADDITIONAL_TORTURE_OPTIONS}.  Define these in a @file{.dejagnurc}
 file or add them to the @file{site.exp} file; for example [...]

Notice '.dejagnurc' or 'site.exp', that is: globally.  (Doing this
"globally in gcc-dg.exp" -- as you'd mentioned above -- would work too,
conditionalized to '[istarget riscv*-*-*]' only, for now, as you
suggested.)

Otherwise, per what we've not got, either of the following two happens:
before any other 'load_lib gcc-dg.exp', 'gcc.target/riscv/riscv.exp'
happens to be read first, does set 'ADDITIONAL_TORTURE_OPTIONS', then
does its 'load_lib gcc-dg.exp' -- which then incorporates
'ADDITIONAL_TORTURE_OPTIONS' for *all* (!) following '*.exp' files as
part of that 'runtest' instance.  Alternatively, any other '*.exp' file's
'load_lib gcc-dg.exp' comes first (without the desired
'ADDITIONAL_TORTURE_OPTIONS' being set), and once
'gcc.target/riscv/riscv.exp' is read, while it then does set
'ADDITIONAL_TORTURE_OPTIONS', its 'load_lib gcc-dg.exp' is a no-op, as
that one has already been loaded, and therefore the
'ADDITIONAL_TORTURE_OPTIONS' aren't incorporated.

Instead, I suggest to do this locally: do 'load_lib torture-options.exp',
'torture-init', 'set-torture-options [...]' (where that includes your
special options), 'gcc-dg-runtest', 'torture-finish'.  See other '*.exp'
files.


(No, I didn't invent this interface.)


(I however don't see yet how this would be related to the current
"ERROR: torture-init: torture_without_loops is not empty as expected"
discussion, as has, kind of, been claimed in

"RISC-V: Add missing torture-init and torture-finish for rvv.exp" and the
following.)


Thanks for taking a look at this. Please don't get me wrong, never mean 
to vilify the patches above - and I should have verified first (by 
reverting those) if they caused the issue - if at all. It just seemed 
that we started seeing these relatively recently and the timing of your 
changes seemed to coincide. As you say above, RV likely has existing 
less than ideal constructs which somehow used to work before, but not in 
the new regime.


Anyhow the goal is to get this fixed for RV and any more help will be 
appreciated since I'm really a TCL noob.


It seems my claim yesterday that adding torture-{init,finish} fixed RV 
issues were just premature. I was trying a mix of running the full suite 
vs. just  RUNTESTFLAGS="riscv.exp" and in some cases latter can give a 
false positive (I was making sure dejagnu got rebuilt and rekicked etc, 
but anyhow different issue).


I'm currently removing the ADDITIONAL_TORTURE_OPTIONS to see if this 
helps cure it and then try the new sequence you pointed to above.


FWIW if you want to test this out at your end, it is super easy.

|git clone https://github.com/riscv-collab/riscv-gnu-toolchain 
toolchain-upstream cd toolchain-upstream git submodule init git 
submodule update ||./configure --with-arch=rv64imafdc --with-abi=lp64d --enable-multilib 
--enable-linux --prefix= make -j make report-linux 
SIM=qemu Thx, -Vineet |||




Re: [PATCH] RISC-V/testsuite: Run target testing over all the usual optimization levels

2023-05-25 Thread Vineet Gupta



On 5/25/23 14:17, Vineet Gupta wrote:

FWIW if you want to test this out at your end, it is super easy.

|git clone https://github.com/riscv-collab/riscv-gnu-toolchain 
toolchain-upstream cd toolchain-upstream git submodule init git 
submodule update ||./configure --with-arch=rv64imafdc --with-abi=lp64d 
--enable-multilib --enable-linux --prefix= make -j 
make report-linux SIM=qemu Thx, -Vineet ||| 


Sorry the mailer clobbered the instructions.

git clone https://github.com/riscv-collab/riscv-gnu-toolchain 
toolchain-upstream

cd toolchain-upstream
git submodule init
git submodule update
./configure --with-arch=rv64imafdc --with-abi=lp64d --enable-multilib 
--enable-linux --prefix=

make -j
make -j make report-linux SIM=qemu

Thx




Re: RISC-V Test Errors and Failures

2023-05-25 Thread Vineet Gupta




On 5/25/23 13:29, Thomas Schwinge wrote:

Hi!

On 2023-05-17T09:52:13+0200, Andreas Schwab via Gcc-patches 
 wrote:

On Mai 16 2023, Vineet Gupta wrote:


Yes I was seeing similar tcl errors and such - and in my case an even
higher count.

They are coming from commit d6654a4be3b.

I call FUD.  Until you prove otherwise, of coures.


Grüße
  Thomas


Just as a data point, with those patches reverted, I don't see the tcl 
errors.


2023-05-25 7a3c9f8e8362 Revert "Let each 'lto_init' determine the 
default 'LTO_OPTIONS', and 'torture-init' the 'LTO_TORTURE_OPTIONS'"
2023-05-25 22206cb760ee Revert "Testsuite: Add missing 
'torture-init'/'torture-finish' around 'LTO_TORTURE_OPTIONS' usage"
2023-05-25 db46b946dd6d Revert "Testsuite: Add 'torture-init-done', and 
use it to conditionalize implicit 'torture-init'"

2023-05-25 bd412162fd0d Revert "xxx vineet fixup"
2023-05-22 97a5e2241d33 xxx vineet fixup
2023-05-24 ec2e86274427 Fortran: reject bad DIM argument of SIZE 
intrinsic in simplification [PR104350]

...

   = Summary of gcc testsuite =
    | # of unexpected case / # of unique 
unexpected case

    |  gcc |  g++ | gfortran |
 rv64imafdc/  lp64d/ medlow |   25 / 4 |    1 / 1 |   72 /    12 |
 rv32imafdc/ ilp32d/ medlow |   26 / 5 |    3 / 2 |   72 /    12 |
   rv32imac/  ilp32/ medlow |   25 / 4 |    3 / 2 |  109 /    19 |
   rv64imac/   lp64/ medlow |   26 / 5 |    1 / 1 |  109 /    19 |




Re: [PATCH] RISC-V/testsuite: Run target testing over all the usual optimization levels

2023-05-25 Thread Vineet Gupta




On 5/25/23 14:17, Vineet Gupta wrote:
Thanks for taking a look at this. Please don't get me wrong, never 
mean to vilify the patches above - and I should have verified first 
(by reverting those) if they caused the issue - if at all. It just 
seemed that we started seeing these relatively recently and the timing 
of your changes seemed to coincide. As you say above, RV likely has 
existing less than ideal constructs which somehow used to work before, 
but not in the new regime.


Anyhow the goal is to get this fixed for RV and any more help will be 
appreciated since I'm really a TCL noob.


It seems my claim yesterday that adding torture-{init,finish} fixed RV 
issues were just premature. I was trying a mix of running the full 
suite vs. just  RUNTESTFLAGS="riscv.exp" and in some cases latter can 
give a false positive (I was making sure dejagnu got rebuilt and 
rekicked etc, but anyhow different issue).


I'm currently removing the ADDITIONAL_TORTURE_OPTIONS to see if this 
helps cure it 


Another data point: commenting this out altogether doesn't hep either.

-Vineet



  1   2   3   4   >