[Bug target/118182] New: RISC-V: Miscompile for 410.bwaves, 416.gamess and 465.tonto from spec2006

kito at gcc dot gnu.org via Gcc-bugs Mon, 23 Dec 2024 00:55:06 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118182


            Bug ID: 118182
           Summary: RISC-V: Miscompile for 410.bwaves, 416.gamess and
                    465.tonto from spec2006
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: kito at gcc dot gnu.org
  Target Milestone: ---

Created attachment 59955
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=59955&action=edit
Reduced testcase from 410.bwaves

410.bwaves, 416.gamess and 465.tonto from spec2006 got miscompare when compile
with just `-O3 -march=rv64gcv`, and we found the that cause by reduction sum.

And the root cause is because reduction sum may execute with VL=0, and our
generated code will not work as expect if VL=0 (but work well if VL  > 0).

The current code gen for reduction sum:

```
        # _148 = .MASK_LEN_FOLD_LEFT_PLUS (stmp__148.33_134, vect__70.32_138, {
-1, ... }, loop_len_161, 0);
        vsetvli zero,a5,e64,m1,ta,ma
        vfmv.s.f        v2,fa5
        vfredosum.vs    v1,v1,v2
        vfmv.f.s        fa5,v1

```


And here is detail analysis for why it not work as expect if VL = 0

1. vfmv.s.f        v2,fa5
vfmv.s.f won't do anything if VL=0, which means v2 will contain garbage value.

2. vfredosum.vs    v1,v1,v2
vfredosum.vs won't do anything if VL=0, and keep vd unchanged even TA.

(spec say: If vl=0, no operation is performed and the destination register is
not updated.) 

3. vfmv.f.s        fa5,v1
vfmv.f.s will move the value from v1 even VL=0, so this is safe.


---

The main root cause is come from following loop:

```
          ! Compute the sum of squares of dq elements    
          do k = 1, nz    
             do j = 1, ny    
                do i = 1, nx    
                   do l = 1, 5    
                      dqnorm = dqnorm + dq(l, i, j, k) * dq(l, i, j, k)    
                   end do    
                end do    
             end do    
          end do
```

And it will generating something like below:

```
  # ivtmp.48_106 = PHI <ivtmp.48_107(9), _84(8)>
  _128 = MIN_EXPR <ivtmp.41_49, POLY_INT_CST [10, 10]>;
  loop_len_165 = MIN_EXPR <_128, POLY_INT_CST [2, 2]>;
  _122 = _128 - loop_len_165;
  loop_len_164 = MIN_EXPR <_122, POLY_INT_CST [2, 2]>;
  _121 = _122 - loop_len_164;
  loop_len_163 = MIN_EXPR <_121, POLY_INT_CST [2, 2]>;
  _120 = _121 - loop_len_163;
  loop_len_162 = MIN_EXPR <_120, POLY_INT_CST [2, 2]>;
  loop_len_161 = _120 - loop_len_162;
  _125 = (void *) ivtmp.44_60;
...
  stmp__148.33_137 = .MASK_LEN_FOLD_LEFT_PLUS (dqnorm_lsm.20_20,
vect__70.32_150, { -1, ... }, loop_len_165, 0);
  stmp__148.33_136 = .MASK_LEN_FOLD_LEFT_PLUS (stmp__148.33_137,
vect__70.32_147, { -1, ... }, loop_len_164, 0);
  stmp__148.33_135 = .MASK_LEN_FOLD_LEFT_PLUS (stmp__148.33_136,
vect__70.32_142, { -1, ... }, loop_len_163, 0);
  stmp__148.33_134 = .MASK_LEN_FOLD_LEFT_PLUS (stmp__148.33_135,
vect__70.32_140, { -1, ... }, loop_len_162, 0);
  _148 = .MASK_LEN_FOLD_LEFT_PLUS (stmp__148.33_134, vect__70.32_138, { -1, ...
}, loop_len_161, 0);
...
```


NOTE: I have a candidate patch to fix that, but just need few more refinement
to make it become upstream-able quality

[Bug target/118182] New: RISC-V: Miscompile for 410.bwaves, 416.gamess and 465.tonto from spec2006

Reply via email to