On Tue, 3 Dec 2019, Richard Sandiford wrote: > Jakub Jelinek <ja...@redhat.com> writes: > > Hi! > > > > As discussed in the PR, we can't optimize e.g. > > int a = t - 1; > > int b = a * v; > > return b + v; > > into return t * v; for signed non-wrapv arithmetics. This can be done > > by the match.pd (A * B) +- A -> (B +- 1) * A or > > A +- (A * B) -> (1 +- B) * A canonicalizations. Being a lazy guy, > > I wrote attached brute force proglet to look for all the cases (for signed > > char) where there is no UB in the original and the transformation would > > introduce UB. For the simple cases with just A and B, rather than A, B and > > C, the problematic cases are for signed char only: > > A*B+A -> (B+1)*A A==-1 && B==127 > > A*B+A -> (B+1)*A A==0 && B==127 > > A*B-A -> (B-1)*A A==0 && B==-128 > > A-A*B -> (1-B)*A A==-1 && B==-127 > > A-A*B -> (1-B)*A A==0 && B==-128 > > A-A*B -> (1-B)*A A==0 && B==-127 > > The current patterns already use VRP (tree_expr_nonzero_p and > > expr_not_equal_to) to do the transformation only if A is known not to be 0 > > or -1. But as the above problematic cases show, for A*B-A the -1 > > case is actually not problematic (transformation doesn't introduce UB; > > if A is -1, -1*B+1 has UB only for minimum and minimum+1, and the > > replacement (B-1)*-1 has also UB for those two cases only) and even when we > > know nothing about A value range, if we know something about B value range, > > we could still optimize. So, for the > > int a = t - 1; > > int b = a * v; > > return b + v; > > case, the a = t - 1 has value range that doesn't include maximum and > > so we can conclude it is ok to transform it into return ((t - 1) + 1) * v > > and thus t * v. > > > > Unfortunately, the patch "broke" a few loop_versioning_*.f90 tests (CCing > > author), where there are small differences between the lversion pass, > > e.g. in loop_versioning_1.f90 (f1): > > - # RANGE ~[0, 0] > > - _4 = iftmp.11_6 * S.4_19; > > - _11 = _4 - iftmp.11_6; > > + # RANGE [0, 9223372036854775806] NONZERO 9223372036854775807 > > + _4 = S.4_19 + -1; > > + _11 = _4 * iftmp.11_6; > > and the lversion pass then emits just one message instead of two, but in the > > end assembly is identical. In loop_versioning_6.f90 (though, with -m32 > > only), the code before lversion pass actually looks better in f1: > > - # i_35 = PHI <1(8), i_28(9)> > > - _9 = iftmp.33_15 * i_35; > > - _10 = _9 * 2; > > - _21 = _10 - iftmp.33_15; > > - (*x.0_23)[_21] = 1.0e+2; > > - _11 = i_35 * 2; > > - _12 = _11 + 1; > > - _13 = _12 * iftmp.33_15; > > - _22 = _13 - iftmp.33_15; > > - (*x.0_23)[_22] = 1.01e+2; > > - i_28 = i_35 + 1; > > - if (iftmp.36_25 < i_28) > > + # i_31 = PHI <1(8), i_26(9)> > > + _10 = iftmp.33_13 * i_31; > > + _11 = _10 * 2; > > + _19 = _11 - iftmp.33_13; > > + (*x.0_21)[_19] = 1.0e+2; > > + (*x.0_21)[_11] = 1.01e+2; > > + i_26 = i_31 + 1; > > + if (iftmp.36_23 < i_26) > > where due to the new canonicalizations we managed to avoid some > > multiplications. One index was iftmp*i*2-iftmp and the other > > was iftmp*(i*2+1)-iftmp and with the patch we managed to simplify > > the latter into iftmp*i*2 and use for that the temporary used for > > the first expression. f1 is actually in assembly smaller because of this. > > The lp64 vs. ! lp64 is just a wild guess, guess testing on further targets > > will show what is the target property that matters. > > We're supposed to version all 9 functions on all targets though, > so I think we should xfail the test for ! lp64 rather than expect > different output. > > E.g. we longer vectorise f1 for -mabi=ilp32 on aarch64-linux-gnu, > which is a regression of sorts, > > Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install?
OK. > Richard > > > 2019-12-03 Richard Sandiford <richard.sandif...@arm.com> > > gcc/testsuite/ > * gfortran.dg/loop_versioning_6.f90: XFAIL the scans for ! lp64. > > Index: gcc/testsuite/gfortran.dg/loop_versioning_6.f90 > =================================================================== > --- gcc/testsuite/gfortran.dg/loop_versioning_6.f90 2019-12-02 > 17:38:19.276428679 +0000 > +++ gcc/testsuite/gfortran.dg/loop_versioning_6.f90 2019-12-03 > 09:45:30.673414203 +0000 > @@ -89,7 +89,5 @@ subroutine f9(x, limit, step) > end do > end subroutine f9 > > -! { dg-final { scan-tree-dump-times {want to version containing loop} 9 > "lversion" { target lp64 } } } > -! { dg-final { scan-tree-dump-times {versioned this loop} 9 "lversion" { > target lp64 } } } > -! { dg-final { scan-tree-dump-times {want to version containing loop} 8 > "lversion" { target { ! lp64 } } } } > -! { dg-final { scan-tree-dump-times {versioned this loop} 8 "lversion" { > target { ! lp64 } } } } > +! { dg-final { scan-tree-dump-times {want to version containing loop} 9 > "lversion" { xfail { ! lp64 } } } } > +! { dg-final { scan-tree-dump-times {versioned this loop} 9 "lversion" { > xfail { ! lp64 } } } } > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)