Hi, Dorit Nuzman/Haifa/IBM wrote on 14/02/2008 17:02:45:
> This is an old debt: A while back Tim had sent me a detailed report > off line showing which C++ tests (originally from the Dongara loops > suite) were vectorized by current g++ or icpc, or both, as well as > when the vectorization by icpc required a pragma, or was partial. I > went over the loops that were reported to be vectorized by icc but > not by gcc, to see which features we are missing. There are 23 such > loops (out of a total of 77). They fall into the following 7 categories: > > (1) scalar evolution analysis fails with "evolution of base is not affine". > This happens in the 3 loops in lines 4267, 4204 and 511. > Here an example: > for (i__ = 1; i__ <= i__2; ++i__) > { > a[i__] = (b[i__] + b[im1] + b[im2]) * .333f; > im2 = im1; > im1 = i__; > } > Missed optimization PR to be opened. I opened PR35224. > > (2) Function calls inside a loop. These are calls to the math > functions sin/cos, which I expect would be vectorized if the proper > simd math lib was available. > This happens in the loop in line 6932. > I think there's an open PR for this one (at least for > powerpc/Altivec?) - need to look/open. There is PR22226. > > (3) This one is the most dominant missed optimization: if-conversion > is failing to if-convert, most likely due to the very limited > handling of loads/stores (i.e. load/store hoisting/sinking is required). > This happens in the 13 loops in lines 4085, 4025, 3883, 3818, 3631, > 355, 3503, 2942, 877, 6740, 6873, 5191, 7943. > There is on going work towards addressing this issue - see http: > //gcc.gnu.org/ml/gcc/2007-07/msg00942.html, http://gcc.gnu. > org/ml/gcc/2007-09/msg00308.html. (I think Victor Kaplansky is > currently working on this). > > (4) A scalar variable, whose address is taken outside the loop (in > an enclosing outer-loop) is analyzed by the data-references > analysis, which fails because it is invariant. > Here's an example: > for (nl = 1; nl <= i__1; ++nl) > { > sum = 0.f; > for (i__ = 1; i__ <= i__2; ++i__) > { > a[i__] = c__[i__] + d__[i__]; > b[i__] = c__[i__] + e[i__];]; > sum += a[i__] + b[i__];];]; > } > dummy_ (ld, n, &a[1], &b[1], &c__[1], &d__[1], &e[1], &aa [aa_offset], > &bb[bb_offset], &cc[cc_offset], &sum); > } > (Analysis of 'sum' fails with "FAILED as dr address is invariant". > This happens in the 2 loops in lines 5053 and 332. > I think there is a missed optimization PR for this one already. need > to look/open. > The related PRs are PR33245 and PR33244. Also there is a FIXME comment in tree-data-ref.c before the failure with "FAILED as dr address is invariant" error: /* FIXME -- data dependence analysis does not work correctly for objects with invariant addresses. Let us fail here until the problem is fixed. */ > (5) Reduction and induction that involve multiplication (i.e. 'prod > *= CST' or 'prod *= a[i]') are currently not supported by the > vectorizer. It should be trivial to add support for this feature > (for reduction, it shouldn't be much more than adding a case for > MULT_EXPR in tree-vectorizer.c:reduction_code_for_scalar_code, I think). > This happens in the 2 loops in lines 4921 and 4632. > A missed-optimization PR to be opened. Opened PR35226. > > (6) loop distribution is required to break a dependence. This may > already be handled by Sebastian's loop-distribution pass that will > be incorporated in 4.4. > Here is an example: > for (i__ = 2; i__ <= i__2; ++i__) > { > a[i__] += c__[i__] * d__[i__]; > b[i__] = a[i__] + d__[i__] + b[i__ - 1]; > } > This happens in the loop in line 2136. > Need to check if we need to open a missed optimization PR for this. I don't think that this is a loop distribution issue. The dependence between the store to a[i] and the load from a[i] doesn't prevent vectorization. The problematic one is between the store to b[i] and the load from b[i-1] in the second statement. > > (7) A dependence, similar to such that would be created by > predictive commoning (or even PRE), is present in the loop: > for (i__ = 1; i__ <= i__2; ++i__) > { > a[i__] = (b[i__] + x) * .5f; > x = b[i__]; > } > This happens in the loop in line 3003. > The vectorizer needs to be extended to handle such cases. > A missed optimization PR to be opened (if doesn't exist already). I opened a new PR - 35229. (PR33244 is somewhat related). Ira