------- Comment #4 from rguenther at suse dot de 2009-11-19 17:30 ------- Subject: Re: [4.4/4.5 Regression] Vectorizer cannot deal with PAREN_EXPR gracefully, 50% performance regression
On Thu, 19 Nov 2009, sfilippone at uniroma2 dot it wrote: > ------- Comment #3 from sfilippone at uniroma2 dot it 2009-11-19 17:17 > ------- > (In reply to comment #2) > > -ftree-vectorizer-verbose=2 tells you: > > > > eval.f90:35: note: not vectorized: relevant stmt not supported: D.1684_73 = > > ((D.1683_72)); > > > > eval.f90:32: note: not vectorized: relevant stmt not supported: D.1684_58 = > > ((D.1683_57)); > > > > PAREN_EXPRs are new in 4.4 and I believe they cannot be turned off > > right now. > > > > The loops are > > > > do i=1,nnd > > x(i) = 1.d0 + (1.d0*i)/nnd > > end do > > do i=1,n > > foo4(i) = 1.d0 + (1.d0*i)/n > > end do > > > > where the vectorizer doesn't know how to ensure evaluation order is > > preserved when trying to vectorize (1.d0*i)/n. Writing them as > > 1.d0*i/n vectorizes the function. > > > > Still the performance is lower by a factor of two compared to 4.3 > > (even with -ffast-math). > > > > Probably the bug should be split. > > > > Well, the performance drop I am looking at is in the subroutine. The > initialization loops are (to me) irrelevant, I had posted a previous version > to the mailing list where the initialization was done with random_number and > the situation was the same. > A run with profiling shows that more than 99% of the time is spent in eval_ Heh, with -fwhole-program GCC optimizes the test away and I get 0.0s runtime. Well, within eval there's nothing really obvious to me. The innermost loop is exactly the same: .L39: movsd (%r15), %xmm0 addq %rsi, %r15 subsd (%rdx), %xmm0 addq %rsi, %rdx subl $1, %eax mulsd %xmm0, %xmm0 addsd %xmm0, %xmm1 jne .L39 the next outer loop has some less loads in 4.5 but also different induction variables. So - nothing obvious to me. Richard. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108