http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49957
Summary: Fails to SLP in 410.bwaves Product: gcc Version: 4.7.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: rgue...@gcc.gnu.org CC: s...@gcc.gnu.org, i...@gcc.gnu.org While the loop subroutine shell(nx,ny,nz) implicit none integer i,j,k,l,nx,ny,nz real*8 q(21,nx,ny,nz),dq(21,nx,ny,nz) do k=1,nz do j=1,ny do i=1,nx do l=1,21 q(l,i,j,k)=q(l,i,j,k)+dq(l,i,j,k) enddo enddo enddo enddo return end is vectorized using loop vectorization the following variant is not (as appearing in 410.bwaves): subroutine shell(nx,ny,nz) implicit none integer i,j,k,l,nx,ny,nz real*8 q(5,nx,ny,nz),dq(5,nx,ny,nz) do k=1,nz do j=1,ny do i=1,nx do l=1,5 q(l,i,j,k)=q(l,i,j,k)+dq(l,i,j,k) enddo enddo enddo enddo return end first of all dependence checking on the innermost unrolled loop fails: (compute_affine_dependence (stmt_a = D.1639_140 = *q_54[D.1638_139]; ) (stmt_b = *q_54[D.1638_157] = D.1647_160; ) (subscript_dependence_tester (analyze_overlapping_iterations (chrec_a = {((pretmp.33_209 + 6) + pretmp.33_213) + offset.5_32, +, 5}_3) (chrec_b = {((pretmp.33_209 + 7) + pretmp.33_213) + offset.5_32, +, 5}_3) (analyze_siv_subscript siv test failed: unimplemented. ) as pretmp.33 is signed and we thus do not associate the constant offset (well, I think this might be the problem at least). That shouldn't prevent vectorization (and it doesn't). But then (probably due to the same reason) we get t.f:7: note: === vect_analyze_data_ref_accesses === t.f:7: note: not consecutive access D.1639_140 = *q_54[D.1638_139]; t.f:7: note: not vectorized: complicated access pattern. t.f:7: note: bad data access. t.f:1: note: vectorized 0 loops in function. so in the end it _does_ seem to be the underlying issue. I will see what can be done here.