http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49957

           Summary: Fails to SLP in 410.bwaves
           Product: gcc
           Version: 4.7.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: tree-optimization
        AssignedTo: unassig...@gcc.gnu.org
        ReportedBy: rgue...@gcc.gnu.org
                CC: s...@gcc.gnu.org, i...@gcc.gnu.org


While the loop

      subroutine shell(nx,ny,nz)
      implicit none
      integer i,j,k,l,nx,ny,nz
      real*8 q(21,nx,ny,nz),dq(21,nx,ny,nz)
      do k=1,nz
         do j=1,ny
            do i=1,nx
               do l=1,21
                  q(l,i,j,k)=q(l,i,j,k)+dq(l,i,j,k)
               enddo
            enddo
         enddo
      enddo
      return
      end

is vectorized using loop vectorization the following variant is not
(as appearing in 410.bwaves):

      subroutine shell(nx,ny,nz)
      implicit none
      integer i,j,k,l,nx,ny,nz
      real*8 q(5,nx,ny,nz),dq(5,nx,ny,nz)
      do k=1,nz
         do j=1,ny
            do i=1,nx
               do l=1,5
                  q(l,i,j,k)=q(l,i,j,k)+dq(l,i,j,k)
               enddo
            enddo
         enddo
      enddo
      return
      end

first of all dependence checking on the innermost unrolled loop fails:

(compute_affine_dependence
  (stmt_a =
D.1639_140 = *q_54[D.1638_139];
)
  (stmt_b =
*q_54[D.1638_157] = D.1647_160;
)
(subscript_dependence_tester
(analyze_overlapping_iterations
  (chrec_a = {((pretmp.33_209 + 6) + pretmp.33_213) + offset.5_32, +, 5}_3)
  (chrec_b = {((pretmp.33_209 + 7) + pretmp.33_213) + offset.5_32, +, 5}_3)
(analyze_siv_subscript
siv test failed: unimplemented.
)

as pretmp.33 is signed and we thus do not associate the constant offset
(well, I think this might be the problem at least).

That shouldn't prevent vectorization (and it doesn't).  But then
(probably due to the same reason) we get

t.f:7: note: === vect_analyze_data_ref_accesses ===
t.f:7: note: not consecutive access D.1639_140 = *q_54[D.1638_139];
t.f:7: note: not vectorized: complicated access pattern.
t.f:7: note: bad data access.
t.f:1: note: vectorized 0 loops in function.

so in the end it _does_ seem to be the underlying issue.

I will see what can be done here.

Reply via email to