http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34265

--- Comment #34 from Dominique d'Humieres <dominiq at lps dot ens.fr> 
2011-05-22 12:06:20 UTC ---
Created attachment 24325
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=24325
reduced tests

The attached bzipped tar contains the files induct_red.f90 with the all the
infrastructure to provide a realistic framework to run a reduced version of
the subroutine mutual_ind_quad_cir_coil contained in induct_qc_x.F90
(reduced to only one critical nested loops).

When the macro XPA is defined the original rotate code

          rot_q_vector(1) = dot_product(rotate_quad(1,:),q_vector(:))
          rot_q_vector(2) = dot_product(rotate_quad(2,:),q_vector(:))
          rot_q_vector(3) = dot_product(rotate_quad(3,:),q_vector(:))

is unrolled as (q_vector(2)==0) if the macro FLD is not defined

          rot_q_vector(1) = rotate_quad(1,1) * q_vector(1) + &
                    rotate_quad(1,2) * q_vector(2)
          rot_q_vector(2) = rotate_quad(2,1) * q_vector(1) + &
                    rotate_quad(2,2) * q_vector(2)
          rot_q_vector(3) = rotate_quad(3,1) * q_vector(1) + &
                    rotate_quad(3,2) * q_vector(2)

Otherwise it is folded as

          rot_q_vector(:) = rotate_quad(:,1) * q_vector(1) + &
                    rotate_quad(:,2) * q_vector(2)

When the macro XPB is defined the original numerator

          numerator = w1gauss(j) * w2gauss(k) *               &
                  dot_product(coil_current_vec,current_vector)

is unrolled as

          numerator = w1gauss(j) * w2gauss(k) *               &
                 (coil_current_vec(1)*current_vector(1) + &
                  coil_current_vec(2)*current_vector(2) + &
                  coil_current_vec(3)*current_vector(3))

When the macro XPC is defined the original denominator

          denominator = sqrt(dot_product(rot_c_vector-rot_q_vector, &
                         rot_c_vector-rot_q_vector))

is unrolled as
          denominator = sqrt((rot_c_vector(1)-rot_q_vector(1))**2 + &
                     (rot_c_vector(2)-rot_q_vector(2))**2 + &
                     (rot_c_vector(3)-rot_q_vector(3))**2)


It contains also a script to run the twelve cases and one case with
graphite and the raw results for revisions 167530, 167531, and 173917
(original, with r167531 reverted: 173917r1, and with /* NEXT_PASS
(pass_complete_unrolli); */ : 173917n since I think this is related to
revision 134730).

See also pr49006.

Reply via email to