------- Comment #6 from burnus at gcc dot gnu dot org 2010-08-02 12:41 ------- (In reply to comment #5) > > As pointed out by Dominique, one needs to be careful. I think one can > > optimize: > > c = matmul(b, transpose(c)) > > c = matmul(c,b) > > where c is a rank-2 matrix. I think one cannot optimize it if c is just a > > vector. (Add "conj" as you wish.) And check whether I have not confused > > columns > > and rows > > So far I have never seen a product matrix*matrix done "in place" and I > seriously doubt it can be done. > The best I can see is to have a temporary of rank 1
I fail to see why a scalar temporary is not enough: do j = 1, N do i = 1, N tmp = 0.0 do k = 1, N tmp = a(i,k)*b(k,j) end do a(i,j) = tmp end do end do Note: it won't work with a scalar if one reverses the i and the j loop. [It would work with a rank-1 "B" argument, but then "A = matmul(B,A)" won't work as the result is a rank-1 array ...] > Note also that the polyhedron test nf.f90 is compiled with unnecessary > temporaries: My feeling is that ONE, THREE (of comment 0), FIVE (of comment 1), and your example (comment 5) point all to the same dependency-analysis problem. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45159