------- Comment #5 from dominiq at lps dot ens dot fr 2010-08-02 11:41 ------- > As pointed out by Dominique, one needs to be careful. I think one can > optimize: > > c = matmul(b, transpose(c)) > c = matmul(c,b) > where c is a rank-2 matrix. I think one cannot optimize it if c is just a > vector. (Add "conj" as you wish.) And check whether I have not confused > columns > and rows
So far I have never seen a product matrix*matrix done "in place" and I seriously doubt it can be done. The best I can see is to have a temporary of rank 1 only instead of the full matrix. In addition roughly speaking the matrix product is O(n**3) while the cost of the temporary is O(n**2), probably negligible for large matrices (for small matrices calling the present matmul is much slower than any other possible algorithm). Note also that the polyhedron test nf.f90 is compiled with unnecessary temporaries: [macbook] lin/test% gfc -Warray-temporaries nf.f90 nf.f90:293.30: if ( i>i1 ) x(i:i+nxy-1) = x(i:i+nxy-1) - au3(i-nxy:i-1)*x(i-nxy:i-1) 1 Warning: Creating array temporary at (1) nf.f90:280.21: g(i:i+nxy-1) = g(i:i+nxy-1) - au3(i-nxy:i-1)*g(i-nxy:i-1) 1 Warning: Creating array temporary at (1) nf.f90:261.29: if ( i>i1 ) x(i:i+nx-1) = x(i:i+nx-1) - au2(i-nx:i-1)*x(i-nx:i-1) 1 Warning: Creating array temporary at (1) nf.f90:248.20: g(i:i+nx-1) = g(i:i+nx-1) - au2(i-nx:i-1)*g(i-nx:i-1) 1 Warning: Creating array temporary at (1) i.e., disjoint sections are not spotted. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45159