I understand comparing very very small benchmarks like this can be misleading but I believe I've looked at this enough to have a sense that it is demonstrating a basic truth and not a narrow performance issue.
The test case that has been attached shows a FORTRAN and Ada program that are equivalent (within their matrix multiply loop). The Ada one runs about 2x slower with about 3x the number of machine instructions in the inner loop. (Note that running with Ada run time checks disabled). I dumped the optimized trees (as the original tree of the Ada version was difficult to read because of the node types not being known to the pretty printer). The Ada tree is certainly a mess compared to the FORTRAN version. The core of the FORTRAN code looks like do I = 1,N do J = 1,N sum = 0.0 do R = 1,N sum = sum + A(I,R)*B(R,J) end do C(I,J) = sum end do end do With the resulting optimized tree fragment (of the inner most loop) being <L25>:; sum = MEM[base: (real4 *) ivtmp.97] * MEM[base: (real4 *) pretmp.81, index: (real4 *) ivtmp.161 + (real4 *) ivtmp.94, step: 4B, offset: 4B] + sum; ivtmp.94 = ivtmp.94 + 1; ivtmp.97 = ivtmp.97 + ivtmp.157; if (ivtmp.94 == (<unnamed type>) D.1273) goto <L29>; else goto <L25>; While the core of the Ada code looks like: for I in A'range(1) loop for J in A'range(2) loop Sum := 0.0; for R in A'range(2) loop Sum := Sum + A(I,R)*B(R,J); end loop; C(I,J) := Sum; end loop; end loop; With the resulting optimized tree fragment of the inner most loop being : <L15>:; D.2370 = (*D.2277)[pretmp.627]{lb: tst_array__L_3__T16b___L sz: pretmp.709 * 4}[(<unnamed type>) r]{lb: tst_array__L_4__T17b___L sz: 4}; <bb 51>: temp.721 = D.2344->LB0; <bb 52>: temp.720 = D.2344->UB1; <bb 53>: temp.719 = D.2344->LB1; <bb 54>: j.73 = (<unnamed type>) j; D.2373 = (*D.2298)[(<unnamed type>) r]{lb: temp.721 sz: MAX_EXPR <(temp.720 + 1 - temp.719) * 4, 0> + 3 & -4}[j.73]{lb: temp.719 sz: 4}; <bb 55>: D.2374 = D.2370 * D.2373; <bb 56>: sum = D.2374 + sum; <bb 57>: if (r == tst_array__L_4__T17b___U) goto <L17>; else goto <L16>; <L16>:; r = r + 1; goto <bb 50> (<L15>); Now, I'll be the first to admit that I know very little about the innards of compiler technology but that tree looks like a horrible mess. It is no wonder the resulting assembly is such a mess. I am attaching a tar file that has the complete source for the Ada and the FORTRAN version. -- Summary: Ada produces substantially slower code than FORTRAN for identical inputs - looping over double subscripted arrays Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: ada AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: jeff at thecreems dot com GCC build triplet: i686-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29543