Hi, I am aware that gcc attempts to avoid any reordering of floating-piont operations by default, as this leads to slightly different answers on different runs. There appears to be a similar problem on the x87, where from my assembly-diving, I believe I've established that when a register spill is required, gcc only stores to the precision of the computation (eg., 64 bits for double precision). On the x87 unit, this therefore introduces an unpredictable (in the sense that the source does not have a store with its implicit round, but the executable does) round operation in the middle of the computation. This unasked-for round operation has the exact same affect as reordering two fp computations (eg, it introduces an epsilon error). This means that not only do you have differing answers where you don't expect them, but theoretically, the 80-bit x87 could produce less accurate results than true 32 or 64-bit (though this would almost never happen in practice, as it would require massive spilling).
It came to my attention because a user of my ATLAS library noted that ATLAS failed to produce a true symmetric matrix when C = A * transpose(A) was taken. If there is no reorderings, the lower triangle of C should exactly match the upper triangle. When using gcc 4.2.0 20060807 (experimental) a register spill is introduced in the calculation of a 4x1 sub-block of C. The spill only affects the C[0], and that element gets an additional round that other elements do not, leading to a slightly non-symmetric matrix. Note that this is not stores in the algorithm causing rounding (which is inevitable), but stores unpredictably introduced into the algorithm by gcc. A complete fix for this problem is to always do 80-bit register spills for the x87, regardless of the data type of the final calculation, and thus avoid the unpredictable round steps. In order to get the problem, you need a code that has a spill, and depends on getting the same answer to one spilled and one unspilled redundant calculation. I have a test case that does so for the above experimental gcc, but not for gcc 4.1.1 20060525 (Red Hat 4.1.1-1), since this earlier one doesn't inject a spill in the right place. I have not tried on various other compiler versions, because I figure this is a general policy, and if I have figured the problem right, you can confirm easily how many bits you spill from the x87. If you are interested in making the x87 produce the same answer in this case, and it is helpful, I can certainly post my tester that demonstrates the problem. I don't want to go through the trouble if the answer is either "confirmed, not going to fixed", or "confirmed, see how it would cause the error, will fix". Let me know, Clint -- Summary: register spills in x87 unit need to be 80-bit, not 64 Product: gcc Version: 4.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: whaley at cs dot utsa dot edu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30255