http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60138
Bug ID: 60138 Summary: superh single/double precision fpu mode setting is backwards and unusable Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: bugdal at aerifal dot cx The sh target switches between single and double precision fpu modes (needed for operating on floats and doubles, respectively) via loading the fpscr from a global variable array __fpscr_values[2]. At times in the past this array has been provided by libgcc; it seems it's currently expected that libc provides it. Each time gcc needs to perform a floating point operation and does not know the current mode, it loads the full fpscr value, clobbering any non-default rounding mode (set by fesetround), exception flags (set by floating point operations), and other state. This makes it impossible to implement working fenv.h API on sh; while glibc implements these functions, they don't work, because gcc immediately clobbers the fpscr before performing the next floating point operation. It should be noted that __fpscr_values is non-const, and in fact I've found some software that writes non-default bits into this array so that it can use non-default settings (particular, flush-denormals-to-zero mode) without them getting clobbered (since gcc will happily reload whatever you put in this array). I suspect the original intent was that the fenv.h functions could also do this. However, that's utterly broken since __fpscr_values is global and floating point environment is supposed to be thread-local. To fix this, the ridiculous hack of loading the value for fpscr from __fpscr_values and directly writing to the fpscr should be removed, and replaced with a read-modify-write cycle (read old fpscr, set/mask precision bit, and write back to fpscr). Naturally this will hurt performance but the current code is simply non-working and needs to be fixed. One could move __fpscr_values to TLS, but given that TLS access does not seem to perform that well on sh, I suspect that's slower than the read-modify-write cycle on fpscr.