http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60138
Oleg Endo <olegendo at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Target| |sh*-*-* Status|UNCONFIRMED |RESOLVED CC| |olegendo at gcc dot gnu.org Resolution|--- |DUPLICATE --- Comment #1 from Oleg Endo <olegendo at gcc dot gnu.org> --- (In reply to Rich Felker from comment #0) > The sh target switches between single and double precision fpu modes (needed > for operating on floats and doubles, respectively) via loading the fpscr > from a global variable array __fpscr_values[2]. At times in the past this > array has been provided by libgcc; it seems it's currently expected that > libc provides it. What leads you to this conclusion? __fpscr_values is still in libgcc -- at least last time I checked. > > Each time gcc needs to perform a floating point operation and does not know > the current mode, it loads the full fpscr value, clobbering any non-default > rounding mode (set by fesetround), exception flags (set by floating point > operations), and other state. This makes it impossible to implement working > fenv.h API on sh; while glibc implements these functions, they don't work, > because gcc immediately clobbers the fpscr before performing the next > floating point operation. > > It should be noted that __fpscr_values is non-const, and in fact I've found > some software that writes non-default bits into this array so that it can > use non-default settings (particular, flush-denormals-to-zero mode) without > them getting clobbered (since gcc will happily reload whatever you put in > this array). I suspect the original intent was that the fenv.h functions > could also do this. However, that's utterly broken since __fpscr_values is > global and floating point environment is supposed to be thread-local. > > To fix this, the ridiculous hack of loading the value for fpscr from > __fpscr_values and directly writing to the fpscr should be removed, and > replaced with a read-modify-write cycle (read old fpscr, set/mask precision > bit, and write back to fpscr). Naturally this will hurt performance but the > current code is simply non-working and needs to be fixed. > > One could move __fpscr_values to TLS, but given that TLS access does not > seem to perform that well on sh, I suspect that's slower than the > read-modify-write cycle on fpscr. No need to move it to TLS, since the FPSCR register is usually already part of an execution context (i.e. thread) and will be saved and restored accordingly. I've raised the FPSCR issue a while ago ( PR 53513 ) and briefly discussed it with Kaz. The conclusion was to get rid of __fpscr_values usage altogether, as you suggested it. The reason why it's a constant global is/was performance. Flipping the FPSCR.PR bit with 'sts - xor - lds' takes way more cycles than simply reloading it with a single lds. When this was introduced, double precision usage was not so popular and things such as fenv.h did not exist, so 'correctness' wasn't so important. Now of course times have changed. To proceed with this issue the missing bit in the mode-switching RTL pass needs to be solved first. See 3rd bullet point in PR 29349. Unfortunately I haven't had the time to do it yet. *** This bug has been marked as a duplicate of bug 53513 ***