On 09/30/2011 12:59 AM, David Miller wrote: > > I tried to add the 'siam' instruction too but that one is really > difficult because it influences the behavior of every float operation > and I couldnt' find an easy way to express those dependencies. I > tried a few easy approaches but I couldn't reliably keep the compiler > from moving 'siam' across float operations. > > The 'siam' (Set Interval Arithmetic Mode) instruction is a mechanism > to override the float rounding mode on a cycle-to-cycle basis, ie. > without the cost of doing a write to the %fsr.
I don't think I'd ever expose this via a builtin. This seems like a feature we've talked about for a long time, but have never done anything about. Specifically, in-compiler support for #pragma STDC FENV_ACCESS and the various <fenv.h> routines. We ought to be able to track the rounding mode (and other relevant parameters) on a per-expression basis, tagging each floating-point operation with the parameters in effect. At some point, at or after rtl generation time, we transform these saved parameters into manipulations of the fpu state. We have several options: (1) Alpha-like where e.g. the rounding mode is directly encoded in the instruction. No further optimization necessary, unless we are manipulating non-rounding parameters. (2) IA64-like where we have multiple fpu environments, and can encode which to use inside the instruction. However, in this case we also need to set up these alternate environments and merge back the exception state when the user reads it. (3) Use optimize-mode-switching to minimize the number of changes to the global state. This includes the use of SIAM vs %fsr, especially when a subroutine call could have changed the global rounding mode. All of which is a lot of work. > +(define_insn "bmask<P:mode>_vis" > + [(set (match_operand:P 0 "register_operand" "=r") > + (plus:P (match_operand:P 1 "register_operand" "rJ") > + (match_operand:P 2 "register_operand" "rJ"))) > + (clobber (reg:SI GSR_REG))] > + "TARGET_VIS2" > + "bmask\t%r1, %r2, %0" > + [(set_attr "type" "array")]) I think this is wrong. I think you want to model this as [(set (match_operand:DI 0 "register_operand" "=r") (plus:DI (match_operand:DI 1 "register_or_zero_operand" "rJ") (match_operand:DI 2 "register_or_zero_operand" "rJ"))) (set (zero_extract:DI (reg:DI GSR_REG) (const_int 32) (const_int 32)) (plus:DI (match_dup 1) (match_dup 2)))] (1) %gsr is really set to something, not just modified in uninteresting ways; we're going to use this value later. (2) Only the top 32 bits of %gsr are changed; the low 32 bits are still valid. You don't want insns that set the low 32 bits to be deleted as dead code. Which is what would happen (3) I realize this version makes things difficult for 32-bit mode. There, I think you may have to settle for an unspec. And perhaps the possible benefit of Properly representing the GSR change isn't that helpful. In which case: (set (reg:DI GSR_REG) (unspec:DI [(match_dup 1) (match_dup 2) (reg:DI GSR_REG)] UNSPEC_BMASK)) > +(define_insn "bshuffle<V64I:mode>_vis" > + [(set (match_operand:V64I 0 "register_operand" "=e") > + (unspec:V64I [(match_operand:V64I 1 "register_operand" "e") > + (match_operand:V64I 2 "register_operand" "e")] > + UNSPEC_BSHUFFLE)) > + (use (reg:SI GSR_REG))] Better to push the use of the GSR_REG into the unspec, and not leave it separate in the parallel. r~