On 09/30/2011 12:59 AM, David Miller wrote:
> 
> I tried to add the 'siam' instruction too but that one is really
> difficult because it influences the behavior of every float operation
> and I couldnt' find an easy way to express those dependencies.  I
> tried a few easy approaches but I couldn't reliably keep the compiler
> from moving 'siam' across float operations.
> 
> The 'siam' (Set Interval Arithmetic Mode) instruction is a mechanism
> to override the float rounding mode on a cycle-to-cycle basis, ie.
> without the cost of doing a write to the %fsr.

I don't think I'd ever expose this via a builtin.  This seems like a feature
we've talked about for a long time, but have never done anything about.

Specifically, in-compiler support for #pragma STDC FENV_ACCESS and the
various <fenv.h> routines.  We ought to be able to track the rounding
mode (and other relevant parameters) on a per-expression basis, tagging
each floating-point operation with the parameters in effect.

At some point, at or after rtl generation time, we transform these saved
parameters into manipulations of the fpu state.  We have several options:

  (1) Alpha-like where e.g. the rounding mode is directly encoded in
      the instruction.  No further optimization necessary, unless we
      are manipulating non-rounding parameters.

  (2) IA64-like where we have multiple fpu environments, and can
      encode which to use inside the instruction.  However, in this
      case we also need to set up these alternate environments and
      merge back the exception state when the user reads it.

  (3) Use optimize-mode-switching to minimize the number of changes
      to the global state.  This includes the use of SIAM vs %fsr,
      especially when a subroutine call could have changed the
      global rounding mode.

All of which is a lot of work.

> +(define_insn "bmask<P:mode>_vis"
> +  [(set (match_operand:P 0 "register_operand" "=r")
> +        (plus:P (match_operand:P 1 "register_operand" "rJ")
> +                (match_operand:P 2 "register_operand" "rJ")))
> +   (clobber (reg:SI GSR_REG))]
> +  "TARGET_VIS2"
> +  "bmask\t%r1, %r2, %0"
> +  [(set_attr "type" "array")])

I think this is wrong.  I think you want to model this as

  [(set (match_operand:DI 0 "register_operand" "=r")
        (plus:DI (match_operand:DI 1 "register_or_zero_operand" "rJ")
                 (match_operand:DI 2 "register_or_zero_operand" "rJ")))
   (set (zero_extract:DI
          (reg:DI GSR_REG)
          (const_int 32)
          (const_int 32))
        (plus:DI (match_dup 1) (match_dup 2)))]

(1) %gsr is really set to something, not just modified in
uninteresting ways; we're going to use this value later.

(2) Only the top 32 bits of %gsr are changed; the low 32 bits are
still valid.  You don't want insns that set the low 32 bits to be
deleted as dead code.  Which is what would happen

(3) I realize this version makes things difficult for 32-bit mode.
There, I think you may have to settle for an unspec.  And perhaps
the possible benefit of Properly representing the GSR change isn't
that helpful.  In which case:

    (set (reg:DI GSR_REG)
         (unspec:DI [(match_dup 1) (match_dup 2) (reg:DI GSR_REG)]
                    UNSPEC_BMASK))

> +(define_insn "bshuffle<V64I:mode>_vis"
> +  [(set (match_operand:V64I 0 "register_operand" "=e")
> +        (unspec:V64I [(match_operand:V64I 1 "register_operand" "e")
> +                   (match_operand:V64I 2 "register_operand" "e")]
> +                     UNSPEC_BSHUFFLE))
> +   (use (reg:SI GSR_REG))]

Better to push the use of the GSR_REG into the unspec, and not leave
it separate in the parallel.



r~

Reply via email to