https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97786

--- Comment #8 from Segher Boessenkool <segher at gcc dot gnu.org> ---
(In reply to Surya Kumari Jangala from comment #7)
> Hi Segher,
> 
> Thanks for the pointers!
> We can optimize the code further and remove the branch completely.
> 
> For P10:
> 
>      xststdcdp 0,1,48
>      setnbc 9,32
>      setbc 3,34
>      isel 3,9,3,2
>      blr
> 
> 
> For P9:
>      xststdcdp 0,1,48
>      setb 9,0
>      mfcr 3,128
>      rlwinm 3,3,3,1
>      lr 4,1
>      isel 9,9,4,0
>      isel 3,9,3,2
>      blr

Ah right, the tdc insns are ISA 3.1, not 3.0 as I misremembered.  Bah.

But we can move the bit to field bit 1 (the FG bit) using some crmove or
similar,
after which the setb will work fine?  Something like
  xststdcdp 0,1,48 # Set CR bit 0 to sign, and CR bit 2 to isinf
  crmove 2,1       # Set CR bit 0 to sign, and CR bit 1 to isinf
  setb 3,0

Hrm, that isn't quite it, heh.  We need bit 0 set for -inf and bit 2 for +inf
(or for +in as well as -inf, also fine).  So

  xststdcdp 0,1,48 # Set CR bit 0 to sign, and CR bit 2 to isinf
  crand 0,0,2      # Set CR bit 0 for -inf
  crmove 1,2       # Set CR bit 0 to sign, and CR bit 1 to isinf
  setb 3,0

(And no doubt I messed up there as well, and we probably *can* do it in just
three insns anyway.  Note that both the crlogical insns can execute
concurrently
though).

It is fine to make this most optimal only for p10 and later, of course. "Gaze
aimed at the future" and such, and setb is a horrible insn :-)

Reply via email to