Re: RFC: Simplify rules for ctz/clz patterns and RTL

Joern Rennecke Wed, 15 Aug 2007 14:54:02 -0700

On Wed, Aug 15, 2007 at 11:55:02AM -0700, Zack Weinberg wrote:
> Joern Rennecke wrote:
> >The operation the nsb instruction of the SHmedia instruction set performs
> >is 'count number of sign bit copies'.
> >[...]
> 
> It sounds like the SH should probably be lumped in with the x86 as not 
> doing "canonical behavior".  Conveniently enough for my grand plan, it 
> already uses an UNSPEC for the actual instruction :-)
> 
> What is the result of the instruction for (64-bit) all-bits-zero or 
> all-bits-one?  64?


No, it is 63.  There is one essential sign bit and 63 more copies.

> Assuming so, it occurs to me that the result of an 
> unsigned clz() on any negative 64-bit value will be zero; thus, you 
> could get a "canonical" clz out of nsb by doing (pseudo-assembly)
> 
>       mov     result, 0
>       cmp/pz  arg
>       bf      1f
>       nsb     result, arg
> 1:

We are talking about SHmedia code here.  cmp/pz and bf are not SHmedia
instructions.  Loading a zero into result would be movi 0,result .

If you want to special-case the negative input, that would be:

 shari  arg,31,tmp
 nsb    arg,result
 cmvne  tmp,tmp,result
 addi   result,1,result

> Similarly, the x & (x-1) operation

It's x ^ (x-1) (xor) or x &~(x-1) (andc)

> used to set up for ctz/ffs in terms 
> of clz will leave the high bit set *only* for x == 0x8000 0000 0000 
> 0000; which can be tested for as x == (x&(x-1)) and the nsb skipped.
> 
> Would these sequences be slower than the current logic?

currently we have for ffs:

 addi   arg,-1,tmp
 xor    arg,tmp,tmp
 shlri  tmp,1,tmp
 nsb    tmp,tmp
 addi   tmp,-64,tmp
 cmveq  arg,r63,tmp
 sub    r63,tmp,result

Using the above sequence, we get the more register-hungry:

 addi   arg,-1,tmp
 xor    arg,tmp,tmp
 shari  tmp,31,tmp2
 nsb    tmp,tmp
 cmvne  tmp2,tmp2,tmp
 addi   tmp,1-64,tmp
 sub    r63,tmp,result

you propose:

 pt     after_nsb,trtmp
 addi   arg,-1,tmp
 andc   arg,tmp,tmp
 movi   -1,tmp2
 beq    arg,tmp,trtmp
 nsb    tmp,tmp2
after_nsb:
 addi   tmp2,-63,tmp
 sub    r63,tmp,result

or is that:

 pt     after_nsb,trtmp
 addi   arg,-1,tmp
 xor    arg,tmp,tmp
 bgt    r63,tmp,trtmp
 nsb    tmp,tmp
after_nsb:
 addi   tmp,-63,tmp
 sub    r63,tmp,result
 
At any rate, the introduction of the branch makes the code worse.

But for the ARC, it would make an interesting shortcut.  Although norm can't
be conditionalized, we can use the -1 from the xor to save on a long
immediate for 32 bit ffs.

 sub_s  tmp,arg,1
 xor.f  tmp,tmp,arg ; for -Os this can be xor_s and
 norm   result,tmp  ; then norm.f produces the flag.
 mov.mi result,tmp
 rsub   result,31,result

> >The ARC700 has a NORM instruction, which again counts the number of
> >sign bit copies.  There is a variant NORM.F which sets the N flag if the
> >input is negative.
> 
> Sorry, I don't recognize the ARC700 - which GCC back end is that?

It belongs in config/arc ; however, proper ARC700 support is not in the
FSF mainline yet.
We are working on it.

> It 
> might be worth teaching optabs.c about sign-bit-count operations, but 
> only if we have more than one architecture that can use it.

The NORM instruction is also available as an optional extension operation
for ARCtangent-A5 and ARC600.

Re: RFC: Simplify rules for ctz/clz patterns and RTL

Reply via email to