On Wed, Aug 15, 2007 at 11:55:02AM -0700, Zack Weinberg wrote: > Joern Rennecke wrote: > >The operation the nsb instruction of the SHmedia instruction set performs > >is 'count number of sign bit copies'. > >[...] > > It sounds like the SH should probably be lumped in with the x86 as not > doing "canonical behavior". Conveniently enough for my grand plan, it > already uses an UNSPEC for the actual instruction :-) > > What is the result of the instruction for (64-bit) all-bits-zero or > all-bits-one? 64?
No, it is 63. There is one essential sign bit and 63 more copies. > Assuming so, it occurs to me that the result of an > unsigned clz() on any negative 64-bit value will be zero; thus, you > could get a "canonical" clz out of nsb by doing (pseudo-assembly) > > mov result, 0 > cmp/pz arg > bf 1f > nsb result, arg > 1: We are talking about SHmedia code here. cmp/pz and bf are not SHmedia instructions. Loading a zero into result would be movi 0,result . If you want to special-case the negative input, that would be: shari arg,31,tmp nsb arg,result cmvne tmp,tmp,result addi result,1,result > Similarly, the x & (x-1) operation It's x ^ (x-1) (xor) or x &~(x-1) (andc) > used to set up for ctz/ffs in terms > of clz will leave the high bit set *only* for x == 0x8000 0000 0000 > 0000; which can be tested for as x == (x&(x-1)) and the nsb skipped. > > Would these sequences be slower than the current logic? currently we have for ffs: addi arg,-1,tmp xor arg,tmp,tmp shlri tmp,1,tmp nsb tmp,tmp addi tmp,-64,tmp cmveq arg,r63,tmp sub r63,tmp,result Using the above sequence, we get the more register-hungry: addi arg,-1,tmp xor arg,tmp,tmp shari tmp,31,tmp2 nsb tmp,tmp cmvne tmp2,tmp2,tmp addi tmp,1-64,tmp sub r63,tmp,result you propose: pt after_nsb,trtmp addi arg,-1,tmp andc arg,tmp,tmp movi -1,tmp2 beq arg,tmp,trtmp nsb tmp,tmp2 after_nsb: addi tmp2,-63,tmp sub r63,tmp,result or is that: pt after_nsb,trtmp addi arg,-1,tmp xor arg,tmp,tmp bgt r63,tmp,trtmp nsb tmp,tmp after_nsb: addi tmp,-63,tmp sub r63,tmp,result At any rate, the introduction of the branch makes the code worse. But for the ARC, it would make an interesting shortcut. Although norm can't be conditionalized, we can use the -1 from the xor to save on a long immediate for 32 bit ffs. sub_s tmp,arg,1 xor.f tmp,tmp,arg ; for -Os this can be xor_s and norm result,tmp ; then norm.f produces the flag. mov.mi result,tmp rsub result,31,result > >The ARC700 has a NORM instruction, which again counts the number of > >sign bit copies. There is a variant NORM.F which sets the N flag if the > >input is negative. > > Sorry, I don't recognize the ARC700 - which GCC back end is that? It belongs in config/arc ; however, proper ARC700 support is not in the FSF mainline yet. We are working on it. > It > might be worth teaching optabs.c about sign-bit-count operations, but > only if we have more than one architecture that can use it. The NORM instruction is also available as an optional extension operation for ARCtangent-A5 and ARC600.