On Thu, Oct 08, 2020 at 11:22:34AM +0000, Wilco Dijkstra wrote:
> >> I think a better way forward would be to make the builtin_clz/ctz more
> >> defined.
> >> Having undefined values is a source of unnecessary bugs given practically
> >> all
> >> modern targets return the number of bits for the zero input - it is
> >> relatively
> >> easy to ensure this on the few targets that don't.
> >
> > Well, e.g. i?86/x86_64 in most commonly used CPU flags is really undefined
> > (the register is unchanged). And -1 is also quite commonly used value,
> > e.g. powerpc, gcn, xtensa.
>
> So wouldn't it be easy to initialize the register before you do the bsr to get
> the same result as with BMI? I don't think an extra mov can affect performance
> in actual code (and GCC could still optimize the zero case if the input range
> doesn't include zero).
>
> -1 is more complex, if these targets don't want to add extra instructions to
> fix
> it up, we could define the zero result either -1 or #bits depending on the
> target
> (still better than completely undefined).
Having it undefined allows optimizations, and has been that way for years.
We just should make sure that we optimize code like x ? __builtin_c[lt]z (x) :
32;
etc. properly (and I believe we do).
Jakub