andreadb added inline comments.
================
Comment at: lib/Headers/x86intrin.h:49
+static __inline__ unsigned int __attribute__((__always_inline__, __nodebug__))
+__tzcnt_u32(unsigned int __X) { return __X ? __builtin_ctz(__X) : 32; }
+#ifdef __x86_64__
----------------
hans wrote:
> probinson wrote:
> > hans wrote:
> > > I'm worried about the conditional here. IIRC, ffmpeg uses TZCNT just as a
> > > faster encoding for BSF, but now we're putting a conditional in the way,
> > > so this will be slower actually. On the other hand, the alternative is
> > > weird too :-/
> > I thought there was a peephole to notice a guard like this and do the right
> > thing? In which case having the guard is fine.
> But for non-BMI targets it won't be able to remove the guard (unless it can
> prove the argument is zero).
>
> MSVC will just emit the raw 'tzcnt' instruction here, and that's what ffmpeg
> wants.
I understand your point. However, the semantic of tzcnt_u32 (at least for BMI)
is well defined on zero input. Changing that semantic for non-BMI targets
sounds a bit odd/confusing to me.
I guess ffmpeg is reliant on the fact that other compilers would either
preserve the intrinsic call until instruction selection, or fold it to a
meaningful value (i.e. 32 in this case) when the input is known to be zero. If
we remove the zero check from tzcnt_u32 (for non BMI targets), we let the
optimizer enable rules to aggressively fold calls to tzcnt_u32 to undef when
the input is zero.
As a side note: I noticed that gcc provides intrinsic `__bsfd` in ia32intrin.h.
Maybe we should do something similar?
https://reviews.llvm.org/D26335
_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits