During development of the patch I just posted for double-word clz, I went through all the back ends and audited their use of the bit-scan named patterns and RTL. It appears to me that our current handling of C[LT]Z_DEFINED_VALUE_AT_ZERO is much more complicated than it needs to be, and also that between my patch and Sandra's earlier patch for synthetic ctz/ffs, we have an opportunity to delete a bunch of code from the back ends.
In this message, I'll use the word "instruction" when I am talking about an actual hardware operation on a particular architecture; the word "pattern" when I am talking about a named define_insn or define_expand in a machine description; and the word "expression" when I am talking about RTL. The word "port" refers to the GCC back-end for a particular CPU architecture. There are eleven ports that make use of an clz instruction. That use is not necessarily in a clz pattern or with clz expressions - some only define ffs patterns, and some use UNSPECs. This is mostly irrelevant to what I want to talk about, though. alpha arm i386 m68k mips rs6000 s390 score sh sparc xtensa Of these, the majority have instructions that, when the input is zero, write to the output a value equal to the number of bits in the input (i.e. GET_MODE_BITSIZE of the mode of the input). I'll refer to this as canonical behavior. Furthermore, these ports set CLZ_DEFINED_VALUE_AT_ZERO to reflect that fact. alpha arm m68k mips rs6000 s390 xtensa The score, sh and sparc instructions may or may not display canonical behavior; their ports do not define CLZ_DEFINED_VALUE_AT_ZERO and I was not able to find documentation of the relevant instruction. i386, as is well known, has a clz instruction that does not write a predictable value to the output when the input is zero, and so correctly does not define CLZ_DEFINED_VALUE_AT_ZERO. (Actually, when TARGET_ABM is true, we are using a new instruction that *does* display canonical behavior, and my aforementioned patch sets CLZ_DEFINED_VALUE_AT_ZERO to reflect that; but again this is mostly irrelevant.) No port needs CLZ_DEFINED_VALUE_AT_ZERO to be a tristate. Either both or neither of the clz pattern and the clz expression produce a defined value at zero. No port defines CLZ_DEFINED_VALUE_AT_ZERO to set the 'val' argument to anything other than GET_MODE_BITSIZE (mode). [Some of them hardcode the constant instead of using that expression.] ---- There are two ports that make use of a ctz instruction: alpha i386 alpha's instruction displays canonical behavior; i386's instruction does not write a predictable value to the output when the input is zero (TARGET_ABM does not help here). Both ports have correct definitions or non-definitions of CTZ_DEFINED_VALUE_AT_ZERO. In addition, four ports define ctz patterns that expand to multi-instruction sequences. arm ia64 rs6000 xtensa Of these, all except ia64 are presently redundant with the generic expander Sandra added to optabs.c. ia64 generates a different sequence involving a popcount instruction; it would be easy enough to add that to optabs.c. CTZ_DEFINED_VALUE_AT_ZERO is not defined by all four ports, but could be. Those that define it, do so correctly. No port needs CTZ_DEFINED_VALUE_AT_ZERO to be a tristate. There are three cases: both the pattern and the expression have a defined value at zero (alpha); neither the pattern nor the expression has a defined value at zero (i386); the pattern has a defined value at zero and the expression is never emitted so its value at zero is moot (arm, rs6000, xtensa, ia64). If optabs.c were taught to synthesize ctz in terms of popcount, the arm, rs6000, xtensa, and ia64 definitions of ctz patterns could all be removed. There would then be no port that defined CTZ_DEFINED_VALUE_AT_ZERO to set 'val' to anything other than GET_MODE_BITSIZE (mode). [My patch removes the arm ctz pattern. rs6000 and xtensa could be removed now.] ---- There is no port that makes use of an ffs instruction. However, there are nine architectures that define ffs patterns. alpha arm i386 ia64 rs6000 score sh sparc xtensa All except ia64's are redundant with optabs.c after Sandra's patch plus my patch. ia64's would be redundant if the aforementioned popcount sequence were added to optabs.c. There is no port that uses the ffs expression. ffs always has a defined value at zero, so there is no FFS_DEFINED_VALUE_AT_ZERO macro nor any need for one. ---- The machine-independent uses of C[LT]Z_DEFINED_VALUE_AT_ZERO are quite limited: * builtins.c (fold_builtin_bitop): Uses them to determine the value of __builtin_clz* and __builtin_ctz* for a zero argument. Interestingly, if the macros are false for a given mode, it folds the builtins as if they displayed canonical behavior. * optabs.c: Uses them in strategies for expanding ctz and ffs. * rtlanal.c (nonzero_bits1): Uses them to decide what bits can be nonzero in the result of a clz or ctz expression. * simplify-rtx.c: Uses them when propagating zeroes into clz and ctz expressions. Appears to fake the canonical behavior in all cases if the macro returns false. ---- Given all of the above, I propose that we simplify our support for bit scan instructions as follows: * We do add the popcount expansion of ctz to optabs.c, and we remove the ctz and ffs patterns from all targets (except i386 and alpha's ctz). This IMO is a no-brainer. * Since no one uses it, we rip out all support for the ffs pattern and expression. * We redefine the clz and ctz expressions to have semantics consistent with canonical hardware behavior, unconditionally. Ports with instructions inconsistent with that rule must use UNSPECs instead. It is my hope and expectation that this only affects the i386. However, it might also affect score, sh, and sparc. I'd appreciate it if the maintainers of those ports could report the behavior of the relevant instructions. * We also redefine __builtin_clz to have those semantics unconditionally. * I would like to do the same for __builtin_ctz, but there is a catch. The synthetic ctz sequence in terms of popcount (as presently implemented by ia64.md, and potentially usable for at least i386 and rs6000 as well if moved to optabs.c) produces the canonical behavior at zero, but the synthetic sequence in terms of clz (as presently implemented by optabs.c) produces the value -1 at zero. I have not been able to think of any refinement to that sequence that would reliably produce GET_MODE_BITSIZE(mode) at zero in an efficient manner. Furthermore, -1 is the value most convenient for implementing ffs in terms of ctz. Opinions and/or clever bit manipulation hacks would be much appreciated. * The sole remaining use of C[LT]Z_DEFINED_VALUE_AT_ZERO is then in optabs.c. We replace this with two targetm booleans, defaulting to true, whose meaning is precisely 'all clz/ctz named patterns for this target display canonical hardware behavior'. zw