RFC: Simplify rules for ctz/clz patterns and RTL

Zack Weinberg Fri, 10 Aug 2007 23:45:59 -0700

During development of the patch I just posted for double-word clz, I
went through all the back ends and audited their use of the bit-scan
named patterns and RTL.  It appears to me that our current handling of
C[LT]Z_DEFINED_VALUE_AT_ZERO is much more complicated than it needs to
be, and also that between my patch and Sandra's earlier patch for
synthetic ctz/ffs, we have an opportunity to delete a bunch of code from
the back ends.


In this message, I'll use the word "instruction" when I am talking about
an actual hardware operation on a particular architecture; the word
"pattern" when I am talking about a named define_insn or define_expand
in a machine description; and the word "expression" when I am talking
about RTL.  The word "port" refers to the GCC back-end for a particular
CPU architecture.

There are eleven ports that make use of an clz instruction.  That use is
not necessarily in a clz pattern or with clz expressions - some only
define ffs patterns, and some use UNSPECs.  This is mostly irrelevant to
what I want to talk about, though.

  alpha arm i386 m68k mips rs6000 s390 score sh sparc xtensa

Of these, the majority have instructions that, when the input is zero,
write to the output a value equal to the number of bits in the input
(i.e. GET_MODE_BITSIZE of the mode of the input).  I'll refer to this as
canonical behavior.  Furthermore, these ports set
CLZ_DEFINED_VALUE_AT_ZERO to reflect that fact.

  alpha arm m68k mips rs6000 s390 xtensa

The score, sh and sparc instructions may or may not display canonical
behavior; their ports do not define CLZ_DEFINED_VALUE_AT_ZERO and I was
not able to find documentation of the relevant instruction.

i386, as is well known, has a clz instruction that does not write a
predictable value to the output when the input is zero, and so correctly
does not define CLZ_DEFINED_VALUE_AT_ZERO.  (Actually, when TARGET_ABM
is true, we are using a new instruction that *does* display canonical
behavior, and my aforementioned patch sets CLZ_DEFINED_VALUE_AT_ZERO to
reflect that; but again this is mostly irrelevant.)

No port needs CLZ_DEFINED_VALUE_AT_ZERO to be a tristate.  Either both
or neither of the clz pattern and the clz expression produce a defined
value at zero.

No port defines CLZ_DEFINED_VALUE_AT_ZERO to set the 'val' argument to
anything other than GET_MODE_BITSIZE (mode).  [Some of them hardcode the
constant instead of using that expression.]

----

There are two ports that make use of a ctz instruction:

  alpha i386

alpha's instruction displays canonical behavior; i386's instruction does
not write a predictable value to the output when the input is zero
(TARGET_ABM does not help here).  Both ports have correct definitions or
non-definitions of CTZ_DEFINED_VALUE_AT_ZERO.

In addition, four ports define ctz patterns that expand to
multi-instruction sequences.

  arm ia64 rs6000 xtensa

Of these, all except ia64 are presently redundant with the generic
expander Sandra added to optabs.c.  ia64 generates a different sequence
involving a popcount instruction; it would be easy enough to add that to
optabs.c.

CTZ_DEFINED_VALUE_AT_ZERO is not defined by all four ports, but could
be.  Those that define it, do so correctly.

No port needs CTZ_DEFINED_VALUE_AT_ZERO to be a tristate.  There are
three cases: both the pattern and the expression have a defined value at
zero (alpha); neither the pattern nor the expression has a defined value
at zero (i386); the pattern has a defined value at zero and the
expression is never emitted so its value at zero is moot (arm, rs6000,
xtensa, ia64).

If optabs.c were taught to synthesize ctz in terms of popcount, the arm,
rs6000, xtensa, and ia64 definitions of ctz patterns could all be
removed.  There would then be no port that defined
CTZ_DEFINED_VALUE_AT_ZERO to set 'val' to anything other than
GET_MODE_BITSIZE (mode).  [My patch removes the arm ctz pattern.  rs6000
and xtensa could be removed now.]

----

There is no port that makes use of an ffs instruction.  However, there
are nine architectures that define ffs patterns.

  alpha arm i386 ia64 rs6000 score sh sparc xtensa

All except ia64's are redundant with optabs.c after Sandra's patch plus
my patch.  ia64's would be redundant if the aforementioned popcount
sequence were added to optabs.c.

There is no port that uses the ffs expression.  ffs always has a defined
value at zero, so there is no FFS_DEFINED_VALUE_AT_ZERO macro nor any
need for one.

----

The machine-independent uses of C[LT]Z_DEFINED_VALUE_AT_ZERO are quite
limited:

 * builtins.c (fold_builtin_bitop): Uses them to determine the value of
 __builtin_clz* and __builtin_ctz* for a zero argument.  Interestingly,
 if the macros are false for a given mode, it folds the builtins as if
 they displayed canonical behavior.

 * optabs.c: Uses them in strategies for expanding ctz and ffs.

 * rtlanal.c (nonzero_bits1): Uses them to decide what bits can be
 nonzero in the result of a clz or ctz expression.

 * simplify-rtx.c: Uses them when propagating zeroes into clz and ctz
 expressions.  Appears to fake the canonical behavior in all
 cases if the macro returns false.

----

Given all of the above, I propose that we simplify our support for bit
scan instructions as follows:

* We do add the popcount expansion of ctz to optabs.c, and we remove the
 ctz and ffs patterns from all targets (except i386 and alpha's ctz).
This IMO is a no-brainer.

* Since no one uses it, we rip out all support for the ffs pattern and
expression.

* We redefine the clz and ctz expressions to have semantics consistent
with canonical hardware behavior, unconditionally.  Ports with
instructions inconsistent with that rule must use UNSPECs instead.  It
is my hope and expectation that this only affects the i386.  However, it
might also affect score, sh, and sparc.  I'd appreciate it if the
maintainers of those ports could report the behavior of the relevant
instructions.

* We also redefine __builtin_clz to have those semantics unconditionally.

* I would like to do the same for __builtin_ctz, but there is a catch.
The synthetic ctz sequence in terms of popcount (as presently
implemented by ia64.md, and potentially usable for at least i386 and
rs6000 as well if moved to optabs.c) produces the canonical behavior at
zero, but the synthetic sequence in terms of clz (as presently
implemented by optabs.c) produces the value -1 at zero.  I have not been
able to think of any refinement to that sequence that would reliably
produce GET_MODE_BITSIZE(mode) at zero in an efficient manner.
Furthermore, -1 is the value most convenient for implementing ffs in
terms of ctz.  Opinions and/or clever bit manipulation hacks would be
much appreciated.

* The sole remaining use of C[LT]Z_DEFINED_VALUE_AT_ZERO is then in
optabs.c.  We replace this with two targetm booleans, defaulting to
true, whose meaning is precisely 'all clz/ctz named patterns for this
target display canonical hardware behavior'.

zw

RFC: Simplify rules for ctz/clz patterns and RTL

Reply via email to