https://sourceware.org/bugzilla/show_bug.cgi?id=34104
Bug ID: 34104
Summary: Add CLZ/CLO macro support for MIPS R5900 (EE) using
plzcw
Product: binutils
Version: 2.34
Status: UNCONFIRMED
Severity: enhancement
Priority: P2
Component: binutils
Assignee: unassigned at sourceware dot org
Reporter: archicharmer at mail dot ru
Target Milestone: ---
Created attachment 16697
--> https://sourceware.org/bugzilla/attachment.cgi?id=16697&action=edit
Things inside - .c test files, PDF with macro representations, a .diff
I would like to propose adding support for count leading/trailing zeros/ones
macros specifically for the R5900 (Emotion Engine) target utilizing its native
plzcw instruction opcode.
plzcw (Parallel Leading Zero or one Count Word), a quote of the description
from the manual: "The number of leading zeros or ones of the two words in GPR
rs are counted. The results of the leading counts minus one are loaded in the
corresponding words in GPR rd.". More simply saying, it operates on the lower
64 bits (two 32-bit words) of the 128-bit GPR, searching in each one as
follows: from left to right, it counts the bits equal to the sign bit till the
sign bit's opposite is encountered. Despite the fact it's an MMI opcode, it
leaves the upper 64 bits of destination register as is, not overwrites them.
In each implementation, I was following the next rules:
1. It must bit-for-bit match the reference functions, such as libgcc's
__ctzsi2, __clzsi2 and etc corresponding __builtins, correctly handling edge
cases (full zero, all ones, 63rd bit set, 32nd/31st bit set, etc) as well.
However, practice showed some issue with dctz/dcto, see below;
2. The sequence must be branchless, preventing pipeline stalls;
3. Using %1 (input) for reading, while %0 (output) and $at (GPR 1) are used for
writing/intermediates, adhering to MIPS macro standards;
4. The upper 64 bits of the output register must stay equal between before and
after the macro;
5. Placing the instructions in such order to facilitate dual-issue /
superscalar execution on the R5900 pipeline when it's possible.
All the implementations are tested and verified on the real hardware (PS2),
loading an O32 kernel v5.4.221 in the Linux system built from scratch. You can
see the test files I used in the attached archive.
The PDF file inside the attached archive is a visual traces for each
implementation regarding those which operates with doubleword, illustrating how
each instruction affects the W3:W2:W1:W0 words in 128-bit register.
On dctz/dcto edge cases:
Note that the dctz/dcto implementation returns -1 for cases where standard
__builtin_ctzll might expect 31 (or undefined). I suppose __builtin_ctzll
returns 31 at the full zero case it's because its logic is like:
case the lower word is non-zero --- ctzll = ctz(lower word)
case the lower word is zero --- ctzll = ctz(upper word) + 32
Taking into account, that ctz returns -1 at the zero case, then ctzll(0) might
evaluate into the next formula:
ctzll = ctz(0) + 32 = -1 + 32 = 31
I used __builtin_ctzll because the toolchain, the kernel, the Linux system I am
using - all of them are ABI O32, so it might be the appropriate to doubleword
count trailing zeros/ones __builtin reference function is just unavailable for
me to ideally be compared with. Tell me if dctz(0)=-1
dcto(0xFFFFFFFFFFFFFFFF)=-1 implementations should be adjusted.
Also, among the files inside the attached archive, there is a preliminary .diff
file showing my progress of a try to seamlessly integrate with the existing
MIPS macro framework. I updated the opcode tables and basic structures, but
I’ve hit a roadblock with the macro expansion syntax in tc-mips.c
Could a maintainer assist in adding these as a macros to opcodes/mips-opc.c?
--
You are receiving this mail because:
You are on the CC list for the bug.