On 3/18/24 6:07 PM, Vineet Gupta wrote:
Naive question: why is define_split more natural vs. define_insn_and_split.
Is it because of the syntax/implementation or that one is more Combine
"centric" and other is more of a Split* Pass thing (roughly speaking of
course) or something else altogether ?
There are parts of combine that cost based on the number of insns. This
is primarily to keep combine from taking an insane amount of time.
So when we have a little white lie like mvconst_internal we muck up that
costing aspect of the combiner. That in turn stops the combiner from
doing certain combinations.
As a concrete example consider this:
unsigned long long w32mem(unsigned long long w32)
{
return w32 & ~(1U << 0);
}
Right now we use a bseti+addi to construct the constant, then do the
logical and. Prior to the combiner it looks like this:
(insn 7 3 8 2 (set (reg:DI 137)
(const_int 4294967294 [0xfffffffe])) "j.c":3:16 206 {*mvconst_internal}
(nil))
(insn 8 7 13 2 (set (reg:DI 136 [ _2 ])
(and:DI (reg/v:DI 135 [ w32 ])
(reg:DI 137))) "j.c":3:16 102 {*anddi3}
(expr_list:REG_DEAD (reg:DI 137)
(expr_list:REG_DEAD (reg/v:DI 135 [ w32 ])
(expr_list:REG_EQUAL (and:DI (reg/v:DI 135 [ w32 ])
(const_int 4294967294 [0xfffffffe]))
(nil)))))
The combiner will refuse to match a splitter where the number of
incoming insns matches the number of resulting insns. So to match this
we'd have to write another define_insn_and_split.
If we didn't have mvconst_internal, then we'd see something like this:
(insn 6 3 7 2 (set (reg:DI 138)
(const_int 4294967296 [0x100000000])) "j.c":3:16 -1
(nil))
(insn 7 6 8 2 (set (reg:DI 137)
(plus:DI (reg:DI 138)
(const_int -2 [0xfffffffffffffffe]))) "j.c":3:16 -1
(expr_list:REG_EQUAL (const_int 4294967294 [0xfffffffe])
(nil)))
(insn 8 7 9 2 (set (reg:DI 136 [ _2 ])
(and:DI (reg/v:DI 135 [ w32 ])
(reg:DI 137))) "j.c":3:16 -1
(nil))
Which we could match with a define_split which would generate RTL for
bclri+zext.w.
Note that I suspect there's a handful of these kinds of sequences for
logical ops where the immediate doesn't fit a simm12.
Of course the point of the white lie is to expose complex constant
synthesis in away that the combiner can use to simplify things. It's
definitely a tradeoff either way. What's been rattling around a little
bit would be to narrow the set of constants allowed by mvconst_internal
to those which require 3 or more to synthesize. THe idea being cases
like you're looking where we can use addi+add rather than lui+addi+add
would be rejected by mvconst_internal, but the more complex constants
that push combine over the 4 insn limit would be allowed by
mvconst_internal.
Would we have to revisit the new splitter (and perhaps audit the
existing define_insn_and_split patterns) if we were to go ahead with
this revert ?
I don't recall follow-ups which used the result of mvconst_internal in
subsequent combiner patterns, but it should be fairly simple to search
for them.
We'd need to look back at the original bug which resulted in creating
the mvconst_internal pattern. My recollection is it had a fairly
complex constant and we were unable to see enough insns to do the
necessary simplification to fix that case.
I bet whatever goes on inside perlbench, mcf and x264 (guessing based
on instruction counts in your message) + the original bug report will
provide reasonable testcases for evaluating reversion + adding patches
to avoid the regressions.
So why use "P" for your mode iterator? I would have expected GPR since
I think this works for both SI and DI mode as-is.
My intent at least was to have this work for either of rv64/rv32 and in
either of those environments, work for both SI or DI - certainly
rv64+{SI,DI}.
To that end I debated between GPR, X and P.
It seems GPR only supports DI if TARGET_64BIT.
But I could well be wrong about above requirements or the subtle
differences in these.
I wouldn't worry about GPR only support DI for TARGET_64BIT. I don't
think we generally expose DImode patterns for TARGET_32BIT.
For the output template "#" for alternative 0 and the add instruction
for alternative 1 is probably better. That way it's clear to everyone
that alternative 0 is always meant to be split.
OK.
Don't you need some kind of condition on the split to avoid splitting
when you've got a register for operands[2]?
Won't the predicate "match_code" guarantee that ?
(define_predicate "const_two_s12"
(match_code "const_int")
{
return SUM_OF_TWO_S12 (INTVAL (op), false);
})
You're right. Missed the match_code. Sorry for the bad steer.
Do I need to build gcc in a certain way to expose such errors - I wasn't
able to trigger such an error for the entire testsuite or SPEC build.
There's a distinct RTL checking mode. It's fairly expensive from a
compile-time standpoint though.
Jeff