On 3/18/24 6:07 PM, Vineet Gupta wrote:


Naive question: why is define_split more natural vs. define_insn_and_split.
Is it because of the syntax/implementation or that one is more Combine
"centric" and other is more of a Split* Pass thing (roughly speaking of
course) or something else altogether ?
There are parts of combine that cost based on the number of insns. This is primarily to keep combine from taking an insane amount of time.

So when we have a little white lie like mvconst_internal we muck up that costing aspect of the combiner. That in turn stops the combiner from doing certain combinations.

As a concrete example consider this:

unsigned long long w32mem(unsigned long long w32)
{
    return w32 & ~(1U << 0);
}


Right now we use a bseti+addi to construct the constant, then do the logical and. Prior to the combiner it looks like this:


(insn 7 3 8 2 (set (reg:DI 137) (const_int 4294967294 [0xfffffffe])) "j.c":3:16 206 {*mvconst_internal}
     (nil))
(insn 8 7 13 2 (set (reg:DI 136 [ _2 ])
        (and:DI (reg/v:DI 135 [ w32 ])
            (reg:DI 137))) "j.c":3:16 102 {*anddi3}
     (expr_list:REG_DEAD (reg:DI 137)
        (expr_list:REG_DEAD (reg/v:DI 135 [ w32 ])
            (expr_list:REG_EQUAL (and:DI (reg/v:DI 135 [ w32 ])
                    (const_int 4294967294 [0xfffffffe]))
                (nil)))))

The combiner will refuse to match a splitter where the number of incoming insns matches the number of resulting insns. So to match this we'd have to write another define_insn_and_split.



If we didn't have mvconst_internal, then we'd see something like this:

(insn 6 3 7 2 (set (reg:DI 138)
        (const_int 4294967296 [0x100000000])) "j.c":3:16 -1
     (nil))
(insn 7 6 8 2 (set (reg:DI 137)
        (plus:DI (reg:DI 138)
            (const_int -2 [0xfffffffffffffffe]))) "j.c":3:16 -1
     (expr_list:REG_EQUAL (const_int 4294967294 [0xfffffffe])
        (nil)))
(insn 8 7 9 2 (set (reg:DI 136 [ _2 ])
        (and:DI (reg/v:DI 135 [ w32 ])
            (reg:DI 137))) "j.c":3:16 -1
     (nil))

Which we could match with a define_split which would generate RTL for bclri+zext.w.


Note that I suspect there's a handful of these kinds of sequences for logical ops where the immediate doesn't fit a simm12.


Of course the point of the white lie is to expose complex constant synthesis in away that the combiner can use to simplify things. It's definitely a tradeoff either way. What's been rattling around a little bit would be to narrow the set of constants allowed by mvconst_internal to those which require 3 or more to synthesize. THe idea being cases like you're looking where we can use addi+add rather than lui+addi+add would be rejected by mvconst_internal, but the more complex constants that push combine over the 4 insn limit would be allowed by mvconst_internal.



Would we have to revisit the new splitter (and perhaps audit the
existing define_insn_and_split patterns) if we were to go ahead with
this revert ?
I don't recall follow-ups which used the result of mvconst_internal in subsequent combiner patterns, but it should be fairly simple to search for them.

We'd need to look back at the original bug which resulted in creating the mvconst_internal pattern. My recollection is it had a fairly complex constant and we were unable to see enough insns to do the necessary simplification to fix that case.

I bet whatever goes on inside perlbench, mcf and x264 (guessing based on instruction counts in your message) + the original bug report will provide reasonable testcases for evaluating reversion + adding patches to avoid the regressions.


So why use "P" for your mode iterator?  I would have expected GPR since
I think this works for both SI and DI mode as-is.

My intent at least was to have this work for either of rv64/rv32 and in
either of those environments, work for both SI or DI - certainly
rv64+{SI,DI}.
To that end I debated between GPR, X and P.
It seems GPR only supports DI if TARGET_64BIT.
But I could well be wrong about above requirements or the subtle
differences in these.
I wouldn't worry about GPR only support DI for TARGET_64BIT. I don't think we generally expose DImode patterns for TARGET_32BIT.


For the output template "#" for alternative 0 and the add instruction
for alternative 1 is probably better.  That way it's clear to everyone
that alternative 0 is always meant to be split.

OK.

Don't you need some kind of condition on the split to avoid splitting
when you've got a register for operands[2]?

Won't the predicate "match_code" guarantee that ?

   (define_predicate "const_two_s12"
      (match_code "const_int")
    {
      return SUM_OF_TWO_S12 (INTVAL (op), false);
    })
You're right.  Missed the match_code.  Sorry for the bad steer.


Do I need to build gcc in a certain way to expose such errors - I wasn't
able to trigger such an error for the entire testsuite or SPEC build.
There's a distinct RTL checking mode. It's fairly expensive from a compile-time standpoint though.

Jeff

Reply via email to