106265]

Jeff Law Fri, 22 Mar 2024 23:00:04 -0700



On 3/18/24 6:07 PM, Vineet Gupta wrote:


Naive question: why is define_split more natural vs. define_insn_and_split.
Is it because of the syntax/implementation or that one is more Combine
"centric" and other is more of a Split* Pass thing (roughly speaking of
course) or something else altogether ?

There are parts of combine that cost based on the number of insns. Thisis primarily to keep combine from taking an insane amount of time.

So when we have a little white lie like mvconst_internal we muck up thatcosting aspect of the combiner. That in turn stops the combiner fromdoing certain combinations.


As a concrete example consider this:

unsigned long long w32mem(unsigned long long w32)
{
    return w32 & ~(1U << 0);
}

Right now we use a bseti+addi to construct the constant, then do thelogical and. Prior to the combiner it looks like this:

(insn 7 3 8 2 (set (reg:DI 137)(const_int 4294967294 [0xfffffffe])) "j.c":3:16 206 {*mvconst_internal}

     (nil))
(insn 8 7 13 2 (set (reg:DI 136 [ _2 ])
        (and:DI (reg/v:DI 135 [ w32 ])
            (reg:DI 137))) "j.c":3:16 102 {*anddi3}
     (expr_list:REG_DEAD (reg:DI 137)
        (expr_list:REG_DEAD (reg/v:DI 135 [ w32 ])
            (expr_list:REG_EQUAL (and:DI (reg/v:DI 135 [ w32 ])
                    (const_int 4294967294 [0xfffffffe]))
                (nil)))))

The combiner will refuse to match a splitter where the number ofincoming insns matches the number of resulting insns. So to match thiswe'd have to write another define_insn_and_split.




If we didn't have mvconst_internal, then we'd see something like this:

(insn 6 3 7 2 (set (reg:DI 138)
        (const_int 4294967296 [0x100000000])) "j.c":3:16 -1
     (nil))
(insn 7 6 8 2 (set (reg:DI 137)
        (plus:DI (reg:DI 138)
            (const_int -2 [0xfffffffffffffffe]))) "j.c":3:16 -1
     (expr_list:REG_EQUAL (const_int 4294967294 [0xfffffffe])
        (nil)))
(insn 8 7 9 2 (set (reg:DI 136 [ _2 ])
        (and:DI (reg/v:DI 135 [ w32 ])
            (reg:DI 137))) "j.c":3:16 -1
     (nil))

Which we could match with a define_split which would generate RTL forbclri+zext.w.

Note that I suspect there's a handful of these kinds of sequences forlogical ops where the immediate doesn't fit a simm12.

Of course the point of the white lie is to expose complex constantsynthesis in away that the combiner can use to simplify things. It'sdefinitely a tradeoff either way. What's been rattling around a littlebit would be to narrow the set of constants allowed by mvconst_internalto those which require 3 or more to synthesize. THe idea being caseslike you're looking where we can use addi+add rather than lui+addi+addwould be rejected by mvconst_internal, but the more complex constantsthat push combine over the 4 insn limit would be allowed bymvconst_internal.


Would we have to revisit the new splitter (and perhaps audit the
existing define_insn_and_split patterns) if we were to go ahead with
this revert ?

I don't recall follow-ups which used the result of mvconst_internal insubsequent combiner patterns, but it should be fairly simple to searchfor them.

We'd need to look back at the original bug which resulted in creatingthe mvconst_internal pattern. My recollection is it had a fairlycomplex constant and we were unable to see enough insns to do thenecessary simplification to fix that case.

I bet whatever goes on inside perlbench, mcf and x264 (guessing basedon instruction counts in your message) + the original bug report willprovide reasonable testcases for evaluating reversion + adding patchesto avoid the regressions.

So why use "P" for your mode iterator?  I would have expected GPR since
I think this works for both SI and DI mode as-is.


My intent at least was to have this work for either of rv64/rv32 and in
either of those environments, work for both SI or DI - certainly
rv64+{SI,DI}.
To that end I debated between GPR, X and P.
It seems GPR only supports DI if TARGET_64BIT.
But I could well be wrong about above requirements or the subtle
differences in these.

I wouldn't worry about GPR only support DI for TARGET_64BIT. I don'tthink we generally expose DImode patterns for TARGET_32BIT.

For the output template "#" for alternative 0 and the add instruction
for alternative 1 is probably better.  That way it's clear to everyone
that alternative 0 is always meant to be split.

OK.

Don't you need some kind of condition on the split to avoid splitting
when you've got a register for operands[2]?


Won't the predicate "match_code" guarantee that ?

   (define_predicate "const_two_s12"
      (match_code "const_int")
    {
      return SUM_OF_TWO_S12 (INTVAL (op), false);
    })

You're right.  Missed the match_code.  Sorry for the bad steer.


Do I need to build gcc in a certain way to expose such errors - I wasn't
able to trigger such an error for the entire testsuite or SPEC build.

There's a distinct RTL checking mode. It's fairly expensive from acompile-time standpoint though.


Jeff

Re: [gcc-15 1/3] RISC-V: avoid LUI based const materialization ... [part of PR/106265]

Reply via email to