[Bug rtl-optimization/111267] [14 Regression] Codegen regression from i386 argument passing changes

roger at nextmovesoftware dot com via Gcc-bugs Fri, 12 Jan 2024 11:06:47 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111267


--- Comment #6 from Roger Sayle <roger at nextmovesoftware dot com> ---
Sorry for the delay in replying/answering Jakub's questions/comments.  Yes,
using a define_insn_and_split in the backend fixes/works around the issue (and
I agree your implementation/refinement in comment #5 is better than mine in
comment #2), but I've a feeling that this approach isn't the ideal solution. 
Nothing about this split, is specific to these x86 instructions or even to the
i386 backend.

A more generic fix might be teach combine.cc that it can split parallels of two
independent sets, with no inter dependencies, into two insns if the total cost
of the two instructions is less than the original two, i.e. a 2 insn -> 2 insn
combination.

But then even this doesn't feel like the perfect approach... the reason combine
doesn't already support 2->2 combinations is that they're not normally
required, these types of problems are usually handled by GCSE or CSE or PRE (or
?).

The pattern is insn1 defines REG1 to a complicated expression, that is live in
several locations, so this instruction can't be eliminated.  However, if the
definition of REG1 is provided to insn2 that sets REG2, this second instruction
can be significantly simplified.  This feels like a classic (non-)constant
propagation problem.  I'm thinking perhaps want_to_gcse_p (or somewhere
similar) could be tweaked.

For people just joining the discussion (hopefully Jeff or a Richard):

(set (REG:DI 1) (concat:DI (REG:SI 2) (REG:SI 3))
...
(set (REG:SI 4) (low_part (REG:DI 1))

can be simplified so that the second assignment becomes just:
(set (REG:SI 4) (REG:SI 2))
and similarly for high_part vs. low_part.  These don't even
need to be in the same basic block.

In actuality, "concat" is a large ugly expression, and high_part/low_part are
actually SUBREGs (or could be TRUNCATE or SHIFT+TRUNCATE), but the theory
should remain the same.

I'm trying to figure out which pass (or cselib?) is normally responsible for
handling this type of pseudo-reg propagation.

But the define_insn_and_split certainly papers over the deficiency in the
middle-end's RTL optimizers and fixes this (very) specific case/regression.

[Bug rtl-optimization/111267] [14 Regression] Codegen regression from i386 argument passing changes

Reply via email to