Re: [PATCH] Improve optimizer to avoid stack spill across pure function call

user202729 Wed, 17 Jul 2024 21:17:31 -0700




Sent with Proton Mail secure email.

On Wednesday, July 17th, 2024 at 11:20 PM, Jeff Law <jeffreya...@gmail.com> 
wrote:

> 
> On 7/15/24 7:53 AM, Vladimir Makarov wrote:
> 
> > On 6/14/24 07:10, user202...@protonmail.com wrote:
> > 
> > > This patch was inspired from PR 110137. It reduces the amount of stack
> > > spilling by ensuring that more values are constant across a pure
> > > function call.
> > > 
> > > It does not add any new flag; rather, it makes the optimizer generate
> > > more optimal code.
> > > 
> > > For the added test file, the change is the following. As can be seen,
> > > the number of memory operations is cut in half (almost, because rbx =
> > > rdi also need to be saved in the "after" version).
> > > 
> > > Before:
> > > 
> > > _Z2ggO7MyClass:
> > > .LFB653:
> > > .cfi_startproc
> > > sub rsp, 72
> > > .cfi_def_cfa_offset 80
> > > movdqu xmm1, XMMWORD PTR [rdi]
> > > movdqu xmm0, XMMWORD PTR [rdi+16]
> > > movaps XMMWORD PTR [rsp+16], xmm1
> > > movaps XMMWORD PTR [rsp], xmm0
> > > call _Z1fv
> > > movdqa xmm1, XMMWORD PTR [rsp+16]
> > > movdqa xmm0, XMMWORD PTR [rsp]
> > > lea rdx, [rsp+32]
> > > movaps XMMWORD PTR [rsp+32], xmm1
> > > movaps XMMWORD PTR [rsp+48], xmm0
> > > add rsp, 72
> > > .cfi_def_cfa_offset 8
> > > ret
> > > .cfi_endproc
> > > 
> > > After:
> > > 
> > > _Z2ggO7MyClass:
> > > .LFB653:
> > > .cfi_startproc
> > > push rbx
> > > .cfi_def_cfa_offset 16
> > > .cfi_offset 3, -16
> > > mov rbx, rdi
> > > sub rsp, 32
> > > .cfi_def_cfa_offset 48
> > > call _Z1fv
> > > movdqu xmm0, XMMWORD PTR [rbx]
> > > movaps XMMWORD PTR [rsp], xmm0
> > > movdqu xmm0, XMMWORD PTR [rbx+16]
> > > movaps XMMWORD PTR [rsp+16], xmm0
> > > add rsp, 32
> > > .cfi_def_cfa_offset 16
> > > pop rbx
> > > .cfi_def_cfa_offset 8
> > > ret
> > > .cfi_endproc
> > > 
> > > As explained in PR 110137, the reason I modify the RTL pass instead of
> > > the GIMPLE pass is that currently the code that handle the
> > > optimization is in the IRA.
> > > 
> > > The optimization involved is: rewrite
> > > 
> > > definition: a = something;
> > > ...
> > > use a;
> > > 
> > > to move the definition statement right before the use statement,
> > > provided none of the statements inbetween modifies "something".
> > > 
> > > The existing code only handle the case where "something" is a memory
> > > reference with a fixed address. The patch modifies the logic to also
> > > allow memory reference whose address is not changed by the statements
> > > inbetween.
> > > 
> > > In order to do that the only way I can think of is to modify
> > > "validate_equiv_mem" to also validate the equivalence of the address,
> > > which may consist of a pseudo-register.
> > > 
> > > Nevertheless, reviews and suggestions to improve the code/explain how
> > > to implement it in GIMPLE phase would be appreciated.
> > > 
> > > Bootstrapped and regression tested on x86_64-pc-linux-gnu.
> > > 
> > > I think the test passes but there are some spurious failures with some
> > > scan-assembler-* tests, looking through it it doesn't appear to
> > > pessimize the code.
> > > 
> > > I think this also fix PR 103541 as a side effect, but I'm not sure if
> > > the generated code is optimal (it loads from the global variable
> > > twice, but then there's no readily usable caller-saved register so you
> > > need an additional memory operation anyway)
> > 
> > Sorry for the big delay with the review, I missed your patch. It is
> > better to CC a patch to maintainers of the related code to avoid such
> > situation. Fortunately, Jeff Law pointed me out your patch recently. It
> > is also a good practice to ping the patch if there is no response for
> > one week. Sometimes people do several pings to get an attention for
> > their patches.
> > 
> > The patch looks ok to me. The only thing I found is that the test case
> > should be in g++.target/i386 dir, not in gcc.target/i386. I would also
> > add -std=c++11 (or something more), as the test in g++.target will be
> > run with different -std options and for -std=c++98 the test will not pass.
> > 
> > Also it is better to test such non-trivial patches on all major
> > targets. You can use compiler farm for this if you have no own
> > available machines. Otherwise, you should pay attention to new
> > testsuite regressions on other targets after submitting the patch.
> > 
> > So the patch can be committed with the test change I wrote above.
> > 
> > And thank you to find the opportunity to generate a better code and
> > implementing it.
> 
> Note this patch is causing the compiler to hang on the compile/pr43415.c
> testcase on visium-elf. So there's no way for it to go forward without
> some additional testing and bugfixing.
> 
> For the submitter. If you configure the compiler with:
> 
> <path-to-gcc-sources>/configure --target=visium-elf
> 
> 
> Then build with
> 
> make all-gcc
> 
> Then test with
> cd gcc; make check-gcc RUNTESTFLAGS=compile.exp=pr43415.c
> 
> You should see the failure (after a long wait for the dejagnu timeout to
> kick in).
> 
> jeff

Thanks for the review. I suspect an infinite loop might be caused by insn 
sequence of the form

(set (reg 1) (value))
(set (reg 2) (value))
(set (reg 3) (compare (reg 1) (reg 2)))

then reg 1 and reg 2 would be repeatedly move to right before (set reg 3) insn 
which cause an infinite loop. It might have been difficult to reproduce on x86 
since x86 is a CISC instead of RISC so the reg values would directly be merged 
into the (set reg 3) insn.

I will look more closely at the logic and determine how best to handle the 
error (and why the current code does not have this problem), then send an 
updated patch later.
Re: [PATCH] Improve optimizer to avoid stack spill across pure function call

Reply via email to