On Thu, 8 Jun 2023 at 09:58, Maxim Kuvyrkov <maxim.kuvyr...@linaro.org>
wrote:

> Hi Jonathan,
>
> Interestingly, this increases code-size of -O3 code on aarch64-linux-gnu
> on SPEC CPU2017's 641.leela_s benchmark [1].
>
> In particular, FastBoard::get_nearby_enemies() grew from 1444 to 2212
> bytes.  This seems like a corner-case; the rest of SPEC CPU2017 is, mostly,
> neutral to this patch.  Is this something you may be interested in
> investigating?  I'll be happy to assist.
>

I'd certainly like to avoid the regression, but I'm too dumb to understand
most inlining bugs myself.


>
> Looking at assembly, one of the differences I see is that the "after"
> version has calls to realloc_insert(), while "before" version seems to have
> them inlined [2].
>
> [1]
> https://git.linaro.org/toolchain/ci/interesting-commits.git/tree/gcc/sha1/b7b255e77a271974479c34d1db3daafc04b920bc/tcwg_bmk-code_size-cpu2017fast/status.txt
>
>
I find it annoying that adding `if (n < sz) __builtin_unreachable()` seems
to affect the size estimates for the function, and so perturbs inlining
decisions. That code shouldn't add any actual instructions, so shouldn't
affect size estimates.

I mentioned this in a meeting last week and Jason suggested checking
whether using __builtin_assume has the same undesirable consequences, so I
think I'll start by investigating that.



> [2] 641.leela_s is non-GPL/non-BSD benchmark, and I'm not sure if I can
> post its compiled and/or preprocessed code publicly.  I assume RedHat has
> SPEC CPU2017 license, and I can post details to you privately.
>
>
Yes, I think I can get the benchmark code from Vlad.

Thanks for bringing this to my attention.

Reply via email to