Ping....
On 10/06/16 18:18, Bernd Edlinger wrote: > Hi, > > I'd like to ping for this patch: > https://gcc.gnu.org/ml/gcc-patches/2016-09/msg02362.html > > Thanks, > Bernd. > > > On 09/30/16 15:33, Bernd Edlinger wrote: >> On 09/30/16 12:14, Bernd Edlinger wrote: >>> Eric Botcazou wrote: >>>>> A comment before the SETs and a testcase would be nice. IIRC >>>>> we do have stack size testcases via using -fstack-usage. >>>> >>>> Or -Wstack-usage, which might be more appropriate here. >>> >>> Yes. good idea. I was not aware that we already have that kind of >>> tests. >>> >>> When trying to write this test. I noticed, that I did not try -Os so >>> far. >>> But for -Os the stack is still the unchanged 3500 bytes. >>> >>> However for embedded targets I am often inclined to use -Os, and >>> would certainly not expect the stack to explode... >>> >>> I see in arm.md there are places like >>> >>> /* If we're optimizing for size, we prefer the libgcc calls. */ >>> if (optimize_function_for_size_p (cfun)) >>> FAIL; >>> >> >> Oh, yeah. The comment is completely misleading. >> >> If this pattern fails, expmed.c simply expands some >> less efficient rtl, which also results in two shifts >> and one or-op. No libgcc calls at all. >> >> So in simple cases without spilling the resulting >> assembler is the same, regardless if this pattern >> fails or not. But the half-defined out registers >> make a big difference when it has to be spilled. >> >>> /* Expand operation using core-registers. >>> 'FAIL' would achieve the same thing, but this is a bit >>> smarter. */ >>> scratch1 = gen_reg_rtx (SImode); >>> scratch2 = gen_reg_rtx (SImode); >>> arm_emit_coreregs_64bit_shift (LSHIFTRT, operands[0], >>> operands[1], >>> operands[2], scratch1, scratch2); >>> >>> >>> .. that explains why this happens. I think it would be better to >>> use the emit_coreregs for shift count >= 32, because these are >>> effectively 32-bit shifts. >>> >>> Will try if that can be improved, and come back with the >>> results. >>> >> >> The test case with -Os has 3520 bytes stack usage. >> When only shift count >= 32 are handled we >> have still 3000 bytes stack usage. >> And when arm_emit_coreregs_64bit_shift is always >> allowed to run, we have 2360 bytes stack usage. >> >> Also for the code size it is better not to fail this >> pattern. So I propose to remove this exception in all >> three expansions. >> >> Here is an improved patch with the test case from the PR. >> And a comment on the redundant SET why it is better to clear >> the out register first. >> >> >> Bootstrap and reg-testing on arm-linux-gnueabihf. >> Is it OK for trunk? >> >> >> Thanks >> Bernd.