Hi, Sandra. FWIW, I tried this patch on A15 Juno with Coremark and any difference, if any, between specifying this option and not was below 1%.
Cheers, -- Evandro Menezes Austin, TX > -----Original Message----- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org] On > Behalf Of Sandra Loosemore > Sent: Friday, November 14, 2014 18:47 > To: GCC Patches > Cc: Chris Jones; Joshua Conner > Subject: [patch, arm] align saved FP regs on stack > > On ARM targets, the stack is aligned to an 8-byte boundary, but when > saving/restoring the VFP coprocessor registers in the function > prologue/epilogue, it is possible for the 8-byte values to end up at > locations that are 4-byte aligned but not 8-byte aligned. This can result in > a performance penalty on micro-architectures that are optimized for well- > aligned data, especially when such a misalignment may result in cache line > splits within a single access. This patch detects when at least one > coprocessor register value needs to be saved and adds some additional padding > to the stack at that point if necessary to align it to an 8-byte boundary. > I've re-used the existing logic to try pushing a 4-byte scratch register and > only fall back to an explicit stack adjustment if that fails. > > NVIDIA found that an earlier version of this patch (benchmarked with > SPECint2k and SPECfp2k on an older version of GCC) gave measurable > improvements on their Tegra K1 64-bit processor, aka "Denver". We aren't > sure what other ARM processors might benefit from the extra alignment, so > we've given it its own command-line option instead of tying it to -mtune. > > I did some hand-testing of this patch on small test cases to verify that the > expected alignment was happening, but it seemed to me that the expected > assembly-language patterns were likely too fragile to be hard-wired into a > test case. I also ran regression tests both with and without the switch set > so it doesn't break other things. OK to commit? > > -Sandra