The following sequence of patches enables generation of LDRD/STRD instructions for Cortex-A15 with -O2 and for all Cortex-A CPUs with -Os when profitable. This almost always improves code size and is expected to improve performance on Cortex-A15.
[0/6] LDRD/STRD generation - introduction (this email) [1/6] Merge LDR/STR into LDRD/STRD with -O2 in Thumb (1-ldrdstrd.patch) [2/6] Merge LDR/STR into LDRD/STRD with -O2 in ARM (2-output-double.patch) [3/6] Merge LDR/STR into LDRD/STRD with -Os (3-size.patch) [4/6] Improve peepholes for generating LDM with commutative operators (4-ldm-commute.patch) [5/6] Generate LDRD/STRD for internal memcpy (5-internal-memcpy.patch) [6/6] Tests for LDRD/STRD/LDM/STM generation for Cortex-A9/A15 (6-cortexa-tests.patch, creatests.py) The patches are to be applied in the given order. Testing and benchmarking is in progress: * Passed all tests in check-gcc without regressions on qemu for target arm-none-eabi built with newlib and various configurations of A15/A9/Thumb/ARM (see ^note). * Successful cross-build of arm-none-linux-gnueabi with eglibc and A15/A9/Thumb/ARM * Successful bootstrap on Cortex-A9 Tegra Linux Ubuntu board (A9 Thumb/ARM). * Cross-built Spec2k using arm-none-linux-gnueabi compiler in several configurations of A9/A15/Thumb/ARM. * Ran CINT Spec2K on Cortex-A9 VE2 Linux board. No regression in runtime between the files built with the trunk and the patched versions of the compiler (A9 Thumb/ARM). There are some LDRD/STRD instructions in the patched version. * A brief look at CSiBE benchmark: the results show small improvement in code size with a small increase in compilation time for most benchmarks. I am working on more benchmark results and their detailed analysis. -- Greta (^note) The patch has accidently fixed or masked a failure in a regression test of vector shuffle: FAIL->PASS: gcc.dg/torture/vshuf-v8sf.c -O2 execution test (an abort statement is executed) The test fails when the compiler is configured with cortex-a15 thumb neon and softfp, gcc trunk r180197. After the patch is applied, the test passes.