[Bug target/54252] New: [Neon] Bad alignment code generated for Neon loads
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54252 Bug #: 54252 Summary: [Neon] Bad alignment code generated for Neon loads Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: major Priority: P3 Component: target AssignedTo: unassig...@gcc.gnu.org ReportedBy: eric.ba...@allegorithmic.com Created attachment 28009 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=28009 Small repro case Using gcc trunk at rev 190381, compiled with the Android NDK r8b build-gcc.sh script (so an arm-linux-androideabi target) and the command line below, the attached repro case does not compile, and spits the following error messages: Erics-Mac:src batut$ /Users/batut/android-ndk-r8b/toolchains/arm-linux-androideabi-4.8.0/prebuilt/darwin-x86/bin/arm-linux-androideabi-gcc -mfloat-abi=hard -mfpu=vfp -mfpu=neon -marm -O1 -c test.c /var/folders/nr/l7qlwr295379gn7tqyv61jx8gn/T//cc9BgV0B.s: Assembler messages: /var/folders/nr/l7qlwr295379gn7tqyv61jx8gn/T//cc9BgV0B.s:29: Error: bad alignment -- `vld1.32 {d16},[r3:128]!' /var/folders/nr/l7qlwr295379gn7tqyv61jx8gn/T//cc9BgV0B.s:32: Error: bad alignment -- `vld1.32 {d7},[r2:128]' The assembly code generated is: algNEON: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. ldr r3, .L2 .LPIC0: add r3, pc, r3 ldr r2, .L2+4 ldr r2, [r3, r2] mov r3, r2 vld1.32 {d16}, [r3:128]! <= Offending load vld1.32 {d6}, [r3:64] add r2, r2, #16 vld1.32 {d7}, [r2:128]<= Offending load vadd.f32d16, d0, d16 vmov.f32d0, #0.0 @ v2sf vmla.f32d0, d16, d6[0] vmls.f32d0, d16, d6[1] vmla.f32d0, d16, d7[0] vmls.f32d0, d16, d7[1] bx lr This does not happen at -O0. This also happens with gcc 4.7.1. arm-linux-androideabi-gcc -v Using built-in specs. COLLECT_GCC=/Users/batut/android-ndk-r8b/toolchains/arm-linux-androideabi-4.8.0/prebuilt/darwin-x86/bin/arm-linux-androideabi-gcc COLLECT_LTO_WRAPPER=/Users/batut/android-ndk-r8b/toolchains/arm-linux-androideabi-4.8.0/prebuilt/darwin-x86/bin/../libexec/gcc/arm-linux-androideabi/4.8.0/lto-wrapper Target: arm-linux-androideabi Configured with: /Users/batut/android-ndk-r8b/src/build/../gcc/gcc-4.8.0/configure --prefix=/tmp/ndk-batut/build/toolchain/prefix --target=arm-linux-androideabi --host=x86_64-apple-darwin --build=x86_64-apple-darwin --with-gnu-as --with-gnu-ld --enable-languages=c,c++ --with-gmp=/tmp/ndk-batut/build/toolchain/temp-install --with-mpfr=/tmp/ndk-batut/build/toolchain/temp-install --with-mpc=/tmp/ndk-batut/build/toolchain/temp-install --without-ppl --without-cloog --disable-libssp --enable-threads --disable-nls --disable-libmudflap --disable-libgomp --disable-libstdc__-v3 --disable-sjlj-exceptions --disable-shared --disable-tls --disable-libitm --with-float=soft --with-fpu=vfp --with-arch=armv5te --enable-target-optspace --enable-initfini-array --disable-nls --prefix=/tmp/ndk-batut/build/toolchain/prefix --with-sysroot=/tmp/ndk-batut/build/toolchain/prefix/sysroot --with-binutils-version=2.22 --with-mpfr-version=2.4.1 --with-mpc-version=0.8.1 --with-gmp-version=5.0.5 --with-gcc-version=4.8.0 --with-gdb-version=7.3.x --disable-bootstrap --disable-libquadmath --disable-plugin --with-arch=armv5te --program-transform-name='s&^&arm-linux-androideabi-&' Thread model: posix gcc version 4.8.0 20120814 (experimental) (GCC)
[Bug target/54300] New: [4.7/4.8 Regression] Erroneous optimization causes wrong Neon data management
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54300 Bug #: 54300 Summary: [4.7/4.8 Regression] Erroneous optimization causes wrong Neon data management Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: major Priority: P3 Component: target AssignedTo: unassig...@gcc.gnu.org ReportedBy: eric.ba...@allegorithmic.com Created attachment 28044 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=28044 Small repro case Using gcc trunk at rev 190381, compiled with the Android NDK r8b build-gcc.sh script (so an arm-linux-androideabi target) and the command line below, the attached repro case generates wrong code: arm-linux-androideabi-g++ -march=armv7-a -mfloat-abi=softfp -mfpu=vfp -mfpu=neon -marm -O2 test.cpp -S -o test.s The core loop is pasted below: for(unsigned int sv=0 ; sv!=dv0 ; sv=(sv+s1v)&smask_v) { int32x4_t s; s = vmovl_s16(vget_low_s16(_loadlo_8i16((cv8i16*)_Inp, sv ))); c = vaddq_s32(c, s); } 8 bytes are fetched from "_Inp (in bytes) + sv", then sign-extended from 4 16bits values to 4 32bits values, then accumulated in "c". The generated assembly code for the loop is: .L3: addr4, r0, ip vmov.i32d18, #0 @ v4hi <= d18 is full of 0's addip, ip, r2 vld1.16{d19}, [r4:64]<= d19 holds useful data andip, ip, r5 cmpr3, ip vswpd18, d19 <= d19 is now full of 0's vmovl.s16q9, d19 <= d19 (full of 0's) gets expanded vadd.i32q8, q8, q9<= q9 is always zero when accumulated bne.L3 When using "-O1" or "-O2 -fno-gcse", correct code is generated: .L3: addr4, r0, ip addip, ip, r2 andip, ip, r5 vld1.16{d18}, [r4:64]<= d18 holds useful data cmpr3, ip vmovl.s16q9, d18 <= d18 is sign-extended vadd.i32q8, q8, q9<= q9 is accumulated bne.L3 This also happens with gcc 4.7.1, but not with gcc 4.6 Also, in the loadlo_8i16 function, if we replace the call to zero_64 by the proper vdup_n_s16(0), then correct code is generated at -O2. The (stripped down in the repro case) _v16u8_ and _v8u8_ structures are the way we implemented some kind of compiler-performed polymorphism for Neon variables, since not all ARM compilers have -flax-vector-conversions.
[Bug target/54300] [4.7/4.8 Regression] Erroneous optimization causes wrong Neon data management
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54300 --- Comment #2 from Eric Batut 2012-08-20 16:11:35 UTC --- (In reply to comment #1) Hi Richard Using "-O2 -fno-strict-aliasing" generates the exact same (incorrect) code. > Your testcase is quite convoluted but it looks you may be violating C > type-based aliasing rules. Thus, try -fno-strict-aliasing.
[Bug target/54252] [4.7/4.8 Regression] Bad alignment code generated for Neon loads
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54252 --- Comment #8 from Eric Batut 2012-08-30 15:52:20 UTC --- The original bug instance is fixed on trunk (rev 190803). I had what I think is another instance of the same bug, where the error message is "alignment of array elements is greater than element size", and this is also fixed by rev 190803. (In reply to comment #7) > Fixed now I believe on trunk.
[Bug target/55073] New: Wrong Neon code generation at -O2 caused by -fschedule-insns
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55073 Bug #: 55073 Summary: Wrong Neon code generation at -O2 caused by -fschedule-insns Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassig...@gcc.gnu.org ReportedBy: eric.ba...@allegorithmic.com Created attachment 28528 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=28528 Zipfile with repro case, build script, disassembly listings and register flow analysis Using gcc trunk at rev 192800, compiled with the Android NDK's build-gcc.sh script (arm-linux-androideabi target). Compiling the attached repro case at -O2 yields incorrect results. Correct results are generated for -O2 -fno-schedule-insns. The command line to build an incorrect program is : arm-linux-androideabi-g++ -mandroid -march=armv7-a -mfloat-abi=softfp -mfpu=vfp -mfpu=neon -fpic -marm -O2 -fno-strict-aliasing -Wall -o repro_ko repro.cpp The command line to build a correct program is : arm-linux-androideabi-g++ -mandroid -march=armv7-a -mfloat-abi=softfp -mfpu=vfp -mfpu=neon -fpic -marm -O2 -fno-schedule-insns -fno-strict-aliasing -Wall -o repro_ok repro.cpp I am aware that the test case is quite convoluted but this is because we use some kind of "universal" 128b vector type that autoconverts to and from other Neon types (not all ARM compilers have -flax-vector-conversions). Still, both program should output the same results. The body of the failing function is pasted below (prolog and epilog omitted): Correct code (-O2 -fno-schedule-insns): vmovd19, d20 @ v8qi vmovd21, d18 @ v8qi vmovd20, d19 @ v8qi vzip.8d19, d18 vzip.8d21, d20 vswpd18, d19 vswpd20, d21 vmovd21, d19 @ v8qi vmovd19, d20 @ v8qi vzip.8d21, d20 vzip.8d19, d18 vswpd20, d21 vswpd18, d19 vmovl.s8q10, d21 vmovl.s8q9, d19 vsub.i16q9, q9, q8 vsub.i16q8, q10, q8 vadd.i16q8, q9, q8 vst1.64{d16-d17}, [r0:128] Incorrect code (-O2): vmovd19, d20 @ v8qi vmovd22, d18 @ v8qi vmovd21, d20 @ v8qi vzip.8d19, d18 vzip.8d22, d21 vswpd18, d19 vmovd20, d22 @ v8qi vmovd21, d18 @ v8qi vzip.8d22, d19 vzip.8d21, d20 vmovl.s8q9, d22 vswpd20, d21 vsub.i16q9, q9, q8 vmovl.s8q10, d21 vsub.i16q8, q10, q8 vadd.i16q8, q9, q8 vst1.64{d16-d17}, [r0:128] I have attached a build.sh script that builds the two versions (OK and KO) of the output programs. These programs need to be run on any Android ARMV7 target. This probably happens with linux builds of gcc as well. I did some register flow tracing to give formal expressions of what ends up in the return value (well, just before the vsub/vsub/vadd actually). This is in the attached bug_gcc.txt file (which should be read with hard tabs, tab length set to 30 or something in order for the formatting to work). I don't know if this is related to bug 54300 (which by the way is still "unconfirmed" although I confirmed it occurring even with -fno-strict-aliasing, do I need to provide more info on this one?)
[Bug target/54300] [4.7/4.8 Regression] Erroneous optimization causes wrong Neon data management
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54300 --- Comment #4 from Eric Batut 2012-10-25 12:56:33 UTC --- I did the test with -fno-strict-aliasing and the exact same problem occur. Do I need to provide more information on this issue for it to move to the "Confirmed" state? Best Regards, Eric
[Bug target/54300] [4.7 Regression] Erroneous optimization causes wrong Neon data management
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54300 Eric Batut changed: What|Removed |Added Known to work|4.8.0 | Known to fail||4.8.0 --- Comment #8 from Eric Batut --- This still happens with the gcc 4.8 that was released in the Android NDK r9. I moved 4.8.0 from the "Known to work" field to the "Known to fail" field.
[Bug target/51968] New: gcc trunk (ARM) ICEs in final_scan_insn in final.c:2716, with "could not split insn" error msg
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51968 Bug #: 51968 Summary: gcc trunk (ARM) ICEs in final_scan_insn in final.c:2716, with "could not split insn" error msg Classification: Unclassified Product: gcc Version: 4.7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassig...@gcc.gnu.org ReportedBy: eric.ba...@allegorithmic.com Created attachment 26433 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=26433 Preprocessed file that triggers the ICE Trunk gcc compiled with Android's build-gcc.sh ICEs on the attached preprocessed file. The actual error message is: ../../../engine/src/filters/cpu/shader/jpeg_simd.cpp:753:1: error: could not split insn (insn 2104 4054 4050 (set (reg:V16QI 103 d20 [orig:846 D.35902 ] [846]) (vec_concat:V16QI (reg:V8QI 103 d20 [2317]) (reg:V8QI 105 d21 [2318]))) /home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/lib/gcc/arm-linux-androideabi/4.7.0/include/arm_neon.h:5694 1348 {neon_vcombinev8qi} (nil)) ../../../engine/src/filters/cpu/shader/jpeg_simd.cpp:753:1: internal compiler error: in final_scan_insn, at final.c:2716 This is a regression caused by commit 183051, and breaks a lot of Neon code in our codebase :) cc1plus command: cc1plus -fpreprocessed jpeg_simd.ii -quiet -dumpbase jpeg_simd.cpp -mandroid -mbionic -march=armv7-a -mcpu=cortex-a9 -mfloat-abi=softfp -mfpu=vfp -mfpu=neon -marm -mtls-dialect=gnu -auxbase-strip tmp/obj/android-g++/release/cpushader_neon/jpeg_simd.o -O2 -O2 -Wno-unused-function -Wno-psabi -Werror=implicit-function-declaration -Wall -Wextra -Wno-strict-aliasing -Wno-unused -Wno-switch -Wno-comment -version -fpic -flax-vector-conversions -fdata-sections -ffunction-sections -fno-short-enums -fno-exceptions -fno-rtti -fvisibility=hidden -fvisibility-inlines-hidden -fno-strict-aliasing -fPIC -o jpeg_simd.s gcc -v -save-temps yields: Using built-in specs. COLLECT_GCC=/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/bin/arm-linux-androideabi-g++ COLLECT_LTO_WRAPPER=/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/libexec/gcc/arm-linux-androideabi/4.7.0/lto-wrapper Target: arm-linux-androideabi Configured with: /home/eb/android-ndk-r6/src/build/../gcc/gcc-4.7.0/configure --prefix=/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86 --target=arm-linux-androideabi --host=i386-linux-gnu --build=i386-linux-gnu --with-gnu-as --with-gnu-ld --enable-languages=c,c++ --with-gmp=/tmp/ndk-eb/build/toolchain/temp-install --with-mpfr=/tmp/ndk-eb/build/toolchain/temp-install --with-mpc=/tmp/ndk-eb/build/toolchain/temp-install --disable-libssp --enable-threads --disable-nls --disable-libmudflap --disable-libgomp --disable-libstdc__-v3 --disable-sjlj-exceptions --disable-shared --disable-tls --with-float=soft --with-fpu=vfp --with-arch=armv5te --enable-target-optspace --enable-initfini-array --disable-nls --prefix=/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86 --with-sysroot=/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/sysroot --with-binutils-version=2.21.53 --with-mpfr-version=3.0.1 --with-gmp-version=5.0.2 --with-gcc-version=4.7.0 --with-gdb-version=6.6 --with-mpc-version=0.9 --with-arch=armv5te --enable-libstdc__-v3 --program-transform-name='s,^,arm-linux-androideabi-,' Thread model: posix gcc version 4.7.0 20120123 (experimental) (GCC) COLLECT_GCC_OPTIONS='-v' '-save-temps' '-c' '-mandroid' '-mbionic' '-Wno-unused-function' '-Wno-psabi' '-march=armv7-a' '-mcpu=cortex-a9' '-mfloat-abi=softfp' '-mfpu=vfp' '-fpic' '-Werror=implicit-function-declaration' '-flax-vector-conversions' '-fdata-sections' '-ffunction-sections' '-fno-short-enums' '-D' 'ANDROID' '-D' '__ARM_ARCH_5__' '-D' '__ARM_ARCH_5T__' '-D' '__ARM_ARCH_5E__' '-D' '__ARM_ARCH_5TE__' '-D' '__ARM_NEON__' '-D' 'S_IWRITE=0200' '-D' 'HAVE_USR_INCLUDE_MALLOC_H' '-D' 'MADV_FREE=5' '-D' 'LINUX' '-D' 'PAGE_SIZE=0x400' '-D' 'HAVE_PTHREAD_MUTEX_TIMEDLOCK' '-D' 'HAVE_SYS_SEM_H' '-D' 'WEBPLUG=0' '-D' 'GAMERELEASE=1' '-D' 'DEBUGMODE=0' '-D' 'UNITY_RELEASE=1' '-D' 'ENABLE_PROFILER=0' '-D' 'ANDROID' '-D' 'OS_ANDROID' '-D' '_STLP_HAS_WCHAR_T' '-D' 'BOOST_NO_CWCHAR' '-D' 'ALG_DEBUG_SIMPLEOUTPUT' '-D' 'QT_NO_QWS_TRANSFORMED' '-fno-exceptions' '-fno-rtti' '-fvisibility=hidden' '-fvisibility-inlines-hidden' '-mfpu=neon' '-O2' '-marm' '-O2' '-fno-strict-aliasing' '-fPIC' '-D' '_REENTRANT' '-Wall' '-Wextra' '-Wno-strict-aliasing' '-Wno-unused' '-Wno-switch' '-Wno-comment' '-D' 'ALG_MAIN_CPU_ON' '-D' 'FX_PRJ_RDXM' '-D' 'FX_PRJ_DXTENC' '-D' 'FX_PRJ_PVRENC' '-D' 'FX_PRJ_ETC' '-D' 'emit=' '-D' 'NDEBUG' '-D' 'ALG_ISA_NEON' '-D' 'FX_ARCH_CURRENT_NEON' '-D' 'FX_ARCH_ON_NEON' '-D' 'ALG_DEBUG_SIMPLEOUTPUT' '-I' '/usr/lib/qt4/mkspecs/android-g++' '-I'
[Bug target/51968] gcc trunk (ARM) ICEs in final_scan_insn in final.c:2716, with "could not split insn" error msg
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51968 --- Comment #1 from Eric Batut 2012-01-23 17:36:50 UTC --- Adding Richard Henderson, who committed rev 183051.
[Bug target/51968] gcc trunk (ARM) ICEs in final_scan_insn in final.c:2716, with "could not split insn" error msg
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51968 --- Comment #5 from Eric Batut 2012-01-24 11:12:23 UTC --- (In reply to comment #3) > Created attachment 26436 [details] > proposed patch > > I'll run this through a cross-build first, but I expect this will fix it. This patch makes gcc trunk no longer crash with our Neon files. Building right now to test for code correctness, although judging from the patch the functionality of the code should not be impacted. Thanks Richard !
[Bug target/51968] gcc trunk (ARM) ICEs in final_scan_insn in final.c:2716, with "could not split insn" error msg
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51968 --- Comment #6 from Eric Batut 2012-01-24 11:17:33 UTC --- Our Neon codebase (lots of image processing filters) produce correct results with the patch applied to the latest trunk rev.
[Bug target/51980] New: ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51980 Bug #: 51980 Summary: ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq Classification: Unclassified Product: gcc Version: 4.7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassig...@gcc.gnu.org ReportedBy: eric.ba...@allegorithmic.com Created attachment 26442 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=26442 Minimal repro case (C file) When using UZP/ZIP/TRN Neon intrinsics, gcc-trunk generates a whole lot of stack operations (and associated stack alignment operations) even if everything can purely be done using Neon registers. Compiler used is GCC trunk, rev 183468, compiled with Android's build-gcc.sh (arm-linux-androideabi). Command line is: arm-linux-androideabi-g++ -c -march=armv7-a -mcpu=cortex-a9 -mfloat-abi=hard -mfpu=vfp -flax-vector-conversions -mfpu=neon -O2 -o test.s test.c -S Generated assembly code for attached C file is: _Z13sqrlen4D_16u817__simd128_uint8_tS_: vabd.u8q1, q0, q1 stmfdsp!, {r4, fp} <= Unnecessary addfp, sp, #4 <= Unnecessary subsp, sp, #48 <= Unnecessary addr3, sp, #15 <= Unnecessary vmull.u8q0, d2, d2 bicr3, r3, #15 <= Unnecessary vmull.u8q8, d3, d3 vuzp.32q0, q8 vstmiar3, {d0-d1} <= Unnecessary, caused by vuzp.32 vstrd16, [r3, #16] <= Unnecessary, caused by vuzp.32 vstrd17, [r3, #24] <= Unnecessary, caused by vuzp.32 vpaddl.u16q0, q0 vpadal.u16q0, q8 subsp, fp, #4 <= Unnecessary ldmfdsp!, {r4, fp} <= Unnecessary bxlr As no stack operation is needed in this function, ideally the following should be generated instead: _Z13sqrlen4D_16u817__simd128_uint8_tS_: vabd.u8q1, q0, q1 vmull.u8q0, d2, d2 vmull.u8q8, d3, d3 vuzp.32q0, q8 vpaddl.u16q0, q0 vpadal.u16q0, q8 bxlr This makes even tight Neon functions written with intrinsics much larger and slower than necessary, and makes it very hard to write performance-oriented code with intrinsics in arm-gcc. gcc -v yields: Using built-in specs. COLLECT_GCC=/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/bin/arm-linux-androideabi-g++ COLLECT_LTO_WRAPPER=/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/libexec/gcc/arm-linux-androideabi/4.7.0/lto-wrapper Target: arm-linux-androideabi Configured with: /home/eb/android-ndk-r6/src/build/../gcc/gcc-4.7.0/configure --prefix=/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86 --target=arm-linux-androideabi --host=i386-linux-gnu --build=i386-linux-gnu --with-gnu-as --with-gnu-ld --enable-languages=c,c++ --with-gmp=/tmp/ndk-eb/build/toolchain/temp-install --with-mpfr=/tmp/ndk-eb/build/toolchain/temp-install --with-mpc=/tmp/ndk-eb/build/toolchain/temp-install --disable-libssp --enable-threads --disable-nls --disable-libmudflap --disable-libgomp --disable-libstdc__-v3 --disable-sjlj-exceptions --disable-shared --disable-tls --with-float=soft --with-fpu=vfp --with-arch=armv5te --enable-target-optspace --enable-initfini-array --disable-nls --prefix=/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86 --with-sysroot=/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/sysroot --with-binutils-version=2.21.53 --with-mpfr-version=3.0.1 --with-gmp-version=5.0.2 --with-gcc-version=4.7.0 --with-gdb-version=6.6 --with-mpc-version=0.9 --with-arch=armv5te --enable-libstdc__-v3 --program-transform-name='s,^,arm-linux-androideabi-,' Thread model: posix gcc version 4.7.0 20120124 (experimental) (GCC) COLLECT_GCC_OPTIONS='-c' '-march=armv7-a' '-mcpu=cortex-a9' '-mfloat-abi=hard' '-mfpu=vfp' '-flax-vector-conversions' '-mfpu=neon' '-O2' '-o' 'test.s' '-S' '-v' '-mtls-dialect=gnu' /home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/libexec/gcc/arm-linux-androideabi/4.7.0/cc1plus -quiet -v -imultilib armv7-a -D_GNU_SOURCE test.c -mbionic -fPIC -quiet -dumpbase test.c -march=armv7-a -mcpu=cortex-a9 -mfloat-abi=hard -mfpu=vfp -mfpu=neon -mtls-dialect=gnu -auxbase-strip test.s -O2 -version -flax-vector-conversions -o test.s -fno-exceptions -fno-rtti GNU C++ (GCC) version 4.7.0 20120124 (experimental) (arm-linux-androideabi) compiled by GNU C version 4.6.0 20110603 (Red Hat 4.6.0-10), GMP version 5.0.2, MPFR version 3.0.1, MPC version 0.9 GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 ignoring nonexistent directory "/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/lib/gcc/arm-linux-androideabi/4.7.0/../../../../arm-linux-androideabi/include/c++/4.7.0" ignorin
[Bug target/51968] gcc trunk (ARM) ICEs in final_scan_insn in final.c:2716, with "could not split insn" error msg
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51968 --- Comment #9 from Eric Batut 2012-01-25 09:43:01 UTC --- (In reply to comment #8) > Fixed. Great, many thanks !
[Bug target/48941] [arm gcc] NEON: Stack pointer operations performed even tho stack is not accessed at all in function.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48941 Eric Batut changed: What|Removed |Added CC||eric.batut at allegorithmic ||dot com, Greta.Yorsh at arm ||dot com --- Comment #8 from Eric Batut 2012-01-27 14:11:34 UTC --- Any chance of seeing the work on this restart ? I found this bug while looking for something that would help (I raised bug 51980 for the same kind of issue, still seen on trunk), but the patch attached to this bug does not solve the issue for code that is rich with zip/uzp/trn intrinsics. This is a major limitation of arm-gcc with respect to performance-critical Neon code in my opinion.
[Bug target/51980] ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51980 Eric Batut changed: What|Removed |Added CC||ramana at gcc dot gnu.org, ||rsandifo at gcc dot gnu.org --- Comment #2 from Eric Batut 2012-01-27 14:13:08 UTC --- Adding the usual suspects for ARM-related bugs.
[Bug target/55073] Wrong Neon code generation at -O2 caused by -fschedule-insns
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55073 --- Comment #3 from Eric Batut 2012-11-30 09:52:19 UTC --- Hello Richard I updated my working copy of gcc to rev 193943, rebuilt the compiler, rebuilt the testcase I originally attached to this bug report, and I am still getting different results depending on whether the -fno-schedule-insns option is used or not. Furthermore, neither of the two sets of return values I get match the ones you use in your test case for the failure detection. On what HW and with which compile options did you test this and come to these values? I'd be glad to run more tests if you need me to. Shall I reopen this bug? Best Regards, Eric
[Bug target/55073] Wrong Neon code generation at -O2 caused by -fschedule-insns
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55073 --- Comment #5 from Eric Batut 2012-11-30 10:14:00 UTC --- Since this comes from several hours of stripping down a texture generation engine to the single function that provided different results, I must admit I have no idea what the correct return values are. What worries me more is that I still get two different set of values on a Tegra3 (Cortex-A9) after rebuilding pr55073.C with the build.sh script in the attached zipfile (and replacing the if-abort by printfs) : root@android:/data # ./repro_ko ./repro_ko [0] = 0002 [1] = 0002 [2] = FFFBFFFB [3] = FFFBFFFB root@android:/data # ./repro_ok ./repro_ok [0] = 00030003 [1] = 00030003 [2] = FFFAFFFA [3] = FFFAFFFA Were you directly targeting A15 when building the testcase? Can this enable/disable some optimization codepaths that would explain why we have different results ?
[Bug target/55073] Wrong Neon code generation at -O2 caused by -fschedule-insns
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55073 --- Comment #6 from Eric Batut 2012-11-30 11:05:18 UTC --- Building the test case at O1 (which I tend to trust slightly more than O2 in the present case) gives the same set of values than the previous "OK" case : root@android:/data # ./repro_O1 ./repro_O1 [0] = 00030003 [1] = 00030003 [2] = FFFAFFFA [3] = FFFAFFFA I hereby declare these values to be the reference values.
[Bug target/55073] Wrong Neon code generation at -O2 caused by -fschedule-insns
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55073 --- Comment #7 from Eric Batut 2012-11-30 13:21:13 UTC --- Richard, I apologize, building at -O0 (and handrolling an assembly routine to do the same computation) proves me wrong : your values are the correct ones, and -O1 is also broken. The reference values are indeed [0] = [1] = [2] = FFFCFFFC [3] = FFFCFFFC And I still have no idea why my build of your patch does not produce these results on my HW. Could you please attach a binary build of the repro case so that I can test it on my HW? In the meantime I'll keep looking. Best Regards, Eric
[Bug target/55073] Wrong Neon code generation at -O2 caused by -fschedule-insns
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55073 --- Comment #9 from Eric Batut 2012-11-30 14:29:11 UTC --- Richard, I double-checked (update + rebuild), the end of my assembly files correctly states : .ident"GCC: (GNU) 4.8.0 20121130 (experimental)" Since -O1 is also broken on my end, I tried to isolate the option that would fix -O1. It turns out that "-O1" and "-O1 -fno-dse" give identical function bodies, only the epilog differs: - "-O1" gives vmovl.s8q9, d19 <= d19 (wrong) vsub.i16q9, q9, q8 vmovl.s8q10, d21 <= d21 (wrong) vsub.i16q8, q10, q8 vadd.i16q8, q9, q8 vst1.64{d16-d17}, [r0:128] - "-O1 -fno-dse" gives vmovl.s8q9, d18 <= d18 (correct) instead of d19 (wrong) vsub.i16q9, q9, q8 vmovl.s8q10, d20 <= d20 (correct) instead of d21 (wrong) vsub.i16q8, q10, q8 vadd.i16q8, q9, q8 vst1.64{d16-d17}, [r0:128] The function body above the previous code snippets is the same for both builds. The only difference is the widening of d19 and d21 in the wrong case, and of d18 and d20 in the correct case. The compiler I am using to build arm-linux-androideabi-gcc is an Apple build of gcc 4.2.1 : ~/android-ndk-r8b: gcc -v Using built-in specs. Target: i686-apple-darwin11 Configured with: /private/var/tmp/llvmgcc42/llvmgcc42-2336.1~22/src/configure --disable-checking --enable-werror --prefix=/Developer/usr/llvm-gcc-4.2 --mandir=/share/man --enable-languages=c,objc,c++,obj-c++ --program-prefix=llvm- --program-transform-name=/^[cg][^.-]*$/s/$/-4.2/ --with-slibdir=/usr/lib --build=i686-apple-darwin11 --enable-llvm=/private/var/tmp/llvmgcc42/llvmgcc42-2336.1~22/dst-llvmCore/Developer/usr/local --program-prefix=i686-apple-darwin11- --host=x86_64-apple-darwin11 --target=i686-apple-darwin11 --with-gxx-include-dir=/usr/include/c++/4.2.1 Thread model: posix gcc version 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.1.00) Do you think rebuilding arm-linux-androideabi-gcc on Linux to check if the generated code is the same is worth the time or is there no chance whatsoever that it can make a difference ?
[Bug target/55073] Wrong Neon code generation at -O2 caused by -fschedule-insns
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55073 --- Comment #12 from Eric Batut 2012-11-30 15:16:47 UTC --- (In reply to comment #11) > Something else to check is that you are using the version of arm_neon.h that > comes with gcc-4.8. This file has to match the version of GCC it was designed > for. The arm_neon.h file is properly copied to the right place bu the build script, and inserting a #error in there did cause my build to fail, so I think I have the right one. I am setting up my Linux VM to rebuild arm-linux-androideabi-gcc to check if it behaves the same as the Mac-built version does. Thanks a lot for your help in sorting this out. Best Regards, Eric
[Bug target/55073] Wrong Neon code generation at -O2 caused by -fschedule-insns
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55073 --- Comment #13 from Eric Batut 2012-11-30 16:16:36 UTC --- Richard, After a clean checkout of gcc's trunk and of the Android NDK r8b package and tools, I rebuilt arm-linux-androideabi-gcc on a Ubuntu VM using gcc 4.5.1. I then rebuilt my testcase with "-O1" and "-O1 -fno-dse", and the same difference is there: d19 and d21 are used as sources for the two vmovl.s8 instead of d18 and d20. I attach a new tarball with the (very slightly) modified source I am using, the two assembly files that are generated, and the two binary files (they should run on any Android device, no fancy stuff here). Could you please use your local build of gcc to generate the same assembly files so that we can compare the function bodies? Best Regards, Eric
[Bug target/55073] Wrong Neon code generation at -O2 caused by -fschedule-insns
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55073 --- Comment #14 from Eric Batut 2012-11-30 16:20:10 UTC --- Created attachment 28840 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=28840 Second repro case with source code, build script, assembly files and binary files
[Bug target/54300] [4.7 Regression] Erroneous optimization causes wrong Neon data management
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54300 --- Comment #6 from Eric Batut 2013-01-11 10:42:04 UTC --- The patch by Christophe Lyon in the linked email was applied on trunk by Ramana at rev 188951 (June 25th 2012), but gcc-trunk still fails as of today (rev 195102). The vswp instruction that causes d19 to be 0 before being used afterwards is still generated. Don't know about 4.7.x, though. So unless my test is wrong (same command line and same test case as in the original bug report), 4.8.0 should not be in the "Known to work" field. Did you try with trunk ? (In reply to comment #5) > I could not reproduce this in a modified 4.7.0 which has patches from the > trunk. > I think it was fixed by > http://gcc.gnu.org/ml/gcc-patches/2012-05/msg01732.html > .