[Bug target/54252] New: [Neon] Bad alignment code generated for Neon loads

2012-08-14 Thread eric.batut at allegorithmic dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54252

 Bug #: 54252
   Summary: [Neon] Bad alignment code generated for Neon loads
Classification: Unclassified
   Product: gcc
   Version: 4.8.0
Status: UNCONFIRMED
  Severity: major
  Priority: P3
 Component: target
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: eric.ba...@allegorithmic.com


Created attachment 28009
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=28009
Small repro case

Using gcc trunk at rev 190381, compiled with the Android NDK r8b build-gcc.sh
script (so an arm-linux-androideabi target) and the command line below, the
attached repro case does not compile, and spits the following error messages:

Erics-Mac:src batut$
/Users/batut/android-ndk-r8b/toolchains/arm-linux-androideabi-4.8.0/prebuilt/darwin-x86/bin/arm-linux-androideabi-gcc
-mfloat-abi=hard -mfpu=vfp -mfpu=neon -marm -O1 -c test.c
/var/folders/nr/l7qlwr295379gn7tqyv61jx8gn/T//cc9BgV0B.s: Assembler
messages:
/var/folders/nr/l7qlwr295379gn7tqyv61jx8gn/T//cc9BgV0B.s:29: Error: bad
alignment -- `vld1.32 {d16},[r3:128]!'
/var/folders/nr/l7qlwr295379gn7tqyv61jx8gn/T//cc9BgV0B.s:32: Error: bad
alignment -- `vld1.32 {d7},[r2:128]'

The assembly code generated is:
algNEON:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
ldr r3, .L2
.LPIC0:
add r3, pc, r3
ldr r2, .L2+4
ldr r2, [r3, r2]
mov r3, r2
vld1.32 {d16}, [r3:128]!  <= Offending load
vld1.32 {d6}, [r3:64]
add r2, r2, #16
vld1.32 {d7}, [r2:128]<= Offending load
vadd.f32d16, d0, d16
vmov.f32d0, #0.0  @ v2sf
vmla.f32d0, d16, d6[0]
vmls.f32d0, d16, d6[1]
vmla.f32d0, d16, d7[0]
vmls.f32d0, d16, d7[1]
bx  lr

This does not happen at -O0.
This also happens with gcc 4.7.1.


arm-linux-androideabi-gcc -v
Using built-in specs.
COLLECT_GCC=/Users/batut/android-ndk-r8b/toolchains/arm-linux-androideabi-4.8.0/prebuilt/darwin-x86/bin/arm-linux-androideabi-gcc
COLLECT_LTO_WRAPPER=/Users/batut/android-ndk-r8b/toolchains/arm-linux-androideabi-4.8.0/prebuilt/darwin-x86/bin/../libexec/gcc/arm-linux-androideabi/4.8.0/lto-wrapper
Target: arm-linux-androideabi
Configured with:
/Users/batut/android-ndk-r8b/src/build/../gcc/gcc-4.8.0/configure
--prefix=/tmp/ndk-batut/build/toolchain/prefix --target=arm-linux-androideabi
--host=x86_64-apple-darwin --build=x86_64-apple-darwin --with-gnu-as
--with-gnu-ld --enable-languages=c,c++
--with-gmp=/tmp/ndk-batut/build/toolchain/temp-install
--with-mpfr=/tmp/ndk-batut/build/toolchain/temp-install
--with-mpc=/tmp/ndk-batut/build/toolchain/temp-install --without-ppl
--without-cloog --disable-libssp --enable-threads --disable-nls
--disable-libmudflap --disable-libgomp --disable-libstdc__-v3
--disable-sjlj-exceptions --disable-shared --disable-tls --disable-libitm
--with-float=soft --with-fpu=vfp --with-arch=armv5te --enable-target-optspace
--enable-initfini-array --disable-nls
--prefix=/tmp/ndk-batut/build/toolchain/prefix
--with-sysroot=/tmp/ndk-batut/build/toolchain/prefix/sysroot
--with-binutils-version=2.22 --with-mpfr-version=2.4.1 --with-mpc-version=0.8.1
--with-gmp-version=5.0.5 --with-gcc-version=4.8.0 --with-gdb-version=7.3.x
--disable-bootstrap --disable-libquadmath --disable-plugin --with-arch=armv5te
--program-transform-name='s&^&arm-linux-androideabi-&'
Thread model: posix
gcc version 4.8.0 20120814 (experimental) (GCC)


[Bug target/54300] New: [4.7/4.8 Regression] Erroneous optimization causes wrong Neon data management

2012-08-17 Thread eric.batut at allegorithmic dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54300

 Bug #: 54300
   Summary: [4.7/4.8 Regression] Erroneous optimization causes
wrong Neon data management
Classification: Unclassified
   Product: gcc
   Version: 4.8.0
Status: UNCONFIRMED
  Severity: major
  Priority: P3
 Component: target
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: eric.ba...@allegorithmic.com


Created attachment 28044
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=28044
Small repro case

Using gcc trunk at rev 190381, compiled with the Android NDK r8b build-gcc.sh
script (so an arm-linux-androideabi target) and the command line below, the
attached repro case generates wrong code:

arm-linux-androideabi-g++ -march=armv7-a -mfloat-abi=softfp -mfpu=vfp
-mfpu=neon -marm -O2 test.cpp -S -o test.s

The core loop is pasted below:
for(unsigned int sv=0 ; sv!=dv0 ; sv=(sv+s1v)&smask_v)
{
int32x4_t s;
s = vmovl_s16(vget_low_s16(_loadlo_8i16((cv8i16*)_Inp, sv )));
c = vaddq_s32(c, s);
}

8 bytes are fetched from "_Inp (in bytes) + sv", then sign-extended from 4
16bits values to 4 32bits values, then accumulated in "c".

The generated assembly code for the loop is:
.L3:
addr4, r0, ip
vmov.i32d18, #0  @ v4hi   <= d18 is full of 0's
addip, ip, r2
vld1.16{d19}, [r4:64]<= d19 holds useful data
andip, ip, r5
cmpr3, ip
vswpd18, d19  <= d19 is now full of 0's
vmovl.s16q9, d19   <= d19 (full of 0's) gets expanded
vadd.i32q8, q8, q9<= q9 is always zero when accumulated
bne.L3

When using "-O1" or "-O2 -fno-gcse", correct code is generated:
.L3:
addr4, r0, ip
addip, ip, r2
andip, ip, r5
vld1.16{d18}, [r4:64]<= d18 holds useful data
cmpr3, ip
vmovl.s16q9, d18   <= d18 is sign-extended
vadd.i32q8, q8, q9<= q9 is accumulated
bne.L3

This also happens with gcc 4.7.1, but not with gcc 4.6

Also, in the loadlo_8i16 function, if we replace the call to zero_64 by the
proper vdup_n_s16(0), then correct code is generated at -O2.

The (stripped down in the repro case) _v16u8_ and _v8u8_ structures are the way
we implemented some kind of compiler-performed polymorphism for Neon variables,
since not all ARM compilers have -flax-vector-conversions.


[Bug target/54300] [4.7/4.8 Regression] Erroneous optimization causes wrong Neon data management

2012-08-20 Thread eric.batut at allegorithmic dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54300

--- Comment #2 from Eric Batut  2012-08-20 
16:11:35 UTC ---
(In reply to comment #1)

Hi Richard

Using "-O2 -fno-strict-aliasing" generates the exact same (incorrect) code.

> Your testcase is quite convoluted but it looks you may be violating C
> type-based aliasing rules.  Thus, try -fno-strict-aliasing.


[Bug target/54252] [4.7/4.8 Regression] Bad alignment code generated for Neon loads

2012-08-30 Thread eric.batut at allegorithmic dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54252

--- Comment #8 from Eric Batut  2012-08-30 
15:52:20 UTC ---
The original bug instance is fixed on trunk (rev 190803).
I had what I think is another instance of the same bug, where the error message
is "alignment of array elements is greater than element size", and this is also
fixed by rev 190803.

(In reply to comment #7)
> Fixed now I believe on trunk.


[Bug target/55073] New: Wrong Neon code generation at -O2 caused by -fschedule-insns

2012-10-25 Thread eric.batut at allegorithmic dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55073



 Bug #: 55073

   Summary: Wrong Neon code generation at -O2 caused by

-fschedule-insns

Classification: Unclassified

   Product: gcc

   Version: 4.8.0

Status: UNCONFIRMED

  Severity: normal

  Priority: P3

 Component: target

AssignedTo: unassig...@gcc.gnu.org

ReportedBy: eric.ba...@allegorithmic.com





Created attachment 28528

  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=28528

Zipfile with repro case, build script, disassembly listings and register flow

analysis



Using gcc trunk at rev 192800, compiled with the Android NDK's build-gcc.sh

script (arm-linux-androideabi target).



Compiling the attached repro case at -O2 yields incorrect results. Correct

results are generated for -O2 -fno-schedule-insns.



The command line to build an incorrect program is :

arm-linux-androideabi-g++ -mandroid -march=armv7-a -mfloat-abi=softfp -mfpu=vfp

-mfpu=neon -fpic -marm -O2 -fno-strict-aliasing -Wall -o repro_ko repro.cpp



The command line to build a correct program is :

arm-linux-androideabi-g++ -mandroid -march=armv7-a -mfloat-abi=softfp -mfpu=vfp

-mfpu=neon -fpic -marm -O2 -fno-schedule-insns -fno-strict-aliasing -Wall -o

repro_ok repro.cpp



I am aware that the test case is quite convoluted but this is because we use

some kind of "universal" 128b vector type that autoconverts to and from other

Neon types (not all ARM compilers have -flax-vector-conversions). Still, both

program should output the same results.



The body of the failing function is pasted below (prolog and epilog omitted):

Correct code (-O2 -fno-schedule-insns):

vmovd19, d20  @ v8qi

vmovd21, d18  @ v8qi

vmovd20, d19  @ v8qi

vzip.8d19, d18

vzip.8d21, d20

vswpd18, d19

vswpd20, d21

vmovd21, d19  @ v8qi

vmovd19, d20  @ v8qi

vzip.8d21, d20

vzip.8d19, d18

vswpd20, d21

vswpd18, d19

vmovl.s8q10, d21

vmovl.s8q9, d19

vsub.i16q9, q9, q8

vsub.i16q8, q10, q8

vadd.i16q8, q9, q8

vst1.64{d16-d17}, [r0:128]



Incorrect code (-O2):

vmovd19, d20  @ v8qi

vmovd22, d18  @ v8qi

vmovd21, d20  @ v8qi

vzip.8d19, d18

vzip.8d22, d21

vswpd18, d19

vmovd20, d22  @ v8qi

vmovd21, d18  @ v8qi

vzip.8d22, d19

vzip.8d21, d20

vmovl.s8q9, d22

vswpd20, d21

vsub.i16q9, q9, q8

vmovl.s8q10, d21

vsub.i16q8, q10, q8

vadd.i16q8, q9, q8

vst1.64{d16-d17}, [r0:128]



I have attached a build.sh script that builds the two versions (OK and KO) of

the output programs. These programs need to be run on any Android ARMV7 target.

This probably happens with linux builds of gcc as well.



I did some register flow tracing to give formal expressions of what ends up in

the return value (well, just before the vsub/vsub/vadd actually). This is in

the attached bug_gcc.txt file (which should be read with hard tabs, tab length

set to 30 or something in order for the formatting to work).



I don't know if this is related to bug 54300 (which by the way is still

"unconfirmed" although I confirmed it occurring even with -fno-strict-aliasing,

do I need to provide more info on this one?)


[Bug target/54300] [4.7/4.8 Regression] Erroneous optimization causes wrong Neon data management

2012-10-25 Thread eric.batut at allegorithmic dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54300



--- Comment #4 from Eric Batut  2012-10-25 
12:56:33 UTC ---

I did the test with -fno-strict-aliasing and the exact same problem occur.

Do I need to provide more information on this issue for it to move to the

"Confirmed" state?



Best Regards,

Eric


[Bug target/54300] [4.7 Regression] Erroneous optimization causes wrong Neon data management

2013-08-20 Thread eric.batut at allegorithmic dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54300

Eric Batut  changed:

   What|Removed |Added

  Known to work|4.8.0   |
  Known to fail||4.8.0

--- Comment #8 from Eric Batut  ---
This still happens with the gcc 4.8 that was released in the Android NDK r9.
I moved 4.8.0 from the "Known to work" field to the "Known to fail" field.


[Bug target/51968] New: gcc trunk (ARM) ICEs in final_scan_insn in final.c:2716, with "could not split insn" error msg

2012-01-23 Thread eric.batut at allegorithmic dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51968

 Bug #: 51968
   Summary: gcc trunk (ARM) ICEs in final_scan_insn in
final.c:2716, with "could not split insn" error msg
Classification: Unclassified
   Product: gcc
   Version: 4.7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: eric.ba...@allegorithmic.com


Created attachment 26433
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=26433
Preprocessed file that triggers the ICE

Trunk gcc compiled with Android's build-gcc.sh ICEs on the attached
preprocessed file.

The actual error message is:
../../../engine/src/filters/cpu/shader/jpeg_simd.cpp:753:1: error: could not
split insn
(insn 2104 4054 4050 (set (reg:V16QI 103 d20 [orig:846 D.35902 ] [846])
(vec_concat:V16QI (reg:V8QI 103 d20 [2317])
(reg:V8QI 105 d21 [2318])))
/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/lib/gcc/arm-linux-androideabi/4.7.0/include/arm_neon.h:5694
1348 {neon_vcombinev8qi}
 (nil))
../../../engine/src/filters/cpu/shader/jpeg_simd.cpp:753:1: internal compiler
error: in final_scan_insn, at final.c:2716


This is a regression caused by commit 183051, and breaks a lot of Neon code in
our codebase :)


cc1plus command:

cc1plus -fpreprocessed jpeg_simd.ii -quiet -dumpbase jpeg_simd.cpp -mandroid
-mbionic -march=armv7-a -mcpu=cortex-a9 -mfloat-abi=softfp -mfpu=vfp -mfpu=neon
-marm -mtls-dialect=gnu -auxbase-strip
tmp/obj/android-g++/release/cpushader_neon/jpeg_simd.o -O2 -O2
-Wno-unused-function -Wno-psabi -Werror=implicit-function-declaration -Wall
-Wextra -Wno-strict-aliasing -Wno-unused -Wno-switch -Wno-comment -version
-fpic -flax-vector-conversions -fdata-sections -ffunction-sections
-fno-short-enums -fno-exceptions -fno-rtti -fvisibility=hidden
-fvisibility-inlines-hidden -fno-strict-aliasing -fPIC -o jpeg_simd.s


gcc -v -save-temps yields:

Using built-in specs.
COLLECT_GCC=/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/bin/arm-linux-androideabi-g++
COLLECT_LTO_WRAPPER=/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/libexec/gcc/arm-linux-androideabi/4.7.0/lto-wrapper
Target: arm-linux-androideabi
Configured with: /home/eb/android-ndk-r6/src/build/../gcc/gcc-4.7.0/configure
--prefix=/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86
--target=arm-linux-androideabi --host=i386-linux-gnu --build=i386-linux-gnu
--with-gnu-as --with-gnu-ld --enable-languages=c,c++
--with-gmp=/tmp/ndk-eb/build/toolchain/temp-install
--with-mpfr=/tmp/ndk-eb/build/toolchain/temp-install
--with-mpc=/tmp/ndk-eb/build/toolchain/temp-install --disable-libssp
--enable-threads --disable-nls --disable-libmudflap --disable-libgomp
--disable-libstdc__-v3 --disable-sjlj-exceptions --disable-shared --disable-tls
--with-float=soft --with-fpu=vfp --with-arch=armv5te --enable-target-optspace
--enable-initfini-array --disable-nls
--prefix=/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86
--with-sysroot=/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/sysroot
--with-binutils-version=2.21.53 --with-mpfr-version=3.0.1
--with-gmp-version=5.0.2 --with-gcc-version=4.7.0 --with-gdb-version=6.6
--with-mpc-version=0.9 --with-arch=armv5te --enable-libstdc__-v3
--program-transform-name='s,^,arm-linux-androideabi-,'
Thread model: posix
gcc version 4.7.0 20120123 (experimental) (GCC) 
COLLECT_GCC_OPTIONS='-v' '-save-temps' '-c' '-mandroid' '-mbionic'
'-Wno-unused-function' '-Wno-psabi' '-march=armv7-a' '-mcpu=cortex-a9'
'-mfloat-abi=softfp' '-mfpu=vfp' '-fpic'
'-Werror=implicit-function-declaration' '-flax-vector-conversions'
'-fdata-sections' '-ffunction-sections' '-fno-short-enums' '-D' 'ANDROID' '-D'
'__ARM_ARCH_5__' '-D' '__ARM_ARCH_5T__' '-D' '__ARM_ARCH_5E__' '-D'
'__ARM_ARCH_5TE__' '-D' '__ARM_NEON__' '-D' 'S_IWRITE=0200' '-D'
'HAVE_USR_INCLUDE_MALLOC_H' '-D' 'MADV_FREE=5' '-D' 'LINUX' '-D'
'PAGE_SIZE=0x400' '-D' 'HAVE_PTHREAD_MUTEX_TIMEDLOCK' '-D' 'HAVE_SYS_SEM_H'
'-D' 'WEBPLUG=0' '-D' 'GAMERELEASE=1' '-D' 'DEBUGMODE=0' '-D' 'UNITY_RELEASE=1'
'-D' 'ENABLE_PROFILER=0' '-D' 'ANDROID' '-D' 'OS_ANDROID' '-D'
'_STLP_HAS_WCHAR_T' '-D' 'BOOST_NO_CWCHAR' '-D' 'ALG_DEBUG_SIMPLEOUTPUT' '-D'
'QT_NO_QWS_TRANSFORMED' '-fno-exceptions' '-fno-rtti' '-fvisibility=hidden'
'-fvisibility-inlines-hidden' '-mfpu=neon' '-O2' '-marm' '-O2'
'-fno-strict-aliasing' '-fPIC' '-D' '_REENTRANT' '-Wall' '-Wextra'
'-Wno-strict-aliasing' '-Wno-unused' '-Wno-switch' '-Wno-comment' '-D'
'ALG_MAIN_CPU_ON' '-D' 'FX_PRJ_RDXM' '-D' 'FX_PRJ_DXTENC' '-D' 'FX_PRJ_PVRENC'
'-D' 'FX_PRJ_ETC' '-D' 'emit=' '-D' 'NDEBUG' '-D' 'ALG_ISA_NEON' '-D'
'FX_ARCH_CURRENT_NEON' '-D' 'FX_ARCH_ON_NEON' '-D' 'ALG_DEBUG_SIMPLEOUTPUT'
'-I' '/usr/lib/qt4/mkspecs/android-g++' '-I' 

[Bug target/51968] gcc trunk (ARM) ICEs in final_scan_insn in final.c:2716, with "could not split insn" error msg

2012-01-23 Thread eric.batut at allegorithmic dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51968

--- Comment #1 from Eric Batut  2012-01-23 
17:36:50 UTC ---
Adding Richard Henderson, who committed rev 183051.


[Bug target/51968] gcc trunk (ARM) ICEs in final_scan_insn in final.c:2716, with "could not split insn" error msg

2012-01-24 Thread eric.batut at allegorithmic dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51968

--- Comment #5 from Eric Batut  2012-01-24 
11:12:23 UTC ---
(In reply to comment #3)
> Created attachment 26436 [details]
> proposed patch
> 
> I'll run this through a cross-build first, but I expect this will fix it.

This patch makes gcc trunk no longer crash with our Neon files. Building right
now to test for code correctness, although judging from the patch the
functionality of the code should not be impacted.

Thanks Richard !


[Bug target/51968] gcc trunk (ARM) ICEs in final_scan_insn in final.c:2716, with "could not split insn" error msg

2012-01-24 Thread eric.batut at allegorithmic dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51968

--- Comment #6 from Eric Batut  2012-01-24 
11:17:33 UTC ---
Our Neon codebase (lots of image processing filters) produce correct results
with the patch applied to the latest trunk rev.


[Bug target/51980] New: ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq

2012-01-24 Thread eric.batut at allegorithmic dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51980

 Bug #: 51980
   Summary: ARM - Neon code polluted by useless stores to the
stack with vuzpq / vzipq / vtrnq
Classification: Unclassified
   Product: gcc
   Version: 4.7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: eric.ba...@allegorithmic.com


Created attachment 26442
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=26442
Minimal repro case (C file)

When using UZP/ZIP/TRN Neon intrinsics, gcc-trunk generates a whole lot of
stack operations (and associated stack alignment operations) even if everything
can purely be done using Neon registers. 

Compiler used is GCC trunk, rev 183468, compiled with Android's build-gcc.sh
(arm-linux-androideabi).

Command line is:
arm-linux-androideabi-g++ -c -march=armv7-a -mcpu=cortex-a9 -mfloat-abi=hard
-mfpu=vfp -flax-vector-conversions -mfpu=neon -O2 -o test.s test.c -S

Generated assembly code for attached C file is:
_Z13sqrlen4D_16u817__simd128_uint8_tS_:
vabd.u8q1, q0, q1
stmfdsp!, {r4, fp}   <= Unnecessary
addfp, sp, #4  <= Unnecessary
subsp, sp, #48 <= Unnecessary
addr3, sp, #15 <= Unnecessary
vmull.u8q0, d2, d2
bicr3, r3, #15 <= Unnecessary
vmull.u8q8, d3, d3
vuzp.32q0, q8
vstmiar3, {d0-d1} <= Unnecessary, caused by vuzp.32
vstrd16, [r3, #16]  <= Unnecessary, caused by vuzp.32
vstrd17, [r3, #24]  <= Unnecessary, caused by vuzp.32
vpaddl.u16q0, q0
vpadal.u16q0, q8
subsp, fp, #4  <= Unnecessary
ldmfdsp!, {r4, fp}   <= Unnecessary
bxlr

As no stack operation is needed in this function, ideally the following should
be generated instead:
_Z13sqrlen4D_16u817__simd128_uint8_tS_:
vabd.u8q1, q0, q1
vmull.u8q0, d2, d2
vmull.u8q8, d3, d3
vuzp.32q0, q8
vpaddl.u16q0, q0
vpadal.u16q0, q8
bxlr

This makes even tight Neon functions written with intrinsics much larger and
slower than necessary, and makes it very hard to write performance-oriented
code with intrinsics in arm-gcc.

gcc -v yields:
Using built-in specs.
COLLECT_GCC=/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/bin/arm-linux-androideabi-g++
COLLECT_LTO_WRAPPER=/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/libexec/gcc/arm-linux-androideabi/4.7.0/lto-wrapper
Target: arm-linux-androideabi
Configured with: /home/eb/android-ndk-r6/src/build/../gcc/gcc-4.7.0/configure
--prefix=/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86
--target=arm-linux-androideabi --host=i386-linux-gnu --build=i386-linux-gnu
--with-gnu-as --with-gnu-ld --enable-languages=c,c++
--with-gmp=/tmp/ndk-eb/build/toolchain/temp-install
--with-mpfr=/tmp/ndk-eb/build/toolchain/temp-install
--with-mpc=/tmp/ndk-eb/build/toolchain/temp-install --disable-libssp
--enable-threads --disable-nls --disable-libmudflap --disable-libgomp
--disable-libstdc__-v3 --disable-sjlj-exceptions --disable-shared --disable-tls
--with-float=soft --with-fpu=vfp --with-arch=armv5te --enable-target-optspace
--enable-initfini-array --disable-nls
--prefix=/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86
--with-sysroot=/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/sysroot
--with-binutils-version=2.21.53 --with-mpfr-version=3.0.1
--with-gmp-version=5.0.2 --with-gcc-version=4.7.0 --with-gdb-version=6.6
--with-mpc-version=0.9 --with-arch=armv5te --enable-libstdc__-v3
--program-transform-name='s,^,arm-linux-androideabi-,'
Thread model: posix
gcc version 4.7.0 20120124 (experimental) (GCC) 
COLLECT_GCC_OPTIONS='-c' '-march=armv7-a' '-mcpu=cortex-a9' '-mfloat-abi=hard'
'-mfpu=vfp' '-flax-vector-conversions' '-mfpu=neon' '-O2' '-o' 'test.s' '-S'
'-v' '-mtls-dialect=gnu'

/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/libexec/gcc/arm-linux-androideabi/4.7.0/cc1plus
-quiet -v -imultilib armv7-a -D_GNU_SOURCE test.c -mbionic -fPIC -quiet
-dumpbase test.c -march=armv7-a -mcpu=cortex-a9 -mfloat-abi=hard -mfpu=vfp
-mfpu=neon -mtls-dialect=gnu -auxbase-strip test.s -O2 -version
-flax-vector-conversions -o test.s -fno-exceptions -fno-rtti
GNU C++ (GCC) version 4.7.0 20120124 (experimental) (arm-linux-androideabi)
compiled by GNU C version 4.6.0 20110603 (Red Hat 4.6.0-10), GMP version
5.0.2, MPFR version 3.0.1, MPC version 0.9
GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
ignoring nonexistent directory
"/home/eb/android-ndk-r6/toolchains/arm-linux-androideabi-4.7.0/prebuilt/linux-x86/lib/gcc/arm-linux-androideabi/4.7.0/../../../../arm-linux-androideabi/include/c++/4.7.0"
ignorin

[Bug target/51968] gcc trunk (ARM) ICEs in final_scan_insn in final.c:2716, with "could not split insn" error msg

2012-01-25 Thread eric.batut at allegorithmic dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51968

--- Comment #9 from Eric Batut  2012-01-25 
09:43:01 UTC ---
(In reply to comment #8)
> Fixed.

Great, many thanks !


[Bug target/48941] [arm gcc] NEON: Stack pointer operations performed even tho stack is not accessed at all in function.

2012-01-27 Thread eric.batut at allegorithmic dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48941

Eric Batut  changed:

   What|Removed |Added

 CC||eric.batut at allegorithmic
   ||dot com, Greta.Yorsh at arm
   ||dot com

--- Comment #8 from Eric Batut  2012-01-27 
14:11:34 UTC ---
Any chance of seeing the work on this restart ?

I found this bug while looking for something that would help (I raised bug
51980 for the same kind of issue, still seen on trunk), but the patch attached
to this bug does not solve the issue for code that is rich with zip/uzp/trn
intrinsics.

This is a major limitation of arm-gcc with respect to performance-critical Neon
code in my opinion.


[Bug target/51980] ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq

2012-01-27 Thread eric.batut at allegorithmic dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51980

Eric Batut  changed:

   What|Removed |Added

 CC||ramana at gcc dot gnu.org,
   ||rsandifo at gcc dot gnu.org

--- Comment #2 from Eric Batut  2012-01-27 
14:13:08 UTC ---
Adding the usual suspects for ARM-related bugs.


[Bug target/55073] Wrong Neon code generation at -O2 caused by -fschedule-insns

2012-11-30 Thread eric.batut at allegorithmic dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55073



--- Comment #3 from Eric Batut  2012-11-30 
09:52:19 UTC ---

Hello Richard



I updated my working copy of gcc to rev 193943, rebuilt the compiler, rebuilt

the testcase I originally attached to this bug report, and I am still getting

different results depending on whether the -fno-schedule-insns option is used

or not. Furthermore, neither of the two sets of return values I get match the

ones you use in your test case for the failure detection. On what HW and with

which compile options did you test this and come to these values?



I'd be glad to run more tests if you need me to.



Shall I reopen this bug?



Best Regards,

Eric


[Bug target/55073] Wrong Neon code generation at -O2 caused by -fschedule-insns

2012-11-30 Thread eric.batut at allegorithmic dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55073



--- Comment #5 from Eric Batut  2012-11-30 
10:14:00 UTC ---

Since this comes from several hours of stripping down a texture generation

engine to the single function that provided different results, I must admit I

have no idea what the correct return values are.



What worries me more is that I still get two different set of values on a

Tegra3 (Cortex-A9) after rebuilding pr55073.C with the build.sh script in the

attached zipfile (and replacing the if-abort by printfs) :



root@android:/data # ./repro_ko

./repro_ko

[0] = 0002

[1] = 0002

[2] = FFFBFFFB

[3] = FFFBFFFB

root@android:/data # ./repro_ok

./repro_ok

[0] = 00030003

[1] = 00030003

[2] = FFFAFFFA

[3] = FFFAFFFA



Were you directly targeting A15 when building the testcase? Can this

enable/disable some optimization codepaths that would explain why we have

different results ?


[Bug target/55073] Wrong Neon code generation at -O2 caused by -fschedule-insns

2012-11-30 Thread eric.batut at allegorithmic dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55073



--- Comment #6 from Eric Batut  2012-11-30 
11:05:18 UTC ---

Building the test case at O1 (which I tend to trust slightly more than O2 in

the present case) gives the same set of values than the previous "OK" case :



root@android:/data # ./repro_O1

./repro_O1

[0] = 00030003

[1] = 00030003

[2] = FFFAFFFA

[3] = FFFAFFFA



I hereby declare these values to be the reference values.


[Bug target/55073] Wrong Neon code generation at -O2 caused by -fschedule-insns

2012-11-30 Thread eric.batut at allegorithmic dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55073



--- Comment #7 from Eric Batut  2012-11-30 
13:21:13 UTC ---

Richard,



I apologize, building at -O0 (and handrolling an assembly routine to do the

same computation) proves me wrong : your values are the correct ones, and -O1

is also broken.



The reference values are indeed

[0] = 

[1] = 

[2] = FFFCFFFC

[3] = FFFCFFFC



And I still have no idea why  my build of your patch does not produce these

results on my HW. Could you please attach a binary build of the repro case so

that I can test it on my HW? In the meantime I'll keep looking.



Best Regards,

Eric


[Bug target/55073] Wrong Neon code generation at -O2 caused by -fschedule-insns

2012-11-30 Thread eric.batut at allegorithmic dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55073



--- Comment #9 from Eric Batut  2012-11-30 
14:29:11 UTC ---

Richard, 



I double-checked (update + rebuild), the end of my assembly files correctly

states :

.ident"GCC: (GNU) 4.8.0 20121130 (experimental)"



Since -O1 is also broken on my end, I tried to isolate the option that would

fix -O1. It turns out that "-O1" and "-O1 -fno-dse" give identical function

bodies, only the epilog differs:

 - "-O1" gives

vmovl.s8q9, d19 <= d19 (wrong)

vsub.i16q9, q9, q8

vmovl.s8q10, d21 <= d21 (wrong)

vsub.i16q8, q10, q8

vadd.i16q8, q9, q8

vst1.64{d16-d17}, [r0:128]

 - "-O1 -fno-dse" gives

vmovl.s8q9, d18 <= d18 (correct) instead of d19 (wrong)

vsub.i16q9, q9, q8

vmovl.s8q10, d20 <= d20 (correct) instead of d21 (wrong)

vsub.i16q8, q10, q8

vadd.i16q8, q9, q8

vst1.64{d16-d17}, [r0:128]



The function body above the previous code snippets is the same for both builds.

The only difference is the widening of d19 and d21 in the wrong case, and of

d18 and d20 in the correct case.



The compiler I am using to build arm-linux-androideabi-gcc is an Apple build of

gcc 4.2.1 :



~/android-ndk-r8b: gcc -v

Using built-in specs.

Target: i686-apple-darwin11

Configured with: /private/var/tmp/llvmgcc42/llvmgcc42-2336.1~22/src/configure

--disable-checking --enable-werror --prefix=/Developer/usr/llvm-gcc-4.2

--mandir=/share/man --enable-languages=c,objc,c++,obj-c++

--program-prefix=llvm- --program-transform-name=/^[cg][^.-]*$/s/$/-4.2/

--with-slibdir=/usr/lib --build=i686-apple-darwin11

--enable-llvm=/private/var/tmp/llvmgcc42/llvmgcc42-2336.1~22/dst-llvmCore/Developer/usr/local

--program-prefix=i686-apple-darwin11- --host=x86_64-apple-darwin11

--target=i686-apple-darwin11 --with-gxx-include-dir=/usr/include/c++/4.2.1

Thread model: posix

gcc version 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.1.00)



Do you think rebuilding arm-linux-androideabi-gcc on Linux to check if the

generated code is the same is worth the time or is there no chance whatsoever

that it can make a difference ?


[Bug target/55073] Wrong Neon code generation at -O2 caused by -fschedule-insns

2012-11-30 Thread eric.batut at allegorithmic dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55073



--- Comment #12 from Eric Batut  
2012-11-30 15:16:47 UTC ---

(In reply to comment #11)

> Something else to check is that you are using the version of arm_neon.h that

> comes with gcc-4.8.  This file has to match the version of GCC it was designed

> for.





The arm_neon.h file is properly copied to the right place bu the build script,

and inserting a #error in there did cause my build to fail, so I think I have

the right one.



I am setting up my Linux VM to rebuild arm-linux-androideabi-gcc to check if it

behaves the same as the Mac-built version does.



Thanks a lot for your help in sorting this out.



Best Regards,

Eric


[Bug target/55073] Wrong Neon code generation at -O2 caused by -fschedule-insns

2012-11-30 Thread eric.batut at allegorithmic dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55073



--- Comment #13 from Eric Batut  
2012-11-30 16:16:36 UTC ---

Richard,



After a clean checkout of gcc's trunk and of the Android NDK r8b package and

tools, I rebuilt arm-linux-androideabi-gcc on a Ubuntu VM using gcc 4.5.1. I

then rebuilt my testcase with "-O1" and "-O1 -fno-dse", and the same difference

is there: d19 and d21 are used as sources for the two vmovl.s8 instead of d18

and d20.



I attach a new tarball with the (very slightly) modified source I am using, the

two assembly files that are generated, and the two binary files (they should

run on any Android device, no fancy stuff here). Could you please use your

local build of gcc to generate the same assembly files so that we can compare

the function bodies?



Best Regards,

Eric


[Bug target/55073] Wrong Neon code generation at -O2 caused by -fschedule-insns

2012-11-30 Thread eric.batut at allegorithmic dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55073



--- Comment #14 from Eric Batut  
2012-11-30 16:20:10 UTC ---

Created attachment 28840

  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=28840

Second repro case with source code, build script, assembly files and binary

files


[Bug target/54300] [4.7 Regression] Erroneous optimization causes wrong Neon data management

2013-01-11 Thread eric.batut at allegorithmic dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54300



--- Comment #6 from Eric Batut  2013-01-11 
10:42:04 UTC ---

The patch by Christophe Lyon in the linked email was applied on trunk by Ramana

at rev 188951 (June 25th 2012), but gcc-trunk still fails as of today (rev

195102). The vswp instruction that causes d19 to be 0 before being used

afterwards is still generated.



Don't know about 4.7.x, though.



So unless my test is wrong (same command line and same test case as in the

original bug report), 4.8.0 should not be in the "Known to work" field. Did you

try with trunk ?







(In reply to comment #5)

> I could not reproduce this in a modified 4.7.0 which has patches from the

> trunk.

> I think it was fixed by 
> http://gcc.gnu.org/ml/gcc-patches/2012-05/msg01732.html

> .