http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60960
Bug ID: 60960 Summary: Wrong result when a vector variable is divided by a literal constant Product: gcc Version: 4.8.2 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: uranus at tinlans dot org > gcc -v Using built-in specs. COLLECT_GCC=/usr/x86_64-pc-linux-gnu/gcc-bin/4.8.2/gcc COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-pc-linux-gnu/4.8.2/lto-wrapper Target: x86_64-pc-linux-gnu Configured with: /var/tmp/portage/sys-devel/gcc-4.8.2/work/gcc-4.8.2/configure --prefix=/usr --bindir=/usr/x86_64-pc-linux-gnu/gcc-bin/4.8.2 --includedir=/usr/lib/gcc/x86_64-pc-linux-gnu/4.8.2/include --datadir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.8.2 --mandir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.8.2/man --infodir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.8.2/info --with-gxx-include-dir=/usr/lib/gcc/x86_64-pc-linux-gnu/4.8.2/include/g++-v4 --host=x86_64-pc-linux-gnu --build=x86_64-pc-linux-gnu --disable-altivec --disable-fixed-point --without-cloog --disable-lto --enable-nls --without-included-gettext --with-system-zlib --enable-obsolete --disable-werror --enable-secureplt --enable-multilib --with-multilib-list=m32,m64 --enable-libmudflap --disable-libssp --enable-libgomp --with-python-dir=/share/gcc-data/x86_64-pc-linux-gnu/4.8.2/python --enable-checking=release --enable-java-awt=gtk --enable-libstdcxx-time --enable-objc-gc --enable-languages=c,c++,java,objc,obj-c++,fortran --enable-shared --enable-threads=posix --enable-__cxa_atexit --enable-clocale=gnu --enable-targets=all --with-bugurl=http://bugs.gentoo.org/ --with-pkgversion='Gentoo 4.8.2 p1.0, pie-0.5.8' Thread model: posix gcc version 4.8.2 (Gentoo 4.8.2 p1.0, pie-0.5.8) --------------- Example code (2 files to avoid the interference from the automatic inline feature): /* vec.c */ typedef unsigned char v4qi __attribute__ ((vector_size (4))); v4qi f1 (v4qi v); v4qi f2 (v4qi v); v4qi f3 (v4qi x, v4qi y); void print (v4qi v); int main () { v4qi x = { 5, 5, 5, 5 }; v4qi y = { 2, 2, 2, 2 }; v4qi z; z = f1 (x); print (z); z = f2 (x); print (z); z = f3 (x, y); print (z); return 0; } /* vec-impl.c */ #include <stdio.h> typedef unsigned char v4qi __attribute__ ((vector_size (4))); v4qi f1 (v4qi v) { return v / 2; } v4qi f2 (v4qi v) { return v / (v4qi) { 2, 2, 2, 2 }; } v4qi f3 (v4qi x, v4qi y) { return x / y; } void print (v4qi v) { printf ("%d %d %d %d\n", v[3], v[2], v[1], v[0]); } --------------- Command line: > gcc -O3 -c vec.c > gcc -O3 -c vec-impl.c > gcc -O3 vec.o vec-impl.o -o test > ./test Output: 2 130 130 130 2 130 130 130 2 2 2 2 --------------- Although the target doesn't support this operation, I remember GCC is able to expand it to proper scalar operations. I expected all of the 3 outputs should be identical, but the results returned by f1 () and f2 () are wrong. The whole vector is treated as an integer variable and right shifted by 1 in f1 () and f2 (). By using the command "gcc -O3 -fdump-tree-all -da -S vec-impl.c", we can see the assembly code of f1 () and f2 () is wrong: f1: .LFB24: .cfi_startproc movl %edi, %eax shrl %eax ret .cfi_endproc .LFE24: .size f1, .-f1 .p2align 4,,15 .globl f2 .type f2, @function I don't show the assembly code of f2 () because is the same. Here is the RTL expansion result of the function f1 () in the file vec-impl.c.166r.expand: (note 4 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK) (insn 2 4 3 2 (set (reg/v:SI 61 [ v ]) (reg:SI 5 di [ v ])) vec-impl.c:7 -1 (nil)) (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG) (insn 6 3 7 2 (parallel [ (set (reg:SI 62 [ D.2425 ]) (lshiftrt:SI (reg/v:SI 61 [ v ]) (const_int 1 [0x1]))) (clobber (reg:CC 17 flags)) ]) vec-impl.c:8 -1 (nil)) (insn 7 6 11 2 (set (reg:SI 60 [ <retval> ]) (reg:SI 62 [ D.2425 ])) vec-impl.c:8 -1 (nil)) (insn 11 7 14 2 (set (reg/i:SI 0 ax) (reg:SI 60 [ <retval> ])) vec-impl.c:9 -1 (nil)) (insn 14 11 0 2 (use (reg/i:SI 0 ax)) vec-impl.c:9 -1 (nil)) And here is its corresponding GIMPLE: f1 (v4qi v) { vector(4) unsigned char _4; ;; basic block 2, loop depth 0 ;; pred: ENTRY _4 = v_1(D) >> 1; return _4; ;; succ: EXIT } I'm not sure whether it's correct or not; anyway, I can make sure the transformation was done by the veclower pass. We could see it was a vector operation in the file vec-impl.c.121t.loopdone: ;; Function f1 (f1, funcdef_no=24, decl_uid=2380, cgraph_uid=24) f1 (v4qi v) { v4qi _2; <bb 2>: _2 = v_1(D) / { 2, 2, 2, 2 }; return _2; } And it was altered in the file vec-impl.c.122t.veclower21: ;; Function f1 (f1, funcdef_no=24, decl_uid=2380, cgraph_uid=24) f1 (v4qi v) { v4qi _2; vector(4) unsigned char _4; <bb 2>: _4 = v_1(D) >> 1; _2 = _4; return _2; } On the contrary, the transformed GIMPLE of f3 () is the one I expected: ;; Function f3 (f3, funcdef_no=26, decl_uid=2388, cgraph_uid=26) f3 (v4qi x, v4qi y) { v4qi _3; unsigned char _5; unsigned char _6; unsigned char _7; unsigned char _8; unsigned char _9; unsigned char _10; unsigned char _11; unsigned char _12; unsigned char _13; unsigned char _14; unsigned char _15; unsigned char _16; <bb 2>: _5 = BIT_FIELD_REF <x_1(D), 8, 0>; _6 = BIT_FIELD_REF <y_2(D), 8, 0>; _7 = _5 / _6; _8 = BIT_FIELD_REF <x_1(D), 8, 8>; _9 = BIT_FIELD_REF <y_2(D), 8, 8>; _10 = _8 / _9; _11 = BIT_FIELD_REF <x_1(D), 8, 16>; _12 = BIT_FIELD_REF <y_2(D), 8, 16>; _13 = _11 / _12; _14 = BIT_FIELD_REF <x_1(D), 8, 24>; _15 = BIT_FIELD_REF <y_2(D), 8, 24>; _16 = _14 / _15; _3 = {_7, _10, _13, _16}; return _3; }