[Bug c/47617] New: SSE rounding mode works -g, not -O3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47617 Summary: SSE rounding mode works -g, not -O3 Product: gcc Version: 4.3.2 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassig...@gcc.gnu.org ReportedBy: cck0...@yahoo.com Created attachment 23252 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=23252 generated .i file Hi folks, I'm working with SSE intrinsics and think I have a rounding problem. When I try to change modes with _MM_SET_ROUNDING_MODE, I see different results when compiled "-g", but not "-O3". What am I missing? thanks! Using built-in specs. Target: i386-redhat-linux Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl =http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-che cking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-languages=c, c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk --disable-dssi --enable-plugin --with-java-home=/us r/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre --enable-libgcj-multifile --enable-java-maintainer-mode --with-ecj-jar =/usr/share/java/eclipse-ecj.jar --disable-libjava-multilib --with-cpu=generic --build=i386-redhat-linux Thread model: posix gcc version 4.3.2 20081105 (Red Hat 4.3.2-7) (GCC) COLLECT_GCC_OPTIONS='-O3' '-Wall' '-o' 'round' '-msse' '-mmmx' '-save-temps' '-v' '-mtune=generic' /usr/libexec/gcc/i386-redhat-linux/4.3.2/cc1 -E -quiet -v round.c -msse -mmmx -mtune=generic -Wall -O3 -fp ch-preprocess -o round.i ignoring nonexistent directory "/usr/lib/gcc/i386-redhat-linux/4.3.2/include-fixed" ignoring nonexistent directory "/usr/lib/gcc/i386-redhat-linux/4.3.2/../../../../i386-redhat-linux/include" #include "..." search starts here: #include <...> search starts here: /usr/local/include /usr/lib/gcc/i386-redhat-linux/4.3.2/include /usr/include End of search list. COLLECT_GCC_OPTIONS='-O3' '-Wall' '-o' 'round' '-msse' '-mmmx' '-save-temps' '-v' '-mtune=generic' /usr/libexec/gcc/i386-redhat-linux/4.3.2/cc1 -fpreprocessed round.i -quiet -dumpbase round.c -msse -mmmx - mtune=generic -auxbase round -O3 -Wall -version -o round.s GNU C (GCC) version 4.3.2 20081105 (Red Hat 4.3.2-7) (i386-redhat-linux) compiled by GNU C version 4.3.2 20081105 (Red Hat 4.3.2-7), GMP version 4.2.2, MPFR version 2.3.2. GGC heuristics: --param ggc-min-expand=55 --param ggc-min-heapsize=48000 Compiler executable checksum: 3bee52601079f736b7b63b762646f4ba round.c: In function ‘test_sse1_feature’: round.c:150: warning: unused variable ‘sig’ round.c:150: warning: unused variable ‘extensions’ round.c:149: warning: ‘edx’ may be used uninitialized in this function COLLECT_GCC_OPTIONS='-O3' '-Wall' '-o' 'round' '-msse' '-mmmx' '-save-temps' '-v' '-mtune=generic' as -V -Qy -o round.o round.s GNU assembler version 2.18.50.0.9 (i386-redhat-linux) using BFD version version 2.18.50.0.9-8.fc10 20080822 COMPILER_PATH=/usr/libexec/gcc/i386-redhat-linux/4.3.2/:/usr/libexec/gcc/i386-redhat-linux/4.3.2/:/usr/libe xec/gcc/i386-redhat-linux/:/usr/lib/gcc/i386-redhat-linux/4.3.2/:/usr/lib/gcc/i386-redhat-linux/:/usr/libex ec/gcc/i386-redhat-linux/4.3.2/:/usr/libexec/gcc/i386-redhat-linux/:/usr/lib/gcc/i386-redhat-linux/4.3.2/:/ usr/lib/gcc/i386-redhat-linux/ LIBRARY_PATH=/usr/lib/gcc/i386-redhat-linux/4.3.2/:/usr/lib/gcc/i386-redhat-linux/4.3.2/:/usr/lib/gcc/i386- redhat-linux/4.3.2/../../../:/lib/:/usr/lib/ COLLECT_GCC_OPTIONS='-O3' '-Wall' '-o' 'round' '-msse' '-mmmx' '-save-temps' '-v' '-mtune=generic' /usr/libexec/gcc/i386-redhat-linux/4.3.2/collect2 --eh-frame-hdr --build-id -m elf_i386 --hash-style=gnu - dynamic-linker /lib/ld-linux.so.2 -o round /usr/lib/gcc/i386-redhat-linux/4.3.2/../../../crt1.o /usr/lib/gc c/i386-redhat-linux/4.3.2/../../../crti.o /usr/lib/gcc/i386-redhat-linux/4.3.2/crtbegin.o -L/usr/lib/gcc/i3 86-redhat-linux/4.3.2 -L/usr/lib/gcc/i386-redhat-linux/4.3.2 -L/usr/lib/gcc/i386-redhat-linux/4.3.2/../../. . round.o -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /usr/lib/gc c/i386-redhat-linux/4.3.2/crtend.o /usr/lib/gcc/i386-redhat-linux/4.3.2/../../../crtn.o
[Bug c/47617] SSE rounding mode works -g, not -O3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47617 --- Comment #2 from cck0011 at yahoo dot com 2011-02-06 16:25:55 UTC --- (In reply to comment #1) > I think you need to use -frounding-math. GCC assumes by default the rounding > mode is round-to-nearest. See > http://gcc.gnu.org/onlinedocs/gcc-4.5.2/gcc/Optimize-Options.html#index-frounding_002dmath-819 > . Hi Andrew, thanks for writing. I tried -frounding-math and the result is still the same. Adding/removing -mfpmath=sse doesn't change it either. Is there any additional information I can provide? Thanks!
[Bug target/47617] SSE rounding mode works -g, not -O3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47617 --- Comment #4 from cck0011 at yahoo dot com 2011-02-08 01:37:58 UTC --- Created attachment 23273 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=23273 source file Here's the source code. Rename to round.c.
[Bug target/47617] SSE rounding mode works -g, not -O3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47617 --- Comment #5 from cck0011 at yahoo dot com 2011-02-08 01:46:18 UTC --- (In reply to comment #4) > Created attachment 23273 [details] > source file > > Here's the source code. Rename to round.c. Hi Richard, here's the source code. Rename to round.c. I think I must be doing something wrong here. Someone would have noticed that results from _mm_cvtps_pi16 weren't changing when _MM_SET_ROUNDING_MODE() was called. -) I'm puzzled by it working with -g, but not with -O3. Any additional information I can provide? Thanks!
[Bug target/47617] SSE rounding mode works -g, not -O3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47617 --- Comment #8 from cck0011 at yahoo dot com 2011-02-09 02:08:23 UTC --- Hi folks, First, thanks for working on this. Second, I read the link and I _think_ I understand it. Let me paraphrase it back to you and you can tell me if I've got the point: There is an optimizer that extracts common expressions and evaluates them once instead of every time they occur. (What's the name of that so I can call it by the right name?) In my code it finds the expression: dest = _mm_cvtps_pi16(source); Several times. Since it doesn't see source changing, this expression only gets evaluated once. Now, the change to rounding mode that happens with _MM_SET_ROUNDING_MODE(...) isn't detected as something that would change the value of the _mm_cvtps_pi16(...) expression, so the optimization is not removed. Recognizing that change to rounding mode and reacting to it is what's at the heart of bug 34678, and that's why this is a duplicate. The work-arounds are: 1)insert 'asm("":"+X"(source));' before changing rounding mode to make the compiler re-evaluate expressions that use source. 2) do _MM_SET_ROUNDING_MODE(...) before any divisions or integer conversions that might get optimized out. The scope of the optimization is a function body and any inlined code. So do _MM_SET_ROUNDING_MODE early within that scope. Is my understanding correct? A few more questions: Will this bug exist on non-X86 processors? What does the 'asm("":"+X"(source));' expression do ? Will this syntax work for non-X86 processors? To be correct, should I compile with -frounding-math ? Thanks!
[Bug target/47617] SSE rounding mode works -g, not -O3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47617 --- Comment #9 from cck0011 at yahoo dot com 2011-02-12 18:20:03 UTC --- Hi folks, I tried the asm("":"+X"(source)); as shown. I get an error: inconsistent operand constraints in an ‘asm’. The info pages make it look like this should work, but the Inline Assembly Howto doesn't mention the X constraint. If the compiler should agree with the info pages, I'm doing something wrong. What am I missing? thanks
[Bug c/47825] New: SSE bitwise operations on floats work -g, fail -O3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47825 Summary: SSE bitwise operations on floats work -g, fail -O3 Product: gcc Version: 4.3.2 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassig...@gcc.gnu.org ReportedBy: cck0...@yahoo.com Created attachment 23413 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=23413 .i file Hi folks, I'm trying the vec_sel() example using SSE (bitwise and, or, andnot) to select bits from two vectors to build a third. It works under -g, but fails under -O3. What am I missing? thanks. gcc -v Using built-in specs. Target: i386-redhat-linux Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk --disable-dssi --enable-plugin --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre --enable-libgcj-multifile --enable-java-maintainer-mode --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --disable-libjava-multilib --with-cpu=generic --build=i386-redhat-linux Thread model: posix gcc version 4.3.2 20081105 (Red Hat 4.3.2-7) (GCC) works under: cc -g -o setel setel.c -msse -mmmx -pedantic -Wall fails under: cc -O3 -o setel setel.c -msse -mmmx -pedantic -Wall .i file attached
[Bug c/47825] SSE bitwise operations on floats work -g, fail -O3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47825 --- Comment #2 from cck0011 at yahoo dot com 2011-02-20 17:35:56 UTC --- Created attachment 23414 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=23414 source file with _may_alias_ attribute added Hi Richard, thanks for replying. I added _may_alias_; no change in results. What am I missing? Source attached. Thanks!
[Bug c/47825] SSE bitwise operations on floats work -g, fail -O3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47825 --- Comment #4 from cck0011 at yahoo dot com 2011-02-20 18:22:46 UTC --- (In reply to comment #3) > Probably the same on maskarray. You should also try to update GCC to > 4.3.5. Hi Richard, added to maskarray as well: no change in results. Adding option -fno-strict-aliasing fixes it as well. I'm confused: since _mm_store_ps explicitly copies into floatarray why is aliasing an issue? thanks!
[Bug target/47825] SSE bitwise operations on floats work -g, fail -O3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47825 --- Comment #7 from cck0011 at yahoo dot com 2011-02-22 00:54:17 UTC --- (In reply to comment #5) > The issue is that maskarray is initialized as array of ints but then you > load it as array of floats, the scheduler re-orders those so you see > a load from uninitialized stack: > [looks puzzled; reads info page for gcc -fstrict-aliasing for the N + 1th time; sudden look of comprehension dawns...] "taking a pointer and casting gives undefined result..." Thank you Richard. That makes sense now given the context... -) Thanks folks!
[Bug c/48036] New: unexpected byte swap in sse _mm_cvtpu16_ps in 64-bit 4.5.1
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48036 Summary: unexpected byte swap in sse _mm_cvtpu16_ps in 64-bit 4.5.1 Product: gcc Version: 4.5.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassig...@gcc.gnu.org ReportedBy: cck0...@yahoo.com Created attachment 23586 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=23586 .i file Hi folks, I'm getting an unexpected byte swap in _mm_cvtpu16_ps. Version 32-bit 4.3.2 has the behavior I was expecting. I'm posting a program that takes a vector of chars, widens them to vectors of shorts, and then converts the shorts to vectors of floats. I was expecting the order of the floats in the vectors to match the order of the chars in the starting vector. My understanding is wrong or a bug crept in after 4.3.2. What am I missing? Thanks! Using built-in specs. COLLECT_GCC=/usr/bin/cc COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/4.5.1/lto-wrapper Target: x86_64-redhat-linux Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --enable-languages=c,c++,objc,obj-c++,java,fortran,ada,lto --enable-plugin --enable-java-awt=gtk --disable-dssi --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre --enable-libgcj-multifile --enable-java-maintainer-mode --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --disable-libjava-multilib --with-ppl --with-cloog --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux Thread model: posix gcc version 4.5.1 20100924 (Red Hat 4.5.1-4) (GCC) COLLECT_GCC_OPTIONS='-v' '-save-temps' '-g' '-o' 'char_to_float' '-I.' '-msse' '-mmmx' '-mtune=generic' '-march=x86-64' /usr/libexec/gcc/x86_64-redhat-linux/4.5.1/cc1 -E -quiet -v -I. char_to_float.c -msse -mmmx -mtune=generic -march=x86-64 -g -fworking-directory -fpch-preprocess -o char_to_float.i ignoring nonexistent directory "/usr/lib/gcc/x86_64-redhat-linux/4.5.1/include-fixed" ignoring nonexistent directory "/usr/lib/gcc/x86_64-redhat-linux/4.5.1/../../../../x86_64-redhat-linux/include" #include "..." search starts here: #include <...> search starts here: . /usr/local/include /usr/lib/gcc/x86_64-redhat-linux/4.5.1/include /usr/include End of search list. COLLECT_GCC_OPTIONS='-v' '-save-temps' '-g' '-o' 'char_to_float' '-I.' '-msse' '-mmmx' '-mtune=generic' '-march=x86-64' /usr/libexec/gcc/x86_64-redhat-linux/4.5.1/cc1 -fpreprocessed char_to_float.i -quiet -dumpbase char_to_float.c -msse -mmmx -mtune=generic -march=x86-64 -auxbase char_to_float -g -version -o char_to_float.s GNU C (GCC) version 4.5.1 20100924 (Red Hat 4.5.1-4) (x86_64-redhat-linux) compiled by GNU C version 4.5.1 20100924 (Red Hat 4.5.1-4), GMP version 4.3.1, MPFR version 2.4.2, MPC version 0.8.1 GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072 GNU C (GCC) version 4.5.1 20100924 (Red Hat 4.5.1-4) (x86_64-redhat-linux) compiled by GNU C version 4.5.1 20100924 (Red Hat 4.5.1-4), GMP version 4.3.1, MPFR version 2.4.2, MPC version 0.8.1 GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072 Compiler executable checksum: ea394b69293dd698607206e8e43d607e COLLECT_GCC_OPTIONS='-v' '-save-temps' '-g' '-o' 'char_to_float' '-I.' '-msse' '-mmmx' '-mtune=generic' '-march=x86-64' as -V -Qy --64 -o char_to_float.o char_to_float.s GNU assembler version 2.20.51.0.7 (x86_64-redhat-linux) using BFD version version 2.20.51.0.7-5.fc14 20100318 COMPILER_PATH=/usr/libexec/gcc/x86_64-redhat-linux/4.5.1/:/usr/libexec/gcc/x86_64-redhat-linux/4.5.1/:/usr/libexec/gcc/x86_64-redhat-linux/:/usr/lib/gcc/x86_64-redhat-linux/4.5.1/:/usr/lib/gcc/x86_64-redhat-linux/ LIBRARY_PATH=/usr/lib/gcc/x86_64-redhat-linux/4.5.1/:/usr/lib/gcc/x86_64-redhat-linux/4.5.1/../../../../lib64/:/lib/../lib64/:/usr/lib/../lib64/:/usr/lib/gcc/x86_64-redhat-linux/4.5.1/../../../:/lib/:/usr/lib/ COLLECT_GCC_OPTIONS='-v' '-save-temps' '-g' '-o' 'char_to_float' '-I.' '-msse' '-mmmx' '-mtune=generic' '-march=x86-64' /usr/libexec/gcc/x86_64-redhat-linux/4.5.1/collect2 --build-id --no-add-needed --eh-frame-hdr -m elf_x86_64 --hash-style=gnu -dynamic-linker /lib64/ld-linux-x86-64.so.2 -o char_to_float /usr/lib/gcc/x86_64-redhat-linux/4.5.1/../../../../lib64/crt1.o /usr/lib/gcc/x86_64-redhat-linux/4.5.1/../../../../lib64/crti.o /usr/lib/gcc/x86_64-redhat-linux/4.5.1/crtbegin.o -L/usr/lib/gcc/x86_64-redhat-linux/4.5.1 -L/usr/lib/gcc/x86_64-redhat-linux/4.5.1/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/usr/lib/gcc/x86_64-redhat-linux/4.5.1/../../.. char_to_float.o -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /usr/lib/gcc/x8