[Bug tree-optimization/57642] New: vectorizer not working with function templates
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57642 Bug ID: 57642 Summary: vectorizer not working with function templates Product: gcc Version: 4.8.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: yzhang1985 at gmail dot com Hi, the following simple loop doesn't vectorize in GCC 4.8.1, but does with 4.3.2. It does vectorize if I make DoIt a regular function instead of a templated function. #include #include #include #include #include #include class SqrtFunc { public: float operator()(float x) { return (((3.02f * x) + 1.5f) * x - 2.1f) * x + 1.5f; } }; template void DoIt(float *data, int size, Functor functor) { for (int i = 0; i < size; ++i) { data[i] = functor(data[i]); } } int main() { float data[2048]; SqrtFunc functor; DoIt(data, sizeof(data), functor); return 0; }
[Bug tree-optimization/57642] vectorizer not working with function templates
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57642 --- Comment #1 from Yale Zhang --- I would like to know if there's an easy work around for this.
[Bug tree-optimization/57642] vectorizer not working with function templates
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57642 Yale Zhang changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #2 from Yale Zhang --- Sorry, please close this. My loop was eliminated as dead code, thus no vectorization. I saw the message not enough data-refs for auto-vectorization, which made me think it wasn't being vectorized, but that's probably from somewhere else.
[Bug java/83647] add x86_64 Windows support to GCJ
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83647 --- Comment #2 from Yale Zhang --- (In reply to Andrew Pinski from comment #1) > GCC 6 is in regression only fixes due to it being a release branch. > > Won't fix as Java was removed from GCC 7. There are other open source Java > implementations including but not limited to OpenJDK. I was afraid of that, but I want to compile to native code and AFAIK, GCJ is the only one or if not, the only robust one. I think this change can be useful to others and shouldn't be lost just because GCC 6 is limited to regression fixes only. Is there a non-release branch that this can checked into?
[Bug java/83647] New: add x86_64 Windows support to GCJ
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83647 Bug ID: 83647 Summary: add x86_64 Windows support to GCJ Product: gcc Version: 6.4.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: java Assignee: unassigned at gcc dot gnu.org Reporter: yzhang1985 at gmail dot com Target Milestone: --- Created attachment 43002 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43002&action=edit The changes to natVMConsole.cc are only for MingW. Probably these changes don't need to be in GCC and can be downstream GCJ currently doesn't support x86_64 Windows, specifically x86_64-w64-mingw32. I have made a patch that supports it. I know GCJ has been removed from GCC 7, but I'm hoping this can still make it into GCC 6. The changes were mostly to change 32bit ints to pointer sized ints. Specifically, unsigned long -> uintptr_t jint -> jlong Changing jint -> jlong is probably not right because that would change 32bit builds. I wanted to change those jint to jsize and change jsize from int to intptr_t, but wasn't sure of the effects. The other big change was I had to replace boehm-gc with a newer version (7.2e, 7.2g crashes). GCC seems to have an out of date, custom version that doesn't support x86_64 windows, and that only builds a static lib. I didn't include that in the patch, but you can simply plant the newer version into the source code.
[Bug tree-optimization/80647] New: vectorized loop crashes from wrongly assuming 16 byte alignment
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80647 Bug ID: 80647 Summary: vectorized loop crashes from wrongly assuming 16 byte alignment Product: gcc Version: 6.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: yzhang1985 at gmail dot com Target Milestone: --- Created attachment 41328 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41328&action=edit compiling with -O3 will reproduce the crash I'm getting a crash for a function that extracts a sub region of an image in-place. I compile with gcc -O3, which vectorizes the inner most loop, while (twd--) { *pintdest++ = *pintsrc++; } ---assembly- movdqa (%r10,%rax,1),%xmm0 add$0x1,%ecx movups %xmm0,(%rdx,%rax,1) It crashes on movdqa because the address isn't aligned. It should be using unaligned vector loads like movdqu or lddqu instead. I tested it with GCC 4.8 which did vectorize the loop correctly. Starting with Nehalem, there is no penalty for using unaligned loads/stores if the vector doesn't span 2 cache lines, so why not always generate unaligned loads/stores? It used to be that the other advantage to exploit for aligned data was to fuse the vector load/store with another instruction, reducing machine code size. But even that alignment restriction for memory operands was relaxed starting with SandyBridge's VEX instructions.
[Bug tree-optimization/80647] vectorized loop crashes from wrongly assuming 16 byte alignment
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80647 --- Comment #2 from Yale Zhang --- Very interesting case. First, I didn't know unaligned loads were undefined behavior on x86. ICC 17 doesn't vectorize the loop probably because the destination and source of the memmove() alias. But apparently GCC knows how to vectorize memmove(). In this function, the destination always comes before the source, so it's trivial to vectorize. Vectorizing the case where destination > source is harder, and I wonder if GCC can do that. This is some legacy code from > 10 years ago. Manually vectorizing the memmove() was too smart for modern compilers. But the solution is simple. I'll just use the other simple, fallback implementation used on unknown platforms. It's still vectorizable though. thanks Andrew.
[Bug inline-asm/77756] New: cpuid
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77756 Bug ID: 77756 Summary: cpuid Product: gcc Version: 6.2.0 Status: UNCONFIRMED Severity: major Priority: P3 Component: inline-asm Assignee: unassigned at gcc dot gnu.org Reporter: yzhang1985 at gmail dot com Target Milestone: --- Created attachment 39696 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=39696&action=edit should print 99 when run on an AVX2 capable processor I've found a bug in __get_cpuid() in the compiler internal header, cpuid.h. I wish to detect if the CPU supports AVX2, but when I call __get_cpuid(7, ...), EBX is all zeros. The problem is for level 7, ECX must be set to 0 before calling cpuid. As a work around, I've added "xor %%ecx, %%ecx" to __cpuid() and that took care of the problem: #define __cpuid(level, a, b, c, d) \ __asm__("xor %%ecx, %%ecx\n" \ "cpuid\n" \ : "=a"(a), "=b"(b), "=c"(c), "=d"(d) \ : "0"(level)) It looks like Intel only started requiring this for level 7. Who knows if they'll require setting ECX to other values for future levels, but for know, it seems always setting ECX to 0 is OK. One mystery is why GCC's builtin AVX2 auto detection for function multiversioning, which uses __get_cpuid() works (see multiversioning.cpp). However, I can't use multiversioning because ifunc hasn't been ported to Windows, so I have to do manual detection. I don't think attaching a preprocessed C file is necessary to reproduce this. Here's the output of gcc -v: Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/6/lto-wrapper Target: x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Debian 6.2.0-4' --with-bugurl=file:///usr/share/doc/gcc-6/README.Bugs --enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-6 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-6-amd64/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-6-amd64 --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-6-amd64 --with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-objc-gc --enable-multiarch --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu Thread model: posix gcc version 6.2.0 20160914 (Debian 6.2.0-4)
[Bug other/77769] New: function generated for OpenMP region uses wrong instruction set
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77769 Bug ID: 77769 Summary: function generated for OpenMP region uses wrong instruction set Product: gcc Version: 6.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: other Assignee: unassigned at gcc dot gnu.org Reporter: yzhang1985 at gmail dot com Target Milestone: --- Created attachment 39707 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=39707&action=edit compilation will fail with "target specific option mismatch" Greetings, I'm trying write AVX2 SIMD intrinsics code that will be dynamically dispatched at runtime through a function pointer. The code has to work for vanilla x86_64 processors, so I can't use -mavx2. Instead, I use #pragma GCC target("avx2") to target AVX2 for selected functions. The bug is that whenever I call AVX or AVX2 intrinsics inside an OpenMP region, I get the error, "target specific option mismatch" If I move the intrinsics code to another function, it can compile, but if I mark that function with __attribute__((always_inline)), the compilation fails with the same error. So, my conclusion is that the OpenMP code generator is still targeting vanilla x86_64, instead of AVX2. Appreciate it if someone can work on fixing this. command line: g++ -O3 -fopenmp openmp_wrong_target_isa.cpp Sorry, I couldn't include the preprocessed file as requested - exceeds upload limit. gcc -v output: Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/6/lto-wrapper Target: x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Debian 6.2.0-4' --with-bugurl=file:///usr/share/doc/gcc-6/README.Bugs --enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-6 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-6-amd64/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-6-amd64 --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-6-amd64 --with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-objc-gc --enable-multiarch --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu Thread model: posix gcc version 6.2.0 20160914 (Debian 6.2.0-4)
[Bug middle-end/77769] function generated for OpenMP region uses wrong instruction set
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77769 --- Comment #3 from Yale Zhang --- (In reply to Richard Biener from comment #2) > The testcase you attached can't work because we can't inline an avx2 > function into a function not having avx2 enabled. Right, but main() and the OpenMP function should have AVX2 enabled because they come after #pragma GCC target("avx2") which is still in effect. If the target("avx2") was surrounded by #pragma GCC push_options/pop_options, then main would not have AVX2 enabled
[Bug target/77756] __get_cpuid() returns wrong values for level 7 (extended features)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77756 --- Comment #2 from Yale Zhang --- (In reply to Uroš Bizjak from comment #1) > Created attachment 39711 [details] > Patch that fixes __get_cpuid > > Can you please check if the attached patch fixes your problem? Great, your patch works. Thanks for taking care of it so quickly. I see you made it flexible by setting ECX to 0 only for certain levels, without increasing machine code size since __get_cpuid() is inlined and most of the unused cases will get thrown away as dead code. But does level 13 really exist? I don't see any documentation for it. Also, any idea why the AVX2 auto detection used for function multiversioning was working earlier, which used __get_cpuid()? Was it just by chance?
[Bug target/77756] __get_cpuid() returns wrong values for level 7 (extended features)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77756 --- Comment #12 from Yale Zhang --- What's the purpose of subleaf? Is it to distinguish the capabilities of different cores in a heterogeneous chip (e.g. ARM big-little)? Then I would be fine with making this an extra parameter to __get_cpuid(). Microsoft has a __cpuidex() function that also takes subleaf, but for the regular __cpuid(), the subleaf default to 0. Should __get_cpuid() default to 0 as well? (In reply to uros from comment #11) > Author: uros > Date: Thu Sep 29 18:44:32 2016 > New Revision: 240629 > > URL: https://gcc.gnu.org/viewcvs?rev=240629&root=gcc&view=rev > Log: > PR target/77756 > * config/i386/cpuid.h (__get_cpuid_count): New. > (__get_cpuid): Rename __level to __leaf. > > testsuite/ChangeLog: > > PR target/77756 > * gcc.target/i386/pr77756.c: New test. > > > Modified: > trunk/gcc/ChangeLog > trunk/gcc/config/i386/cpuid.h > trunk/gcc/testsuite/ChangeLog > trunk/gcc/testsuite/gcc.target/i386/pr77756.c
[Bug other/61417] New: can't use intrinsic function as argument to function template
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61417 Bug ID: 61417 Summary: can't use intrinsic function as argument to function template Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: other Assignee: unassigned at gcc dot gnu.org Reporter: yzhang1985 at gmail dot com This program isn't compiling. It works in GCC 4.3, 4.8, and in the Intel compiler. The problem is that GCC fails to inline the _mm_cmpgt_epi8 function (which is not compiled into its own symbol), thinking it's defined externally. #include #include #define FORCE_INLINE __attribute__ ((always_inline)) __m128i g_results; typedef __m128i TwoOperandVectorFunction(__m128i, __m128i); FORCE_INLINE void IntrinsicBench(TwoOperandVectorFunction f) { __m128i r0, r1, r2; for (int i = 0; i < 20; i += 16) { r0 = f(r1, r2); } g_results = r0; } int main(int argc, char **argv) { IntrinsicBench(_mm_cmpgt_epi8); return 0; } I'm using GCC 4.9 (x86_64) configured with ./configure --prefix=/opt/gcc4.9 --with-gmp-include=/home/yale/gmp-5.1.2 --with-gmp-lib=/home/yale/gmp-5.1.2/.libs --with-mpfr-include=/home/yale/mpfr-3.1.2/src --with-mpfr-lib=/home/yale/mpfr-3.1.2/src/.libs --with-mpc-include=/home/yale/mpc-1.0.1/src --with-mpc-lib=/home/yale/mpc-1.0.1/src/.libs --enable-languages=c,c++,java --with-multilib-list=m32,m64 --enable-libgcj --enable-libgcj-multifile --enable-static-libjava --disable-java-awt --disable-libgcj-debug --disable-jvmpi --disable-bootstrap --disable-nls --disable-multilib