[Bug rtl-optimization/43147] New: SSE shuffle merge
I've noticed that GCC (my current version is 4.4.1) doesn't fully optimize SSE shuffle merges, as seen in this example: #include extern void printv(__m128 m); int main() { m = _mm_shuffle_ps(m, m, 0xC9); // Those two shuffles together swap pairs m = _mm_shuffle_ps(m, m, 0x2D); // And could be optimized to 0x4E printv(m); return 0; } This code generates the following assembly: movaps .LC1, %xmm1 shufps $201, %xmm1, %xmm1 shufps $45, %xmm1, %xmm1; <-- Both should merge to 78 movaps %xmm1, %xmm0 movaps %xmm1, -24(%ebp) .LC0: .long 1065353216 ; 1.0f .long 1073741824 ; 2.0f .long 1077936128 ; 3.0f .long 1082130432 ; 4.0f Would be nice to see it as an enhancement! -- Summary: SSE shuffle merge Product: gcc Version: 4.4.1 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: rtl-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: liranuna at gmail dot com GCC build triplet: x86_64-linux-gnu GCC host triplet: x86_64-linux-gnu GCC target triplet: x86_64-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43147
[Bug rtl-optimization/43147] SSE shuffle merge
--- Comment #1 from liranuna at gmail dot com 2010-02-23 01:37 --- It appears I am missing a line in the code I posted: #include extern void printv(__m128 m); int main() { __m128 m = _mm_set_ps(1.0f, 2.0f, 3.0f, 4.0f); m = _mm_shuffle_ps(m, m, 0xC9); // Those two shuffles together swap pairs m = _mm_shuffle_ps(m, m, 0x2D); // And could be optimized to 0x4E printv(m); return 0; } -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43147
[Bug target/43722] New: ICE when passing NEON registers using const refrences
Giving GCC 4.4.3 the following code with the arguments "-O1 -mcpu=cortex-a8 -mfpu=neon -mfloat-abi=softfp": #include #include void printv_f32(const float32x4_t &v) { printf("%f\n", vgetq_lane_f32(v, 0)); } int main() { float32x4_t v = {0.0, 1.0f, 2.0f, 3.0f}; printv_f32(v); return 0; } Results into an ICE: /home/liranuna/Projects/mathlib_md/source/main.cpp: In function 'int main()': /home/liranuna/Projects/mathlib_md/source/main.cpp:21: error: insn does not satisfy its constraints: (insn 25 5 7 2 /home/liranuna/Projects/mathlib_md/source/main.cpp:11 (set (mem/c/i:V4SF (pre_dec:SI (reg/f:SI 0 r0 [134])) [0 v+0 S16 A64]) (reg:V4SF 95 d16)) 710 {*neon_movv4sf} (expr_list:REG_INC (reg/f:SI 0 r0 [134]) (nil))) /home/liranuna/Projects/mathlib_md/source/main.cpp:21: internal compiler error: in reload_cse_simplify_operands, at postreload.c:396 Please submit a full bug report, with preprocessed source if appropriate. See <http://gcc.gnu.org/bugs.html> for instructions. -- Summary: ICE when passing NEON registers using const refrences Product: gcc Version: 4.4.3 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: liranuna at gmail dot com GCC host triplet: x86_64-linux-gnu GCC target triplet: arm-linux-gnueabi http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43722
[Bug target/43722] ICE when passing NEON registers using const refrences
--- Comment #1 from liranuna at gmail dot com 2010-04-12 03:24 --- I would like to add that changing void printv_f32(const float32x4_t &v) into: void printv_f32(float32x4_t v) makes the problem go away, but the generated code is suboptimal. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43722
[Bug target/43724] New: GCC produces suboptimal ARM NEON code for zero vector assignment
The intrinsic family for vdupq_n_XXX with argument of 0. The code generated is: mov r0, #0 vdup.32 q8, r0 Instead of the faster veor.32 q8, q8, q8 Thing to note is that GCC will use xorps on x86[_64] for SSE when using _mm_setzero_ps() or _mm_set1_ps(0). -- Summary: GCC produces suboptimal ARM NEON code for zero vector assignment Product: gcc Version: 4.4.3 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: liranuna at gmail dot com GCC build triplet: x86_64-linux-gnu GCC host triplet: x86_64-linux-gnu GCC target triplet: arm-linux-gnueabi http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43724
[Bug target/43722] ICE when passing NEON registers using const refrences
--- Comment #7 from liranuna at gmail dot com 2010-04-13 07:43 --- Mikael's patch seems to do that trick as well as producing very nice assembly. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43722
[Bug 45775] (c++) New: Private templated classes/structs inside a class.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45775 Summary: Private templated classes/structs inside a class. Product: gcc Version: 4.4.3 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ AssignedTo: unassig...@gcc.gnu.org ReportedBy: liran...@gmail.com Created attachment 21874 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=21874 Failing code Using GCC 4.4.3, and the attached source code, GCC does not error that the access to A::B is illegal. According to the C++ spec: 11.8 Nested classes [class.access.nest] 1 A nested class is a member and as such has the same access rights as any other member. The members of an enclosing class have no special access to members of a nested class; the usual access rules (Clause 11) shall be obeyed. Note that A::C is erroring correctly. -- Configure bugmail: http://gcc.gnu.org/bugzilla/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are on the CC list for the bug.
[Bug 45775] (c++) Private templated classes/structs inside a class.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45775 --- Comment #1 from Liran Nuna 2010-09-24 06:50:29 UTC --- Accidentally attached wrong source file: #include class A { private: template struct B { }; struct C { }; public: template B getAb() { return B(); } C getAc() { return C(); } }; template void print_private_template(const A::B &ab) { printf("%d\n", T); } void print_private_class(const A::C &ac) { printf("something\n"); } int main(int, char**) { A a; print_private_template(a.getAb<42>()); print_private_class(a.getAc()); return 0; } -- Configure bugmail: http://gcc.gnu.org/bugzilla/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are on the CC list for the bug.