https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63271
Bug ID: 63271 Summary: Should commute arithmetic with vector load Product: gcc Version: 4.9.1 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: zackw at panix dot com Consider #include <emmintrin.h> __m128i foo(char C) { return _mm_set_epi8( 0, C, 2*C, 3*C, 4*C, 5*C, 6*C, 7*C, 8*C, 9*C, 10*C, 11*C, 12*C, 13*C, 14*C, 15*C); } __m128i bar(char C) { __m128i v = _mm_set_epi8(0, 1, 2, 3, 4, 5, 6, 7, 8, 9,10,11,12,13,14,15); v *= C; return v; } I *believe* these functions compute the same value, and should therefore generate identical code, but with gcc 4.9 foo() generates considerably larger and slower code. The test case is expressed in terms of x86 <emmintrin.h> but I have no reason to believe it isn't a generic missed optimization in the tree-level vectorizer.