[Bug target/60778] New: shift not folded into shift on x86-64
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60778 Bug ID: 60778 Summary: shift not folded into shift on x86-64 Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: sunfish at mozilla dot com On this C code: double mem[4096]; double foo(long x) { return mem[x>>3]; } GCC emits this x86-64 code: sarq$3, %rdi movsd mem(,%rdi,8), %xmm0 The following x86-64 code would be preferrable: andq$-8, %rdi movsd mem(%rdi), %xmm0 since it has smaller code size, and avoids using a scaled index which costs an extra micro-op on some microarchitectures. The same situation arrises on 32-bit x86 also. This was observed on all GCC versions currently on the GCC Explorer website [0], with the latest at this time being 4.9.0 20130909. [0] http://gcc.godbolt.org/
[Bug target/60826] New: inefficient code for vector xor on SSE2
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60826 Bug ID: 60826 Summary: inefficient code for vector xor on SSE2 Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: sunfish at mozilla dot com On the following C testcase: #include typedef double v2f64 __attribute__((__vector_size__(16), may_alias)); typedef int64_t v2i64 __attribute__((__vector_size__(16), may_alias)); static inline v2f64 f_and (v2f64 l, v2f64 r) { return (v2f64)((v2i64)l & (v2i64)r); } static inline v2f64 f_xor (v2f64 l, v2f64 r) { return (v2f64)((v2i64)l ^ (v2i64)r); } static inline double vector_to_scalar(v2f64 v) { return v[0]; } double test(v2f64 w, v2f64 x, v2f64 z) { v2f64 y = f_and(w, x); return vector_to_scalar(f_xor(z, y)); } GCC emits this code: andpd%xmm1, %xmm0 movdqa%xmm0, %xmm3 pxor%xmm2, %xmm3 movdqa%xmm3, -24(%rsp) movsd-24(%rsp), %xmm0 ret GCC should move the result of the xor to the return register directly instead of spilling it. Also, it should avoid the first movdqa, which is an unnecessary copy. Also, this should ideally use xorpd instead of pxor, to avoid a domain-crossing penalty on Nehalem and other micro-architectures (or xorps if domain-crossing doesn't matter, since its smaller).
[Bug target/60826] inefficient code for vector xor on SSE2
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60826 --- Comment #2 from Dan Gohman --- A little more detail: I think I have seen GCC use a spill + movsd reload as a method of zeroing the non-zero-index vector elements of an xmm register, however that's either not what's happening here, or it may be happening when it isn't needed. I think the x86-64 ABI doesn't require the unused parts of an xmm return register to be zeroed, but even if it does, I can also reproduce the unnecessary spill and reload when I modify the test function above to this: void test(v2f64 w, v2f64 x, v2f64 z, double *p) { v2f64 y = f_and(w, x); *p = vector_to_scalar(f_xor(z, y)); }
[Bug other/56955] documentation for attribute malloc contradicts itself
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56955 --- Comment #13 from Dan Gohman --- (In reply to Paul Eggert from comment #12) > (In reply to Rich Felker from comment #10) > > This assumption only aids > > optimization in the case where a pointer residing in the obtained memory is > > used (e.g. dereferenced or compared with another pointer) before anything is > > stored to it. > > No, it also aids optimization because GCC can infer lack of aliasing > elsewhere, even if no pointer in the newly allocated memory is > used-before-set. Consider the contrived example am.c (which I've added as > an attachment to this report). It has two functions f and g that differ > only in that f calls m which has __attribute__ ((malloc)) whereas g calls n > which does not. With the weaker assumption you're suggesting, GCC could not > optimize away the reload from a->next in f, because of the intervening > assignment '*p = q'. Actually, GCC and Clang both eliminate the reload of a->next in f (and not in g). The weaker assumption is sufficient for that. *p can't alias a or b without violating the weaker assumption. What GCC is additionally doing in f is deleting the stores to a->next and b->next as dead stores. That's really clever. However, the weaker assumption is actually sufficient for that too: First, forward b to eliminate the load of a->next. Then, it can be proved that a doesn't escape, and is defined by an attribute malloc function, so the stores through it can't be loaded anywhere else, so they're dead. Then, it can be proved that b doesn't escape either, and is also defined by an attribute malloc function, so the stores through it are dead too. Consequently, the weaker assumption is still fairly strong. Further, the weaker assumption would be usable by a much broader set of functions, so it may even provide overall stronger alias information in practice.
[Bug other/56955] documentation for attribute malloc contradicts itself
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56955 --- Comment #17 from Dan Gohman --- (In reply to Richard Biener from comment #16) > One reason for why realloc is "hard" is that there is no language that says > it is undefined to access the object via the old pointer, but there is only > language that says the old and the new pointer values may be equal. C89 was unclear, but C99 and now C11 7.22.3.5 say realloc deallocates the old pointer, and there is no mention of the case where the pointers happen to be equal. The interpretation of this to mean that old and new pointers don't alias, even when being comparison-equal, has a serious following.