[Bug target/60778] New: shift not folded into shift on x86-64

2014-04-07 Thread sunfish at mozilla dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60778

Bug ID: 60778
   Summary: shift not folded into shift on x86-64
   Product: gcc
   Version: 4.9.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: sunfish at mozilla dot com

On this C code:

double mem[4096];
double foo(long x) {
  return mem[x>>3];
}

GCC emits this x86-64 code:

sarq$3, %rdi
movsd   mem(,%rdi,8), %xmm0

The following x86-64 code would be preferrable:

andq$-8, %rdi
movsd   mem(%rdi), %xmm0

since it has smaller code size, and avoids using a scaled index which costs an
extra micro-op on some microarchitectures.

The same situation arrises on 32-bit x86 also.

This was observed on all GCC versions currently on the GCC Explorer website
[0], with the latest at this time being 4.9.0 20130909.

[0] http://gcc.godbolt.org/


[Bug target/60826] New: inefficient code for vector xor on SSE2

2014-04-11 Thread sunfish at mozilla dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60826

Bug ID: 60826
   Summary: inefficient code for vector xor on SSE2
   Product: gcc
   Version: 4.9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: sunfish at mozilla dot com

On the following C testcase:

#include 

typedef double v2f64 __attribute__((__vector_size__(16), may_alias));
typedef int64_t v2i64 __attribute__((__vector_size__(16), may_alias));

static inline v2f64 f_and   (v2f64 l, v2f64 r) { return (v2f64)((v2i64)l &
(v2i64)r); }
static inline v2f64 f_xor   (v2f64 l, v2f64 r) { return (v2f64)((v2i64)l ^
(v2i64)r); }
static inline double vector_to_scalar(v2f64 v) { return v[0]; }

double test(v2f64 w, v2f64 x, v2f64 z)
{
v2f64 y = f_and(w, x);

return vector_to_scalar(f_xor(z, y));
}

GCC emits this code:

andpd%xmm1, %xmm0
movdqa%xmm0, %xmm3
pxor%xmm2, %xmm3
movdqa%xmm3, -24(%rsp)
movsd-24(%rsp), %xmm0
ret

GCC should move the result of the xor to the return register directly instead
of spilling it. Also, it should avoid the first movdqa, which is an unnecessary
copy.

Also, this should ideally use xorpd instead of pxor, to avoid a domain-crossing
penalty on Nehalem and other micro-architectures (or xorps if domain-crossing
doesn't matter, since its smaller).


[Bug target/60826] inefficient code for vector xor on SSE2

2014-04-14 Thread sunfish at mozilla dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60826

--- Comment #2 from Dan Gohman  ---
A little more detail: I think I have seen GCC use a spill + movsd reload as a
method of zeroing the non-zero-index vector elements of an xmm register,
however that's either not what's happening here, or it may be happening when it
isn't needed.

I think the x86-64 ABI doesn't require the unused parts of an xmm return
register to be zeroed, but even if it does, I can also reproduce the
unnecessary spill and reload when I modify the test function above to this:

void test(v2f64 w, v2f64 x, v2f64 z, double *p)
{
v2f64 y = f_and(w, x);

*p = vector_to_scalar(f_xor(z, y));
}


[Bug other/56955] documentation for attribute malloc contradicts itself

2014-05-20 Thread sunfish at mozilla dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56955

--- Comment #13 from Dan Gohman  ---
(In reply to Paul Eggert from comment #12)
> (In reply to Rich Felker from comment #10)
> > This assumption only aids
> > optimization in the case where a pointer residing in the obtained memory is
> > used (e.g. dereferenced or compared with another pointer) before anything is
> > stored to it.
> 
> No, it also aids optimization because GCC can infer lack of aliasing
> elsewhere, even if no pointer in the newly allocated memory is
> used-before-set.  Consider the contrived example am.c (which I've added as
> an attachment to this report).  It has two functions f and g that differ
> only in that f calls m which has __attribute__ ((malloc)) whereas g calls n
> which does not.  With the weaker assumption you're suggesting, GCC could not
> optimize away the reload from a->next in f, because of the intervening
> assignment '*p = q'.

Actually, GCC and Clang both eliminate the reload of a->next in f (and not in
g). The weaker assumption is sufficient for that. *p can't alias a or b without
violating the weaker assumption.

What GCC is additionally doing in f is deleting the stores to a->next and
b->next as dead stores. That's really clever. However, the weaker assumption is
actually sufficient for that too: First, forward b to eliminate the load of
a->next. Then, it can be proved that a doesn't escape, and is defined by an
attribute malloc function, so the stores through it can't be loaded anywhere
else, so they're dead. Then, it can be proved that b doesn't escape either, and
is also defined by an attribute malloc function, so the stores through it are
dead too.

Consequently, the weaker assumption is still fairly strong. Further, the weaker
assumption would be usable by a much broader set of functions, so it may even
provide overall stronger alias information in practice.


[Bug other/56955] documentation for attribute malloc contradicts itself

2014-05-21 Thread sunfish at mozilla dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56955

--- Comment #17 from Dan Gohman  ---
(In reply to Richard Biener from comment #16)
> One reason for why realloc is "hard" is that there is no language that says
> it is undefined to access the object via the old pointer, but there is only
> language that says the old and the new pointer values may be equal.

C89 was unclear, but C99 and now C11 7.22.3.5 say realloc deallocates the old
pointer, and there is no mention of the case where the pointers happen to be
equal. The interpretation of this to mean that old and new pointers don't
alias, even when being comparison-equal, has a serious following.