https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114908

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jamborm at gcc dot gnu.org

--- Comment #9 from Richard Biener <rguenth at gcc dot gnu.org> ---
One issue with

V load3(const unsigned long* ptr)
{
  V ret = {};
  __builtin_memcpy(&ret, ptr, 3 * sizeof(unsigned long));

is that we cannot load a vector worth of data from ptr because that might
trap, so rewriting that into

  V tem = *ptr;
  ret = VEC_PERM <ret, tem, { 4, 5, 6, 3 }>;

isn't possible.  There is neither a vector<T, 3> type nor an integer type
with 3 * sizeof (T) size, so a lowering to sth like

  x = BIT_FIELD_REF <ptr, ...>
  BIT_FIELD_REF <&ret, ...> = x;

isn't possible either.  But we can possibly handle

  __builtin_memcpy (&A, &B, ...);

when we have an underlying decl of appropriate size/type.  This could be
done in update_address_taken which can be teached to optimistically
analyze this and rewrite the memcpy when A and B can both become registers.

That would not cover load3 (above problem), or split3_1, concat1_3 or perm
(aggregate and not vector register type).  The latter three would require
adjustments in SRA.  We have for example

  MEM <char[8]> [(struct simd4 *)&ret + 24B] = {};
  __builtin_memcpy (&ret, &x, 24);
  D.3390 = ret;

here SRA isn't done because ret is addressable because of the memcpy,
so it requires SRA to realize the addresses still don't escape (all
accesses remain direct).  IIRC there's some improvement in SRA to this
respect already?  As said there's no good general way to lower this
kind of memcpy but SRA can treat a memcpy as a partial aggregate
assignment (when disqualifying partial "field" or padding accesses).

IMO enhancing SRA to better handle memcpy/memmove and memset should be
a priority.  It already sees

Candidate (3316): x
Candidate (3389): ret
Candidate (3390): D.3390
Candidate (3320): ret
Allowed ADDR_EXPR of ret because of __builtin_memcpy (&ret, &x, 24);

Allowed ADDR_EXPR of x because of __builtin_memcpy (&ret, &x, 24);

Will attempt to totally scalarize ret (UID: 3320):
Will attempt to totally scalarize D.3390 (UID: 3390):
! Disqualifying x - No scalar replacements to be created.
Created a replacement for ret offset: 256, size: 64: ret$b$dataD.3392

but I think it simply fails to handle memcpy in the scalarization attempt
and when building accesses.

Reply via email to