https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90424
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |NEW
CC| |jakub at gcc dot gnu.org,
| |rguenth at gcc dot gnu.org,
| |rsandifo at gcc dot gnu.org,
| |uros at gcc dot gnu.org
Assignee|rguenth at gcc dot gnu.org |unassigned at gcc dot
gnu.org
--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #3)
> OK, so the "easier" way to allow aligned sub-vector inserts produces for
>
> typedef unsigned char v16qi __attribute__((vector_size(16)));
> v16qi load (const void *p)
> {
> v16qi r;
> __builtin_memcpy (&r, p, 8);
> return r;
> }
>
> load (const void * p)
> {
> v16qi r;
> long unsigned int _3;
> v16qi _5;
> vector(8) unsigned char _7;
>
> <bb 2> :
> _3 = MEM[(char * {ref-all})p_2(D)];
> _7 = VIEW_CONVERT_EXPR<vector(8) unsigned char>(_3);
> r_9 = BIT_INSERT_EXPR <r_8(D), _7, 0 (64 bits)>;
> _5 = r_9;
> return _5;
>
> and unfortunately (as I feared)
>
> load:
> .LFB0:
> .cfi_startproc
> movq (%rdi), %rax
> pxor %xmm1, %xmm1
> movaps %xmm1, -24(%rsp)
> movq %rax, -24(%rsp)
> movdqa -24(%rsp), %xmm0
> ret
So we're now at this state. This is where either simplifications
or canonicalizations on SSA can be made, middle-end changes to
BIT_INSERT_EXPR expansion, possibly via extending vec_set
in a similar way vec_init was. Note vec_set can end up as
(subreg:N
(vec_select
(vec_concat:V2I
(subreg:VI into:N)
(vec_duplicate:VI (subreg:I to_insert:M))
(... )))
when a proper (vector) integer mode exists to cover the insertion
and when a proper 2xwide vector mode exists for the concat.
You could argue that
GIMPLE should also use permutes for inserts (but then not use
CONSTRUCTOR for the splat). That is, I think both GIMPLE and RTL
could use some streamlining here (for the RTL parts that's always
difficult because you have to adjust many targets). RTL
definitely misses a vec_perm operation to consolidate vec_select
and vec_merge.
I'm not going to work on that part for this moment.