[Bug tree-optimization/18438] vectorizer failed for vector matrix multiplication

pinskia at gcc dot gnu.org Mon, 12 Dec 2016 01:40:15 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=18438


--- Comment #11 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Maxim Kuvyrkov from comment #9)
> I've looked into another case where inability to handle stores with gaps
> generates sub-optimal code.  I'm interested in spending some time on fixing
> this, provided some guidance in the vectorizer.
> 
> Is it substantially more difficult to handle stores with gaps compared to
> loads with gaps?
> 
> The following is [minimally] reduced from 462.libquantum:quantum_sigma_x(),
> which is #2 function in 462.libquantum profile.  This cycle accounts for
> about 25% of total 462.libquantum time.
> 
> ===struct node_struct
> {
>   float _Complex gap;
>   unsigned long long state;
> };
> 
> struct reg_struct
> {
>   int size;
>   struct node_struct *node;
> };
> 
> void
> func(int target, struct reg_struct *reg)
> {
>   int i;
> 
>   for(i=0; i<reg->size; i++)
>     reg->node[i].state ^= ((unsigned long long) 1 << target);
> }
> ===
> 
> This loop vectorizes into
>   <bb 5>:
>   # vectp.8_39 = PHI <vectp.8_40(6), vectp.9_38(4)>
>   vect_array.10 = LOAD_LANES (MEM[(long long unsigned int *)vectp.8_39]);
>   vect__5.11_41 = vect_array.10[0];
>   vect__5.12_42 = vect_array.10[1];
>   vect__7.13_44 = vect__5.11_41 ^ vect_cst__43;
>   _48 = BIT_FIELD_REF <vect__7.13_44, 64, 0>;
>   MEM[(long long unsigned int *)ivtmp_45] = _48;
>   ivtmp_50 = ivtmp_45 + 16;
>   _51 = BIT_FIELD_REF <vect__7.13_44, 64, 64>;
>   MEM[(long long unsigned int *)ivtmp_50] = _51;
> 
> which then becomes for aarch64:
> .L4:
>       ld2     {v0.2d - v1.2d}, [x1]
>       add     w2, w2, 1
>       cmp     w2, w7
>       eor     v0.16b, v2.16b, v0.16b
>       umov    x4, v0.d[1]
>       st1     {v0.d}[0], [x1]
>       add     x1, x1, 32
>       str     x4, [x1, -16]
>       bcc     .L4


What I did for thunderx was create a vector cost model which caused this loop
not be vectorized to get the regression from happening.  Not this might
actually be better code for some micro arch. I need to check with the new
processor we have in house but that is next week or so.  I don't know how much
I can share next week though.

[Bug tree-optimization/18438] vectorizer failed for vector matrix multiplication

Reply via email to