https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91446
--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
Index: gcc/config/i386/x86-tune-costs.h
===================================================================
--- gcc/config/i386/x86-tune-costs.h (revision 274422)
+++ gcc/config/i386/x86-tune-costs.h (working copy)
@@ -1442,7 +1442,7 @@ struct processor_costs skylake_cost = {
{4, 4, 4}, /* cost of loading integer registers
in QImode, HImode and SImode.
Relative to reg-reg move (2). */
- {6, 6, 3}, /* cost of storing integer registers */
+ {6, 6, 6}, /* cost of storing integer registers */
2, /* cost of reg,reg fld/fst */
{6, 6, 8}, /* cost of loading fp registers
in SFmode, DFmode and XFmode */
produces
foo:
.LFB0:
.cfi_startproc
vmovq %rdi, %xmm1
subq $40, %rsp
.cfi_def_cfa_offset 48
vpinsrq $1, %rsi, %xmm1, %xmm0
vmovq %rdx, %xmm2
vmovaps %xmm0, (%rsp)
movq %rsp, %rdi
vpinsrq $1, %rcx, %xmm2, %xmm0
vmovaps %xmm0, 16(%rsp)
call bar
addq $40, %rsp
.cfi_def_cfa_offset 8
ret
it may appear odd that we don't use AVX256, this is because of
t.i:17:3: note: === vect_analyze_data_ref_accesses ===
t.i:17:3: note: Detected interleaving store t.width and t.height
t.i:17:3: note: Detected interleaving store t.x and t.y
t.i:17:3: note: Detected interleaving store of size 2
t.i:17:3: note: t.width = width_2(D);
t.i:17:3: note: t.height = height_4(D);
t.i:17:3: note: Detected interleaving store of size 2
t.i:17:3: note: t.x = x_6(D);
t.i:17:3: note: t.y = y_8(D);
and thus we are "confused" about the different sign of the fields which
ultimatively yields to different vector types which would make a difference
if there is sign-dependent arithmetic performed, but not in this particular
case. On GIMPLE we'd also need nop-conversions to make the IL checker happy.
On this ground the bug would be valid but not about costs (you may want
to open a separate bug for this issue).