https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119879
--- Comment #1 from Jan Hubicka <hubicka at gcc dot gnu.org> --- The problem is in: /* VEC_PACK_TRUNC_EXPR: If inner size is greater than outer size we will end up doing two conversions and packing them. */ if (!scalar_p && inner_size > outer_size) { int n = inner_size / outer_size; stmt_cost = stmt_cost * n + (n - 1) * ix86_vec_cost (mode, ix86_cost->sse_op); } While this is true for code produced by loop vectorizer (which for double->float produces VEC_PACK_TRUNC_EXPR having two float inputs), it is not true when SLP vectorizer is trying to cost conversions of 2 floats to 2 doubles. I will check how to distinguish these.