https://gcc.gnu.org/bugzilla/show_bug.cgi?id=37021
Bill Schmidt <wschmidt at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |wschmidt at gcc dot gnu.org --- Comment #20 from Bill Schmidt <wschmidt at gcc dot gnu.org> --- We still don't vectorize the original code example on Power. It appears that this is being disabled because of an alignment issue. The data references are being rejected by: product.f:9:0: note: can't force alignment of ref: REALPART_EXPR <*a.0_24[_50]> and similar for the other three DRs. This happens due to this code in vect_compute_data_ref_alignment: if (base_alignment < TYPE_ALIGN (vectype)) { /* Strip an inner MEM_REF to a bare decl if possible. */ if (TREE_CODE (base) == MEM_REF && integer_zerop (TREE_OPERAND (base, 1)) && TREE_CODE (TREE_OPERAND (base, 0)) == ADDR_EXPR) base = TREE_OPERAND (TREE_OPERAND (base, 0), 0); if (!vect_can_force_dr_alignment_p (base, TYPE_ALIGN (vectype))) { if (dump_enabled_p ()) { dump_printf_loc (MSG_NOTE, vect_location, "can't force alignment of ref: "); dump_generic_expr (MSG_NOTE, TDF_SLIM, ref); dump_printf (MSG_NOTE, "\n"); } return true; } Here TYPE_ALIGN (vectype) is 128 (Power vectors are normally aligned on a 128-bit value), and base_alignment is 64. a.0 is defined as: complex(kind=8) [0:D.1831] * restrict a.0; In both ELFv1 and ELFv2 ABIs for Power, a complex type is defined to have the same alignment as the underlying type. So "complex double" has 8-byte alignment. On earlier versions of Power, the decision is fine, because unaligned accesses are expensive prior to POWER8. With POWER8, though, an unaligned access will (most of the time) perform as well as an aligned access. So ideally we would like to teach the vectorizer to allow vectorization here. It seems like vect_supportable_dr_alignment ought to be considered as part of the SLP vectorization decision here, rather than just comparing the base alignment with the vector type alignment. Adding a check for that allows things to get a little further, but we still don't vectorize the block. (I haven't yet looked into why, but I assume more needs to be done downstream to handle this case.) My understanding of the vectorizer is not yet very deep, so before going too far down the wrong path, I'd like your opinion on the best approach to fixing the problem. Thanks! Bill