https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119373
Bug ID: 119373 Summary: RISC-V: missed unrolling opportunity Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: parras at gcc dot gnu.org Target Milestone: --- Created attachment 60820 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=60820&action=edit Source reduced from 510.parest_r The analysis of SPEC2017's 510.parest_r shows that the topmost basic block is a tight loop (see attached reducer). Once vectorised, by unrolling and mutualising 4 instructions, AArch64 achieves a 22% reduction in dynamic instruction count (DIC) within the block. However, RISC-V still vectorises but misses the opportunity to further unroll. The vectoriser dump for RISC-V shows the analysis fails for the natural mode RVVM1DF (and chooses RVVMF8QI instead) because it requires a "conversion not supported by target". It turns out this is caused by two missing standard named patterns: vec_unpacku_hi and vec_unpacku_lo. Defining those two patterns allows RVVM1DF to be picked by the vectoriser. However, it then produces worse code because partial vectors cannot be used due to the presence of both masks and lengths in loop_vinfo (see tree-vect-loop.cc:3015-3028): /* For now, we don't expect to mix both masking and length approaches for one loop, disable it if both are recorded. */ if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) && !LOOP_VINFO_MASKS (loop_vinfo).is_empty () && !LOOP_VINFO_LENS (loop_vinfo).is_empty ()) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, "can't vectorize a loop with partial vectors" " because we don't expect to mix different" " approaches with partial vectors for the" " same loop.\n"); LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false; } My feeling is that we should be able to pick one over the other rather than giving up entirely on partial vectors.