https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119224
--- Comment #7 from Robin Dapp <rdapp at gcc dot gnu.org> --- > So this why you weren't seeing it but I'm confused about the rationale... > I unpack above to following statements > > 1. -mno-vector-strict-align allows us to unroll - seems ok. > 2. Otherwise (-mvector-strict-align ?) leads to unaligned access ??? > 3. Or Is this the vector element vs. whole vector thing At first I though I wasn't confused but now I am as well ;) Unrolling the inner loop is not an issue, but the outer one is because of its unknown stride. However, we unroll just fine if I set the strides to something weird like 17 and 23. Our "preferred" vector alignment is element alignment anyway so for a char array there is nothing to do. I think the idea is that with an unknown stride we have an unknown misalignment of our data refs. Sometimes misalignment can be handled by peeling off some iterations but we don't do that for variable strides. But here every alignment is 1 anyway and it can't get any worse than that. Therefore we _should_ be able to unroll the loop even without the movmisalign pattern (=-mno-vector-strict-align). I believe that indicates a mistake in riscv_support_vector_misalignment and we could just always return true there for byte vectors. Then we would unroll here. That might be a rush job, though, need to think about it some more. But always unrolling would only exacerbate the scheduling/spilling problem so we should somehow get this under control first. With zvl128b in particular the situation is just unpleasant right now.