https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114801
--- Comment #33 from Jakub Jelinek <jakub at gcc dot gnu.org> --- That is still a hack, but guess can be acceptable for 14.22 and short term trunk if the ARM maintainers approve it. But, for GCC 15+, I think if the behavior is that when the predicate constant/register is used in an instruction, regardless of the element mode it actually performs per-byte predication, then it should be represented as V16BImode, not V8BImode or V4BImode. It is fine if instructions which produce the predicate mask like comparisons produce V8BImode or V4BImode, but what consumes should use subreg of that to V16BImode. At least if the behavior is either perform the operation on all elements and then based on the 16 bits in the predicate choose result between the newly computed result and something else on byte by byte basis. Or perhaps if the operation is performed only on elements where at least one predicate bit for the element is non-zero and then merged. I think it would be useful if you pointed at the docs how the instructions exactly work or tried to explain it here.