On Tue, Jan 9, 2018 at 10:59 PM, Richard Sandiford
wrote:
> After cunrolling the inner loop, the remaining loop in the testcase
> has a single 32-bit access and a group of 64-bit accesses. We first
> try to vectorise at 128 bits (VF 4), but decide not to for cost reasons.
> We then try with 64 bi
After cunrolling the inner loop, the remaining loop in the testcase
has a single 32-bit access and a group of 64-bit accesses. We first
try to vectorise at 128 bits (VF 4), but decide not to for cost reasons.
We then try with 64 bits (VF 2) instead. This means that the group
of 64-bit accesses us