On 27/11/15 14:13, Richard Biener wrote:
The following fixes the excessive peeling for gaps we do when doing
SLP now that I removed most of the restrictions on having gaps in
the first place.
This should make low-trip vectorized loops more efficient (sth
also the combine-epilogue-with-vectorized-body-by-masking patches
claim to do).
Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.
Richard.
2015-11-27 Richard Biener <rguent...@suse.de>
PR tree-optimization/68559
* tree-vect-data-refs.c (vect_analyze_group_access_1): Move
peeling for gap checks ...
* tree-vect-stmts.c (vectorizable_load): ... here and relax
for SLP.
* tree-vect-loop.c (vect_analyze_loop_2): Re-set
LOOP_VINFO_PEELING_FOR_GAPS before re-trying without SLP.
* gcc.dg/vect/slp-perm-4.c: Adjust again.
* gcc.dg/vect/pr45752.c: Likewise.
Since this, we have
FAIL: gcc.dg/vect/pr45752.c -flto -ffat-lto-objects scan-tree-dump-times vect
"gaps requires scalar epilogue loop" 0
FAIL: gcc.dg/vect/pr45752.c scan-tree-dump-times vect "gaps requires scalar
epilogue loop" 0
on aarch64 platforms (aarch64-none-linux-gnu, aarch64-none-elf,
aarch64_be-none-elf).
Thanks, Alan