https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84037
--- Comment #15 from Richard Biener <rguenth at gcc dot gnu.org> ---
Oh, and if you don't disable inlining then you get down to sizes of 148
(SSE and SLP) and 91 and 75 (SSE and no SLP). So you won't get rid
of two instances of vectorization regardless of the parameter
(for size 75 I don't apply PARAM_MAX_UNROLLED_INSNS because it's not at least
one full unroll copy when looking at the scalar body size of 43).
With the default param we inhibit use of SLP in
capacita2.f90:226:0: note: estimated vector body size is 19, scalar body size 2
capacita2.f90:226:0: note: not vectorized: loop grows too much.
capacita2.f90:226:0: note: estimated vector body size is 11, scalar body size 2
capacita2.f90:226:0: note: loop vectorized
capacita2.f90:259:0: note: estimated vector body size is 19, scalar body size 2
capacita2.f90:259:0: note: not vectorized: loop grows too much.
capacita2.f90:259:0: note: estimated vector body size is 11, scalar body size 2
capacita2.f90:259:0: note: loop vectorized
in addition to the critical loop (copies) at 551:
capacita2.f90:551:0: note: estimated vector body size is 298, scalar body size
43
capacita2.f90:551:0: note: not vectorized: loop grows too much.
capacita2.f90:551:0: note: estimated vector body size is 259, scalar body size
43
capacita2.f90:551:0: note: not vectorized: loop grows too much.
capacita2.f90:551:0: note: estimated vector body size is 147, scalar body size
43
capacita2.f90:551:0: note: loop vectorized
capacita2.f90:551:0: note: estimated vector body size is 259, scalar body size
43
capacita2.f90:551:0: note: not vectorized: loop grows too much.
capacita2.f90:551:0: note: estimated vector body size is 147, scalar body size
43
capacita2.f90:551:0: note: loop vectorized
capacita2.f90:551:0: note: estimated vector body size is 258, scalar body size
43
capacita2.f90:551:0: note: not vectorized: loop grows too much.
capacita2.f90:551:0: note: estimated vector body size is 259, scalar body size
43
capacita2.f90:551:0: note: not vectorized: loop grows too much.
capacita2.f90:551:0: note: estimated vector body size is 168, scalar body size
43
capacita2.f90:551:0: note: loop vectorized
I do think that applying this sort of heuristic makes sense, even if it doesn't
help the polyhedron case.
Numbers for different values of the parameter are
300 (w/o patch) 20.91user 0.05system 0:20.97elapsed
200 (default) 20.98user 0.08system 0:21.07elapsed 99%CPU
147 19.62user 0.06system 0:19.70elapsed 99%CPU
146 17.27user 0.08system 0:17.36elapsed 99%CPU
140 17.19user 0.06system 0:17.26elapsed 99%CPU
91 17.41user 0.05system 0:17.48elapsed 99%CPU
90 16.98user 0.05system 0:17.04elapsed 99%CPU
75 17.01user 0.04system 0:17.06elapsed 99%CPU
74 16.93user 0.06system 0:16.99elapsed 99%CPU
1 17.02user 0.06system 0:17.08elapsed 100%CPU
the sweet spot for this benchmark seems to be 146...
For reference with -fno-tree-vectorize I get
18.36user 0.09system 0:18.45elapsed 99%CPU
with --param vect-max-version-for-alias-checks=0
16.92user 0.06system 0:16.99elapsed 99%CPU