On Fri, 3 Feb 2023 at 12:46, Prathamesh Kulkarni <prathamesh.kulka...@linaro.org> wrote: > > Hi Richard, > While digging thru aarch64_expand_vector_init, I noticed it gives > priority to loading a constant first: > /* Initialise a vector which is part-variable. We want to first try > to build those lanes which are constant in the most efficient way we > can. */ > > which results in suboptimal code-gen for following case: > int16x8_t f_s16(int16_t x) > { > return (int16x8_t) { x, x, x, x, x, x, x, 1 }; > } > > code-gen trunk: > f_s16: > movi v0.8h, 0x1 > ins v0.h[0], w0 > ins v0.h[1], w0 > ins v0.h[2], w0 > ins v0.h[3], w0 > ins v0.h[4], w0 > ins v0.h[5], w0 > ins v0.h[6], w0 > ret > > The attached patch tweaks the following condition: > if (n_var == n_elts && n_elts <= 16) > { > ... > } > > to pass if maxv >= 80% of n_elts, with 80% being an > arbitrary "high enough" threshold. The intent is to dup > the most repeating variable if it it's repetition > is "high enough" and insert constants which should be "better" than > loading constant first and inserting variables like in the above case. > > Alternatively, I suppose we can remove threshold and for constants, > generate both sequences and check which one is more > efficient ? > > code-gen with patch: > f_s16: > dup v0.8h, w0 > movi v1.4h, 0x1 > ins v0.h[7], v1.h[0] > ret > > The patch is lightly tested to verify that vec[t]-init-*.c tests pass > with bootstrap+test > in progress. > Does this look OK ? Hi Richard, ping https://gcc.gnu.org/pipermail/gcc-patches/2023-February/611243.html
Thanks, Prathamesh > > Thanks, > Prathamesh