On Mon, 13 Feb 2023 at 11:58, Prathamesh Kulkarni <prathamesh.kulka...@linaro.org> wrote: > > On Fri, 3 Feb 2023 at 12:46, Prathamesh Kulkarni > <prathamesh.kulka...@linaro.org> wrote: > > > > Hi Richard, > > While digging thru aarch64_expand_vector_init, I noticed it gives > > priority to loading a constant first: > > /* Initialise a vector which is part-variable. We want to first try > > to build those lanes which are constant in the most efficient way we > > can. */ > > > > which results in suboptimal code-gen for following case: > > int16x8_t f_s16(int16_t x) > > { > > return (int16x8_t) { x, x, x, x, x, x, x, 1 }; > > } > > > > code-gen trunk: > > f_s16: > > movi v0.8h, 0x1 > > ins v0.h[0], w0 > > ins v0.h[1], w0 > > ins v0.h[2], w0 > > ins v0.h[3], w0 > > ins v0.h[4], w0 > > ins v0.h[5], w0 > > ins v0.h[6], w0 > > ret > > > > The attached patch tweaks the following condition: > > if (n_var == n_elts && n_elts <= 16) > > { > > ... > > } > > > > to pass if maxv >= 80% of n_elts, with 80% being an > > arbitrary "high enough" threshold. The intent is to dup > > the most repeating variable if it it's repetition > > is "high enough" and insert constants which should be "better" than > > loading constant first and inserting variables like in the above case. > > > > Alternatively, I suppose we can remove threshold and for constants, > > generate both sequences and check which one is more > > efficient ? > > > > code-gen with patch: > > f_s16: > > dup v0.8h, w0 > > movi v1.4h, 0x1 > > ins v0.h[7], v1.h[0] > > ret > > > > The patch is lightly tested to verify that vec[t]-init-*.c tests pass > > with bootstrap+test > > in progress. > > Does this look OK ? > Hi Richard, > ping https://gcc.gnu.org/pipermail/gcc-patches/2023-February/611243.html Hi Richard, ping * 2: https://gcc.gnu.org/pipermail/gcc-patches/2023-February/611243.html
Thanks, Prathamesh > > Thanks, > Prathamesh > > > > Thanks, > > Prathamesh