On Wed, 24 May 2023 at 15:40, Richard Sandiford <richard.sandif...@arm.com> wrote: > > Prathamesh Kulkarni <prathamesh.kulka...@linaro.org> writes: > > On Mon, 22 May 2023 at 14:18, Richard Sandiford > > <richard.sandif...@arm.com> wrote: > >> > >> Prathamesh Kulkarni <prathamesh.kulka...@linaro.org> writes: > >> > Hi Richard, > >> > Thanks for the suggestions. Does the attached patch look OK ? > >> > Boostrap+test in progress on aarch64-linux-gnu. > >> > >> Like I say, please wait for the tests to complete before sending an RFA. > >> It saves a review cycle if the tests don't in fact pass. > > Right, sorry, will post patches after completion of testing henceforth. > >> > >> > diff --git a/gcc/config/aarch64/aarch64.cc > >> > b/gcc/config/aarch64/aarch64.cc > >> > index 29dbacfa917..e611a7cca25 100644 > >> > --- a/gcc/config/aarch64/aarch64.cc > >> > +++ b/gcc/config/aarch64/aarch64.cc > >> > @@ -22332,6 +22332,43 @@ aarch64_unzip_vector_init (machine_mode mode, > >> > rtx vals, bool even_p) > >> > return gen_rtx_PARALLEL (new_mode, vec); > >> > } > >> > > >> > +/* Return true if INSN is a scalar move. */ > >> > + > >> > +static bool > >> > +scalar_move_insn_p (const rtx_insn *insn) > >> > +{ > >> > + rtx set = single_set (insn); > >> > + if (!set) > >> > + return false; > >> > + rtx src = SET_SRC (set); > >> > + rtx dest = SET_DEST (set); > >> > + return is_a<scalar_mode>(GET_MODE (dest)) > >> > + && aarch64_mov_operand_p (src, GET_MODE (src)); > >> > >> Formatting: > >> > >> return (is_a<scalar_mode>(GET_MODE (dest)) > >> && aarch64_mov_operand_p (src, GET_MODE (src))); > >> > >> OK with that change if the tests pass, thanks. > > Unfortunately, the patch regressed vec-init-21.c: > > > > int8x16_t f_s8(int8_t x, int8_t y) > > { > > return (int8x16_t) { x, y, 1, 2, 3, 4, 5, 6, > > 7, 8, 9, 10, 11, 12, 13, 14 }; > > } > > > > -O3 code-gen trunk: > > f_s8: > > adrp x2, .LC0 > > ldr q0, [x2, #:lo12:.LC0] > > ins v0.b[0], w0 > > ins v0.b[1], w1 > > ret > > > > -O3 code-gen patch: > > f_s8: > > adrp x2, .LC0 > > ldr d31, [x2, #:lo12:.LC0] > > adrp x2, .LC1 > > ldr d0, [x2, #:lo12:.LC1] > > ins v31.b[0], w0 > > ins v0.b[0], w1 > > zip1 v0.16b, v31.16b, v0.16b > > ret > > > > With trunk, it chooses the fallback sequence because both fallback > > and zip1 sequence had cost = 20, however with patch applied, > > we end up with zip1 sequence cost = 24 and fallback sequence > > cost = 28. > > > > This happens because of using insn_cost instead of > > set_rtx_cost for the following expression: > > (set (reg:QI 100) > > (subreg/s/u:QI (reg/v:SI 94 [ y ]) 0)) > > set_rtx_cost returns 0 for above expression but insn_cost returns 4. > > Yeah, was wondering why you'd dropped the set_rtx_cost thing, > but decided not to question it since using insn_cost seemed > reasonable if it worked. [reposting because my reply got blocked for moderator approval]
The attached patch uses set_rtx_cost for single_set and insn_cost otherwise for non debug insns similar to seq_cost. > > > This expression template appears twice in fallback sequence, which raises > > the cost to 28 from 20, while it appears once in each half of zip1 sequence, > > which raises the cost to 24 from 20, and so it now prefers zip1 sequence > > instead. > > > > I assumed this expression would be ignored because it looks like a scalar > > move, > > but that doesn't seem to be the case ? > > aarch64_classify_symbolic_expression returns > > SYMBOL_FORCE_TO_MEM for (subreg/s/u:QI (reg/v:SI 94 [ y ]) 0) > > and thus aarch64_mov_operand_p returns false. > > Ah, I guess it should be aarch64_mov_operand instead. Confusing that > they're so different... Thanks, using aarch64_mov_operand worked. > > > Another issue with the zip1 sequence above is using same register x2 > > for loading another half of constant in: > > adrp x2, .LC1 > > > > I guess this will create an output dependency from adrp x2, .LC0 -> > > adrp x2, .LC1 > > and anti-dependency from ldr d31, [x2, #:lo12:.LC0] -> adrp x2, .LC1 > > essentially forcing almost the entire sequence (except ins > > instructions) to execute sequentially ? > > I'd expect modern cores to handle that via renaming. Ah right, thanks for the clarification. For some reason, it seems git diff is not formatting the patch correctly :/ Or perhaps I am doing something wrongly. For eg, it shows: + return is_a<scalar_mode>(GET_MODE (dest)) + && aarch64_mov_operand (src, GET_MODE (src)); but after applying the patch, it's formatted correctly with "&&aarch64..." right below is_a<scalar_mode>, both on column 10. Similarly, for following hunk in seq_cost_ignoring_scalar_moves: + if (NONDEBUG_INSN_P (seq) + && !scalar_move_insn_p (seq)) After applying patch, "&&" is below N, and not '('. Both N and "&&" are on col 9. And for the following just below: + { + if (rtx set = single_set (seq)) diff shows only one space difference between '{' and the following if, but after applying the patch it's formatted correctly, with two spaces after the curly brace. This is the entire file aarch64.cc with patch applied: https://people.linaro.org/~prathamesh.kulkarni/aarch64.cc Does the formatting look OK for scalar_move_insn_p and seq_cost_ignoring_scalar_moves in above aarch64.cc ? Patch passes bootstrap+test on aarch64-linux-gnu with no regressions. OK to commit ? Thanks, Prathamesh > > Thanks, > Richard
[aarch64] Ignore cost of scalar moves for seq in vector initialization. gcc/ChangeLog: * config/aarch64/aarch64.cc (scalar_move_insn_p): New function. (seq_cost_ignoring_scalar_moves): Likewise. (aarch64_expand_vector_init): Call seq_cost_ignoring_scalar_moves. diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index d6fc94015fa..598f2f86417 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -22332,6 +22332,47 @@ aarch64_unzip_vector_init (machine_mode mode, rtx vals, bool even_p) return gen_rtx_PARALLEL (new_mode, vec); } +/* Return true if INSN is a scalar move. */ + +static bool +scalar_move_insn_p (const rtx_insn *insn) +{ + rtx set = single_set (insn); + if (!set) + return false; + rtx src = SET_SRC (set); + rtx dest = SET_DEST (set); + return is_a<scalar_mode>(GET_MODE (dest)) + && aarch64_mov_operand (src, GET_MODE (src)); +} + +/* Similar to seq_cost, but ignore cost for scalar moves. This function + is called from aarch64_expand_vector_init. */ + +static unsigned +seq_cost_ignoring_scalar_moves (const rtx_insn *seq, bool speed) +{ + unsigned cost = 0; + + for (; seq; seq = NEXT_INSN (seq)) + if (NONDEBUG_INSN_P (seq) + && !scalar_move_insn_p (seq)) + { + if (rtx set = single_set (seq)) + cost += set_rtx_cost (set, speed); + else + { + int this_cost = insn_cost (CONST_CAST_RTX_INSN (seq), speed); + if (this_cost > 0) + cost += this_cost; + else + cost++; + } + } + + return cost; +} + /* Expand a vector initialization sequence, such that TARGET is initialized to contain VALS. */ @@ -22367,7 +22408,7 @@ aarch64_expand_vector_init (rtx target, rtx vals) halves[i] = gen_rtx_SUBREG (mode, tmp_reg, 0); rtx_insn *rec_seq = get_insns (); end_sequence (); - costs[i] = seq_cost (rec_seq, !optimize_size); + costs[i] = seq_cost_ignoring_scalar_moves (rec_seq, !optimize_size); emit_insn (rec_seq); } @@ -22384,7 +22425,8 @@ aarch64_expand_vector_init (rtx target, rtx vals) start_sequence (); aarch64_expand_vector_init_fallback (target, vals); rtx_insn *fallback_seq = get_insns (); - unsigned fallback_seq_cost = seq_cost (fallback_seq, !optimize_size); + unsigned fallback_seq_cost + = seq_cost_ignoring_scalar_moves (fallback_seq, !optimize_size); end_sequence (); emit_insn (seq_total_cost < fallback_seq_cost ? seq : fallback_seq);