On Tue, 28 Nov 2023, Jeff Law wrote:
> FWIW, I was looking at a regression with our internal tests after your
> changes. It was quite nice to see how well twiddling -mbranch-cost
> correlated to how many instructions we would allow in a conditional move
> sequence.
I'm a bit concerned though that our interpretation of `-mbranch-cost=0'
is different from the middle end's, such as in `emit_store_flag':
/* If we reached here, we can't do this with a scc insn, however there
are some comparisons that can be done in other ways. Don't do any
of these cases if branches are very cheap. */
if (BRANCH_COST (optimize_insn_for_speed_p (), false) == 0)
return 0;
> The downside is it highlighted the gimple vs RTL use issue. I'm confident
> that we would like to see a higher branch cost in the RTL phases for our
> uarch, but I'm much less comfortable with how that's going to change the
> decisions made in trees/gimple. We'll have to investigate that at some depth.
Ack.
> > I've looked at it already and it's the middle end that ends up with the
> > zero-extension, specifically `convert_move' invoked from `emit_cstore'
> > down the call to `noce_try_store_flag_mask', to widen the output from
> > `cstoredi4', so I don't think we can do anything in the backend to prevent
> > it from happening. And neither I think we can do anything useful about
> > `cstoredi4' having a SImode output, as it's a pattern matched by name
> > rather than RTX, so we can't provide variants having a SImode and a DImode
> > output each both at a time, as that would cause a name clash.
> We're actually tracking some of these extraneous extensions. Do you happen to
> know if the zero-extended object happens to be (subreg:SI (reg:DI)) kind of
> construct? That's the kind of thing we're chasing down right now from various
> points. Vineet has already fixed one class of them. Jivan and I are looking
> at others.
Under GDB it's a plain move from (reg:SI 140) to (reg:DI 139), as in the
FROM and TO arguments to `convert_move' respectively. This makes it call
`convert_mode_scalar', which then chooses between `zext_optab' and
`sext_optab' as appropriate, under:
/* If the target has a converter from FROM_MODE to TO_MODE, use it. */
to produce:
(set (reg:DI 139)
(zero_extend:DI (reg:SI 140)))
ending up with this complete sequence:
(insn 27 0 28 (set (reg:SI 140)
(eq:SI (reg/v:DI 137 [ c ])
(const_int 0 [0]))) -1
(nil))
(insn 28 27 29 (set (reg:DI 139)
(zero_extend:DI (reg:SI 140))) -1
(nil))
(insn 29 28 30 (set (reg:DI 141)
(neg:DI (reg:DI 139))) -1
(nil))
(insn 30 29 0 (set (reg/v:DI 134 [ <retval> ])
(and:DI (reg/v:DI 135 [ a ])
(reg:DI 141))) -1
(nil))
passed to `targetm.noce_conversion_profitable_p' right away. Maybe you
can teach `emit_cstore' or `convert_move' to use a subreg when it is known
for the particular target that the value produced by the conditional-set
machine instruction emitted by `cstoreMODE4' is valid unchanged in both
modes.
You can fiddle with it by trying:
$ gcc -march=rv64gc -mbranch-cost=3 -O2 -S
gcc/testsuite/gcc.target/riscv/pr105314.c
Set a breakpoint at `noce_try_store_flag_mask' and then single-step to see
how things proceed.
Maciej