On Sun, 19 Nov 2023, Jeff Law wrote:
> As I suspect you know a big part of the problem here is that BRANCH_COST and
> rtx_cost don't have any common scale and thus trying to compare BRANCH_COST to
> RTX_COST doesn't have well defined meaning.
We do have preexisting places using COSTS_N_INSNS (BRANCH_COST ())
though, as documented in ifcvt.cc:
??? Actually, instead of the branch instruction costs we might want
to use COSTS_N_INSNS (BRANCH_COST ()) as in other places. */
so it seems the right direction, and given that we expose this measure to
the user (and at the very least GCC developers implementing new tuning
microarchitectures) I think it's the only sane way to do branch costing:
define the measure in terms of how many ordinary ALU instructions a branch
is statistically equivalent to.
We may have to consider whether we want/need higher resolution here,
effectively branch costs including a fractional part.
> That hasn't kept us from trying to do precisely that and the result has always
> been less than satisfactory. You're introducing more, but I don't think
> there's a reasonable way out of this mess at this point.
Ack.
> > FWIW I don't understand why the test cases absolutely HAD to have such
> > overlong names guaranteed to exceed our 80 column limit in any context.
> > It's such a pain to handle.
> I dislike the long names as well. I nearly changed them myself as part of the
> eswin submission, but that seemed a bit gratituous to me so I left them as-is.
>
> If you wanted to rename them, be my guest, consider it pre-approved ;-)
Ack.
> WRT the extraneous zero-extension. Isn't that arguably a bug in the scc
> expander for risc-v? Fixing that isn't a prerequisite here, but it probably
> worth a bit of someone's time.
I've looked at it already and it's the middle end that ends up with the
zero-extension, specifically `convert_move' invoked from `emit_cstore'
down the call to `noce_try_store_flag_mask', to widen the output from
`cstoredi4', so I don't think we can do anything in the backend to prevent
it from happening. And neither I think we can do anything useful about
`cstoredi4' having a SImode output, as it's a pattern matched by name
rather than RTX, so we can't provide variants having a SImode and a DImode
output each both at a time, as that would cause a name clash.
What we could do is adding new named patterns with the output mode spelt
out explicitly, such as `cstoresidi4', `cstoredidi4', etc., but that would
be quite a big rewrite. Maybe worth doing, maybe not, I don't know. For
the time being the hack I have made does the trick.
Maciej