Thanks for the context.
Robin Dapp <[email protected]> writes:
>> Sorry for the slow review. TBH I was hoping someone else would pick
>> it up, since (a) I'm not very familiar with this code, and (b) I don't
>> really agree with the way that the current code works. I'm not sure the
>> current dependency checking is safe, so I'm nervous about adding even
>> more cases to it. And it feels like the different ifcvt techniques are
>> not now well distinguished, so that they're beginning to overlap and
>> compete with one another. None of that is your fault, of course. :)
>
> I might be to blame, at least partially :) The idea back then was to
> do it here because (1) it can handle cases the other part cannot and
> (2) its costing is based on sequence cost rather than just counting
> instructions.
Ah, OK. (2) seems like a good reason.
>> This condition looks odd. A SET_SRC is never null. But more importantly,
>> continuing means "assume the best", and I don't think we should assume
>> the best for (say) an insn with two parallel sets.
>
> IIRC I didn't consider two parallel sets at all :/
>
>> But I'm not sure which cases this code is trying to catch. Is it trying
>> to catch cases where seq2 "spontaneously" uses registers that happen to
>> overlap with cond? If so, then when does that happen? And if it does
>> happen, wouldn't the sequence also have to set the registers first?
>
> In order for sequence costing to be superior to just counting "conditional"
> instructions we need to make sure that as few redundant instructions as
> possible are present in the costed sequences. (redundant as in "will be
> removed in a subsequent pass").
>
> So, originally, the backend would always be passed a comparison
> (set cc (cmp a b)). On e.g. s390 this would result in at least two
> redundant instructions per conditional move so costing was always wrong.
> When passing the backend the comparison "result" e.g. (cmp cc 0)
> there is no redundant instruction.
>
> Now, passing the comparison is useful as well when we want to turn
> a conditional move into a min/max in the backend. This, however,
> overwrites the initial condition result and any subsequent iterations
> must not pass the result but the comparison itself to the backend.
Yeah, makes sense. Using your example, there seem to be two things
that we're checking:
(1) Does the sequence change cc? This is checked via:
if (cc_cmp)
{
/* Check if SEQ can clobber registers mentioned in
cc_cmp and/or rev_cc_cmp. If yes, we need to use
only seq1 from that point on. */
rtx cc_cmp_pair[2] = { cc_cmp, rev_cc_cmp };
for (walk = seq; walk; walk = NEXT_INSN (walk))
{
note_stores (walk, check_for_cc_cmp_clobbers, cc_cmp_pair);
if (cc_cmp_pair[0] == NULL_RTX)
{
cc_cmp = NULL_RTX;
rev_cc_cmp = NULL_RTX;
break;
}
}
}
and is the case that Manolis's patch is dealing with.
(2) Does the sequence use a and b? If so, we need to use temporary
destinations for any earlier writes to a and b.
Is that right?
(1) looks OK, although Manolis's modified_in_p would work too.
(2) is the code I quoted yesterday and is the part I'm not sure
about. First of all:
seq1 = try_emit_cmove_seq (if_info, temp, cond,
new_val, old_val, need_cmov,
&cost1, &temp_dest1);
must have a consistent view of what a and b are. So old_val and new_val
cannot at this point reference "newer" values of a and b (as set by previous
instructions in the sequence). AIUI:
if (int *ii = rewired_src->get (insn))
new_val = simplify_replace_rtx (new_val, (*targets)[*ii],
(*temporaries)[*ii]);
is the code that ensures this. If a previous write to a and b has been
replaced by a temporary, this code substitutes that temporary into new_val.
The same cond, new_val and old_val are used in:
seq2 = try_emit_cmove_seq (if_info, temp, cond,
new_val, old_val, need_cmov,
&cost2, &temp_dest2, cc_cmp, rev_cc_cmp);
So won't any use of a and b in seq2 to be from cond, rather than old_val
and new_val? If so, shouldn't we set read_comparison for any use of a
and b, rather than skipping IF_THEN_ELSE?
> In my first approach I hadn't considered several special cases which,
> of course, complicated things further down the line.
>
> All that said - maybe trying hard to get rid of later-redundant insns
> is not a good approach after all? A lot of complexity would go away
> if we counted emitted conditional instructions just like the other
> ifcvt part. Maybe a middle ground would be to cost the initial
> sequence as well as if + jmp and compare this against insn_cost of
> only the created conditional insns?
Using seq_cost seems like the best way of costing things. And I agree
that it's worth trying to avoid costing (and generating) redundant
instructions.
> In that case we might need to tighten/audit our preconditions in order
> not to accidentally allow cases that result in strictly worse code.
> But maybe that's an easier problem than what we are currently solving?
Not sure :)
Thanks,
Richard