Thanks for the context. Robin Dapp <rdapp....@gmail.com> writes: >> Sorry for the slow review. TBH I was hoping someone else would pick >> it up, since (a) I'm not very familiar with this code, and (b) I don't >> really agree with the way that the current code works. I'm not sure the >> current dependency checking is safe, so I'm nervous about adding even >> more cases to it. And it feels like the different ifcvt techniques are >> not now well distinguished, so that they're beginning to overlap and >> compete with one another. None of that is your fault, of course. :) > > I might be to blame, at least partially :) The idea back then was to > do it here because (1) it can handle cases the other part cannot and > (2) its costing is based on sequence cost rather than just counting > instructions.
Ah, OK. (2) seems like a good reason. >> This condition looks odd. A SET_SRC is never null. But more importantly, >> continuing means "assume the best", and I don't think we should assume >> the best for (say) an insn with two parallel sets. > > IIRC I didn't consider two parallel sets at all :/ > >> But I'm not sure which cases this code is trying to catch. Is it trying >> to catch cases where seq2 "spontaneously" uses registers that happen to >> overlap with cond? If so, then when does that happen? And if it does >> happen, wouldn't the sequence also have to set the registers first? > > In order for sequence costing to be superior to just counting "conditional" > instructions we need to make sure that as few redundant instructions as > possible are present in the costed sequences. (redundant as in "will be > removed in a subsequent pass"). > > So, originally, the backend would always be passed a comparison > (set cc (cmp a b)). On e.g. s390 this would result in at least two > redundant instructions per conditional move so costing was always wrong. > When passing the backend the comparison "result" e.g. (cmp cc 0) > there is no redundant instruction. > > Now, passing the comparison is useful as well when we want to turn > a conditional move into a min/max in the backend. This, however, > overwrites the initial condition result and any subsequent iterations > must not pass the result but the comparison itself to the backend. Yeah, makes sense. Using your example, there seem to be two things that we're checking: (1) Does the sequence change cc? This is checked via: if (cc_cmp) { /* Check if SEQ can clobber registers mentioned in cc_cmp and/or rev_cc_cmp. If yes, we need to use only seq1 from that point on. */ rtx cc_cmp_pair[2] = { cc_cmp, rev_cc_cmp }; for (walk = seq; walk; walk = NEXT_INSN (walk)) { note_stores (walk, check_for_cc_cmp_clobbers, cc_cmp_pair); if (cc_cmp_pair[0] == NULL_RTX) { cc_cmp = NULL_RTX; rev_cc_cmp = NULL_RTX; break; } } } and is the case that Manolis's patch is dealing with. (2) Does the sequence use a and b? If so, we need to use temporary destinations for any earlier writes to a and b. Is that right? (1) looks OK, although Manolis's modified_in_p would work too. (2) is the code I quoted yesterday and is the part I'm not sure about. First of all: seq1 = try_emit_cmove_seq (if_info, temp, cond, new_val, old_val, need_cmov, &cost1, &temp_dest1); must have a consistent view of what a and b are. So old_val and new_val cannot at this point reference "newer" values of a and b (as set by previous instructions in the sequence). AIUI: if (int *ii = rewired_src->get (insn)) new_val = simplify_replace_rtx (new_val, (*targets)[*ii], (*temporaries)[*ii]); is the code that ensures this. If a previous write to a and b has been replaced by a temporary, this code substitutes that temporary into new_val. The same cond, new_val and old_val are used in: seq2 = try_emit_cmove_seq (if_info, temp, cond, new_val, old_val, need_cmov, &cost2, &temp_dest2, cc_cmp, rev_cc_cmp); So won't any use of a and b in seq2 to be from cond, rather than old_val and new_val? If so, shouldn't we set read_comparison for any use of a and b, rather than skipping IF_THEN_ELSE? > In my first approach I hadn't considered several special cases which, > of course, complicated things further down the line. > > All that said - maybe trying hard to get rid of later-redundant insns > is not a good approach after all? A lot of complexity would go away > if we counted emitted conditional instructions just like the other > ifcvt part. Maybe a middle ground would be to cost the initial > sequence as well as if + jmp and compare this against insn_cost of > only the created conditional insns? Using seq_cost seems like the best way of costing things. And I agree that it's worth trying to avoid costing (and generating) redundant instructions. > In that case we might need to tighten/audit our preconditions in order > not to accidentally allow cases that result in strictly worse code. > But maybe that's an easier problem than what we are currently solving? Not sure :) Thanks, Richard