Matthew Fortune <matthew.fort...@imgtec.com> writes: > Richard Sandiford <rdsandif...@googlemail.com> writes: >> Jeff Law <l...@redhat.com> writes: >> > On 01/09/15 04:32, Robert Suchanek wrote: >> >> Hi Steven/Vladimir, >> >> >> >>> It's hard to say what the correct fix should be, but it sounds like >> >>> the address you get after the substitutions should be simplified >> >>> (folded). >> >> >> >> Coming back to the original testcase and re-analyzing the problem, it >> >> appears that there is, indeed, a missing case for simplification of >> LO_SUM/HIGH pair. >> >> The patch attached resolves the issue. >> >> >> >> Although the testcase is not reproducible on the trunk, I think it is >> >> still worth to include it in case the ICE reoccurred. >> >> >> >> The patch has been bootstrapped and regtested on >> >> x86_64-unknown-linux-gnu target and similarly tested against SVN >> >> revision r212763 where it can be reproduced. >> >> >> >> Regards, >> >> Robert >> >> >> >> 2015-01-08 Robert Suchanek <robert.sucha...@imgtec.com> >> >> >> >> gcc/ >> >> * simplify-rtx.c (simplify_replace_fn_rtx): Simplify (lo_sum (high >> x) >> >> (const (plus x offset))) to (const (plus x offset)). >> > You have to be careful here. Whether or not this transformation is >> > valid depends on the size of the offset and whether or not the port >> > has an overlap between its sethi and losum insns and whether or not >> > any rounding occurs when applying the relocations for sethi/losum as >> > well as potentially other factors. >> > >> > We don't currently have a way for ports to indicate what offsets would >> > make this kind of simplification valid. In fact, there's at least >> one >> > port (PA) where the validity of this kind of simplification can't be >> > determined until after LRA/reload when you know the full context of >> > how the result is going to be used. >> >> I agree this is the kind of thing we'd need to consider if we were >> deciding whether it's valid to connect a (lo_sum (high x+N) x+N) to an >> existing (high x). But this code is handling cases where the connection >> has already been made and we're trying to simplify the result. Would it >> be valid RTL to use: >> >> (lo_sum (high x) (const (plus x offset))) >> >> to mean anything other than x+offset? >> >> Hmm, admittedly the documentation doesn't guarantee that... > > I guess so. I took the phrasing below for (high:m exp) to mean that high > only made sense when used with lo_sum. > > "It is used with lo_sum to represent the typical two-instruction sequence > used in RISC machines to reference a global memory location" > > I don't know if that was the intended meaning though. There is nothing > to state that lo_sum has to be used with high though so lo_sum could > be applied to a different base quite legitimately it seems.
Yeah, it's also used for small-data accesses, where the base is the global pointer. You can also add a variable offset in between the high and lo_sum, which is the main reason we can't just take the second operand of the lo_sum as golden. Seeing the high is important. >> If we do go for the change, I think we should generalise it to handle >> (lo_sum (high x+N) x+N') and (lo_sum (high x-N) x) too. Things like >> get_related_value or split_const could help there. > > Seems reasonable to cover these cases but I think the code already does > that (this is directly above the new case Robert added): > > /* (lo_sum (high x) x) -> x */ > if (GET_CODE (op0) == HIGH && rtx_equal_p (XEXP (op0, 0), op1)) > return op1; > > So, (lo_sum (high x) x) is already simplified to x but the special case > where a constant is only added to the lo_sum operand is missing. No, the cases I was describing were where the high is a (const (plus x -N)) and the lo_sum is just x, or where the high is (const (plus x N)) and the lo_sum is (const (plus x N')). I.e. rather than have the two special cases of (lo_sum (high x) x) and (lo_sum (high x) (const (plus x N))), we should handle all cases of (lo_sum (high x) y) in which y and x have the same base. E.g. if we have an int array called "x" that is aligned to 64 bits, we can access x[3] from the same high as x[2]. Neither the existing code nor the patch would be able to simplify that back to x[3]. Thanks, Richard