Alias analysis by DSE based on CSELIB expansion assumes that references to the stack frame from different base registers (ie FP, SP) never alias.
The comment block in cselib explains that cselib does not allow substitution of FP, SP or SFP specifically in order not to break DSE. However, the logic implemented in CSELIB uses two mutually recursive functions to perform cse expansion, cselib_expand_value_rtx_1() and expand_loc(). The first explicitly checks and rejects an attempt to expand FP, SFP, SP, otherwise it calls expand_loc() to choose and expansion for a value from the list of available expansions. expand_loc() does not implement the equivalent logic for SFP, FP and SP, rather it always prefers the first available value expansion, falling back to a simple regno expansion. Therefore given a value that can be expanded to either FP or another value, expand_loc() will always reject FP in favour of the available value expansion. This patch ensures that expand_loc() prefers to substitute for a frame register in order to preserve the behaviour expected by DSE. This issue was found while developing the ARM aarch64 backend wheregcc.c-torture/execute/vector-2.c is mis-compiled with -fno-omit-frame-pointer. The issue has not been observed on another target. The patch has been regressed on x86 (and aarch64).
Patch attached, proposed ChangeLog: 2012-01-04 Marcus Shawcroft <marcus.shawcr...@arm.com> * cselib.c (expand_loc): Prefer a frame register substitution. /Marcus
diff --git a/gcc/cselib.c b/gcc/cselib.c index fc86ef1..7159ca4 100644 --- a/gcc/cselib.c +++ b/gcc/cselib.c @@ -1331,11 +1331,32 @@ cselib_lookup_mem (rtx x, int create) return mem_elt; } -/* Search thru the possible substitutions in P. We prefer a non reg - substitution because this allows us to expand the tree further. If - we find, just a reg, take the lowest regno. There may be several - non-reg results, we just take the first one because they will all - expand to the same place. */ +/* Search through the possible substitutions in P. + + We prefer in order: + 1) STACK_POINTER_REGNUM, FRAME_POINTER or the HARD_FRAME_POINTER. + 2) A non reg substitution because this allows us to expand the tree further. + 3) The lowest regno. + + We are not willing to substitute the STACK_POINTER_REGNUM, + FRAME_POINTER or the HARD_FRAME_POINTER. This is requirement of + DSE. + + These expansions confuse the code that notices that stores + into the frame go dead at the end of the function and that the + frame is not affected by calls to subroutines. + + If STACK_POINTER_REGNUM substitution is allowed, DSE will think + that parameter pushing also goes dead which is wrong. + + If FRAME_POINTER or HARD_FRAME_POINTER substitution is allowed then + you lose the opportunity to make the frame assumptions. + + If other potential uses need to substitute these frame registers we + should add a parameter to control this behavior. + + There may be several non register results, we just take the first + one because they will all expand to the same place. */ static rtx expand_loc (struct elt_loc_list *p, struct expand_value_data *evd, @@ -1345,7 +1366,24 @@ expand_loc (struct elt_loc_list *p, struct expand_value_data *evd, unsigned int regno = UINT_MAX; struct elt_loc_list *p_in = p; - for (; p; p = p -> next) + /* Prefer a frame related register over any other option. */ + for (p = p_in; p; p = p -> next) + { + if ((REG_P (p->loc)) + && (REGNO (p->loc) < regno) + && !bitmap_bit_p (evd->regs_active, REGNO (p->loc))) + { + unsigned int candidate_regno = REGNO (p->loc); + if (candidate_regno == STACK_POINTER_REGNUM + || candidate_regno == FRAME_POINTER_REGNUM + || candidate_regno == HARD_FRAME_POINTER_REGNUM) + { + return p->loc; + } + } + } + + for (p = p_in; p; p = p -> next) { /* Avoid infinite recursion trying to expand a reg into a the same reg. */