ira.c update_equiv_regs patch causes gcc/testsuite/gcc.target/arm/pr43920-2.c regression
Hi, I think the rtl dump in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64916 is not jump2 phase rtl dump. Because jump2 is after ira, the register number should be hardware register number. the jump2 rtl dump should as follow ... 31: NOTE_INSN_BASIC_BLOCK 5 32: [r6:SI]=r4:SI REG_DEAD r6:SI REG_DEAD r4:SI 33: [r5:SI]=r0:SI REG_DEAD r5:SI REG_DEAD r0:SI 7: r0:SI=0 REG_EQUAL 0 85: use r0:SI 86: {return;sp:SI=sp:SI+0x18;r3:SI=[sp:SI];r4:SI=[sp:SI+0x4];r5:SI=[sp:SI+0x8];r6:SI=[sp:SI+0xc];r7:SI=[sp:SI+0x10];pc:SI=[sp:SI+0x14];} REG_UNUSED pc:SI REG_UNUSED r3:SI REG_CFA_RESTORE r7:SI REG_CFA_RESTORE r6:SI REG_CFA_RESTORE r5:SI REG_CFA_RESTORE r4:SI REG_CFA_RESTORE r3:SI 77: barrier 46: L46: 45: NOTE_INSN_BASIC_BLOCK 6 8: r0:SI=r4:SI REG_DEAD r4:SI REG_EQUAL 0x 87: use r0:SI 88: {return;sp:SI=sp:SI+0x18;r3:SI=[sp:SI];r4:SI=[sp:SI+0x4];r5:SI=[sp:SI+0x8];r6:SI=[sp:SI+0xc];r7:SI=[sp:SI+0x10];pc:SI=[sp:SI+0x14];} REG_UNUSED pc:SI REG_UNUSED r3:SI REG_CFA_RESTORE r7:SI REG_CFA_RESTORE r6:SI REG_CFA_RESTORE r5:SI REG_CFA_RESTORE r4:SI REG_CFA_RESTORE r3:SI 79: barrier 54: L54: 53: NOTE_INSN_BASIC_BLOCK 7 9: r0:SI=0x <== lost REG_EQUAL after patch 34: L34: 35: NOTE_INSN_BASIC_BLOCK 8 41: use r0:SI 90: {return;sp:SI=sp:SI+0x18;r3:SI=[sp:SI];r4:SI=[sp:SI+0x4];r5:SI=[sp:SI+0x8];r6:SI=[sp:SI+0xc];r7:SI=[sp:SI+0x10];pc:SI=[sp:SI+0x14];} REG_UNUSED pc:SI REG_UNUSED r3:SI REG_CFA_RESTORE r7:SI REG_CFA_RESTORE r6:SI REG_CFA_RESTORE r5:SI REG_CFA_RESTORE r4:SI REG_CFA_RESTORE r3:SI 89: barrier try_forward_edges(remove basic block 6) fail by try_crossjump_bb didn't occur. try_crossjump_bb (remove insns in basic block 6) fail by comparison between insn 9 and insn 8 not eqaul. The comparison in function can_replace_by use REG_EQUAL to compare register content. The REG_EQAUL lost in insn 9, so the comparison fail. But we may still have chance to remove basic block 6 in this case. Because right hand value of insn 9 is already a constant. We could get comparison equal by comparing between insn 8's register note and insn 9's RHS if RHS is a constant integer. Possible patch for can_replace_by in cfgcleanup.c. - if (!note1 || !note2 || !rtx_equal_p (XEXP (note1, 0), XEXP (note2, 0)) - || !CONST_INT_P (XEXP (note1, 0))) + + if (!note1 || !CONST_INT_P (XEXP (note1, 0))) return dir_none; + if (note2) +{ + if (!rtx_equal_p (XEXP (note1, 0), XEXP (note2, 0))) + return dir_none; +} + else +{ + if (!CONST_INT_P (SET_SRC (s2)) + || !rtx_equal_p (XEXP (note1, 0), SET_SRC (s2))) + return dir_none; +} + I'm not sure the idea is ok or it might crash something. Any suggestion would be very helpful. Thanks in advance. Shiva Chen
Re: ira.c update_equiv_regs patch causes gcc/testsuite/gcc.target/arm/pr43920-2.c regression
Hi, Jeff Thanks for your advice. can_replace_by.patch is the new patch to handle both cases. pr43920-2.c.244r.jump2.ori is the original jump2 rtl dump pr43920-2.c.244r.jump2.patch_can_replace_by is the jump2 rtl dump after patch can_replace_by.patch Could you help me to review the patch? Thanks again. Shiva 2015-04-18 0:03 GMT+08:00 Jeff Law : > On 04/17/2015 03:57 AM, Shiva Chen wrote: >> >> Hi, >> >> I think the rtl dump in >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64916 >> is not jump2 phase rtl dump. >> >> Because jump2 is after ira, the register number should be hardware >> register number. >> >> the jump2 rtl dump should as follow >> >> ... >> 31: NOTE_INSN_BASIC_BLOCK 5 >> 32: [r6:SI]=r4:SI >>REG_DEAD r6:SI >>REG_DEAD r4:SI >> 33: [r5:SI]=r0:SI >>REG_DEAD r5:SI >>REG_DEAD r0:SI >> 7: r0:SI=0 >>REG_EQUAL 0 >> 85: use r0:SI >> 86: >> {return;sp:SI=sp:SI+0x18;r3:SI=[sp:SI];r4:SI=[sp:SI+0x4];r5:SI=[sp:SI+0x8];r6:SI=[sp:SI+0xc];r7:SI=[sp:SI+0x10];pc:SI=[sp:SI+0x14];} >>REG_UNUSED pc:SI >>REG_UNUSED r3:SI >>REG_CFA_RESTORE r7:SI >>REG_CFA_RESTORE r6:SI >>REG_CFA_RESTORE r5:SI >>REG_CFA_RESTORE r4:SI >>REG_CFA_RESTORE r3:SI >> 77: barrier >> 46: L46: >> 45: NOTE_INSN_BASIC_BLOCK 6 >> 8: r0:SI=r4:SI >>REG_DEAD r4:SI >>REG_EQUAL 0x >> 87: use r0:SI >> 88: >> {return;sp:SI=sp:SI+0x18;r3:SI=[sp:SI];r4:SI=[sp:SI+0x4];r5:SI=[sp:SI+0x8];r6:SI=[sp:SI+0xc];r7:SI=[sp:SI+0x10];pc:SI=[sp:SI+0x14];} >>REG_UNUSED pc:SI >>REG_UNUSED r3:SI >>REG_CFA_RESTORE r7:SI >>REG_CFA_RESTORE r6:SI >>REG_CFA_RESTORE r5:SI >>REG_CFA_RESTORE r4:SI >>REG_CFA_RESTORE r3:SI >> 79: barrier >> 54: L54: >> 53: NOTE_INSN_BASIC_BLOCK 7 >> 9: r0:SI=0x <== lost REG_EQUAL after patch >> 34: L34: >> 35: NOTE_INSN_BASIC_BLOCK 8 >> 41: use r0:SI >> 90: >> {return;sp:SI=sp:SI+0x18;r3:SI=[sp:SI];r4:SI=[sp:SI+0x4];r5:SI=[sp:SI+0x8];r6:SI=[sp:SI+0xc];r7:SI=[sp:SI+0x10];pc:SI=[sp:SI+0x14];} >>REG_UNUSED pc:SI >>REG_UNUSED r3:SI >>REG_CFA_RESTORE r7:SI >>REG_CFA_RESTORE r6:SI >>REG_CFA_RESTORE r5:SI >>REG_CFA_RESTORE r4:SI >>REG_CFA_RESTORE r3:SI >> 89: barrier > > Intead of the slim dump, can you please include the full RTL dump. I find > those much easier to read. > > > >> >> Possible patch for can_replace_by in cfgcleanup.c. >> >> - if (!note1 || !note2 || !rtx_equal_p (XEXP (note1, 0), XEXP (note2, 0)) >> - || !CONST_INT_P (XEXP (note1, 0))) >> + >> + if (!note1 || !CONST_INT_P (XEXP (note1, 0))) >> return dir_none; >> >> + if (note2) >> +{ >> + if (!rtx_equal_p (XEXP (note1, 0), XEXP (note2, 0))) >> + return dir_none; >> +} >> + else >> +{ >> + if (!CONST_INT_P (SET_SRC (s2)) >> + || !rtx_equal_p (XEXP (note1, 0), SET_SRC (s2))) >> + return dir_none; >> +} >> + >> >> I'm not sure the idea is ok or it might crash something. >> Any suggestion would be very helpful. > > Seems like you're on a reasonable path to me. I suggest you stick with it. > > Basically what it appears you're trying to do is unify insns from different > blocks where one looks like > > (set x y) with an attached REG_EQUAL note > > And the other looks like > > (set x const_int) > > Where the REG_EQUAL note has the same value as the const_int in the second > set. > > I think you'd want to handle both cases i1 has the note i2, no note and i1 > has no note and i2 has a note. > > Jeff > > jeff can_replace_by.patch Description: Binary data pr43920-2.c.244r.jump2.ori Description: Binary data pr43920-2.c.244r.jump2.patch_can_replace_by Description: Binary data Changelog.can_replace_by Description: Binary data
LRA assign same hard register with live range overlapped pseduos
HI, I'm trying to port a new 32bit target to GCC 4.8.0 with LRA enabled There is an error case which generates following RTL (insn 536 267 643 3 (set (reg/f:SI 0 $r0 [477]) <== r477 assign to r0 (plus:SI (reg/f:SI 31 $sp) (const_int 112 [0x70]))) test2.c:95 64 {*addsi3} (nil)) (insn 643 536 537 3 (set (reg/f:SI 0 $r0 [565]) <== r565 assign to r0, and corrupt the usage of r477 (reg/f:SI 31 $sp)) test2.c:95 44 {*movsi} (nil)) (insn 537 643 538 3 (set (reg/v:SI 13 $r13 [orig:61 i14 ] [61]) (mem/c:SI (plus:SI (reg/f:SI 0 $r0 [565]) <== use r565 (const_int 136 [0x88])) [5 %sfp+24 S4 A32])) test2.c:95 39 {*load_si} (expr_list:REG_DEAD (reg/f:SI 0 $r0 [565]) (nil))) ... (insn 539 540 270 3 (set (reg:SI 0 $r0 [479]) (plus:SI (reg/f:SI 0 $r0 [477]) (reg:SI 5 $r5 [480]))) test2.c:95 62 {*add_16bit} (expr_list:REG_DEAD (reg:SI 5 $r5 [480]) (expr_list:REG_DEAD (reg/f:SI 0 $r0 [477]) <== use r477 which should be $sp +112 Note that the live ranges of r477 and r565 are overlapped but assigned same register $r0. (r31 is stack pointer) By tracing LRA process, I noticed that when r477 is created, the lra_reg_info[r477].val = lra_reg_info[r31] due to (set r477 r31). But after lra_eliminate(), the stack offset changes and r477 is equal to r31+112 instead. In next lra-iteration round, r565 is created, and r565 = r31. In that case, register content of r477 should treat as not equal to r565 due to eliminate offset have been changed. Otherwise, r565 and r477 may assign to same hard register. To recognize that, I record the eliminate offset when the pseudo register have been created. Register content are the same only when lra_reg_info[].val and lra_reg_info[].offset are equal. gcc/lra-assigns.c |6 -- gcc/lra-int.h |2 ++ gcc/lra.c | 12 +++- 3 files changed, 17 insertions(+), 3 deletions(-) diff --git a/gcc/lra-assigns.c b/gcc/lra-assigns.c index b204513..daf0aa9 100644 --- a/gcc/lra-assigns.c +++ b/gcc/lra-assigns.c @@ -448,7 +448,7 @@ find_hard_regno_for (int regno, int *cost, int try_only_hard_regno) int hr, conflict_hr, nregs; enum machine_mode biggest_mode; unsigned int k, conflict_regno; - int val, biggest_nregs, nregs_diff; + int offset, val, biggest_nregs, nregs_diff; enum reg_class rclass; bitmap_iterator bi; bool *rclass_intersect_p; @@ -508,9 +508,11 @@ find_hard_regno_for (int regno, int *cost, int try_only_hard_regno) #endif sparseset_clear_bit (conflict_reload_and_inheritance_pseudos, regno); val = lra_reg_info[regno].val; + offset = lra_reg_info[regno].offset; CLEAR_HARD_REG_SET (impossible_start_hard_regs); EXECUTE_IF_SET_IN_SPARSESET (live_range_hard_reg_pseudos, conflict_regno) -if (val == lra_reg_info[conflict_regno].val) +if ((val == lra_reg_info[conflict_regno].val) +&& (offset == lra_reg_info[conflict_regno].offset)) { conflict_hr = live_pseudos_reg_renumber[conflict_regno]; nregs = (hard_regno_nregs[conflict_hr] diff --git a/gcc/lra-int.h b/gcc/lra-int.h index 98f2ff7..8ae4eb0 100644 --- a/gcc/lra-int.h +++ b/gcc/lra-int.h @@ -116,6 +116,8 @@ struct lra_reg /* Value holding by register. If the pseudos have the same value they do not conflict. */ int val; + /* Eliminate offset of the pseduo have been created. */ + int offset; /* These members are set up in lra-lives.c and updated in lra-coalesce.c. */ /* The biggest size mode in which each pseudo reg is referred in diff --git a/gcc/lra.c b/gcc/lra.c index 9df24b5..69962be 100644 --- a/gcc/lra.c +++ b/gcc/lra.c @@ -194,7 +194,17 @@ lra_create_new_reg (enum machine_mode md_mode, rtx original, new_reg = lra_create_new_reg_with_unique_value (md_mode, original, rclass, title); if (original != NULL_RTX && REG_P (original)) -lra_reg_info[REGNO (new_reg)].val = lra_reg_info[REGNO (original)].val; +{ + lra_reg_info[REGNO (new_reg)].val = lra_reg_info[REGNO (original)].val; + + rtx x = lra_eliminate_regs (original, VOIDmode, NULL_RTX); + + if (GET_CODE (x) == PLUS + && GET_CODE (XEXP (x, 1)) == CONST_INT) + lra_reg_info[REGNO (new_reg)].offset = INTVAL (XEXP (x, 1)); + else + lra_reg_info[REGNO (new_reg)].offset = 0; +} return new_reg; } -- 1.7.9.5 Comments? Thanks Best regards, Shiva
Re: LRA assign same hard register with live range overlapped pseduos
Full test2.c.209r.reload is about 296kb and i can't send successfully. Is there another way to send the dump file? Shiva 2013/4/18 Shiva Chen : > Hi, Vladimir > > attachment is the ira dump of the case > > Shiva > > 2013/4/17 Vladimir Makarov : >> On 13-04-15 1:20 AM, shiva Chen wrote: >>> >>> HI, >>> >>> I'm trying to port a new 32bit target to GCC 4.8.0 with LRA enabled >>> >>> There is an error case which generates following RTL >>> >>> >>> (insn 536 267 643 3 (set (reg/f:SI 0 $r0 [477]) <== r477 assign to r0 >>> (plus:SI (reg/f:SI 31 $sp) >>> (const_int 112 [0x70]))) test2.c:95 64 {*addsi3} >>>(nil)) >>> (insn 643 536 537 3 (set (reg/f:SI 0 $r0 [565]) <== r565 assign to >>> r0, and corrupt the usage of r477 >>> (reg/f:SI 31 $sp)) test2.c:95 44 {*movsi} >>>(nil)) >>> (insn 537 643 538 3 (set (reg/v:SI 13 $r13 [orig:61 i14 ] [61]) >>> (mem/c:SI (plus:SI (reg/f:SI 0 $r0 [565]) <== use r565 >>> (const_int 136 [0x88])) [5 %sfp+24 S4 A32])) test2.c:95 >>> 39 >>> {*load_si} >>>(expr_list:REG_DEAD (reg/f:SI 0 $r0 [565]) >>> (nil))) >>> ... >>> (insn 539 540 270 3 (set (reg:SI 0 $r0 [479]) >>> (plus:SI (reg/f:SI 0 $r0 [477]) >>> (reg:SI 5 $r5 [480]))) test2.c:95 62 {*add_16bit} >>>(expr_list:REG_DEAD (reg:SI 5 $r5 [480]) >>> (expr_list:REG_DEAD (reg/f:SI 0 $r0 [477]) <== use r477 which >>> should be $sp +112 >>> >>> Note that the live ranges of r477 and r565 are overlapped but assigned >>> same register $r0. (r31 is stack pointer) >>> >>> By tracing LRA process, I noticed that when r477 is created, >>> the lra_reg_info[r477].val = lra_reg_info[r31] due to (set r477 r31). >>> But after lra_eliminate(), the stack offset changes and >>> r477 is equal to r31+112 instead. >>> >>> In next lra-iteration round, r565 is created, and r565 = r31. >>> >>> In that case, register content of r477 should treat as not equal to >>> r565 due to eliminate offset have been changed. >>> >>> Otherwise, r565 and r477 may assign to same hard register. >>> >>> >>> To recognize that, I record the eliminate offset when the pseudo >>> register have been created. >>> >>> Register content are the same only when lra_reg_info[].val and >>> lra_reg_info[].offset are equal. >>> >>> >>> gcc/lra-assigns.c |6 -- >>> gcc/lra-int.h |2 ++ >>> gcc/lra.c | 12 +++- >>> 3 files changed, 17 insertions(+), 3 deletions(-) >>> >>> diff --git a/gcc/lra-assigns.c b/gcc/lra-assigns.c >>> index b204513..daf0aa9 100644 >>> --- a/gcc/lra-assigns.c >>> +++ b/gcc/lra-assigns.c >>> @@ -448,7 +448,7 @@ find_hard_regno_for (int regno, int *cost, int >>> try_only_hard_regno) >>> int hr, conflict_hr, nregs; >>> enum machine_mode biggest_mode; >>> unsigned int k, conflict_regno; >>> - int val, biggest_nregs, nregs_diff; >>> + int offset, val, biggest_nregs, nregs_diff; >>> enum reg_class rclass; >>> bitmap_iterator bi; >>> bool *rclass_intersect_p; >>> @@ -508,9 +508,11 @@ find_hard_regno_for (int regno, int *cost, int >>> try_only_hard_regno) >>> #endif >>> sparseset_clear_bit (conflict_reload_and_inheritance_pseudos, regno); >>> val = lra_reg_info[regno].val; >>> + offset = lra_reg_info[regno].offset; >>> CLEAR_HARD_REG_SET (impossible_start_hard_regs); >>> EXECUTE_IF_SET_IN_SPARSESET (live_range_hard_reg_pseudos, >>> conflict_regno) >>> -if (val == lra_reg_info[conflict_regno].val) >>> +if ((val == lra_reg_info[conflict_regno].val) >>> +&& (offset == lra_reg_info[conflict_regno].offset)) >>> { >>> conflict_hr = live_pseudos_reg_renumber[conflict_regno]; >>> nregs = (hard_regno_nregs[conflict_hr] >>> diff --git a/gcc/lra-int.h b/gcc/lra-int.h >>> index 98f2ff7..8ae4eb0 100644 >>> --- a/gcc/lra-int.h >>> +++ b/gcc/lra-int.h >>> @@ -116,6 +116,8 @@ struct lra_reg >>> /* Value holding by register. If the pseudos have the same >>> value >>>they do not conflict. */ &g
Re: LRA assign same hard register with live range overlapped pseduos
Hi, Vladimir Previous patch probably not completed. The new patch will record lra_reg_info[i].offset as the offset from eliminate register to the pseudo i and keep updating when the stack has been changed. Therefore, lra-assign could get the latest offset to identify the pseudo content is equal or not. gcc/lra-assigns.c |6 -- gcc/lra-eliminations.c | 12 ++-- gcc/lra-int.h |2 ++ gcc/lra.c |5 - 4 files changed, 20 insertions(+), 5 deletions(-) diff --git a/gcc/lra-assigns.c b/gcc/lra-assigns.c index b204513..daf0aa9 100644 --- a/gcc/lra-assigns.c +++ b/gcc/lra-assigns.c @@ -448,7 +448,7 @@ find_hard_regno_for (int regno, int *cost, int try_only_hard_regno) int hr, conflict_hr, nregs; enum machine_mode biggest_mode; unsigned int k, conflict_regno; - int val, biggest_nregs, nregs_diff; + int offset, val, biggest_nregs, nregs_diff; enum reg_class rclass; bitmap_iterator bi; bool *rclass_intersect_p; @@ -508,9 +508,11 @@ find_hard_regno_for (int regno, int *cost, int try_only_hard_regno) #endif sparseset_clear_bit (conflict_reload_and_inheritance_pseudos, regno); val = lra_reg_info[regno].val; + offset = lra_reg_info[regno].offset; CLEAR_HARD_REG_SET (impossible_start_hard_regs); EXECUTE_IF_SET_IN_SPARSESET (live_range_hard_reg_pseudos, conflict_regno) -if (val == lra_reg_info[conflict_regno].val) +if ((val == lra_reg_info[conflict_regno].val) +&& (offset == lra_reg_info[conflict_regno].offset)) { conflict_hr = live_pseudos_reg_renumber[conflict_regno]; nregs = (hard_regno_nregs[conflict_hr] diff --git a/gcc/lra-eliminations.c b/gcc/lra-eliminations.c index 9df0bae..2d34b51 100644 --- a/gcc/lra-eliminations.c +++ b/gcc/lra-eliminations.c @@ -1046,6 +1046,7 @@ spill_pseudos (HARD_REG_SET set) static void update_reg_eliminate (bitmap insns_with_changed_offsets) { + int i; bool prev; struct elim_table *ep, *ep1; HARD_REG_SET temp_hard_reg_set; @@ -1124,8 +1125,15 @@ update_reg_eliminate (bitmap insns_with_changed_offsets) setup_elimination_map (); for (ep = reg_eliminate; ep < ®_eliminate[NUM_ELIMINABLE_REGS]; ep++) if (elimination_map[ep->from] == ep && ep->previous_offset != ep->offset) - bitmap_ior_into (insns_with_changed_offsets, - &lra_reg_info[ep->from].insn_bitmap); + { +bitmap_ior_into (insns_with_changed_offsets, +&lra_reg_info[ep->from].insn_bitmap); + + /* Update offset when the eliminate offset have been changed. */ +for (i = FIRST_PSEUDO_REGISTER; i < max_reg_num (); i++) + if (lra_reg_info[i].val - 1 == ep->from) + lra_reg_info[i].offset += (ep->offset - ep->previous_offset); + } } /* Initialize the table of hard registers to eliminate. diff --git a/gcc/lra-int.h b/gcc/lra-int.h index 98f2ff7..944cad1 100644 --- a/gcc/lra-int.h +++ b/gcc/lra-int.h @@ -116,6 +116,8 @@ struct lra_reg /* Value holding by register. If the pseudos have the same value they do not conflict. */ int val; + /* Offset from relative eliminate register to pesudo reg. */ + int offset; /* These members are set up in lra-lives.c and updated in lra-coalesce.c. */ /* The biggest size mode in which each pseudo reg is referred in diff --git a/gcc/lra.c b/gcc/lra.c index 9df24b5..7a60281 100644 --- a/gcc/lra.c +++ b/gcc/lra.c @@ -194,7 +194,10 @@ lra_create_new_reg (enum machine_mode md_mode, rtx original, new_reg = lra_create_new_reg_with_unique_value (md_mode, original, rclass, title); if (original != NULL_RTX && REG_P (original)) -lra_reg_info[REGNO (new_reg)].val = lra_reg_info[REGNO (original)].val; +{ + lra_reg_info[REGNO (new_reg)].val = lra_reg_info[REGNO (original)].val; + lra_reg_info[REGNO (new_reg)].offset = 0; +} return new_reg; } Thanks for the comment :) Shiva 2013/4/18 Shiva Chen : > Full test2.c.209r.reload is about 296kb and i can't send successfully. > Is there another way to send the dump file? > > Shiva > > 2013/4/18 Shiva Chen : >> Hi, Vladimir >> >> attachment is the ira dump of the case >> >> Shiva >> >> 2013/4/17 Vladimir Makarov : >>> On 13-04-15 1:20 AM, shiva Chen wrote: >>>> >>>> HI, >>>> >>>> I'm trying to port a new 32bit target to GCC 4.8.0 with LRA enabled >>>> >>>> There is an error case which generates following RTL >>>> >>>> >>>> (insn 536 267 643 3 (set (reg/f:SI 0 $r0 [477]) <== r477 assign to r0 >>>> (plus:SI (reg/f:SI 31 $sp) >>>> (const_int 112 [0x70]))) test2.c:95 64 {*addsi3} >>>>(nil)) >>>> (insn 643 536 537 3 (set (
Re: LRA assign same hard register with live range overlapped pseduos
t a/gcc/lra-int.h b/gcc/lra-int.h index 98f2ff7..69f8f8a 100644 --- a/gcc/lra-int.h +++ b/gcc/lra-int.h @@ -116,6 +116,8 @@ struct lra_reg /* Value holding by register. If the pseudos have the same value they do not conflict. */ int val; + /* Offset from relative eliminate register to pesudo reg. */ + int offset; /* These members are set up in lra-lives.c and updated in lra-coalesce.c. */ /* The biggest size mode in which each pseudo reg is referred in @@ -439,6 +441,37 @@ lra_get_insn_recog_data (rtx insn) return lra_set_insn_recog_data (insn); } +/* Update offset from eliminate register to pseduo i. */ +static inline void +lra_set_up_reg_val (int val, int offset) +{ + int i; + + for (i = FIRST_PSEUDO_REGISTER; i < max_reg_num (); i++) +{ + if (lra_reg_info[i].val == val) +lra_reg_info[i].offset += offset; +} +} + +/* Return true if register content are equal. */ +static inline bool +lra_reg_val_equal_p (int regno, int val, int offset) +{ + if (lra_reg_info[regno].val == val + && lra_reg_info[regno].offset == offset) +return true; + + return false; +} + +/* Assign register content record. */ +static inline void +lra_assign_reg_val (int from, int to) +{ + lra_reg_info[to].val = lra_reg_info[from].val; + lra_reg_info[to].offset = lra_reg_info[from].offset; +} struct target_lra_int diff --git a/gcc/lra.c b/gcc/lra.c index 9df24b5..4c06a0c 100644 --- a/gcc/lra.c +++ b/gcc/lra.c @@ -1392,6 +1392,7 @@ initialize_lra_reg_info_element (int i) lra_reg_info[i].last_reload = 0; lra_reg_info[i].restore_regno = -1; lra_reg_info[i].val = get_new_reg_value (); + lra_reg_info[i].offset = 0; lra_reg_info[i].copies = NULL; } 2013/4/20 Vladimir Makarov : > On 13-04-17 11:11 PM, Shiva Chen wrote: >> >> Hi, Vladimir >> >> Overlapped live range RTL is from line 7577 to 7597 in test2.c.209r.reload >> >> Previous patch probably not completed. >> The new patch will record lra_reg_info[i].offset as the offset from >> eliminate register to the pseudo i >> and keep updating when the stack has been changed. >> Therefore, lra-assign could get the latest offset to identify the >> pseudo content is equal or not. >> >> gcc/lra-assigns.c |6 -- >> gcc/lra-eliminations.c | 12 ++-- >> gcc/lra-int.h |2 ++ >> gcc/lra.c |5 - >> 4 files changed, 20 insertions(+), 5 deletions(-) >> >> diff --git a/gcc/lra-assigns.c b/gcc/lra-assigns.c >> index b204513..daf0aa9 100644 >> --- a/gcc/lra-assigns.c >> +++ b/gcc/lra-assigns.c >> @@ -448,7 +448,7 @@ find_hard_regno_for (int regno, int *cost, int >> try_only_hard_regno) >> int hr, conflict_hr, nregs; >> enum machine_mode biggest_mode; >> unsigned int k, conflict_regno; >> - int val, biggest_nregs, nregs_diff; >> + int offset, val, biggest_nregs, nregs_diff; >> enum reg_class rclass; >> bitmap_iterator bi; >> bool *rclass_intersect_p; >> @@ -508,9 +508,11 @@ find_hard_regno_for (int regno, int *cost, int >> try_only_hard_regno) >> #endif >> sparseset_clear_bit (conflict_reload_and_inheritance_pseudos, regno); >> val = lra_reg_info[regno].val; >> + offset = lra_reg_info[regno].offset; >> CLEAR_HARD_REG_SET (impossible_start_hard_regs); >> EXECUTE_IF_SET_IN_SPARSESET (live_range_hard_reg_pseudos, >> conflict_regno) >> -if (val == lra_reg_info[conflict_regno].val) >> +if ((val == lra_reg_info[conflict_regno].val) >> +&& (offset == lra_reg_info[conflict_regno].offset)) >> { >> conflict_hr = live_pseudos_reg_renumber[conflict_regno]; >> nregs = (hard_regno_nregs[conflict_hr] >> diff --git a/gcc/lra-eliminations.c b/gcc/lra-eliminations.c >> index 9df0bae..2d34b51 100644 >> --- a/gcc/lra-eliminations.c >> +++ b/gcc/lra-eliminations.c >> @@ -1046,6 +1046,7 @@ spill_pseudos (HARD_REG_SET set) >> static void >> update_reg_eliminate (bitmap insns_with_changed_offsets) >> { >> + int i; >> bool prev; >> struct elim_table *ep, *ep1; >> HARD_REG_SET temp_hard_reg_set; >> @@ -1124,8 +1125,15 @@ update_reg_eliminate (bitmap >> insns_with_changed_offsets) >> setup_elimination_map (); >> for (ep = reg_eliminate; ep < ®_eliminate[NUM_ELIMINABLE_REGS]; >> ep++) >> if (elimination_map[ep->from] == ep && ep->previous_offset != >> ep->offset) >> - bitmap_ior_into (insns_with_changed_offsets, >> - &lra_reg_info[ep->from].insn_bitmap); >> + { >> +bitm
Re: LRA assign same hard register with live range overlapped pseduos
Hi, Vladimir I add the code and the patch has passed bootstrap/regression test on i686-pc-linux-gnu with current main trunk. However, I don't have svn write access yet. Could you please help to commit this patch? Thanks for your comment and your help. I really appreciate it. A plaintext gcc/ChangeLog is as below: 2013-04-23 Shiva Chen * lra-assigns.c (find_hard_regno_for): Use lra_reg_val_equal_p to check the register content is equal or not. * lra-constraints.c (match_reload): Use lra_assign_reg_val to assign register content record. * lra-eliminations.c (update_reg_eliminate): Use lra_set_up_reg_val to update register content offset. * lra-int.h (struct lra_reg): Add offset member. (lra_reg_val_equal_p): New static inline function. (lra_set_up_reg_val): New static inline function. (lra_assign_reg_val): New static inline function. * lra.c (lra_create_new_reg): Use lra_assign_reg_val to assign register content record. (initialize_lra_reg_info_element): Initial offset to zero 2013/4/23 Vladimir Makarov : > On 13-04-22 2:26 AM, Shiva Chen wrote: >> >> Hi, Vladimir >> >> I write the new patch as your suggestion. >> Could you help me to check is there something missing ? > > I think there is one more place to use lra_assign_reg_val: > > lra.c::lra_create_new_reg > > Please add the code and right changelog entry for the patch and you can > commit the patch into trunk. > > Thanks, Shiva. > >> Thanks, Shiva >> >> gcc/lra-assigns.c | 12 +++- >> gcc/lra-constraints.c |5 ++--- >> gcc/lra-eliminations.c | 10 -- >> gcc/lra-int.h | 33 + >> gcc/lra.c |1 + >> 5 files changed, 51 insertions(+), 10 deletions(-) >> >> diff --git a/gcc/lra-assigns.c b/gcc/lra-assigns.c >> index b204513..3f8a899 100644 >> --- a/gcc/lra-assigns.c >> +++ b/gcc/lra-assigns.c >> @@ -448,7 +448,7 @@ find_hard_regno_for (int regno, int *cost, int >> try_only_hard_regno) >> int hr, conflict_hr, nregs; >> enum machine_mode biggest_mode; >> unsigned int k, conflict_regno; >> - int val, biggest_nregs, nregs_diff; >> + int offset, val, biggest_nregs, nregs_diff; >> enum reg_class rclass; >> bitmap_iterator bi; >> bool *rclass_intersect_p; >> @@ -508,9 +508,10 @@ find_hard_regno_for (int regno, int *cost, int >> try_only_hard_regno) >> #endif >> sparseset_clear_bit (conflict_reload_and_inheritance_pseudos, regno); >> val = lra_reg_info[regno].val; >> + offset = lra_reg_info[regno].offset; >> CLEAR_HARD_REG_SET (impossible_start_hard_regs); >> EXECUTE_IF_SET_IN_SPARSESET (live_range_hard_reg_pseudos, >> conflict_regno) >> -if (val == lra_reg_info[conflict_regno].val) >> +if (lra_reg_val_equal_p (conflict_regno, val, offset)) >> { >> conflict_hr = live_pseudos_reg_renumber[conflict_regno]; >> nregs = (hard_regno_nregs[conflict_hr] >> @@ -538,7 +539,7 @@ find_hard_regno_for (int regno, int *cost, int >> try_only_hard_regno) >> } >> EXECUTE_IF_SET_IN_SPARSESET (conflict_reload_and_inheritance_pseudos, >> conflict_regno) >> -if (val != lra_reg_info[conflict_regno].val) >> +if (!lra_reg_val_equal_p (conflict_regno, val, offset)) >> { >> lra_assert (live_pseudos_reg_renumber[conflict_regno] < 0); >> if ((hard_regno >> @@ -1007,7 +1008,7 @@ >> setup_live_pseudos_and_spill_after_risky_transforms (bitmap >> { >> int p, i, j, n, regno, hard_regno; >> unsigned int k, conflict_regno; >> - int val; >> + int val, offset; >> HARD_REG_SET conflict_set; >> enum machine_mode mode; >> lra_live_range_t r; >> @@ -1050,8 +1051,9 @@ >> setup_live_pseudos_and_spill_after_risky_transforms (bitmap >> COPY_HARD_REG_SET (conflict_set, lra_no_alloc_regs); >> IOR_HARD_REG_SET (conflict_set, >> lra_reg_info[regno].conflict_hard_regs); >> val = lra_reg_info[regno].val; >> + offset = lra_reg_info[regno].offset; >> EXECUTE_IF_SET_IN_SPARSESET (live_range_hard_reg_pseudos, >> conflict_regno) >> - if (val != lra_reg_info[conflict_regno].val >> + if (!lra_reg_val_equal_p (conflict_regno, val, offset) >> /* If it is multi-register pseudos they should start on >> the same hard register. */ >> || hard_regno != reg_renumber[
Generate abs target assembly code with saturation behavior
Hi, I have a case which will generate abs instructions. int main(int argc) { if (argc < 0) argc = -(unsigned int)argc; return argc; } To my understanding, given that argc=0x8000 in 32bit int plaform, the result of (unsigned int)argc is well defined and should be 0x8000u. (C99 6.3.1.3 point 2) And then the result of -0x8000u should be 0x8000 because unsigned operation can never overflow and the value can be represented by signed integer. (C99 6.2.5 point 9) when the case is compiled above -O1, it would generate abs instruction directly if the target provides abssi2 naming pattern. However, if the target's abs have saturation behavior (i.e abs (0x8000) = 0x7fff) then the result will wrong. My question is Does the abssi2 semantically assume target abs shouldn't do saturation ? If the target abs have saturation behavior How should the target generate abs instruction to avoid the wrong result ? I noticed that there is ss_abs rtx code. But it seems there is no ss_abssi2 naming pattern. Any suggestion? Thanks in advance. Shiva
Re: Generate abs target assembly code with saturation behavior
Accordoing to GCC implementation defined assigned 0x8000u to signed int should be 0x8000. If GCC generate abs, abs will saturation or not depend on target ISA. Then the result of the case won't follow the GCC implementation defined. Then the result of the marco in libgcc/soft-fp/op-common.h 1126 #define _FP_FROM_INT(fs, wc, X, r, rsize, rtype) \ 1127 do { \ 1128 if (r) \ 1129 { \ 1130 rtype ur_; \ 1131 \ 1132 if ((X##_s = (r < 0))) \ 1133 r = -(rtype)r; The value of r will be target dependent if the input of r = 0x8000 I'm not sure which way would better. We should avoid undefined C99 behavior or We should avoid generate abs then the implementation defined could be guarantee. 2013/6/28 Marc Glisse : > On Fri, 28 Jun 2013, Andrew Haley wrote: > >> On 06/28/2013 08:53 AM, Shiva Chen wrote: >>> >>> I have a case which will generate abs instructions. >>> >>> int main(int argc) >>> { >>> if (argc < 0) >>>argc = -(unsigned int)argc; >>> return argc; >>> } >>> >>> To my understanding, given that argc=0x8000 in 32bit int plaform, >>> the result of (unsigned int)argc is well defined and should be >>> 0x8000u. >>> (C99 6.3.1.3 point 2) >>> >>> And then the result of -0x8000u should be 0x8000 because >>> unsigned operation can never overflow and the value can be >>> represented by signed integer. >>> (C99 6.2.5 point 9) >> >> >> Yes, but you can't then assign that to an int, because it will overflow. >> 0x8000 will not fit in an int: it's undefined behaviour. > > > Implementation defined, and ok with gcc: > http://gcc.gnu.org/onlinedocs/gcc/Integers-implementation.html > > -- > Marc Glisse
Re: Generate abs target assembly code with saturation behavior
Therefore, the macro in libgcc should be fine for GCC even if there may occur signed overflow. Because we have GCC implementation defined to follow. GCC implementation defined could be guarantee even if generate abs. Because abssi2 shouldn't do saturation. But how the target generate abs instruction if the target abs have saturation behavior ? We couldn't use abssi2 naming pattern and ss_abssi2 naming pattern doesn't exist. Or we should suggest CPU provider define another abs instruction without saturation ? 2013/6/28 Joseph S. Myers : > On Fri, 28 Jun 2013, Shiva Chen wrote: > >> Does the abssi2 semantically assume target abs shouldn't do saturation ? > > Yes. The semantics of RTL abs are modulo, whereas ss_abs does signed > saturation. Likewise for addition, subtraction and multiplication. > (Some RTL codes have more complicated target-specific semantics, e.g. > shifts, and signed INT_MIN / -1 and INT_MIN % -1 should currently be > considered undefined in RTL rather than defined with modulo semantics.) > > -- > Joseph S. Myers > jos...@codesourcery.com
Re: Generate abs target assembly code with saturation behavior
Thank you for all your kindly help make the issue more clear. Currently, we would disable abssi2 pattern and the soft-fp could work correctly. Thanks all the explanation and assistance. I really appreciate it. 2013/6/30 Joseph S. Myers : > On Sun, 30 Jun 2013, Shiva Chen wrote: > >> But how the target generate abs instruction if the target abs have >> saturation behavior ? >> We couldn't use abssi2 naming pattern and ss_abssi2 naming pattern >> doesn't exist. > > If you want to use a saturating abs instruction for C code that exhibits > undefined behavior in cases where the instruction saturates, you need to > add support for ss_abssi2 patterns and teach the GIMPLE-to-RTL conversion > to use such patterns as appropriate (in the absence of -fwrapv/-ftrapv, of > course). > > -- > Joseph S. Myers > jos...@codesourcery.com
Re: Target options
2013/7/16 Hendrik Greving : > Along the same lines, what's the difference of target_flags (I know > from old compilers) and target_flags_explicit (I do not know)? > > Thanks, > Regards, > Hendrik Greving > > On Mon, Jul 15, 2013 at 10:30 AM, Hendrik Greving > wrote: >> Hi, >> >> when defining target options with Mask() and "Target" going to >> target_flags. Can I use Init(1) to define the default, or is "Init" >> only used to initialize "Var(name)" kind of options? If so, what's the >> proper way to define defaults, it wasn't clear to me when checking >> mips/i386 definitions for instance. >> >> Thanks, >> Hendrik Greving Hi, Greving To my understanding, when the option use MASK(F) we shouldn't use Init (), one approach is to initialize it in option_override target hook Ex: target_flags |= MASK_F target_flags_explicit determine whether user have given the flag value (disable/enable)or not. Ex: if the flag initial value depends on cpu type when the cpu type is A, flag F should enable However, user may disable the flag explicitly we wish user semantic could take effect. Therefore, the condition would be if (cpu_type == A && !(target_flags_explicit & MASK_F)) target_flags |= MASK_F; Cheers, Shiva
Aarch64 implementation for dwarf exception handling
Hi, I have a question about the implementation of aarch64_final_eh_return_addr which is used to point out the return address of the frame According the source code If FP is not needed return gen_frame_mem (DImode, plus_constant (Pmode, stack_pointer_rtx, fp_offset + cfun->machine->frame.saved_regs_size - 2 * UNITS_PER_WORD)); According the frame layout +---+ <-- arg_pointer_rtx | | callee-allocated save area | for register varargs | +---+ | | local variables | +---+ <-- frame_pointer_rtx | | callee-saved registers | +---+ | LR' +---+ | FP' P+---+ <-- hard_frame_pointer_rtx | dynamic allocation +---+ | | outgoing stack arguments | +---+ <-- stack_pointer_rtx Shouldn't the return value be return gen_frame_mem (DImode, plus_constant (Pmode, stack_pointer_rtx, fp_offset + 2* UNITS_PER_WORD)); Or I just mis-understanding something ? Hope someone could give me a tip. It would be very helpful. Thanks Shiva Chen
Re: Aarch64 implementation for dwarf exception handling
Hi, Yufeng Sorry, I don't have any testcase I just mis-understanding the implementation. Hi, Renlin Thanks to point out my mis-understanding. I didn't aware that LP would in different position between FP needed (bottom of callee) and FP not needed(top of callee). I have check the aarch64_layout_frame() and find out the FP/LP will push as last register by aarch64_save_or_restore_callee_save_registers () if FP is not needed. Thanks for your kindly help, I really appreciate it. Shiva 2014-02-13 22:32 GMT+08:00 Renlin Li : > On 13/02/14 02:14, Shiva Chen wrote: >> >> Hi, >> >> I have a question about the implementation of >> >> aarch64_final_eh_return_addr >> >> which is used to point out the return address of the frame >> >> According the source code >> >> If FP is not needed >> >>return gen_frame_mem (DImode, >> plus_constant (Pmode, >> stack_pointer_rtx, >> fp_offset >> + >> cfun->machine->frame.saved_regs_size >> - 2 * UNITS_PER_WORD)); >> >> >> According the frame layout >> >> +---+ <-- arg_pointer_rtx >> | >> | callee-allocated save area >> | for register varargs >> | >> +---+ >> | >> | local variables >> | >> +---+ <-- frame_pointer_rtx >> | >> | callee-saved registers >> | >> +---+ >> | LR' >> +---+ >> | FP' >> P+---+ <-- hard_frame_pointer_rtx >> | dynamic allocation >> +---+ >> | >> | outgoing stack arguments >> | >> +---+ <-- stack_pointer_rtx >> >> Shouldn't the return value be >> >>return gen_frame_mem (DImode, >> plus_constant (Pmode, >> stack_pointer_rtx, >> fp_offset >> + 2* UNITS_PER_WORD)); >> >> Or I just mis-understanding something ? >> >> >> Hope someone could give me a tip. >> >> It would be very helpful. >> >> Thanks >> >> Shiva Chen >> > Hi, > > If frame pointer is not needed. The prologue routine will store the callee > saved registers to stack according to ascending order, which means X0 will > be saved first if needed, and X30(LR) will be the last if it's pushed into > stack. > > Please check the source code, aarch64_layout_frame(). > > As the comment above the code also indicates, LR would be at the top of the > saved registers block(). > > By the way, there is one additional stack slot might be needed to keep stack > pointer 16-byte aligned, so - 2 * UNITS_PER_WORD is needed to adjust the > load address. > > +---+ <-- arg_pointer_rtx > | > +---+ <-- frame_pointer_rtx > | dummy > | LR > | bla...bla... > | x3 > | x2 > | x1 > | x0 > P +---+ <-- hard_frame_pointer_rtx > | > +---+ <-- stack_pointer_rtx > > > Kind regards, > Renlin >