[Patch, bfin/c6x] Fix ICE for backends that rely on reorder_loops.
cc1 backtrace: arraysum.c: In function 'test_entry': arraysum.c:14:1: internal compiler error: in cfg_layout_initialize, at cfgrtl.c:4233 } ^ 0x6c096d cfg_layout_initialize(unsigned int) ../../trunk/gcc/cfgrtl.c:4233 0xeab763 reorder_loops ../../trunk/gcc/hw-doloop.c:500 0xeacd04 reorg_loops(bool, hw_doloop_hooks*) ../../trunk/gcc/hw-doloop.c:641 0xdb8d2e c6x_hwloops ../../trunk/gcc/config/c6x/c6x.c:5898 0xdbdb10 c6x_reorg ../../trunk/gcc/config/c6x/c6x.c:5939 0xa61e30 rest_of_handle_machine_reorg ../../trunk/gcc/reorg.c:3921 0xa61e5c execute ../../trunk/gcc/reorg.c:3951 Attached please find the patch for this ICE. Since c6x backend choose doloops whose tail and head are the same block, code generation will not be affected. But it may bring some negetive effect on target code efficiency for bfin backend as less hw-doloops may be generated. In my point of view, what we need here is a better block reordering that cares about hw-doloops. If OK for trunk, can someone please apply it? Thanks. Index: gcc/ChangeLog === --- gcc/ChangeLog(revision 206273) +++ gcc/ChangeLog(working copy) @@ -1,3 +1,12 @@ +2014-01-01 Felix Yang + +* hw-doloop.c (set_bb_indices): Remove. +(reorder_loops): Ditto. +(reorg_loops): Remove do_reorder parameter. +* hw-doloop.h (reorg_loops): Likewise. +* config/c6x/c6x.c (c6x_hwloops): Do not rely on reorder_loops. +* config/bfin/bfin.c (bfin_reorg_loops): Ditto. + 2014-01-01 Jakub Jelinek * config/i386/sse.md (*mov_internal): Guard Index: gcc/hw-doloop.c === --- gcc/hw-doloop.c(revision 206273) +++ gcc/hw-doloop.c(working copy) @@ -470,85 +470,6 @@ free_loops (hwloop_info loops) } } -#define BB_AUX_INDEX(BB) ((intptr_t) (BB)->aux) - -/* Initialize the aux fields to give ascending indices to basic blocks. */ -static void -set_bb_indices (void) -{ - basic_block bb; - intptr_t index; - - index = 0; - FOR_EACH_BB_FN (bb, cfun) -bb->aux = (void *) index++; -} - -/* The taken-branch edge from the loop end can actually go forward. - If the target's hardware loop support requires that the loop end be - after the loop start, try to reorder a loop's basic blocks when we - find such a case. - This is not very aggressive; it only moves at most one block. It - does not introduce new branches into loops; it may remove them, or - it may switch fallthru/jump edges. */ -static void -reorder_loops (hwloop_info loops) -{ - basic_block bb; - hwloop_info loop; - - cfg_layout_initialize (0); - - set_bb_indices (); - - for (loop = loops; loop; loop = loop->next) -{ - edge e; - edge_iterator ei; - - if (loop->bad) -continue; - - if (BB_AUX_INDEX (loop->head) <= BB_AUX_INDEX (loop->tail)) -continue; - - FOR_EACH_EDGE (e, ei, loop->head->succs) -{ - if (bitmap_bit_p (loop->block_bitmap, e->dest->index) - && BB_AUX_INDEX (e->dest) < BB_AUX_INDEX (loop->tail)) -{ - basic_block start_bb = e->dest; - basic_block start_prev_bb = start_bb->prev_bb; - - if (dump_file) -fprintf (dump_file, ";; Moving block %d before block %d\n", - loop->head->index, start_bb->index); - loop->head->prev_bb->next_bb = loop->head->next_bb; - loop->head->next_bb->prev_bb = loop->head->prev_bb; - - loop->head->prev_bb = start_prev_bb; - loop->head->next_bb = start_bb; - start_prev_bb->next_bb = start_bb->prev_bb = loop->head; - - set_bb_indices (); - break; -} -} - loops = loops->next; -} - - FOR_EACH_BB_FN (bb, cfun) -{ - if (bb->next_bb != EXIT_BLOCK_PTR_FOR_FN (cfun)) -bb->aux = bb->next_bb; - else -bb->aux = NULL; -} - cfg_layout_finalize (); - clear_aux_for_blocks (); - df_analyze (); -} - /* Call the OPT function for LOOP and all of its sub-loops. This is done in a depth-first search; innermost loops are visited first. OPTIMIZE and FAIL are the functions passed to reorg_loops by the @@ -611,15 +532,10 @@ optimize_loop (hwloop_info loop, struct hw_doloop_ instructions before the loop, to define loop bounds or set up a special loop counter register. - DO_REORDER should be set to true if we should try to use the - reorder_loops function to ensure the loop end occurs after the loop - start. This is for use by targets where the loop hardware requires - this condition. - HOOKS is used to pass in target specific hooks; see hw-doloop.h. */ void -reorg_loops (bool do_reorder, struct hw_doloop_hooks *hooks) +reorg_loops (struct hw_doloop_hooks *hooks) { hwloop_info loops = NULL;
Re: [Patch, bfin/c6x] Fix ICE for backends that rely on reorder_loops.
Ping x1 Any suggestions on this? The background is that we cannot call cfg_layout_initialize after the bb-reorder pass as indicated by the following code snippet: /* Once bb reordering is complete, cfg layout mode should not be re-entered. Entering cfg layout mode will perform optimizations on the cfg that could affect the bb layout negatively or even require fixups. An example of the latter is if edge forwarding performed when optimizing the cfg layout required moving a block from the hot to the cold section under -freorder-blocks-and-partition. This would create an illegal partitioning unless some manual fixup was performed. */ gcc_assert (!crtl->bb_reorder_complete); I just removed the reorder_loops functionality and will consider implementing a similiar feature at the end of the bb-reorder pass as a next step. Cheers, Felix On Wed, Jan 1, 2014 at 9:01 PM, Felix Yang wrote: > cc1 backtrace: > > arraysum.c: In function 'test_entry': > arraysum.c:14:1: internal compiler error: in cfg_layout_initialize, at > cfgrtl.c:4233 > } > ^ > 0x6c096d cfg_layout_initialize(unsigned int) > ../../trunk/gcc/cfgrtl.c:4233 > 0xeab763 reorder_loops > ../../trunk/gcc/hw-doloop.c:500 > 0xeacd04 reorg_loops(bool, hw_doloop_hooks*) > ../../trunk/gcc/hw-doloop.c:641 > 0xdb8d2e c6x_hwloops > ../../trunk/gcc/config/c6x/c6x.c:5898 > 0xdbdb10 c6x_reorg > ../../trunk/gcc/config/c6x/c6x.c:5939 > 0xa61e30 rest_of_handle_machine_reorg > ../../trunk/gcc/reorg.c:3921 > 0xa61e5c execute > ../../trunk/gcc/reorg.c:3951 > > Attached please find the patch for this ICE. > Since c6x backend choose doloops whose tail and head are the same > block, code generation will not be affected. > But it may bring some negetive effect on target code efficiency for > bfin backend as less hw-doloops may be generated. > In my point of view, what we need here is a better block reordering > that cares about hw-doloops. > If OK for trunk, can someone please apply it? Thanks. > > > Index: gcc/ChangeLog > === > --- gcc/ChangeLog(revision 206273) > +++ gcc/ChangeLog(working copy) > @@ -1,3 +1,12 @@ > +2014-01-01 Felix Yang > + > +* hw-doloop.c (set_bb_indices): Remove. > +(reorder_loops): Ditto. > +(reorg_loops): Remove do_reorder parameter. > +* hw-doloop.h (reorg_loops): Likewise. > +* config/c6x/c6x.c (c6x_hwloops): Do not rely on reorder_loops. > +* config/bfin/bfin.c (bfin_reorg_loops): Ditto. > + > 2014-01-01 Jakub Jelinek > > * config/i386/sse.md (*mov_internal): Guard > Index: gcc/hw-doloop.c > === > --- gcc/hw-doloop.c(revision 206273) > +++ gcc/hw-doloop.c(working copy) > @@ -470,85 +470,6 @@ free_loops (hwloop_info loops) > } > } > > -#define BB_AUX_INDEX(BB) ((intptr_t) (BB)->aux) > - > -/* Initialize the aux fields to give ascending indices to basic blocks. */ > -static void > -set_bb_indices (void) > -{ > - basic_block bb; > - intptr_t index; > - > - index = 0; > - FOR_EACH_BB_FN (bb, cfun) > -bb->aux = (void *) index++; > -} > - > -/* The taken-branch edge from the loop end can actually go forward. > - If the target's hardware loop support requires that the loop end be > - after the loop start, try to reorder a loop's basic blocks when we > - find such a case. > - This is not very aggressive; it only moves at most one block. It > - does not introduce new branches into loops; it may remove them, or > - it may switch fallthru/jump edges. */ > -static void > -reorder_loops (hwloop_info loops) > -{ > - basic_block bb; > - hwloop_info loop; > - > - cfg_layout_initialize (0); > - > - set_bb_indices (); > - > - for (loop = loops; loop; loop = loop->next) > -{ > - edge e; > - edge_iterator ei; > - > - if (loop->bad) > -continue; > - > - if (BB_AUX_INDEX (loop->head) <= BB_AUX_INDEX (loop->tail)) > -continue; > - > - FOR_EACH_EDGE (e, ei, loop->head->succs) > -{ > - if (bitmap_bit_p (loop->block_bitmap, e->dest->index) > - && BB_AUX_INDEX (e->dest) < BB_AUX_INDEX (loop->tail)) > -{ > - basic_block start_bb = e->dest; > - basic_block start_prev_bb = start_bb->prev_bb; > - > - if (dump_file) > -fprintf (dump_file, ";; Moving block %d before block %d\n", > - loop->head->index, start_bb->index); > - loop->head->prev_bb->next_bb
[Patch, SMS] Fix a potential bug of schedule_reg_moves of SMS
Hello Vladimir, This patch fixes a trivial bug of SMS: distance1_uses needs to be initialized if newly allocated. Since the original author of SMS is not active for a very long time, it would be nice if you can apply it for me : ) Thanks. Index: gcc/ChangeLog === --- gcc/ChangeLog(revision 206279) +++ gcc/ChangeLog(working copy) @@ -1,3 +1,8 @@ +2013-01-02 Felix Yang + +* modulo-sched.c (schedule_reg_moves): Clear distance1_uses if it +is newly allocated. + 2013-01-01 Jan-Benedict Glaw * config/nios2/nios2.h (BITS_PER_UNIT): Don't define it. Index: gcc/modulo-sched.c === --- gcc/modulo-sched.c(revision 206279) +++ gcc/modulo-sched.c(working copy) @@ -766,6 +766,9 @@ schedule_reg_moves (partial_schedule_ptr ps) distance1_uses = distances[1] ? sbitmap_alloc (g->num_nodes) : NULL; + if (distance1_uses) +bitmap_clear (distance1_uses); + /* Every use of the register defined by node may require a different copy of this register, depending on the time the use is scheduled. Record which uses require which move results. */ Cheers, Felix
Re: [Patch, bfin/c6x] Fix ICE for backends that rely on reorder_loops.
Hi Bernd, It's easy to reproduce this error: step1: Build gcc for c6x from gcc code of latest trunk, say c6x-uclinux-gcc for example step2: Using the newly built c6x gcc compiler to comiple this small case: $c6x-unlinux-gcc -S -O2 arraysum.c arraysum.c: int array[1024]; int test_entry () { unsigned int i; int sum = 0; for (i = 0; i < 1024; i++) { sum = sum + array[i]; } return sum; } Note: -freorder-blocks is enable under -O2. I think it would be better to skip reorder_loops if bb reordering is enabled, since function without hw-doloops can still benefit from block reordering optimization. Attached please find the updated patch. Please apply it if OK for trunk. Thanks Index: gcc/ChangeLog === --- gcc/ChangeLog(revision 206280) +++ gcc/ChangeLog(working copy) @@ -1,3 +1,9 @@ +2014-01-02 Felix Yang + +* config/c6x/c6x.c (c6x_hwloops): Do not reorder loop's basic blocks +if bb reordering is enabled. +* config/bfin/bfin.c (bfin_reorg_loops): Ditto. + 2014-01-01 Jan-Benedict Glaw * config/nios2/nios2.h (BITS_PER_UNIT): Don't define it. Index: gcc/config/c6x/c6x.c === --- gcc/config/c6x/c6x.c(revision 206280) +++ gcc/config/c6x/c6x.c(working copy) @@ -5895,7 +5895,15 @@ static void c6x_hwloops (void) { if (optimize) -reorg_loops (true, &c6x_doloop_hooks); +{ + /* Do not reorder loop's basic blocks if bb reordering is enabled. The reason + is that cfg layout mode should not be re-entered once bb reordering is + complete. */ + if (flag_reorder_blocks || flag_reorder_blocks_and_partition) +reorg_loops (false, &c6x_doloop_hooks); + else +reorg_loops (true, &c6x_doloop_hooks); +} } /* Implement the TARGET_MACHINE_DEPENDENT_REORG pass. We split call insns here Index: gcc/config/bfin/bfin.c === --- gcc/config/bfin/bfin.c(revision 206280) +++ gcc/config/bfin/bfin.c(working copy) @@ -3885,7 +3885,13 @@ static struct hw_doloop_hooks bfin_doloop_hooks = static void bfin_reorg_loops (void) { - reorg_loops (true, &bfin_doloop_hooks); + /* Do not reorder loop's basic blocks if bb reordering is enabled. The reason + is that cfg layout mode should not be re-entered once bb reordering is + complete. */ + if (flag_reorder_blocks || flag_reorder_blocks_and_partition) +reorg_loops (false, &bfin_doloop_hooks); + else +reorg_loops (true, &bfin_doloop_hooks); } /* Possibly generate a SEQUENCE out of three insns found in SLOT. Cheers, Felix On Thu, Jan 2, 2014 at 7:04 PM, Bernd Schmidt wrote: > On 01/01/2014 02:01 PM, Felix Yang wrote: >> >> cc1 backtrace: >> >> arraysum.c: In function 'test_entry': >> arraysum.c:14:1: internal compiler error: in cfg_layout_initialize, at >> cfgrtl.c:4233 > > > Please include steps to reproduce bugs. > > >> Attached please find the patch for this ICE. >> Since c6x backend choose doloops whose tail and head are the same >> block, code generation will not be affected. >> But it may bring some negetive effect on target code efficiency for >> bfin backend as less hw-doloops may be generated. >> In my point of view, what we need here is a better block reordering >> that cares about hw-doloops. >> If OK for trunk, can someone please apply it? Thanks. > > > Is this with -freorder-blocks? Rather than delete the code, just skip it if > that flag is enabled. Or forcibly turn it off for these ports in > option_override. > > > Bernd > Index: gcc/ChangeLog === --- gcc/ChangeLog (revision 206280) +++ gcc/ChangeLog (working copy) @@ -1,3 +1,9 @@ +2014-01-02 Felix Yang + + * config/c6x/c6x.c (c6x_hwloops): Do not reorder loop's basic blocks + if bb reordering is enabled. + * config/bfin/bfin.c (bfin_reorg_loops): Ditto. + 2014-01-01 Jan-Benedict Glaw * config/nios2/nios2.h (BITS_PER_UNIT): Don't define it. Index: gcc/config/c6x/c6x.c === --- gcc/config/c6x/c6x.c(revision 206280) +++ gcc/config/c6x/c6x.c(working copy) @@ -5895,7 +5895,15 @@ static void c6x_hwloops (void) { if (optimize) -reorg_loops (true, &c6x_doloop_hooks); +{ + /* Do not reorder loop's basic blocks if bb reordering is enabled. The reason + is that cfg layout mode should not be re-entered once bb reordering is + complete. */ + if (flag_reorder_blocks || flag_reorder_blocks_and_partition) +reorg_loops
Re: [Patch, SMS] Fix a potential bug of schedule_reg_moves of SMS
Sorry, the year of the date in ChangeLog is wrong, fix it. Index: gcc/ChangeLog == = --- gcc/ChangeLog(revision 206279) +++ gcc/ChangeLog(working copy) @@ -1,3 +1,8 @@ +2014-01-02 Felix Yang + +* modulo-sched.c (schedule_reg_moves): Clear distance1_uses if it +is newly allocated. + 2013-01-01 Jan-Benedict Glaw * config/nios2/nios2.h (BITS_PER_UNIT): Don't define it. Index: gcc/modulo-sched.c === --- gcc/modulo-sched.c(revision 206279) +++ gcc/modulo-sched.c(working copy) @@ -766,6 +766,9 @@ schedule_reg_moves (partial_schedule_ptr ps) distance1_uses = distances[1] ? sbitmap_alloc (g->num_nodes) : NULL; + if (distance1_uses) +bitmap_clear (distance1_uses); + /* Every use of the register defined by node may require a different copy of this register, depending on the time the use is scheduled. Record which uses require which move results. */ Cheers, Felix On Thu, Jan 2, 2014 at 9:07 PM, Felix Yang wrote: > Hello Vladimir, > This patch fixes a trivial bug of SMS: distance1_uses needs to be > initialized if newly allocated. > Since the original author of SMS is not active for a very long time, > it would be nice if you can apply it for me : ) > Thanks. > > Index: gcc/ChangeLog > === > --- gcc/ChangeLog(revision 206279) > +++ gcc/ChangeLog(working copy) > @@ -1,3 +1,8 @@ > +2013-01-02 Felix Yang > + > +* modulo-sched.c (schedule_reg_moves): Clear distance1_uses if it > +is newly allocated. > + > 2013-01-01 Jan-Benedict Glaw > > * config/nios2/nios2.h (BITS_PER_UNIT): Don't define it. > Index: gcc/modulo-sched.c > === > --- gcc/modulo-sched.c(revision 206279) > +++ gcc/modulo-sched.c(working copy) > @@ -766,6 +766,9 @@ schedule_reg_moves (partial_schedule_ptr ps) > >distance1_uses = distances[1] ? sbitmap_alloc (g->num_nodes) : NULL; > > + if (distance1_uses) > +bitmap_clear (distance1_uses); > + >/* Every use of the register defined by node may require a different > copy of this register, depending on the time the use is scheduled. > Record which uses require which move results. */ > > > Cheers, > Felix
Re: [Patch, bfin/c6x] Fix ICE for backends that rely on reorder_loops.
Ping? Cheers, Felix On Thu, Jan 2, 2014 at 10:13 PM, Felix Yang wrote: > Hi Bernd, > > It's easy to reproduce this error: > step1: Build gcc for c6x from gcc code of latest trunk, say > c6x-uclinux-gcc for example > step2: Using the newly built c6x gcc compiler to comiple this small case: > $c6x-unlinux-gcc -S -O2 arraysum.c > > arraysum.c: > > int array[1024]; > > int test_entry () > { > unsigned int i; > int sum = 0; > > for (i = 0; i < 1024; i++) { > sum = sum + array[i]; > } > > return sum; > } > > Note: -freorder-blocks is enable under -O2. > > I think it would be better to skip reorder_loops if bb reordering is > enabled, since function without hw-doloops > can still benefit from block reordering optimization. > Attached please find the updated patch. Please apply it if OK for trunk. > Thanks > > Index: gcc/ChangeLog > === > --- gcc/ChangeLog(revision 206280) > +++ gcc/ChangeLog(working copy) > @@ -1,3 +1,9 @@ > +2014-01-02 Felix Yang > + > +* config/c6x/c6x.c (c6x_hwloops): Do not reorder loop's basic blocks > +if bb reordering is enabled. > +* config/bfin/bfin.c (bfin_reorg_loops): Ditto. > + > 2014-01-01 Jan-Benedict Glaw > > * config/nios2/nios2.h (BITS_PER_UNIT): Don't define it. > Index: gcc/config/c6x/c6x.c > === > --- gcc/config/c6x/c6x.c(revision 206280) > +++ gcc/config/c6x/c6x.c(working copy) > @@ -5895,7 +5895,15 @@ static void > c6x_hwloops (void) > { >if (optimize) > -reorg_loops (true, &c6x_doloop_hooks); > +{ > + /* Do not reorder loop's basic blocks if bb reordering is > enabled. The reason > + is that cfg layout mode should not be re-entered once bb reordering > is > + complete. */ > + if (flag_reorder_blocks || flag_reorder_blocks_and_partition) > +reorg_loops (false, &c6x_doloop_hooks); > + else > +reorg_loops (true, &c6x_doloop_hooks); > +} > } > > /* Implement the TARGET_MACHINE_DEPENDENT_REORG pass. We split call insns > here > Index: gcc/config/bfin/bfin.c > === > --- gcc/config/bfin/bfin.c(revision 206280) > +++ gcc/config/bfin/bfin.c(working copy) > @@ -3885,7 +3885,13 @@ static struct hw_doloop_hooks bfin_doloop_hooks = > static void > bfin_reorg_loops (void) > { > - reorg_loops (true, &bfin_doloop_hooks); > + /* Do not reorder loop's basic blocks if bb reordering is enabled. The > reason > + is that cfg layout mode should not be re-entered once bb reordering is > + complete. */ > + if (flag_reorder_blocks || flag_reorder_blocks_and_partition) > +reorg_loops (false, &bfin_doloop_hooks); > + else > +reorg_loops (true, &bfin_doloop_hooks); > } > > /* Possibly generate a SEQUENCE out of three insns found in SLOT. > > Cheers, > Felix > > > On Thu, Jan 2, 2014 at 7:04 PM, Bernd Schmidt wrote: >> On 01/01/2014 02:01 PM, Felix Yang wrote: >>> >>> cc1 backtrace: >>> >>> arraysum.c: In function 'test_entry': >>> arraysum.c:14:1: internal compiler error: in cfg_layout_initialize, at >>> cfgrtl.c:4233 >> >> >> Please include steps to reproduce bugs. >> >> >>> Attached please find the patch for this ICE. >>> Since c6x backend choose doloops whose tail and head are the same >>> block, code generation will not be affected. >>> But it may bring some negetive effect on target code efficiency for >>> bfin backend as less hw-doloops may be generated. >>> In my point of view, what we need here is a better block reordering >>> that cares about hw-doloops. >>> If OK for trunk, can someone please apply it? Thanks. >> >> >> Is this with -freorder-blocks? Rather than delete the code, just skip it if >> that flag is enabled. Or forcibly turn it off for these ports in >> option_override. >> >> >> Bernd >>
Re: [Patch, SMS] Fix a potential bug of schedule_reg_moves of SMS
Ping ? Cheers, Felix On Thu, Jan 2, 2014 at 9:07 PM, Felix Yang wrote: > Hello Vladimir, > This patch fixes a trivial bug of SMS: distance1_uses needs to be > initialized if newly allocated. > Since the original author of SMS is not active for a very long time, > it would be nice if you can apply it for me : ) > Thanks. > > Index: gcc/ChangeLog > === > --- gcc/ChangeLog(revision 206279) > +++ gcc/ChangeLog(working copy) > @@ -1,3 +1,8 @@ > +2013-01-02 Felix Yang > + > +* modulo-sched.c (schedule_reg_moves): Clear distance1_uses if it > +is newly allocated. > + > 2013-01-01 Jan-Benedict Glaw > > * config/nios2/nios2.h (BITS_PER_UNIT): Don't define it. > Index: gcc/modulo-sched.c > === > --- gcc/modulo-sched.c(revision 206279) > +++ gcc/modulo-sched.c(working copy) > @@ -766,6 +766,9 @@ schedule_reg_moves (partial_schedule_ptr ps) > >distance1_uses = distances[1] ? sbitmap_alloc (g->num_nodes) : NULL; > > + if (distance1_uses) > +bitmap_clear (distance1_uses); > + >/* Every use of the register defined by node may require a different > copy of this register, depending on the time the use is scheduled. > Record which uses require which move results. */ > > > Cheers, > Felix
[PATCH] Add zero-overhead looping for xtensa backend
Hi Sterling, This patch implements zero-overhead looping for xtensa backend using hw-doloop facility. If OK for trunk, please apply it for me. Thanks. Index: gcc/ChangeLog === --- gcc/ChangeLog(revision 206431) +++ gcc/ChangeLog(working copy) @@ -1,3 +1,18 @@ +2014-01-08 Felix Yang + +* config/xtensa/xtensa.c (xtensa_reorg): New. +(xtensa_reorg_loops): New. +(xtensa_can_use_doloop_p): New. +(xtensa_invalid_within_doloop): New. +(hwloop_optimize): New. +(hwloop_fail): New. +(hwloop_pattern_reg): New. +(xtensa_emit_loop_end): Modified to emit the zero-overhead loop end label. +(xtensa_doloop_hooks): Define. +* config/xtensa/xtensa.md (doloop_end): New. +(zero_cost_loop_start): Rewritten. +(zero_cost_loop_end): Rewritten. + 2014-01-08 Marek Polacek PR middle-end/59669 Index: gcc/config/xtensa/xtensa.md === --- gcc/config/xtensa/xtensa.md(revision 206431) +++ gcc/config/xtensa/xtensa.md(working copy) @@ -35,6 +35,8 @@ (UNSPEC_TLS_CALL9) (UNSPEC_TP10) (UNSPEC_MEMW11) + (UNSPEC_LSETUP_START 12) + (UNSPEC_LSETUP_END13) (UNSPECV_SET_FP1) (UNSPECV_ENTRY2) @@ -1289,6 +1291,8 @@ (set_attr "length""3")]) +;; Hardware loop support. + ;; Define the loop insns used by bct optimization to represent the ;; start and end of a zero-overhead loop (in loop.c). This start ;; template generates the loop insn; the end template doesn't generate @@ -1296,34 +1300,58 @@ (define_insn "zero_cost_loop_start" [(set (pc) -(if_then_else (eq (match_operand:SI 0 "register_operand" "a") - (const_int 0)) - (label_ref (match_operand 1 "" "")) - (pc))) - (set (reg:SI 19) -(plus:SI (match_dup 0) (const_int -1)))] +(if_then_else (ne (match_operand:SI 2 "nonimmediate_operand" "0") + (const_int 1)) + (label_ref (match_operand 1 "" "")) + (pc))) + (set (match_operand:SI 0 "nonimmediate_operand" "=a") +(plus (match_dup 2) + (const_int -1))) + (unspec [(const_int 0)] UNSPEC_LSETUP_START)] "" - "loopnez\t%0, %l1" + "loop\t%0, %l1_LEND" [(set_attr "type""jump") (set_attr "mode""none") (set_attr "length""3")]) (define_insn "zero_cost_loop_end" [(set (pc) -(if_then_else (ne (reg:SI 19) (const_int 0)) - (label_ref (match_operand 0 "" "")) - (pc))) - (set (reg:SI 19) -(plus:SI (reg:SI 19) (const_int -1)))] +(if_then_else (ne (match_operand:SI 2 "nonimmediate_operand" "0") + (const_int 1)) + (label_ref (match_operand 1 "" "")) + (pc))) + (set (match_operand:SI 0 "nonimmediate_operand" "=a") +(plus (match_dup 2) + (const_int -1))) + (unspec [(const_int 0)] UNSPEC_LSETUP_END)] "" { -xtensa_emit_loop_end (insn, operands); -return ""; + xtensa_emit_loop_end (insn, operands); + return ""; } [(set_attr "type""jump") (set_attr "mode""none") (set_attr "length""0")]) +; operand 0 is the loop count pseudo register +; operand 1 is the label to jump to at the top of the loop +(define_expand "doloop_end" + [(parallel [(set (pc) (if_then_else + (ne (match_operand:SI 0 "" "") + (const_int 1)) + (label_ref (match_operand 1 "" "")) + (pc))) + (set (match_dup 0) + (plus:SI (match_dup 0) +(const_int -1))) + (unspec [(const_int 0)] UNSPEC_LSETUP_END)])] + "" +{ + /* The loop optimizer doesn't check the predicates... */ + if (GET_MODE (operands[0]) != SImode) +FAIL; +}) + ;; Setting a register from a comparison. Index: gcc/config/xtensa/xtensa.c === --- gcc/config/xtensa/xtensa.c(revision 206431) +++ gcc/config/xtensa/xtensa.c(working copy) @@ -1,6 +1,7 @@ /* Subroutines for insn-output.c for Tensilica's Xtensa architecture. Copyright (C) 2001-2014 Free Software Foundation, Inc. Contributed by Bob Wilson (bwil...@tensilica.com) at Tensilica. + Zero-overhead looping support by Felix Yang (felix.yang0...@gmail.com).
Re: [PATCH] Add zero-overhead looping for xtensa backend
Hi Sterling, Attached please find version 2 of the patch. I applied this updated patch (with small adaptations) to gcc-4.8.2 and carried out some tests. I can execute the testcases in a simulator, which support zero-overhead looping instructions. First of all, I can successfully build libgcc, libstdc++ and newlibc for xtensa with this patch. The newly built xtensa gcc also passed testsuite which comes with newlibc. I also tested the cases under gcc/testsuite/gcc.c-torture/execute/ directory. There are about 800+ cases tested. Test result shows no new failed case with this patch, compared with the original gcc version. Is that OK? I also double checked the loop relaxation issue with binutils-2.24 (the latest version). The result show that the assember can do loop relaxation when the loop target is too far ( > 256 Byte). And this is the reason why I don't check the size of the loop. Index: gcc/ChangeLog === --- gcc/ChangeLog(revision 206463) +++ gcc/ChangeLog(working copy) @@ -1,3 +1,18 @@ +2014-01-09 Felix Yang + +* config/xtensa/xtensa.c (xtensa_reorg): New. +(xtensa_reorg_loops): New. +(xtensa_can_use_doloop_p): New. +(xtensa_invalid_within_doloop): New. +(hwloop_optimize): New. +(hwloop_fail): New. +(hwloop_pattern_reg): New. +(xtensa_emit_loop_end): Modified to emit the zero-overhead loop end label. +(xtensa_doloop_hooks): Define. +* config/xtensa/xtensa.md (doloop_end): New. +(zero_cost_loop_start): Rewritten. +(zero_cost_loop_end): Rewritten. + 2014-01-09 Richard Biener PR tree-optimization/59715 Index: gcc/config/xtensa/xtensa.md === --- gcc/config/xtensa/xtensa.md(revision 206463) +++ gcc/config/xtensa/xtensa.md(working copy) @@ -1,6 +1,7 @@ ;; GCC machine description for Tensilica's Xtensa architecture. ;; Copyright (C) 2001-2014 Free Software Foundation, Inc. ;; Contributed by Bob Wilson (bwil...@tensilica.com) at Tensilica. +;; Zero-overhead looping support by Felix Yang (fei.yang0...@gmail.com). ;; This file is part of GCC. @@ -35,6 +36,8 @@ (UNSPEC_TLS_CALL9) (UNSPEC_TP10) (UNSPEC_MEMW11) + (UNSPEC_LSETUP_START 12) + (UNSPEC_LSETUP_END13) (UNSPECV_SET_FP1) (UNSPECV_ENTRY2) @@ -1289,41 +1292,67 @@ (set_attr "length""3")]) +;; Hardware loop support. + ;; Define the loop insns used by bct optimization to represent the -;; start and end of a zero-overhead loop (in loop.c). This start -;; template generates the loop insn; the end template doesn't generate -;; any instructions since loop end is handled in hardware. +;; start and end of a zero-overhead loop. This start template generates +;; the loop insn; the end template doesn't generate any instructions since +;; loop end is handled in hardware. (define_insn "zero_cost_loop_start" [(set (pc) -(if_then_else (eq (match_operand:SI 0 "register_operand" "a") - (const_int 0)) - (label_ref (match_operand 1 "" "")) - (pc))) - (set (reg:SI 19) -(plus:SI (match_dup 0) (const_int -1)))] +(if_then_else (ne (match_operand:SI 0 "register_operand" "a") + (const_int 1)) + (label_ref (match_operand 1 "" "")) + (pc))) + (set (match_operand:SI 2 "register_operand" "+a0") +(plus (match_dup 2) + (const_int -1))) + (unspec [(const_int 0)] UNSPEC_LSETUP_START)] "" - "loopnez\t%0, %l1" + "loop\t%0, %l1_LEND" [(set_attr "type""jump") (set_attr "mode""none") (set_attr "length""3")]) (define_insn "zero_cost_loop_end" [(set (pc) -(if_then_else (ne (reg:SI 19) (const_int 0)) - (label_ref (match_operand 0 "" "")) - (pc))) - (set (reg:SI 19) -(plus:SI (reg:SI 19) (const_int -1)))] +(if_then_else (ne (match_operand:SI 0 "register_operand" "a") + (const_int 1)) + (label_ref (match_operand 1 "" "")) + (pc))) + (set (match_operand:SI 2 "register_operand" "+a0") +(plus (match_dup 2) + (const_int -1))) + (unspec [(const_int 0)] UNSPEC_LSETUP_END)] "" { -xtensa_emit_loop_end (insn, operands); -return ""; + xtensa_emit_loop_end (insn, operands); + return ""; } [(set_attr "type""jump") (set_attr "mode""none")
Re: [PATCH] Add zero-overhead looping for xtensa backend
Hi Sterling, Please note that version 2 of the patch is for gcc trunk, not for gcc-4.8 branch. Since the doloop_end pattern format has changed, this patch need small adaptation in order for it to work on gcc-4.8. Although I test it on gcc-4.8, I think the testing result still holds for trunk. Cheers, Felix On Thu, Jan 9, 2014 at 11:08 PM, Felix Yang wrote: > Hi Sterling, > > Attached please find version 2 of the patch. > > I applied this updated patch (with small adaptations) to gcc-4.8.2 > and carried out some tests. > I can execute the testcases in a simulator, which support > zero-overhead looping instructions. > > First of all, I can successfully build libgcc, libstdc++ and > newlibc for xtensa with this patch. > The newly built xtensa gcc also passed testsuite which comes with newlibc. > I also tested the cases under gcc/testsuite/gcc.c-torture/execute/ > directory. There are about 800+ cases tested. > Test result shows no new failed case with this patch, compared > with the original gcc version. > Is that OK? > > I also double checked the loop relaxation issue with binutils-2.24 > (the latest version). > The result show that the assember can do loop relaxation when the > loop target is too far ( > 256 Byte). > And this is the reason why I don't check the size of the loop. > > > Index: gcc/ChangeLog > === > --- gcc/ChangeLog (revision 206463) > +++ gcc/ChangeLog(working copy) > @@ -1,3 +1,18 @@ > +2014-01-09 Felix Yang > + > +* config/xtensa/xtensa.c (xtensa_reorg): New. > +(xtensa_reorg_loops): New. > +(xtensa_can_use_doloop_p): New. > +(xtensa_invalid_within_doloop): New. > +(hwloop_optimize): New. > +(hwloop_fail): New. > +(hwloop_pattern_reg): New. > +(xtensa_emit_loop_end): Modified to emit the zero-overhead loop end > label. > +(xtensa_doloop_hooks): Define. > +* config/xtensa/xtensa.md (doloop_end): New. > +(zero_cost_loop_start): Rewritten. > +(zero_cost_loop_end): Rewritten. > + > 2014-01-09 Richard Biener > > PR tree-optimization/59715 > Index: gcc/config/xtensa/xtensa.md > === > --- gcc/config/xtensa/xtensa.md(revision 206463) > +++ gcc/config/xtensa/xtensa.md(working copy) > @@ -1,6 +1,7 @@ > ;; GCC machine description for Tensilica's Xtensa architecture. > ;; Copyright (C) 2001-2014 Free Software Foundation, Inc. > ;; Contributed by Bob Wilson (bwil...@tensilica.com) at Tensilica. > +;; Zero-overhead looping support by Felix Yang (fei.yang0...@gmail.com). > > ;; This file is part of GCC. > > @@ -35,6 +36,8 @@ >(UNSPEC_TLS_CALL9) >(UNSPEC_TP10) >(UNSPEC_MEMW11) > + (UNSPEC_LSETUP_START 12) > + (UNSPEC_LSETUP_END13) > >(UNSPECV_SET_FP1) >(UNSPECV_ENTRY2) > @@ -1289,41 +1292,67 @@ > (set_attr "length""3")]) > > > +;; Hardware loop support. > + > ;; Define the loop insns used by bct optimization to represent the > -;; start and end of a zero-overhead loop (in loop.c). This start > -;; template generates the loop insn; the end template doesn't generate > -;; any instructions since loop end is handled in hardware. > +;; start and end of a zero-overhead loop. This start template generates > +;; the loop insn; the end template doesn't generate any instructions since > +;; loop end is handled in hardware. > > (define_insn "zero_cost_loop_start" >[(set (pc) > -(if_then_else (eq (match_operand:SI 0 "register_operand" "a") > - (const_int 0)) > - (label_ref (match_operand 1 "" "")) > - (pc))) > - (set (reg:SI 19) > -(plus:SI (match_dup 0) (const_int -1)))] > +(if_then_else (ne (match_operand:SI 0 "register_operand" "a") > + (const_int 1)) > + (label_ref (match_operand 1 "" "")) > + (pc))) > + (set (match_operand:SI 2 "register_operand" "+a0") > +(plus (match_dup 2) > + (const_int -1))) > + (unspec [(const_int 0)] UNSPEC_LSETUP_START)] >"" > - "loopnez\t%0, %l1" > + "loop\t%0, %l1_LEND" >[(set_attr "type""jump") > (set_attr "mode""none") > (set_attr "length""3")]) > > (define_insn "zero_cost_loop_end" >[(set (p
[Patch, xtensa] Add LOCAL_REGNO to the xtensa backend.
Hi Sterling, The xtensa backend uses register windows, and we need to define LOCAL_REGNO for it. The dataflow may not be accurate with this macro. This patch passed the cases in testsuite/gcc.c-torture/execute dir. Please apply it if OK for trunk. Thanks. Index: gcc/ChangeLog === --- gcc/ChangeLog(revision 206599) +++ gcc/ChangeLog(working copy) @@ -1,3 +1,7 @@ +2014-01-14 Felix Yang + +* config/xtensa/xtensa.h (LOCAL_REGNO): New. + 2014-01-14 Richard Biener PR tree-optimization/58921 Index: gcc/config/xtensa/xtensa.h === --- gcc/config/xtensa/xtensa.h(revision 206599) +++ gcc/config/xtensa/xtensa.h(working copy) @@ -369,7 +369,14 @@ extern char xtensa_hard_regno_mode_ok[][FIRST_PSEU ((unsigned) ((IN) - GP_REG_FIRST) < WINDOW_SIZE)) ?\ (IN) + WINDOW_SIZE : (IN)) +/* Define this macro if the target machine has register windows. This + C expression returns true if the register is call-saved but is in the + register window. */ +#define LOCAL_REGNO(REGNO)\ + (GP_REG_P (REGNO) && ((unsigned) (REGNO - GP_REG_FIRST) < WINDOW_SIZE)) + + /* Define the classes of registers for register constraints in the machine description. */ enum reg_class Cheers, Felix
[Patch, xtensa] Add section anchor support for the xtensa backend.
Hi Sterling, I found that we can avoid emitting excessive literal loading instructions with with section anchors. This patch also passed the cases in testsuite/gcc.c-torture/execute/ dir. Please apply it if OK for trunk. Index: gcc/ChangeLog === --- gcc/ChangeLog(revision 206599) +++ gcc/ChangeLog(working copy) @@ -1,3 +1,11 @@ +2014-01-14 Felix Yang + +* common/config/xtensa/xtensa-common.c +(xtensa_option_optimization_table): Enable -fsection-anchors under -O1 +or plus, and disable -fcommon by default. +* config/xtensa/xtensa.c (TARGET_MAX_ANCHOR_OFFSET): New. +(TARGET_MIN_ANCHOR_OFFSET): Ditto. + 2014-01-14 Richard Biener PR tree-optimization/58921 Index: gcc/common/config/xtensa/xtensa-common.c === --- gcc/common/config/xtensa/xtensa-common.c(revision 206599) +++ gcc/common/config/xtensa/xtensa-common.c(working copy) @@ -35,6 +35,13 @@ static const struct default_options xtensa_option_ assembler, so GCC cannot do a good job of reordering blocks. Do not enable reordering unless it is explicitly requested. */ { OPT_LEVELS_ALL, OPT_freorder_blocks, NULL, 0 }, +/* Enable section anchors under -O1 or plus. This can avoid generating + excessive literal loading instructions to load addresses of globals. */ +{ OPT_LEVELS_1_PLUS, OPT_fsection_anchors, NULL, 2 }, +/* Allocate uninitialized global variables in the data section of object + file, rather than generating them as common blocks. This is required + for section anchors to work on uninitialized globals. */ +{ OPT_LEVELS_ALL, OPT_fcommon, NULL, 0 }, { OPT_LEVELS_NONE, 0, NULL, 0 } }; Index: gcc/config/xtensa/xtensa.c === --- gcc/config/xtensa/xtensa.c(revision 206599) +++ gcc/config/xtensa/xtensa.c(working copy) @@ -290,6 +290,12 @@ static const int reg_nonleaf_alloc_order[FIRST_PSE #undef TARGET_CANNOT_FORCE_CONST_MEM #define TARGET_CANNOT_FORCE_CONST_MEM xtensa_cannot_force_const_mem +#undef TARGET_MAX_ANCHOR_OFFSET +#define TARGET_MAX_ANCHOR_OFFSET 255 + +#undef TARGET_MIN_ANCHOR_OFFSET +#define TARGET_MIN_ANCHOR_OFFSET 0 + #undef TARGET_LEGITIMATE_ADDRESS_P #define TARGET_LEGITIMATE_ADDRESS_Pxtensa_legitimate_address_p Cheers, Felix
[PATCH] Fix for PR62037
Attached please find the patch and testcase for PR62037. DEF1 can be a GIMPLE_NOP and gimple_bb will be NULL then. The patch checks for that. Bootstrapped on x86_64-suse-linux. OK for trunk? Please commit this patch if it's OK. Index: gcc/ChangeLog === --- gcc/ChangeLog(revision 213772) +++ gcc/ChangeLog(working copy) @@ -1,3 +1,9 @@ +2014-08-09 Felix Yang + +PR tree-optimization/62073 +* tree-vect-loop.c (vect_is_simple_reduction_1): Check that DEF1 has +a basic block. + 2014-08-08 Guozhi Wei * config/rs6000/rs6000.md (*movdi_internal64): Add a new constraint. Index: gcc/testsuite/gcc.dg/vect/pr62073.c === --- gcc/testsuite/gcc.dg/vect/pr62073.c(revision 0) +++ gcc/testsuite/gcc.dg/vect/pr62073.c(revision 0) @@ -0,0 +1,41 @@ +/* { dg-do compile } */ +/* { dg-options "-O1 -ftree-vectorize" } */ + +struct S0 +{ +int f7; +}; +struct S0 g_50; +int g_70; +int g_76; + + +int foo (long long p_56, int * p_57) +{ +int *l_77; +int l_101; + +for (; g_70;) +{ +int **l_78 = &l_77; +if (g_50.f7) +continue; +*l_78 = 0; +} +for (g_76 = 1; g_76 >= 0; g_76--) +{ +int *l_90; +for (l_101 = 4; l_101 >= 0; l_101--) +if (l_101) +*l_90 = 0; +else +{ +int **l_113 = &l_77; +*l_113 = p_57; +} +} + +return *l_77; +} + +/* { dg-final { cleanup-tree-dump "vect" } } */ Index: gcc/testsuite/ChangeLog === --- gcc/testsuite/ChangeLog(revision 213772) +++ gcc/testsuite/ChangeLog(working copy) @@ -1,3 +1,8 @@ +2014-08-09 Felix Yang + +PR tree-optimization/62073 +* gcc.dg/vect/pr62073.c: New test. + 2014-08-08 Richard Biener * gcc.dg/strlenopt-8.c: Remove XFAIL. Index: gcc/tree-vect-loop.c === --- gcc/tree-vect-loop.c(revision 213772) +++ gcc/tree-vect-loop.c(working copy) @@ -2321,7 +2321,8 @@ vect_is_simple_reduction_1 (loop_vec_info loop_inf } def1 = SSA_NAME_DEF_STMT (op1); - if (flow_bb_inside_loop_p (loop, gimple_bb (def_stmt)) + if (gimple_bb (def1) + && flow_bb_inside_loop_p (loop, gimple_bb (def_stmt)) && loop->inner && flow_bb_inside_loop_p (loop->inner, gimple_bb (def1)) && is_gimple_assign (def1)) Cheers, Felix Index: gcc/ChangeLog === --- gcc/ChangeLog (revision 213772) +++ gcc/ChangeLog (working copy) @@ -1,3 +1,9 @@ +2014-08-09 Felix Yang + + PR tree-optimization/62073 + * tree-vect-loop.c (vect_is_simple_reduction_1): Check that DEF1 has + a basic block. + 2014-08-08 Guozhi Wei * config/rs6000/rs6000.md (*movdi_internal64): Add a new constraint. Index: gcc/testsuite/gcc.dg/vect/pr62073.c === --- gcc/testsuite/gcc.dg/vect/pr62073.c (revision 0) +++ gcc/testsuite/gcc.dg/vect/pr62073.c (revision 0) @@ -0,0 +1,41 @@ +/* { dg-do compile } */ +/* { dg-options "-O1 -ftree-vectorize" } */ + +struct S0 +{ +int f7; +}; +struct S0 g_50; +int g_70; +int g_76; + + +int foo (long long p_56, int * p_57) +{ +int *l_77; +int l_101; + +for (; g_70;) +{ +int **l_78 = &l_77; +if (g_50.f7) +continue; +*l_78 = 0; +} +for (g_76 = 1; g_76 >= 0; g_76--) +{ +int *l_90; +for (l_101 = 4; l_101 >= 0; l_101--) +if (l_101) +*l_90 = 0; +else +{ +int **l_113 = &l_77; +*l_113 = p_57; +} +} + +return *l_77; +} + +/* { dg-final { cleanup-tree-dump "vect" } } */ Index: gcc/testsuite/ChangeLog === --- gcc/testsuite/ChangeLog (revision 213772) +++ gcc/testsuite/ChangeLog (working copy) @@ -1,3 +1,8 @@ +2014-08-09 Felix Yang + + PR tree-optimization/62073 + * gcc.dg/vect/pr62073.c: New test. + 2014-08-08 Richard Biener * gcc.dg/strlenopt-8.c: Remove XFAIL. Index: gcc/tree-vect-loop.c === --- gcc/tree-vect-loop.c(revision 213772) +++ gcc/tree-vect-loop.c(working copy) @@ -2321,7 +2321,8 @@ vect_is_simple_reduction_1 (loop_vec_info loop_inf } def1 = SSA_NAME_DEF_STMT (op1); - if (flow_bb_inside_loop_p (loop, gimple_bb (def_stmt)) + if (gimple_bb (def1) + && flow_bb_inside_loop_p (
Re: [PATCH] Fix for PR62037
Hi Richard, I find that you missed the testcase with when committing the patch. GCC 4.8 & 4.9 branch also has the same issue, may be we need to fix for them too? Cheers, Felix On Mon, Aug 11, 2014 at 7:24 PM, Richard Biener wrote: > On Sat, Aug 9, 2014 at 6:28 AM, Felix Yang wrote: >> Attached please find the patch and testcase for PR62037. >> >> DEF1 can be a GIMPLE_NOP and gimple_bb will be NULL then. The patch >> checks for that. >> Bootstrapped on x86_64-suse-linux. OK for trunk? Please commit this >> patch if it's OK. >> > > Thanks - applied. > > Richard. > >> Index: gcc/ChangeLog >> === >> --- gcc/ChangeLog(revision 213772) >> +++ gcc/ChangeLog(working copy) >> @@ -1,3 +1,9 @@ >> +2014-08-09 Felix Yang >> + >> +PR tree-optimization/62073 >> +* tree-vect-loop.c (vect_is_simple_reduction_1): Check that DEF1 has >> +a basic block. >> + >> 2014-08-08 Guozhi Wei >> >> * config/rs6000/rs6000.md (*movdi_internal64): Add a new constraint. >> Index: gcc/testsuite/gcc.dg/vect/pr62073.c >> === >> --- gcc/testsuite/gcc.dg/vect/pr62073.c(revision 0) >> +++ gcc/testsuite/gcc.dg/vect/pr62073.c(revision 0) >> @@ -0,0 +1,41 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-O1 -ftree-vectorize" } */ >> + >> +struct S0 >> +{ >> +int f7; >> +}; >> +struct S0 g_50; >> +int g_70; >> +int g_76; >> + >> + >> +int foo (long long p_56, int * p_57) >> +{ >> +int *l_77; >> +int l_101; >> + >> +for (; g_70;) >> +{ >> +int **l_78 = &l_77; >> +if (g_50.f7) >> +continue; >> +*l_78 = 0; >> +} >> +for (g_76 = 1; g_76 >= 0; g_76--) >> +{ >> +int *l_90; >> +for (l_101 = 4; l_101 >= 0; l_101--) >> +if (l_101) >> +*l_90 = 0; >> +else >> +{ >> +int **l_113 = &l_77; >> +*l_113 = p_57; >> +} >> +} >> + >> +return *l_77; >> +} >> + >> +/* { dg-final { cleanup-tree-dump "vect" } } */ >> Index: gcc/testsuite/ChangeLog >> === >> --- gcc/testsuite/ChangeLog(revision 213772) >> +++ gcc/testsuite/ChangeLog(working copy) >> @@ -1,3 +1,8 @@ >> +2014-08-09 Felix Yang >> + >> +PR tree-optimization/62073 >> +* gcc.dg/vect/pr62073.c: New test. >> + >> 2014-08-08 Richard Biener >> >> * gcc.dg/strlenopt-8.c: Remove XFAIL. >> Index: gcc/tree-vect-loop.c >> === >> --- gcc/tree-vect-loop.c(revision 213772) >> +++ gcc/tree-vect-loop.c(working copy) >> @@ -2321,7 +2321,8 @@ vect_is_simple_reduction_1 (loop_vec_info loop_inf >> } >> >>def1 = SSA_NAME_DEF_STMT (op1); >> - if (flow_bb_inside_loop_p (loop, gimple_bb (def_stmt)) >> + if (gimple_bb (def1) >> + && flow_bb_inside_loop_p (loop, gimple_bb (def_stmt)) >>&& loop->inner >>&& flow_bb_inside_loop_p (loop->inner, gimple_bb (def1)) >>&& is_gimple_assign (def1)) >> >> >> Cheers, >> Felix
[PATCH] Add Berkeley qsort to libiberty to make GCC host-independent
Hi all, The qsort library function may have different behavior on different hosts (say Linux vs MinGW). We may have different sorting results with qsort when there are elements with the same key value. GCC uses qsort a lot. And the output of certain optimizations, such as IRA, relies on the sorting result of this library function. The problem is that we may have different assembly code of GCC on different hosts even with the same source file. Normally this is not what a GCC user expect to see. In order to fix this issue, I am adding Berkeley qsort to libiberty in order to override the one from the library. Bootstrapped on x86_64-suse-linux, OK for trunk? Please help commit this patch if it's OK. Cheers, Felix Index: libiberty/ChangeLog === --- libiberty/ChangeLog (revision 213873) +++ libiberty/ChangeLog (working copy) @@ -1,3 +1,10 @@ +2014-08-12 Felix Yang + + * qsort.c: New file. + * Makefile.in (CFILES): Add qsort.c + (REQUIRED_OFILES): Add qsort.o. + (qsort.o): New target. + 2014-06-11 Andrew Burgess * cplus-dem.c (do_type): Call string_delete even if the call to Index: libiberty/qsort.c === --- libiberty/qsort.c (revision 0) +++ libiberty/qsort.c (revision 0) @@ -0,0 +1,178 @@ +/* + * Copyright (c) 1992, 1993 + * The Regents of the University of California. All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + *notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + *notice, this list of conditions and the following disclaimer in the + *documentation and/or other materials provided with the distribution. + * 3. All advertising materials mentioning features or use of this software + *must display the following acknowledgement: + * This product includes software developed by the University of + * California, Berkeley and its contributors. + * 4. Neither the name of the University nor the names of its contributors + *may be used to endorse or promote products derived from this software + *without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + + +#include + +#define min(a, b) (a) < (b) ? a : b + +/* + * Qsort routine from Bentley & McIlroy's "Engineering a Sort Function". + */ +#define swapcode(TYPE, parmi, parmj, n)\ +{ \ + long i = (n) / sizeof (TYPE);\ + TYPE *pi = (TYPE *) (parmi); \ + TYPE *pj = (TYPE *) (parmj); \ + do { \ +TYPE t = *pi; \ +*pi++ = *pj; \ +*pj++ = t; \ + } while (--i > 0); \ +} + +#define SWAPINIT(a, es) swaptype = ((char *)a - (char *)0) % sizeof(long) || \ + es % sizeof(long) ? 2 : es == sizeof(long)? 0 : 1; + +static inline void +swapfunc (char *a, char *b, int n, int swaptype) +{ + if (swaptype <= 1) +swapcode (long, a, b, n) + else +swapcode (char, a, b, n) +} + +#define swap(a, b) \ + if (swaptype == 0) {\ + long t = *(long *)(a); \ + *(long *)(a) = *(long *)(b);\ + *(long *)(b) = t; \ + } else \ + swapfunc(a, b, es, swaptype) + +#define vecswap(a, b, n) if ((n) > 0) swapfunc (a, b, n, swaptype) + +static inline char * +med3 (char *a, char *b, char *c, int (*cmp)()) +{ + return cmp (a, b) < 0 ? + (cmp (b, c) < 0 ? b : (cmp (a, c) < 0 ? c : a)) + : (cmp (b, c) > 0 ? b : (cmp (a, c) <
Re: [PATCH] Fix for PR62037
Hi Richard, Yeah, the testcase is there. Sorry for my mistake. Thank you for back-porting this patch to 4.8 & 4.9 branch. Cheers, Felix On Wed, Aug 13, 2014 at 5:30 PM, Richard Biener wrote: > On Tue, Aug 12, 2014 at 6:40 PM, Felix Yang wrote: >> Hi Richard, >> >> I find that you missed the testcase with when committing the patch. > > I don't think so. > >> GCC 4.8 & 4.9 branch also has the same issue, may be we need to >> fix for them too? > > Yeah, I'll backport it. > > Thanks, > Richard. > >> Cheers, >> Felix >> >> >> On Mon, Aug 11, 2014 at 7:24 PM, Richard Biener >> wrote: >>> On Sat, Aug 9, 2014 at 6:28 AM, Felix Yang wrote: >>>> Attached please find the patch and testcase for PR62037. >>>> >>>> DEF1 can be a GIMPLE_NOP and gimple_bb will be NULL then. The patch >>>> checks for that. >>>> Bootstrapped on x86_64-suse-linux. OK for trunk? Please commit this >>>> patch if it's OK. >>>> >>> >>> Thanks - applied. >>> >>> Richard. >>> >>>> Index: gcc/ChangeLog >>>> === >>>> --- gcc/ChangeLog(revision 213772) >>>> +++ gcc/ChangeLog(working copy) >>>> @@ -1,3 +1,9 @@ >>>> +2014-08-09 Felix Yang >>>> + >>>> +PR tree-optimization/62073 >>>> +* tree-vect-loop.c (vect_is_simple_reduction_1): Check that DEF1 has >>>> +a basic block. >>>> + >>>> 2014-08-08 Guozhi Wei >>>> >>>> * config/rs6000/rs6000.md (*movdi_internal64): Add a new constraint. >>>> Index: gcc/testsuite/gcc.dg/vect/pr62073.c >>>> === >>>> --- gcc/testsuite/gcc.dg/vect/pr62073.c(revision 0) >>>> +++ gcc/testsuite/gcc.dg/vect/pr62073.c(revision 0) >>>> @@ -0,0 +1,41 @@ >>>> +/* { dg-do compile } */ >>>> +/* { dg-options "-O1 -ftree-vectorize" } */ >>>> + >>>> +struct S0 >>>> +{ >>>> +int f7; >>>> +}; >>>> +struct S0 g_50; >>>> +int g_70; >>>> +int g_76; >>>> + >>>> + >>>> +int foo (long long p_56, int * p_57) >>>> +{ >>>> +int *l_77; >>>> +int l_101; >>>> + >>>> +for (; g_70;) >>>> +{ >>>> +int **l_78 = &l_77; >>>> +if (g_50.f7) >>>> + continue; >>>> +*l_78 = 0; >>>> +} >>>> +for (g_76 = 1; g_76 >= 0; g_76--) >>>> +{ >>>> +int *l_90; >>>> +for (l_101 = 4; l_101 >= 0; l_101--) >>>> +if (l_101) >>>> +*l_90 = 0; >>>> +else >>>> +{ >>>> +int **l_113 = &l_77; >>>> +*l_113 = p_57; >>>> +} >>>> +} >>>> + >>>> +return *l_77; >>>> +} >>>> + >>>> +/* { dg-final { cleanup-tree-dump "vect" } } */ >>>> Index: gcc/testsuite/ChangeLog >>>> === >>>> --- gcc/testsuite/ChangeLog(revision 213772) >>>> +++ gcc/testsuite/ChangeLog(working copy) >>>> @@ -1,3 +1,8 @@ >>>> +2014-08-09 Felix Yang >>>> + >>>> +PR tree-optimization/62073 >>>> +* gcc.dg/vect/pr62073.c: New test. >>>> + >>>> 2014-08-08 Richard Biener >>>> >>>> * gcc.dg/strlenopt-8.c: Remove XFAIL. >>>> Index: gcc/tree-vect-loop.c >>>> === >>>> --- gcc/tree-vect-loop.c(revision 213772) >>>> +++ gcc/tree-vect-loop.c(working copy) >>>> @@ -2321,7 +2321,8 @@ vect_is_simple_reduction_1 (loop_vec_info loop_inf >>>> } >>>> >>>>def1 = SSA_NAME_DEF_STMT (op1); >>>> - if (flow_bb_inside_loop_p (loop, gimple_bb (def_stmt)) >>>> + if (gimple_bb (def1) >>>> + && flow_bb_inside_loop_p (loop, gimple_bb (def_stmt)) >>>>&& loop->inner >>>>&& flow_bb_inside_loop_p (loop->inner, gimple_bb (def1)) >>>>&& is_gimple_assign (def1)) >>>> >>>> >>>> Cheers, >>>> Felix
Re: [PATCH] Add Berkeley qsort to libiberty to make GCC host-independent
Hi Jakub, I did tried to use the glibc qsort. I compared the source of the two versions. I found that the BSD one is more self-contained and easy for integration with libiberty. That's why the BSD one is preferred. And we already have several BSD functions there in libiberty (libiberty/random.c, an example). Cheers, Felix On Wed, Aug 13, 2014 at 8:16 PM, Jakub Jelinek wrote: > On Wed, Aug 13, 2014 at 08:12:03PM +0800, Felix Yang wrote: >> The qsort library function may have different behavior on >> different hosts (say Linux vs MinGW). >> We may have different sorting results with qsort when there are >> elements with the same key value. >> GCC uses qsort a lot. And the output of certain optimizations, >> such as IRA, relies on the sorting result of this library function. >> The problem is that we may have different assembly code of GCC on >> different hosts even with the same source file. >> Normally this is not what a GCC user expect to see. >> In order to fix this issue, I am adding Berkeley qsort to >> libiberty in order to override the one from the library. > > Why have you picked a BSD one rather than the GNU one? > glibc qsort is highly optimized and has some other nice properties qsort > from many other sources does not have. > > Jakub
Re: [PATCH] Add Berkeley qsort to libiberty to make GCC host-independent
Hi Jakub, I got it. I will update the patch after I finished integrating the GLIBC qsort with libiberty. Cheers, Felix On Wed, Aug 13, 2014 at 8:55 PM, Jakub Jelinek wrote: > On Wed, Aug 13, 2014 at 08:48:52PM +0800, Felix Yang wrote: >> I did tried to use the glibc qsort. I compared the source of the >> two versions. >> I found that the BSD one is more self-contained and easy for >> integration with libiberty. > > Being more self-contained is not what we care about, the most significant > thing about qsort from GCC POV is performance, people complain about compile > times and slowing compilation down by a few % just so that libiberty can > contain a single file rather than two files doesn't look to be a good idea. > > Jakub
Re: [PATCH] Add Berkeley qsort to libiberty to make GCC host-independent
Is this possible a Canadian build. I am afread that the qsort then is from the MinGW/Win host? Cheers, Felix On Wed, Aug 13, 2014 at 9:31 PM, Jeff Law wrote: > On 08/13/14 06:12, Felix Yang wrote: >> >> Hi all, >> >> The qsort library function may have different behavior on >> different hosts (say Linux vs MinGW). >> We may have different sorting results with qsort when there are >> elements with the same key value. >> GCC uses qsort a lot. And the output of certain optimizations, >> such as IRA, relies on the sorting result of this library function. >> The problem is that we may have different assembly code of GCC on >> different hosts even with the same source file. >> Normally this is not what a GCC user expect to see. >> In order to fix this issue, I am adding Berkeley qsort to >> libiberty in order to override the one from the library. >> >> Bootstrapped on x86_64-suse-linux, OK for trunk? Please help >> commit this patch if it's OK. > > Or we find the cases where the sort comparison routine can be tweaked to > make it stable which makes this class of problems go away. > > jeff
[committed] MAINTAINERS (Write After Approval): Add myself.
Index: MAINTAINERS === --- MAINTAINERS(revision 215985) +++ MAINTAINERS(working copy) @@ -583,6 +583,7 @@ Chung-Ju Wu Le-Chun Wu Mingjie Xing Canqun Yang +Fei Yang Jeffrey Yasskin Joey Ye Greta Yorsh Cheers, Felix
[4.8 & 4.9] Backport of r211885
The enclosed patch for 4.8 & 4.9 branch is a backport of r211885 from trunk. The only change is to use: for (def_rec = DF_INSN_INFO_DEFS (insn_info); *def_rec; def_rec++) other than the new FOR_EACH_INSN_INFO_DEF interface. Bootstrapped on x86_64-SUSE-Linux for both branches. OK to apply? Index: gcc/loop-invariant.c === --- gcc/loop-invariant.c(revision 216001) +++ gcc/loop-invariant.c(working copy) @@ -839,6 +839,41 @@ check_dependencies (rtx insn, bitmap depends_on) return true; } +/* Pre-check candidate DEST to skip the one which can not make a valid insn + during move_invariant_reg. SIMPLE is to skip HARD_REGISTER. */ + +static bool +pre_check_invariant_p (bool simple, rtx dest) +{ + if (simple && REG_P (dest) && DF_REG_DEF_COUNT (REGNO (dest)) > 1) +{ + df_ref use; + rtx ref; + unsigned int i = REGNO (dest); + struct df_insn_info *insn_info; + df_ref *def_rec; + + for (use = DF_REG_USE_CHAIN (i); use; use = DF_REF_NEXT_REG (use)) +{ + ref = DF_REF_INSN (use); + insn_info = DF_INSN_INFO_GET (ref); + + for (def_rec = DF_INSN_INFO_DEFS (insn_info); *def_rec; def_rec++) +if (DF_REF_REGNO (*def_rec) == i) + { +/* Multi definitions at this stage, most likely are due to + instruction constraints, which requires both read and write + on the same register. Since move_invariant_reg is not + powerful enough to handle such cases, just ignore the INV + and leave the chance to others. */ +return false; + } +} +} + + return true; +} + /* Finds invariant in INSN. ALWAYS_REACHED is true if the insn is always executed. ALWAYS_EXECUTED is true if the insn is always executed, unless the program ends due to a function call. */ @@ -869,6 +904,7 @@ find_invariant_insn (rtx insn, bool always_reached simple = false; if (!may_assign_reg_p (SET_DEST (set)) + || !pre_check_invariant_p (simple, dest) || !check_maybe_invariant (SET_SRC (set))) return; Cheers, Felix
Re: [PATCH] Add zero-overhead looping for xtensa backend
Hello Sterling, My paper work with the FSF has finished and we can now move forward with this patch :-) I rebased the patch on the latest trunk. Attached please find version 3 of the patch. And the enclosed patch also includes the two points pointed by you, do you like it? Make check regression tested with xtensa-elf-gcc built from trunk with this patch. OK to apply? Index: gcc/ChangeLog === --- gcc/ChangeLog(revision 216036) +++ gcc/ChangeLog(working copy) @@ -1,3 +1,19 @@ +2014-10-09 Felix Yang + +* config/xtensa/xtensa.h (TARGET_LOOPS): New Macro. +* config/xtensa/xtensa.c (xtensa_reorg): New. +(xtensa_reorg_loops): New. +(xtensa_can_use_doloop_p): New. +(xtensa_invalid_within_doloop): New. +(hwloop_optimize): New. +(hwloop_fail): New. +(hwloop_pattern_reg): New. +(xtensa_emit_loop_end): Modified to emit the zero-overhead loop end label. +(xtensa_doloop_hooks): Define. +* config/xtensa/xtensa.md (doloop_end): New. +(zero_cost_loop_start): Rewritten. +(zero_cost_loop_end): Rewritten. + 2014-10-09 Joern Rennecke * config/avr/avr.opt (mmcu=): Change to have a string value. Index: gcc/config/xtensa/xtensa.md === --- gcc/config/xtensa/xtensa.md(revision 216036) +++ gcc/config/xtensa/xtensa.md(working copy) @@ -35,6 +35,8 @@ (UNSPEC_TLS_CALL9) (UNSPEC_TP10) (UNSPEC_MEMW11) + (UNSPEC_LSETUP_START 12) + (UNSPEC_LSETUP_END13) (UNSPECV_SET_FP1) (UNSPECV_ENTRY2) @@ -1289,41 +1291,67 @@ (set_attr "length""3")]) +;; Zero-overhead looping support. + ;; Define the loop insns used by bct optimization to represent the -;; start and end of a zero-overhead loop (in loop.c). This start -;; template generates the loop insn; the end template doesn't generate -;; any instructions since loop end is handled in hardware. +;; start and end of a zero-overhead loop. This start template generates +;; the loop insn; the end template doesn't generate any instructions since +;; loop end is handled in hardware. (define_insn "zero_cost_loop_start" [(set (pc) -(if_then_else (eq (match_operand:SI 0 "register_operand" "a") - (const_int 0)) - (label_ref (match_operand 1 "" "")) - (pc))) - (set (reg:SI 19) -(plus:SI (match_dup 0) (const_int -1)))] +(if_then_else (ne (match_operand:SI 0 "register_operand" "a") + (const_int 1)) + (label_ref (match_operand 1 "" "")) + (pc))) + (set (match_operand:SI 2 "register_operand" "+a0") +(plus (match_dup 2) + (const_int -1))) + (unspec [(const_int 0)] UNSPEC_LSETUP_START)] "" - "loopnez\t%0, %l1" + "loop\t%0, %l1_LEND" [(set_attr "type""jump") (set_attr "mode""none") (set_attr "length""3")]) (define_insn "zero_cost_loop_end" [(set (pc) -(if_then_else (ne (reg:SI 19) (const_int 0)) - (label_ref (match_operand 0 "" "")) - (pc))) - (set (reg:SI 19) -(plus:SI (reg:SI 19) (const_int -1)))] +(if_then_else (ne (match_operand:SI 0 "register_operand" "a") + (const_int 1)) + (label_ref (match_operand 1 "" "")) + (pc))) + (set (match_operand:SI 2 "register_operand" "+a0") +(plus (match_dup 2) + (const_int -1))) + (unspec [(const_int 0)] UNSPEC_LSETUP_END)] "" { -xtensa_emit_loop_end (insn, operands); -return ""; + xtensa_emit_loop_end (insn, operands); + return ""; } [(set_attr "type""jump") (set_attr "mode""none") (set_attr "length""0")]) +; operand 0 is the loop count pseudo register +; operand 1 is the label to jump to at the top of the loop +(define_expand "doloop_end" + [(parallel [(set (pc) (if_then_else + (ne (match_operand:SI 0 "" "") + (const_int 1)) + (label_ref (match_operand 1 "" "")) + (pc))) + (set (match_dup 0) + (plus:SI (match_dup 0) +(const_int -1))) + (unspec [(const_int 0)] UNSPEC_LSETUP_END)])] + "" +{ + /* The loop optimizer doesn't check the predicates... */ + if (GET_MODE (operands[0]) != SImode)
Re: 答复: [4.8 & 4.9] Backport of r211885
The issue is that I cannot reproduce it on the official released branches. It happens on my local GCC branch (with a new port). Let's see if the original author of the patch has an testcase. Zhenqiang, do you have one that can reproduce this bug with the official 4.8/4.9 branches? Thanks. Cheers, Felix On Thu, Oct 9, 2014 at 6:41 PM, Richard Biener wrote: > On Thu, Oct 9, 2014 at 11:35 AM, Yangfei (Felix) > wrote: >> >>> > > > On Thu, Oct 09, 2014 at 09:04:49AM +, Yangfei (Felix) wrote: >>> > > > > On Wed, Oct 08, 2014 at 11:00:24PM +0800, Felix Yang wrote: >>> > > > > The enclosed patch for 4.8 & 4.9 branch is a backport of r211885 >>> > > > > from >>> > trunk. >>> > > > > >>> > > > > The only change is to use: >>> > > > > >>> > > > > for (def_rec = DF_INSN_INFO_DEFS (insn_info); *def_rec; >>> > > > > def_rec++) >>> > > > > >>> > > > > other than the new FOR_EACH_INSN_INFO_DEF interface. >>> > > > > >>> > > > > Bootstrapped on x86_64-SUSE-Linux for both branches. OK to apply? >>> > > > >>> > > > ChangeLog entry is missing, plus description why do you want to >>> > > > backport >>> > it. >>> > > > If it fixes a bug on the branches, it would be better to have a >>> > > > bugzilla PR for that, and definitely a testcase. >>> > > > >>> > > >>> > > Yeah, I will add a ChangeLog entry for this patch when it is committed. >>> > > I encountered the same issue when working on my local customized >>> > > 4.8/4.9 >>> > branches. Not reproduceable with the official 4.8/4.9 branches. >>> > > I thinks it's just an enhancement for the loop invariant pass to >>> > > make it more >>> > versatile. It's better that 4.8/4.9 branches also inlcude this >>> > enhancement. >>> > > OK? >>> > >>> > If it is just an enhancement, then those generally are not backported >>> > to release branches (exceptions possible of course, but there needs to be >>> > a >>> strong reason). >>> > Each pass has some risk of breaking something, exposing previously >>> > only latent bugs in later passes etc. >>> > >>> > Jakub >>> >>> We can treat it as bugfix, as we got incorrect code when it triggers. >>> It just happens so rarely. Does it worth backporting? >> >> And the patch fix this bug by making the loop invariant pass more >> conservative. >> I didn't find a PR or testcase on trunk for this patch either. > > We at least want a testcase for the "we got incorrect code when it triggers". > > Richard. > >>
Re: [PATCH] Add zero-overhead looping for xtensa backend
Hi Sterling, I made some improvement to the patch. Two changes: 1. TARGET_LOOPS is now used as a condition of the doloop related patterns, which is more elegant. 2. As the trip count register of the zero-cost loop maybe potentially spilled, we need to change the patterns in order to handle this issue. The solution is similar to that adapted by c6x backend. Just turn the zero-cost loop into a regular loop when that happens when reload is completed. Attached please find version 4 of the patch. Make check regression tested with xtensa-elf-gcc/simulator. OK for trunk? Index: gcc/ChangeLog === --- gcc/ChangeLog(revision 216079) +++ gcc/ChangeLog(working copy) @@ -1,3 +1,20 @@ +2014-10-10 Felix Yang + +* config/xtensa/xtensa.h (TARGET_LOOPS): New Macro. +* config/xtensa/xtensa.c (xtensa_reorg): New. +(xtensa_reorg_loops): New. +(xtensa_can_use_doloop_p): New. +(xtensa_invalid_within_doloop): New. +(hwloop_optimize): New. +(hwloop_fail): New. +(hwloop_pattern_reg): New. +(xtensa_emit_loop_end): Modified to emit the zero-overhead loop end label. +(xtensa_doloop_hooks): Define. +* config/xtensa/xtensa.md (doloop_end): New. +(loop_end): New +(zero_cost_loop_start): Rewritten. +(zero_cost_loop_end): Rewritten. + 2014-10-10 Kyrylo Tkachov * configure.ac: Add --enable-fix-cortex-a53-835769 option. Index: gcc/config/xtensa/xtensa.md === --- gcc/config/xtensa/xtensa.md(revision 216079) +++ gcc/config/xtensa/xtensa.md(working copy) @@ -35,6 +35,8 @@ (UNSPEC_TLS_CALL9) (UNSPEC_TP10) (UNSPEC_MEMW11) + (UNSPEC_LSETUP_START 12) + (UNSPEC_LSETUP_END13) (UNSPECV_SET_FP1) (UNSPECV_ENTRY2) @@ -1289,41 +1291,120 @@ (set_attr "length""3")]) +;; Zero-overhead looping support. + ;; Define the loop insns used by bct optimization to represent the -;; start and end of a zero-overhead loop (in loop.c). This start -;; template generates the loop insn; the end template doesn't generate -;; any instructions since loop end is handled in hardware. +;; start and end of a zero-overhead loop. This start template generates +;; the loop insn; the end template doesn't generate any instructions since +;; loop end is handled in hardware. (define_insn "zero_cost_loop_start" [(set (pc) -(if_then_else (eq (match_operand:SI 0 "register_operand" "a") - (const_int 0)) - (label_ref (match_operand 1 "" "")) - (pc))) - (set (reg:SI 19) -(plus:SI (match_dup 0) (const_int -1)))] - "" - "loopnez\t%0, %l1" +(if_then_else (ne (match_operand:SI 0 "register_operand" "2") + (const_int 1)) + (label_ref (match_operand 1 "" "")) + (pc))) + (set (match_operand:SI 2 "register_operand" "=a") +(plus (match_dup 0) + (const_int -1))) + (unspec [(const_int 0)] UNSPEC_LSETUP_START)] + "TARGET_LOOPS && optimize" + "loop\t%0, %l1_LEND" [(set_attr "type""jump") (set_attr "mode""none") (set_attr "length""3")]) (define_insn "zero_cost_loop_end" [(set (pc) -(if_then_else (ne (reg:SI 19) (const_int 0)) - (label_ref (match_operand 0 "" "")) - (pc))) - (set (reg:SI 19) -(plus:SI (reg:SI 19) (const_int -1)))] - "" +(if_then_else (ne (match_operand:SI 0 "nonimmediate_operand" "2,2") + (const_int 1)) + (label_ref (match_operand 1 "" "")) + (pc))) + (set (match_operand:SI 2 "nonimmediate_operand" "=a,m") +(plus (match_dup 0) + (const_int -1))) + (unspec [(const_int 0)] UNSPEC_LSETUP_END) + (clobber (match_scratch:SI 3 "=X,&r"))] + "TARGET_LOOPS && optimize" + "#" + [(set_attr "type""jump") + (set_attr "mode""none") + (set_attr "length""0")]) + +(define_insn "loop_end" + [(set (pc) +(if_then_else (ne (match_operand:SI 0 "register_operand" "2") + (const_int 1)) + (label_ref (match_operand 1 "" "")) + (pc))) + (set (match_operand:SI 2 "register_operand" "=a") +(plus (match_dup 0) + (const_int -1))) + (unspec [(const_int 0)] UNSPEC_LSETU
Re: [PATCH IRA] update_equiv_regs fails to set EQUIV reg-note for pseudo with more than one definition
Hello Jeff, I see that you have improved the RTL typesafety issue for ira.c, so I rebased this patch on the latest trunk and change to use the new list walking interface. Bootstrapped on x86_64-SUSE-Linux and make check regression tested. OK for trunk? Index: gcc/ChangeLog === --- gcc/ChangeLog(revision 216116) +++ gcc/ChangeLog(working copy) @@ -1,3 +1,14 @@ +2014-10-11 Felix Yang +Jeff Law + +* ira.c (struct equivalence): Change member "is_arg_equivalence" and "replace" +into boolean bitfields; turn member "loop_depth" into a short integer; add new +member "no_equiv" and "reserved". +(no_equiv): Set no_equiv of struct equivalence if register is marked +as having no known equivalence. +(update_equiv_regs): Check all definitions for a multiple-set +register to make sure that the RHS have the same value. + 2014-10-11 Martin Liska PR/63376 Index: gcc/ira.c === --- gcc/ira.c(revision 216116) +++ gcc/ira.c(working copy) @@ -2902,12 +2902,14 @@ struct equivalence /* Loop depth is used to recognize equivalences which appear to be present within the same loop (or in an inner loop). */ - int loop_depth; + short loop_depth; /* Nonzero if this had a preexisting REG_EQUIV note. */ - int is_arg_equivalence; + unsigned char is_arg_equivalence : 1; /* Set when an attempt should be made to replace a register with the associated src_p entry. */ - char replace; + unsigned char replace : 1; + /* Set if this register has no known equivalence. */ + unsigned char no_equiv : 1; }; /* reg_equiv[N] (where N is a pseudo reg number) is the equivalence @@ -3255,6 +3257,7 @@ no_equiv (rtx reg, const_rtx store ATTRIBUTE_UNUSE if (!REG_P (reg)) return; regno = REGNO (reg); + reg_equiv[regno].no_equiv = 1; list = reg_equiv[regno].init_insns; if (list && list->insn () == NULL) return; @@ -3381,7 +3384,7 @@ update_equiv_regs (void) /* If this insn contains more (or less) than a single SET, only mark all destinations as having no known equivalence. */ - if (set == 0) + if (set == NULL_RTX) { note_stores (PATTERN (insn), no_equiv, NULL); continue; @@ -3476,16 +3479,49 @@ update_equiv_regs (void) if (note && GET_CODE (XEXP (note, 0)) == EXPR_LIST) note = NULL_RTX; - if (DF_REG_DEF_COUNT (regno) != 1 - && (! note + if (DF_REG_DEF_COUNT (regno) != 1) +{ + bool equal_p = true; + rtx_insn_list *list; + + /* If we have already processed this pseudo and determined it + can not have an equivalence, then honor that decision. */ + if (reg_equiv[regno].no_equiv) +continue; + + if (! note || rtx_varies_p (XEXP (note, 0), 0) || (reg_equiv[regno].replacement && ! rtx_equal_p (XEXP (note, 0), -reg_equiv[regno].replacement -{ - no_equiv (dest, set, NULL); - continue; +reg_equiv[regno].replacement))) +{ + no_equiv (dest, set, NULL); + continue; +} + + list = reg_equiv[regno].init_insns; + for (; list; list = list->next ()) +{ + rtx note_tmp; + rtx_insn *insn_tmp; + + insn_tmp = list->insn (); + note_tmp = find_reg_note (insn_tmp, REG_EQUAL, NULL_RTX); + gcc_assert (note_tmp); + if (! rtx_equal_p (XEXP (note, 0), XEXP (note_tmp, 0))) +{ + equal_p = false; + break; +} +} + + if (! equal_p) +{ + no_equiv (dest, set, NULL); + continue; +} } + /* Record this insn as initializing this register. */ reg_equiv[regno].init_insns = gen_rtx_INSN_LIST (VOIDmode, insn, reg_equiv[regno].init_insns); @@ -3514,10 +3550,9 @@ update_equiv_regs (void) a register used only in one basic block from a MEM. If so, and the MEM remains unchanged for the life of the register, add a REG_EQUIV note. */ - note = find_reg_note (insn, REG_EQUIV, NULL_RTX); - if (note == 0 && REG_BASIC_BLOCK (regno) >= NUM_FIXED_BLOCKS + if (note == NULL_RTX && REG_BASIC_BLOCK (regno) >= NUM_FIXED_BLOCKS && MEM_P (SET_SRC (set)) && validate_equiv_mem (insn, dest, SET_SRC (set))) note = set_unique_reg_note (insn, REG_EQUIV, copy_rtx (SET_SRC (set))); @@ -3547,7 +3582,7 @@ update_equiv_regs (void) reg_equiv[regno].replacement = x; reg_equiv[regno].src_p = &SET_SRC (set); - reg_equiv[re
Re: [PATCH] Add zero-overhead looping for xtensa backend
Thanks for the comments. The patch checked the usage of teh trip count register, making sure that it is not used in the loop body other than the doloop_end or lives past the doloop_end instruction, as the following code snippet shows: + /* Scan all the blocks to make sure they don't use iter_reg. */ + if (loop->iter_reg_used || loop->iter_reg_used_outside) +{ + if (dump_file) +fprintf (dump_file, ";; loop %d uses iterator\n", + loop->loop_no); + return false; +} For the spill issue, I think we need to handle it. The reason is that currently we are not telling GCC about the existence of the LCOUNT register. Instead, we keep the trip count in a general register and it's possible that this register can be spilled when register pressure is high. It's a good idea to post another patch to describe the LCOUNT register in GCC in order to free this general register. But I want this patch applied as a first step, OK? Cheers, Felix On Tue, Oct 14, 2014 at 12:09 AM, augustine.sterl...@gmail.com wrote: > On Fri, Oct 10, 2014 at 6:59 AM, Felix Yang wrote: >> Hi Sterling, >> >> I made some improvement to the patch. Two changes: >> 1. TARGET_LOOPS is now used as a condition of the doloop related >> patterns, which is more elegant. > > Fine. > >> 2. As the trip count register of the zero-cost loop maybe >> potentially spilled, we need to change the patterns in order to handle >> this issue. > > Actually, for xtensa you don't. The trip count is copied into LCOUNT > at the execution of the loop instruction, and therefore a spill or > whatever doesn't matter--it won't affect the result. So as long as you > have the trip count at the start of the loop, you are fine. > > This does bring up an issue of whether or not the trip count can be > modified during the loop. (note that this is different than early > exit.) If it can, you can't use a zero-overhead loop. Does your patch > address this case. > > The solution is similar to that adapted by c6x backend. >> Just turn the zero-cost loop into a regular loop when that happens >> when reload is completed. >> Attached please find version 4 of the patch. Make check regression >> tested with xtensa-elf-gcc/simulator. >> OK for trunk?
Re: [PATCH] Add zero-overhead looping for xtensa backend
PING? Cheers, Felix On Tue, Oct 14, 2014 at 12:30 AM, Felix Yang wrote: > Thanks for the comments. > > The patch checked the usage of teh trip count register, making sure > that it is not used in the loop body other than the doloop_end or > lives past the doloop_end instruction, as the following code snippet > shows: > > + /* Scan all the blocks to make sure they don't use iter_reg. */ > + if (loop->iter_reg_used || loop->iter_reg_used_outside) > +{ > + if (dump_file) > +fprintf (dump_file, ";; loop %d uses iterator\n", > + loop->loop_no); > + return false; > +} > > For the spill issue, I think we need to handle it. The reason is > that currently we are not telling GCC about the existence of the > LCOUNT register. Instead, we keep the trip count in a general register > and it's possible that this register can be spilled when register > pressure is high. > It's a good idea to post another patch to describe the LCOUNT > register in GCC in order to free this general register. But I want > this patch applied as a first step, OK? > > Cheers, > Felix > > > On Tue, Oct 14, 2014 at 12:09 AM, augustine.sterl...@gmail.com > wrote: >> On Fri, Oct 10, 2014 at 6:59 AM, Felix Yang wrote: >>> Hi Sterling, >>> >>> I made some improvement to the patch. Two changes: >>> 1. TARGET_LOOPS is now used as a condition of the doloop related >>> patterns, which is more elegant. >> >> Fine. >> >>> 2. As the trip count register of the zero-cost loop maybe >>> potentially spilled, we need to change the patterns in order to handle >>> this issue. >> >> Actually, for xtensa you don't. The trip count is copied into LCOUNT >> at the execution of the loop instruction, and therefore a spill or >> whatever doesn't matter--it won't affect the result. So as long as you >> have the trip count at the start of the loop, you are fine. >> >> This does bring up an issue of whether or not the trip count can be >> modified during the loop. (note that this is different than early >> exit.) If it can, you can't use a zero-overhead loop. Does your patch >> address this case. >> >> The solution is similar to that adapted by c6x backend. >>> Just turn the zero-cost loop into a regular loop when that happens >>> when reload is completed. >>> Attached please find version 4 of the patch. Make check regression >>> tested with xtensa-elf-gcc/simulator. >>> OK for trunk?
[PATCH, trivial] Clean up code: update comments and fix spacing to conform to coding style.
This fix typos and spacing in comments for several source files. Please commit this patch if OK for trunk. Index: gcc/ira-conflicts.c === --- gcc/ira-conflicts.c(revision 215391) +++ gcc/ira-conflicts.c(working copy) @@ -60,7 +60,7 @@ static IRA_INT_TYPE **conflicts; /* Record a conflict between objects OBJ1 and OBJ2. If necessary, canonicalize the conflict by recording it for lower-order subobjects - of the corresponding allocnos. */ + of the corresponding allocnos. */ static void record_object_conflict (ira_object_t obj1, ira_object_t obj2) { Index: gcc/ChangeLog === --- gcc/ChangeLog(revision 215391) +++ gcc/ChangeLog(working copy) @@ -1,3 +1,8 @@ +2014-09-19 Felix Yang + +* cfgrtl.c ira.c ira-color.c ira-conflicts ira-lives.c: +Clean up code: update comments and fix spacing to conform to coding style. + 2014-09-19 James Greenhalgh * doc/md.texi (Modifiers): Consistently use "read/write" Index: gcc/ira-color.c === --- gcc/ira-color.c(revision 215391) +++ gcc/ira-color.c(working copy) @@ -135,7 +135,7 @@ struct allocno_color_data and all its subnodes in the tree (forest) of allocno hard register nodes (see comments above). */ int hard_regs_subnodes_start; - /* The length of the previous array. */ + /* The length of the previous array. */ int hard_regs_subnodes_num; /* Records about updating allocno hard reg costs from copies. If the allocno did not get expected hard register, these records are @@ -1864,7 +1864,7 @@ assign_hard_reg (ira_allocno_t a, bool retry_p) if (best_hard_regno >= 0) update_costs_from_copies (a, true, ! retry_p); ira_assert (ALLOCNO_CLASS (a) == aclass); - /* We don't need updated costs anymore: */ + /* We don't need updated costs anymore. */ ira_free_allocno_updated_costs (a); return best_hard_regno >= 0; } @@ -3069,7 +3069,7 @@ color_allocnos (void) -/* Output information about the loop given by its LOOP_TREE_NODE. */ +/* Output information about the loop given by its LOOP_TREE_NODE. */ static void print_loop_title (ira_loop_tree_node_t loop_tree_node) { @@ -3205,7 +3205,7 @@ color_pass (ira_loop_tree_node_t loop_tree_node) ALLOCNO_ASSIGNED_P (subloop_allocno) = true; if (hard_regno >= 0) update_costs_from_copies (subloop_allocno, true, true); -/* We don't need updated costs anymore: */ +/* We don't need updated costs anymore. */ ira_free_allocno_updated_costs (subloop_allocno); } } @@ -3249,7 +3249,7 @@ color_pass (ira_loop_tree_node_t loop_tree_node) ALLOCNO_ASSIGNED_P (subloop_allocno) = true; if (hard_regno >= 0) update_costs_from_copies (subloop_allocno, true, true); - /* We don't need updated costs anymore: */ + /* We don't need updated costs anymore. */ ira_free_allocno_updated_costs (subloop_allocno); } continue; @@ -3265,7 +3265,7 @@ color_pass (ira_loop_tree_node_t loop_tree_node) ALLOCNO_ASSIGNED_P (subloop_allocno) = true; if (hard_regno >= 0) update_costs_from_copies (subloop_allocno, true, true); - /* We don't need updated costs anymore: */ + /* We don't need updated costs anymore. */ ira_free_allocno_updated_costs (subloop_allocno); } } Index: gcc/ira-lives.c === --- gcc/ira-lives.c(revision 215391) +++ gcc/ira-lives.c(working copy) @@ -1323,7 +1323,7 @@ process_bb_node_lives (ira_loop_tree_node_t loop_t curr_point++; } - /* Propagate register pressure to upper loop tree nodes: */ + /* Propagate register pressure to upper loop tree nodes. */ if (loop_tree_node != ira_loop_tree_root) for (i = 0; i < ira_pressure_classes_num; i++) { Index: gcc/ira.c === --- gcc/ira.c(revision 215391) +++ gcc/ira.c(working copy) @@ -1078,7 +1078,7 @@ setup_allocno_and_important_classes (void) containing a given class. If allocatable hard register set of a given class is not a subset of any corresponding set of a class from CLASSES, we use the cheapest (with load/store point of view) - class from CLASSES whose set intersects with given class set */ + class from CLASSES whose set intersects with given class set. */ static void setup_class_translate_array (enum reg_class *class_translate, int classes_num, enum reg_class *classes) @@ -1774,7 +1774,7 @@ void ira_setup_alts (rtx_insn *insn, HARD_REG_SET &alts) { /* MAP nalt * nop -> start of constraints for
[PATCH IRA] update_equiv_regs fails to set EQUIV reg-note for pseudo with more than one definition
Hi, I find that update_equiv_regs in ira.c sets the wrong EQUIV reg-note for pseudo with more than one definiton in certain situation. Here is a simplified RTL snippet after this function finishs handling: (insn 77 37 78 2 (set (reg:SI 171) (const_int 0 [0])) ticket151.c:33 52 {movsi_internal_dsp} (expr_list:REG_EQUAL (const_int 0 [0]) (nil))) .. (insn 79 50 53 2 (set (mem/c:SI (reg/f:SI 136) [2 g_728+0 S4 A64]) (reg:SI 171)) ticket151.c:33 52 {movsi_internal_dsp} (expr_list:REG_DEAD (reg:SI 171) (nil))) (insn 53 79 54 2 (set (mem/c:SI (reg/f:SI 162) [4 g_163+0 S4 A32]) (reg:SI 163)) 52 {movsi_internal_dsp} (expr_list:REG_DEAD (reg:SI 163) (expr_list:REG_DEAD (reg/f:SI 162) (nil (insn 54 53 14 2 (set (reg:SI 171) (mem/u/c:SI (symbol_ref/u:SI ("*.LC8") [flags 0x2]) [4 S4 A32])) ticket151.c:49 52 {movsi_internal_dsp} (expr_list:REG_EQUIV (mem/u/c:SI (symbol_ref/u:SI ("*.LC8") [flags 0x2]) [4 S4 A32]) (expr_list:REG_EQUAL (mem/u/c:SI (symbol_ref/u:SI ("*.LC8") [flags 0x2]) [4 S4 A32]) (nil The REG_EQUIV of insn 54 is not correct as pseudo 171 is defined in insn 77 with a differerent value. This may causes reload replacing pseudo 171 with mem/u/c:SI (symbol_ref/u:SI ("*.LC8"), which is wrong. A proposed patch for this issue, please comment: Index: gcc/ira.c === --- gcc/ira.c(revision 215460) +++ gcc/ira.c(working copy) @@ -3477,18 +3477,26 @@ update_equiv_regs (void) no_equiv (dest, set, NULL); continue; } + /* Record this insn as initializing this register. */ reg_equiv[regno].init_insns = gen_rtx_INSN_LIST (VOIDmode, insn, reg_equiv[regno].init_insns); /* If this register is known to be equal to a constant, record that it is always equivalent to the constant. */ - if (DF_REG_DEF_COUNT (regno) == 1 - && note && ! rtx_varies_p (XEXP (note, 0), 0)) + if (note && ! rtx_varies_p (XEXP (note, 0), 0)) { - rtx note_value = XEXP (note, 0); - remove_note (insn, note); - set_unique_reg_note (insn, REG_EQUIV, note_value); + if (DF_REG_DEF_COUNT (regno) == 1) +{ + rtx note_value = XEXP (note, 0); + remove_note (insn, note); + set_unique_reg_note (insn, REG_EQUIV, note_value); +} + else +{ + no_equiv (dest, set, NULL); + continue; +} } /* If this insn introduces a "constant" register, decrease the priority Cheers, Felix
Re: [PATCH IRA] update_equiv_regs fails to set EQUIV reg-note for pseudo with more than one definition
Hi, Attached please fined the updated patch. Cheers, Felix On Tue, Sep 23, 2014 at 10:48 AM, Yangfei (Felix) wrote: >> On 09/22/14 08:40, Felix Yang wrote: >> > Hi, >> > >> > I find that update_equiv_regs in ira.c sets the wrong EQUIV >> > reg-note for pseudo with more than one definiton in certain situation. >> > Here is a simplified RTL snippet after this function finishs handling: >> > >> > (insn 77 37 78 2 (set (reg:SI 171) >> > (const_int 0 [0])) ticket151.c:33 52 {movsi_internal_dsp} >> >(expr_list:REG_EQUAL (const_int 0 [0]) >> > (nil))) >> > >> > .. >> > >> > (insn 79 50 53 2 (set (mem/c:SI (reg/f:SI 136) [2 g_728+0 S4 A64]) >> > (reg:SI 171)) ticket151.c:33 52 {movsi_internal_dsp} >> >(expr_list:REG_DEAD (reg:SI 171) >> > (nil))) >> > (insn 53 79 54 2 (set (mem/c:SI (reg/f:SI 162) [4 g_163+0 S4 A32]) >> > (reg:SI 163)) 52 {movsi_internal_dsp} >> >(expr_list:REG_DEAD (reg:SI 163) >> > (expr_list:REG_DEAD (reg/f:SI 162) >> > (nil >> > (insn 54 53 14 2 (set (reg:SI 171) >> > (mem/u/c:SI (symbol_ref/u:SI ("*.LC8") [flags 0x2]) [4 S4 >> > A32])) ticket151.c:49 52 {movsi_internal_dsp} >> >(expr_list:REG_EQUIV (mem/u/c:SI (symbol_ref/u:SI ("*.LC8") >> > [flags 0x2]) [4 S4 A32]) >> > (expr_list:REG_EQUAL (mem/u/c:SI (symbol_ref/u:SI ("*.LC8") >> > [flags 0x2]) [4 S4 A32]) >> > (nil >> > >> > >> > The REG_EQUIV of insn 54 is not correct as pseudo 171 is defined >> > in insn 77 with a differerent value. >> > This may causes reload replacing pseudo 171 with mem/u/c:SI >> > (symbol_ref/u:SI ("*.LC8"), which is wrong. >> > A proposed patch for this issue, please comment: >> Is it possible it's the conditional just above this one that is incorrect? >> >> if (DF_REG_DEF_COUNT (regno) != 1 >>&& (! note >>|| rtx_varies_p (XEXP (note, 0), 0) >>|| (reg_equiv[regno].replacement >>&& ! rtx_equal_p (XEXP (note, 0), >> >> reg_equiv[regno].replacement >> { >>no_equiv (dest, set, NULL); >>continue; >> } >> >> >> ISTM the only time a multiply-set register can have a valid REG_EQUIV note >> is if >> each and every assignment to that pseudo has the same RHS. >> >> Jeff > > > Thanks Jeff. Yes, I agree that this is a better place to consider. I am > thinking of a better way to fix this. > > Vladimir, can you shed light on this and give me some suggestions? Thank you. >
Re: [PATCH IRA] update_equiv_regs fails to set EQUIV reg-note for pseudo with more than one definition
Hi, Ignore the previous message. Attached please find the updated patch. Bootstrapped on x86_64-suse-linux. Please apply this patch if OK for trunk. Index: gcc/ChangeLog === --- gcc/ChangeLog(revision 215500) +++ gcc/ChangeLog(working copy) @@ -1,3 +1,8 @@ +2014-09-23 Felix Yang + +* ira.c (update_equiv_regs): Check all definitions for a multiple-set +register to make sure that the RHS have the same value. + 2014-09-23 Ilya Enkovich * cfgcleanup.c (try_optimize_cfg): Do not remove label Index: gcc/ira.c === --- gcc/ira.c(revision 215500) +++ gcc/ira.c(working copy) @@ -3467,16 +3467,43 @@ update_equiv_regs (void) if (note && GET_CODE (XEXP (note, 0)) == EXPR_LIST) note = NULL_RTX; - if (DF_REG_DEF_COUNT (regno) != 1 - && (! note + if (DF_REG_DEF_COUNT (regno) != 1) +{ + rtx list; + bool equal_p = true; + + if (! note || rtx_varies_p (XEXP (note, 0), 0) || (reg_equiv[regno].replacement && ! rtx_equal_p (XEXP (note, 0), -reg_equiv[regno].replacement -{ - no_equiv (dest, set, NULL); - continue; +reg_equiv[regno].replacement))) +{ + no_equiv (dest, set, NULL); + continue; +} + + list = reg_equiv[regno].init_insns; + for (; list; list = XEXP (list, 1)) +{ + rtx note_tmp, insn_tmp; + insn_tmp = XEXP (list, 0); + note_tmp = find_reg_note (insn_tmp, REG_EQUAL, NULL_RTX); + + if (note_tmp == 0 + || ! rtx_equal_p (XEXP (note, 0), XEXP (note_tmp, 0))) +{ + equal_p = false; + break; +} +} + + if (! equal_p) +{ + no_equiv (dest, set, NULL); + continue; +} } + /* Record this insn as initializing this register. */ reg_equiv[regno].init_insns = gen_rtx_INSN_LIST (VOIDmode, insn, reg_equiv[regno].init_insns); Cheers, Felix On Tue, Sep 23, 2014 at 6:45 PM, Felix Yang wrote: > Hi, > Attached please fined the updated patch. > > Cheers, > Felix > > > On Tue, Sep 23, 2014 at 10:48 AM, Yangfei (Felix) > wrote: >>> On 09/22/14 08:40, Felix Yang wrote: >>> > Hi, >>> > >>> > I find that update_equiv_regs in ira.c sets the wrong EQUIV >>> > reg-note for pseudo with more than one definiton in certain situation. >>> > Here is a simplified RTL snippet after this function finishs >>> > handling: >>> > >>> > (insn 77 37 78 2 (set (reg:SI 171) >>> > (const_int 0 [0])) ticket151.c:33 52 {movsi_internal_dsp} >>> >(expr_list:REG_EQUAL (const_int 0 [0]) >>> > (nil))) >>> > >>> > .. >>> > >>> > (insn 79 50 53 2 (set (mem/c:SI (reg/f:SI 136) [2 g_728+0 S4 A64]) >>> > (reg:SI 171)) ticket151.c:33 52 {movsi_internal_dsp} >>> >(expr_list:REG_DEAD (reg:SI 171) >>> > (nil))) >>> > (insn 53 79 54 2 (set (mem/c:SI (reg/f:SI 162) [4 g_163+0 S4 A32]) >>> > (reg:SI 163)) 52 {movsi_internal_dsp} >>> >(expr_list:REG_DEAD (reg:SI 163) >>> > (expr_list:REG_DEAD (reg/f:SI 162) >>> > (nil >>> > (insn 54 53 14 2 (set (reg:SI 171) >>> > (mem/u/c:SI (symbol_ref/u:SI ("*.LC8") [flags 0x2]) [4 S4 >>> > A32])) ticket151.c:49 52 {movsi_internal_dsp} >>> >(expr_list:REG_EQUIV (mem/u/c:SI (symbol_ref/u:SI ("*.LC8") >>> > [flags 0x2]) [4 S4 A32]) >>> > (expr_list:REG_EQUAL (mem/u/c:SI (symbol_ref/u:SI ("*.LC8") >>> > [flags 0x2]) [4 S4 A32]) >>> > (nil >>> > >>> > >>> > The REG_EQUIV of insn 54 is not correct as pseudo 171 is defined >>> > in insn 77 with a differerent value. >>> > This may causes reload replacing pseudo 171 with mem/u/c:SI >>> > (symbol_ref/u:SI ("*.LC8"), which is wrong. >>> > A proposed patch for this issue, please comment: >>> Is it possible it's the conditional just above this one that is incorrect? >>> >>> if (DF_REG_DEF_COUNT (regno) != 1 >>>&& (! note >>>|| rtx_varies_p (XEXP (note, 0), 0) >>>
Re: [PATCH IRA] update_equiv_regs fails to set EQUIV reg-note for pseudo with more than one definition
Hi Jeff, Thanks for the comments. I updated the patch adding some enhancements. Bootstrapped on x86_64-suse-linux. Please apply this patch if OK for trunk. Three points: 1. For multiple-set register, it is not qualified to have a equiv note once it is marked by no_equiv. The patch is updated with this consideration. 2. For the rtx_insn_list new interface, I noticed that the old style XEXP accessor macros is still used in function no_equiv. And I choose to the old style macros with this patch and should come up with another patch to fix this issue, OK? 3. For the conditions that an insn on the init_insns list which did not have a note, I reconsider this and find that this can never happens. So I replaced the check with a gcc assertion. Index: gcc/ChangeLog === --- gcc/ChangeLog(revision 215550) +++ gcc/ChangeLog(working copy) @@ -1,3 +1,11 @@ +2014-09-24 Felix Yang + +* ira.c (struct equivalence): Add no_equiv member. +(no_equiv): Set no_equiv of struct equivalence if register is marked +as having no known equivalence. +(update_equiv_regs): Check all definitions for a multiple-set +register to make sure that the RHS have the same value. + 2014-09-24 Jakub Jelinek PR sanitizer/63316 Index: gcc/ira.c === --- gcc/ira.c(revision 215550) +++ gcc/ira.c(working copy) @@ -2900,6 +2900,8 @@ struct equivalence /* Set when an attempt should be made to replace a register with the associated src_p entry. */ char replace; + /* Set if this register has no known equivalence. */ + char no_equiv; }; /* reg_equiv[N] (where N is a pseudo reg number) is the equivalence @@ -3247,6 +3249,7 @@ no_equiv (rtx reg, const_rtx store ATTRIBUTE_UNUSE if (!REG_P (reg)) return; regno = REGNO (reg); + reg_equiv[regno].no_equiv = 1; list = reg_equiv[regno].init_insns; if (list == const0_rtx) return; @@ -3258,7 +3261,7 @@ no_equiv (rtx reg, const_rtx store ATTRIBUTE_UNUSE return; ira_reg_equiv[regno].defined_p = false; ira_reg_equiv[regno].init_insns = NULL; - for (; list; list = XEXP (list, 1)) + for (; list; list = XEXP (list, 1)) { rtx insn = XEXP (list, 0); remove_note (insn, find_reg_note (insn, REG_EQUIV, NULL_RTX)); @@ -3373,7 +3376,7 @@ update_equiv_regs (void) /* If this insn contains more (or less) than a single SET, only mark all destinations as having no known equivalence. */ - if (set == 0) + if (set == NULL_RTX) { note_stores (PATTERN (insn), no_equiv, NULL); continue; @@ -3467,16 +3470,48 @@ update_equiv_regs (void) if (note && GET_CODE (XEXP (note, 0)) == EXPR_LIST) note = NULL_RTX; - if (DF_REG_DEF_COUNT (regno) != 1 - && (! note + if (DF_REG_DEF_COUNT (regno) != 1) +{ + rtx list; + bool equal_p = true; + + /* Check if it is possible that this multiple-set register has + a known equivalence. */ + if (reg_equiv[regno].no_equiv) +continue; + + if (! note || rtx_varies_p (XEXP (note, 0), 0) || (reg_equiv[regno].replacement && ! rtx_equal_p (XEXP (note, 0), -reg_equiv[regno].replacement -{ - no_equiv (dest, set, NULL); - continue; +reg_equiv[regno].replacement))) +{ + no_equiv (dest, set, NULL); + continue; +} + + list = reg_equiv[regno].init_insns; + for (; list; list = XEXP (list, 1)) +{ + rtx note_tmp, insn_tmp; + + insn_tmp = XEXP (list, 0); + note_tmp = find_reg_note (insn_tmp, REG_EQUAL, NULL_RTX); + gcc_assert (note_tmp); + if (! rtx_equal_p (XEXP (note, 0), XEXP (note_tmp, 0))) +{ + equal_p = false; + break; +} +} + + if (! equal_p) +{ + no_equiv (dest, set, NULL); + continue; +} } + /* Record this insn as initializing this register. */ reg_equiv[regno].init_insns = gen_rtx_INSN_LIST (VOIDmode, insn, reg_equiv[regno].init_insns); @@ -3505,10 +3540,9 @@ update_equiv_regs (void) a register used only in one basic block from a MEM. If so, and the MEM remains unchanged for the life of the register, add a REG_EQUIV note. */ - note = find_reg_note (insn, REG_EQUIV, NULL_RTX); - if (note == 0 && REG_BASIC_BLOCK (regno) >= NUM_FIXED_BLOCKS + if (note == NULL_RTX && REG_BASIC_BLOCK (regno) >= NUM_FIXED_BLOCKS && MEM_P (SET_SRC (set)) && validate_equiv_mem (insn, dest, SET_SRC (set))) note
Re: [PATCH IRA] update_equiv_regs fails to set EQUIV reg-note for pseudo with more than one definition
PING ? Cheers, Felix On Wed, Sep 24, 2014 at 8:07 PM, Felix Yang wrote: > Hi Jeff, > > Thanks for the comments. I updated the patch adding some enhancements. > Bootstrapped on x86_64-suse-linux. Please apply this patch if OK for > trunk. > > Three points: > 1. For multiple-set register, it is not qualified to have a equiv > note once it is marked by no_equiv. The patch is updated with >this consideration. > 2. For the rtx_insn_list new interface, I noticed that the old > style XEXP accessor macros is still used in function no_equiv. >And I choose to the old style macros with this patch and should > come up with another patch to fix this issue, OK? > 3. For the conditions that an insn on the init_insns list which > did not have a note, I reconsider this and find that this can >never happens. So I replaced the check with a gcc assertion. > > > Index: gcc/ChangeLog > === > --- gcc/ChangeLog(revision 215550) > +++ gcc/ChangeLog(working copy) > @@ -1,3 +1,11 @@ > +2014-09-24 Felix Yang > + > +* ira.c (struct equivalence): Add no_equiv member. > +(no_equiv): Set no_equiv of struct equivalence if register is marked > +as having no known equivalence. > +(update_equiv_regs): Check all definitions for a multiple-set > +register to make sure that the RHS have the same value. > + > 2014-09-24 Jakub Jelinek > > PR sanitizer/63316 > Index: gcc/ira.c > === > --- gcc/ira.c(revision 215550) > +++ gcc/ira.c(working copy) > @@ -2900,6 +2900,8 @@ struct equivalence >/* Set when an attempt should be made to replace a register > with the associated src_p entry. */ >char replace; > + /* Set if this register has no known equivalence. */ > + char no_equiv; > }; > > /* reg_equiv[N] (where N is a pseudo reg number) is the equivalence > @@ -3247,6 +3249,7 @@ no_equiv (rtx reg, const_rtx store ATTRIBUTE_UNUSE >if (!REG_P (reg)) > return; >regno = REGNO (reg); > + reg_equiv[regno].no_equiv = 1; >list = reg_equiv[regno].init_insns; >if (list == const0_rtx) > return; > @@ -3258,7 +3261,7 @@ no_equiv (rtx reg, const_rtx store ATTRIBUTE_UNUSE > return; >ira_reg_equiv[regno].defined_p = false; >ira_reg_equiv[regno].init_insns = NULL; > - for (; list; list = XEXP (list, 1)) > + for (; list; list = XEXP (list, 1)) > { >rtx insn = XEXP (list, 0); >remove_note (insn, find_reg_note (insn, REG_EQUIV, NULL_RTX)); > @@ -3373,7 +3376,7 @@ update_equiv_regs (void) > >/* If this insn contains more (or less) than a single SET, > only mark all destinations as having no known equivalence. */ > - if (set == 0) > + if (set == NULL_RTX) > { >note_stores (PATTERN (insn), no_equiv, NULL); >continue; > @@ -3467,16 +3470,48 @@ update_equiv_regs (void) >if (note && GET_CODE (XEXP (note, 0)) == EXPR_LIST) > note = NULL_RTX; > > - if (DF_REG_DEF_COUNT (regno) != 1 > - && (! note > + if (DF_REG_DEF_COUNT (regno) != 1) > +{ > + rtx list; > + bool equal_p = true; > + > + /* Check if it is possible that this multiple-set register has > + a known equivalence. */ > + if (reg_equiv[regno].no_equiv) > +continue; > + > + if (! note >|| rtx_varies_p (XEXP (note, 0), 0) >|| (reg_equiv[regno].replacement >&& ! rtx_equal_p (XEXP (note, 0), > -reg_equiv[regno].replacement > -{ > - no_equiv (dest, set, NULL); > - continue; > +reg_equiv[regno].replacement))) > +{ > + no_equiv (dest, set, NULL); > + continue; > +} > + > + list = reg_equiv[regno].init_insns; > + for (; list; list = XEXP (list, 1)) > +{ > + rtx note_tmp, insn_tmp; > + > + insn_tmp = XEXP (list, 0); > + note_tmp = find_reg_note (insn_tmp, REG_EQUAL, NULL_RTX); > + gcc_assert (note_tmp); > + if (! rtx_equal_p (XEXP (note, 0), XEXP (note_tmp, 0))) > +{ > + equal_p = false; > + break; > +} > +} > + > + if (! equal_p) > +{ > + no_equiv (dest, set, NULL); > + continue; > +} > } > + >/* Record this insn as initializing
[PATCH] Fix typo in comments
Please apply this patch if OK for trunk. Thanks. Index: gcc/lra.c === --- gcc/lra.c(revision 215598) +++ gcc/lra.c(working copy) @@ -933,7 +933,7 @@ lra_set_insn_recog_data (rtx_insn *insn) nalt = 1; if (nop < 0) { - /* Its is a special insn like USE or CLOBBER. We should + /* It is a special insn like USE or CLOBBER. We should recognize any regular insn otherwise LRA can do nothing with this insn. */ gcc_assert (GET_CODE (PATTERN (insn)) == USE Index: gcc/genautomata.c === --- gcc/genautomata.c(revision 215598) +++ gcc/genautomata.c(working copy) @@ -6178,7 +6178,7 @@ merge_states (automaton_t automaton, vec alt_states = new_alt_state; } } - /* Its is important that alt states were sorted before and + /* It is important that alt states were sorted before and after merging to have the same querying results. */ new_state->component_states = uniq_sort_alt_states (alt_states); } Index: gcc/ChangeLog === --- gcc/ChangeLog(revision 215598) +++ gcc/ChangeLog(working copy) @@ -1,3 +1,8 @@ +2014-09-25 Felix Yang + +* lra.c (lra_set_insn_recog_data): Fix typo in comment. +* genautomata.c (merge_states): Ditto. + 2014-09-25 Alexander Ivchenko Maxim Kuznetsov Anna Tikhonova Cheers, Felix Index: gcc/lra.c === --- gcc/lra.c (revision 215598) +++ gcc/lra.c (working copy) @@ -933,7 +933,7 @@ lra_set_insn_recog_data (rtx_insn *insn) nalt = 1; if (nop < 0) { - /* Its is a special insn like USE or CLOBBER. We should + /* It is a special insn like USE or CLOBBER. We should recognize any regular insn otherwise LRA can do nothing with this insn. */ gcc_assert (GET_CODE (PATTERN (insn)) == USE Index: gcc/genautomata.c === --- gcc/genautomata.c (revision 215598) +++ gcc/genautomata.c (working copy) @@ -6178,7 +6178,7 @@ merge_states (automaton_t automaton, vec alt_states = new_alt_state; } } - /* Its is important that alt states were sorted before and + /* It is important that alt states were sorted before and after merging to have the same querying results. */ new_state->component_states = uniq_sort_alt_states (alt_states); } Index: gcc/ChangeLog === --- gcc/ChangeLog (revision 215598) +++ gcc/ChangeLog (working copy) @@ -1,3 +1,8 @@ +2014-09-25 Felix Yang + + * lra.c (lra_set_insn_recog_data): Fix typo in comment. + * genautomata.c (merge_states): Ditto. + 2014-09-25 Alexander Ivchenko Maxim Kuznetsov Anna Tikhonova
Re: [PATCH IRA] update_equiv_regs fails to set EQUIV reg-note for pseudo with more than one definition
Hi Jeff, Thanks for the suggestions. I updated the patch accordingly. 1. Both my employer(Huawei) and I have signed the copyright assignments with FSF. These assignments are already sent via post two days ago and hopefully should reach FSF in one week. Maybe it's OK to commit this patch now? 2. I am not turning member loop_depth of struct equivalence into short integer as GCC API such as bb_loop_depth returns a loop's depth as a 32-bit interger. 3. I find it's kind of difficult to use the new type and interfaces for list walking the init_insns list for this patch. The type of init_insns list is rtx, not rtl_insn_list *. Seems we need to change a lot in order to use the new interface. Not clear about the reason why it is not adjusted when we are transferring to the new interface. Anyway, I think it's better to have another patch fix that issue. OK? 4. This bug is only reproduceable with my local customized GCC version. So I don't have a testcase then. 5. This patch bootstrapped on x86_64-suse-linux and reg-tested, There are no regressions with this patch. Regression test summary with or without the patch: === gcc Summary === # of expected passes107986 # of unexpected failures348 # of unexpected successes33 # of expected failures262 # of unsupported tests2089 /home/yangfei/gcc-devel/gcc-build/gcc/xgcc version 5.0.0 20140924 (experimental) (GCC) -- === g++ Summary === # of expected passes87415 # of unexpected failures276 # of expected failures266 # of unsupported tests3203 /home/yangfei/gcc-devel/gcc-build/gcc/testsuite/g++/../../xg++ version 5.0.0 20140924 (experimental) (GCC) -- === libatomic Summary === # of expected passes54 === libgomp tests === Running target unix === libgomp Summary === # of expected passes693 === libitm tests === Running target unix === libitm Summary === # of expected passes26 # of expected failures3 # of unsupported tests1 === libstdc++ tests === +++ gcc/ChangeLog(working copy) @@ -1,3 +1,13 @@ +2014-09-26 Felix Yang +Jeff Law + +* ira.c (struct equivalence): Change member "is_arg_equivalence" and "replace" +into boolean bitfields; add new member "no_equiv" and "reserved". +(no_equiv): Set no_equiv of struct equivalence if register is marked +as having no known equivalence. +(update_equiv_regs): Check all definitions for a multiple-set +register to make sure that the RHS have the same value. + 2014-09-26 Martin Liska * cgraph.c (cgraph_node::release_body): New argument keep_arguments Index: gcc/ira.c === --- gcc/ira.c(revision 215640) +++ gcc/ira.c(working copy) @@ -2896,10 +2896,13 @@ struct equivalence to be present within the same loop (or in an inner loop). */ int loop_depth; /* Nonzero if this had a preexisting REG_EQUIV note. */ - int is_arg_equivalence; + unsigned char is_arg_equivalence : 1; /* Set when an attempt should be made to replace a register with the associated src_p entry. */ - char replace; + unsigned char replace : 1; + /* Set if this register has no known equivalence. */ + unsigned char no_equiv : 1; + unsigned char reserved : 5; }; /* reg_equiv[N] (where N is a pseudo reg number) is the equivalence @@ -3247,6 +3250,7 @@ no_equiv (rtx reg, const_rtx store ATTRIBUTE_UNUSE if (!REG_P (reg)) return; regno = REGNO (reg); + reg_equiv[regno].no_equiv = 1; list = reg_equiv[regno].init_insns; if (list == const0_rtx) return; @@ -3258,7 +3262,7 @@ no_equiv (rtx reg, const_rtx store ATTRIBUTE_UNUSE return; ira_reg_equiv[regno].defined_p = false; ira_reg_equiv[regno].init_insns = NULL; - for (; list; list = XEXP (list, 1)) + for (; list; list = XEXP (list, 1)) { rtx insn = XEXP (list, 0); remove_note (insn, find_reg_note (insn, REG_EQUIV, NULL_RTX)); @@ -3373,7 +3377,7 @@ update_equiv_regs (void) /* If this insn contains more (or less) than a single SET, only mark all destinations as having no known equivalence. */ - if (set == 0) + if (set == NULL_RTX) { note_stores (PATTERN (insn), no_equiv, NULL); continue; @@ -3467,16 +3471,48 @@ update_equiv_regs (void) if (note && GET_CODE (XEXP (note, 0)) == EXPR_LIST) note = NULL_RTX; - if (DF_REG_DEF_COUNT (regno) != 1 - && (! note + if (DF_REG_DEF_COUNT (regno) != 1) +{ + rtx list; + bool equal_p = true; + + /* If we have already processed this pseudo and determined it + can not have an equivalence, then honor that decision. */ +
Re: [PATCH IRA] update_equiv_regs fails to set EQUIV reg-note for pseudo with more than one definition
Thanks for the explaination. I have changed the loop_depth into a short interger hoping that we can save some memory :-) Attached please find the updated patch. Bootstrapped and reg-tested on x86_64-suse-linux. Please do a final revew once the assignment is ready. As for the new list walking interface, I choose the function "no_equiv" and tried the "checked cast" way. The bad news is that GCC failed to bootstrap with the following change: Index: ira.c === --- ira.c (revision 215536) +++ ira.c (working copy) @@ -3242,12 +3242,12 @@ no_equiv (rtx reg, const_rtx store ATTRIBUTE_UNUSE void *data ATTRIBUTE_UNUSED) { int regno; - rtx list; + rtx_insn_list *list; if (!REG_P (reg)) return; regno = REGNO (reg); - list = reg_equiv[regno].init_insns; + list = as_a (reg_equiv[regno].init_insns); if (list == const0_rtx) return; reg_equiv[regno].init_insns = const0_rtx; @@ -3258,9 +3258,9 @@ no_equiv (rtx reg, const_rtx store ATTRIBUTE_UNUSE return; ira_reg_equiv[regno].defined_p = false; ira_reg_equiv[regno].init_insns = NULL; - for (; list; list = XEXP (list, 1)) + for (; list; list = list->next ()) { - rtx insn = XEXP (list, 0); + rtx_insn *insn = list->insn (); remove_note (insn, find_reg_note (insn, REG_EQUIV, NULL_RTX)); } } Error message: ... checking for suffix of object files... configure: error: in `/home/yangfei/gcc-devel/build/x86_64-unknown-linux-gnu/libgcc': configure: error: cannot compute suffix of object files: cannot compile See `config.log' for more details. make[2]: *** [configure-stage1-target-libgcc] Error 1 make[2]: Leaving directory `/home/yangfei/gcc-devel/build' make[1]: *** [stage1-bubble] Error 2 make[1]: Leaving directory `/home/yangfei/gcc-devel/build' make: *** [all] Error 2 I think the code change is OK. Anything I missed? Cheers, Felix On Sat, Sep 27, 2014 at 5:03 AM, Jeff Law wrote: > On 09/26/14 07:57, Felix Yang wrote: >> >> Hi Jeff, >> >> Thanks for the suggestions. I updated the patch accordingly. >> >> 1. Both my employer(Huawei) and I have signed the copyright >> assignments with FSF. >> These assignments are already sent via post two days ago and >> hopefully should reach FSF in one week. >> Maybe it's OK to commit this patch now? > > Not really. It needs to be accepted by the FSF before we can include the > work. > >> >> 2. I am not turning member loop_depth of struct equivalence into >> short integer as GCC API such as bb_loop_depth >> returns a loop's depth as a 32-bit interger. > > There's already other places that assume loops don't nest that deep. Please > go ahead and change it. And no need to explicitly mark the unused bits. > That's just a maintenance nightmare in the long term anyway :-) > > >> >> 3. I find it's kind of difficult to use the new type and >> interfaces for list walking the init_insns list for this patch. >> The type of init_insns list is rtx, not rtl_insn_list *. Seems >> we need to change a lot in order to use the new interface. >> Not clear about the reason why it is not adjusted when we are >> transferring to the new interface. >> Anyway, I think it's better to have another patch fix that issue. >> OK? > > The right way to go is to add a checked cast when we have some code that is > using the old interface and other code using the new interface. It's > actually a pretty easy change. > > The checked casts effectively mark the limits of where we've been able to > push the RTL typesafety work. Long term as we push the typesafety work > further into the compiler many/most of the checked casts will go away. > > Unfortunately, that won't work in this case because other code wants to > store a (const0_rtx) into the insn list. (const0_rtx) isn't an INSN, so the > checked cast fails and we get a nice abort/ICE. > > Conceptually we just need another marker that is an INSN and we might as > well just convert the whole file to use the new interface at that point. > > Consider the request pulled. > > The const0-rtx problem may be why this wasn't converted in the first palce. > Or it may simply have been a time problem. David's done > 250 patches > around RTL typesafety, but he also has other work to be doing ;-) > >> >> 4. This bug is only reproduceable with my local customized GCC >> version. So I don't have a testcase then. > > OK. > > I'll do a final review when I get notice about the copyright assignment from > the FSF. > > je