On Tue, Jan 26, 2016 at 1:51 PM, Bin.Cheng <amker.ch...@gmail.com> wrote: > On Tue, Jan 26, 2016 at 12:56 PM, Bernd Schmidt <bernds_...@t-online.de> > wrote: >> On 01/26/2016 10:48 AM, Bin.Cheng wrote: >>> >>> Yes, I moved whole loop pass (also the pass_web) after combine and it >>> worked. A combine pass before loop-invariant can fix this problem. >>> Below passes are currently between loop transform and combine: >>> >>> NEXT_PASS (pass_web); >>> NEXT_PASS (pass_rtl_cprop); >>> NEXT_PASS (pass_cse2); >>> NEXT_PASS (pass_rtl_dse1); >>> NEXT_PASS (pass_rtl_fwprop_addr); >>> NEXT_PASS (pass_inc_dec); >>> NEXT_PASS (pass_initialize_regs); >>> NEXT_PASS (pass_ud_rtl_dce); >>> >>> I think pass_web needs to be after loop transform because it's used to >>> handle unrolled register live range. >>> pass_fwprop_addr and pass_inc_dec should stay where they are now. And >>> putting pass_inc_dec before loop unroll may be helpful to keep more >>> auto increment addressing mode chosen by IVO. >>> We should not need to duplicate pass_initialize_regs. >>> So what's about pass_rtl_cprop, cse2 and dse1. Should these pass be >>> duplicated after loop transform thus loop transformed code can be >>> cleaned up? >> >> >> Hard to tell in advance - you might want to experiment a bit; set up a large >> collection of input files and see what various approaches do to code >> generation. In any case, I suspect this is gcc-7 material (unfortunately). > > Quite surprising, there is only one new failure after moving loop > transforms (only along with pass_web). Yes, it is GCC7 change, will For the record. The only new failure is loop-9.c. Given insn patterns as below:
Loop: 23: r106:DF=[`*.LC0'] REG_EQUAL 1.84241999999999990222931955941021442413330078125e+1 24: [r101:DI]=r106:DF insn 23 is merged into insn 24 by combine if it's run before loop_invariant, resulting in below insn: 23: NOTE_INSN_DELETED 24: [r101:DI]=1.84241999999999990222931955941021442413330078125e+1 But we can't support the floating constant, so insn24 will be split anyway by reload: 35: xmm0:DF=[`*.LC0'] 24: [di:DI]=xmm0:DF This time both instructions are in loop, causing a regression. Combine may need some change if we run it before loop_invariant. Thanks, bin > revisit this to see if it makes substantial code generation > difference. Anyway, for the bug itself, there might be simple fix in > x86 backend according to newest update. > > Thanks, > bin