Re: GCC47 movmem breaks RA, GCC46 RA is fine
On Thu, Apr 26, 2012 at 6:16 PM, Paulo J. Matos wrote: > Hi, > > I am facing a problem with the GCC47 register allocation and my movmemqi. > GCC46 dealt very well with the problem but GCC47 keeps throwing at me > register spill failures. > > My backend has very few registers. 3 chip registers in total (class > CHIP_REGS), one of them (XL) is used for memory references (class ADDR_REGS) > and the other two (AL, AH) are for normal use (DATA_REGS), so CHIP_REGS = > ADDR_REGS U DATA_REGS. > > There are a couple of other memory mapped registers, but all loads and > stores go through CHIP_REGS. > > My chip has a block copy instruction which needs source address in XL, > destination address in AH and count in AL. My movmemqi is similar to > movmemsi in rx. > > (define_expand "movmemqi" > [(use (match_operand:BLK 0 "memory_operand")) > (use (match_operand:BLK 1 "memory_operand")) > (use (match_operand:QI 2 "general_operand")) > (use (match_operand:QI 3 "general_operand"))] > "" > { > rtx dst_addr = XEXP(operands[0], 0); > rtx src_addr = XEXP(operands[1], 0); > rtx dst_reg = gen_rtx_REG(QImode, RAH); > rtx src_reg = gen_rtx_REG(QImode, RXL); > rtx cnt_reg = gen_rtx_REG(QImode, RAL); > > emit_move_insn(cnt_reg, operands[2]); > > if(GET_CODE(dst_addr) == PLUS) > { > emit_move_insn(dst_reg, XEXP(dst_addr, 0)); > emit_insn(gen_addqi3(dst_reg, dst_reg, XEXP(dst_addr, 1))); > } > else > emit_move_insn(dst_reg, dst_addr); > > if(GET_CODE(src_addr) == PLUS) > { > emit_move_insn(src_reg, XEXP(src_addr, 0)); > emit_insn(gen_addqi3(src_reg, src_reg, XEXP(src_addr, 1))); > } > else > emit_move_insn(src_reg, src_addr); > > emit_insn(gen_bc2()); > > DONE; > }) > > (define_insn "bc2" > [(set (reg:QI RAL) (const_int 0)) > (set (mem:BLK (reg:QI RAH)) (mem:BLK (reg:QI RXL))) > (set (reg:QI RXL) (plus:QI (reg:QI RXL) (reg:QI RAL))) > (set (reg:QI RAH) (plus:QI (reg:QI RAH) (reg:QI RAL)))] > "" > "bc2") > > The parallel in bc2 setups what the bc2 chip instruction modifies. Copies > block in XL to AH, Moves XL to point to the end of the source block, AH to > point to the end of the destination block and sets AL to 0. > > The C code > int ** > t25 (int *d, int **s) > { > memcpy (d, *s, 16); > return s; > } > > turns into the following after asmcons (-Os passed in): > (note 5 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK) > > (insn 2 5 3 2 (parallel [ > (set (reg/v/f:QI 22 [ d ]) > (reg:QI 1 AL [ d ])) > (clobber (reg:CC 13 CC)) > ]) memcpy.i:3 6 {*movqi} > (expr_list:REG_DEAD (reg:QI 1 AL [ d ]) > (expr_list:REG_UNUSED (reg:CC 13 CC) > (nil > > (insn 3 2 4 2 (parallel [ > (set (reg/v/f:QI 23 [ s ]) > (reg:QI 0 AH [ s ])) > (clobber (reg:CC 13 CC)) > ]) memcpy.i:3 6 {*movqi} > (expr_list:REG_DEAD (reg:QI 0 AH [ s ]) > (expr_list:REG_UNUSED (reg:CC 13 CC) > (nil > > (note 4 3 7 2 NOTE_INSN_FUNCTION_BEG) > > (insn 7 4 8 2 (parallel [ > (set (reg/f:QI 24 [ *s_1(D) ]) > (mem/f:QI (reg/v/f:QI 23 [ s ]) [2 *s_1(D)+0 S1 A16])) > (clobber (reg:CC 13 CC)) > ]) memcpy.i:4 6 {*movqi} > (expr_list:REG_UNUSED (reg:CC 13 CC) > (nil))) > > (insn 8 7 9 2 (parallel [ > (set (reg:QI 1 AL) > (const_int 16 [0x10])) > (clobber (reg:CC 13 CC)) > ]) memcpy.i:4 6 {*movqi} > (expr_list:REG_UNUSED (reg:CC 13 CC) > (nil))) > > (insn 9 8 10 2 (parallel [ > (set (reg:QI 0 AH) > (reg/v/f:QI 22 [ d ])) > (clobber (reg:CC 13 CC)) > ]) memcpy.i:4 6 {*movqi} > (expr_list:REG_DEAD (reg/v/f:QI 22 [ d ]) > (expr_list:REG_UNUSED (reg:CC 13 CC) > (nil > > (insn 10 9 11 2 (parallel [ > (set (reg:QI 3 X) > (reg/f:QI 24 [ *s_1(D) ])) > (clobber (reg:CC 13 CC)) > ]) memcpy.i:4 6 {*movqi} > (expr_list:REG_DEAD (reg/f:QI 24 [ *s_1(D) ]) > (expr_list:REG_UNUSED (reg:CC 13 CC) > (nil > > (insn 11 10 16 2 (parallel [ > (set (reg:QI 1 AL) > (const_int 0 [0])) > (set (mem:BLK (reg:QI 0 AH) [0 A16]) > (mem:BLK (reg:QI 3 X) [0 A16])) > (set (reg:QI 3 X) > (plus:QI (reg:QI 3 X) > (reg:QI 1 AL))) > (set (reg:QI 0 AH) > (plus:QI (reg:QI 0 AH) > (reg:QI 1 AL))) > ]) memcpy.i:4 21 {bc2} > (expr_list:REG_UNUSED (reg:QI 3 X) > (expr_list:REG_UNUSED (reg:QI 1 AL) > (expr_list:REG_UNUSED (reg:QI 0 AH) > (nil) > > (insn 16 11 19 2 (parallel [ > (set (reg/i:QI 1 AL) > (reg/v/f:QI 23 [ s ])) > (clobber (reg:CC 13 CC)) > ]) memcpy.i:6 6 {*movqi} > (expr_list:REG_DEAD (reg/v/f:QI 23 [ s ])
Re: locating unsigned type for non-standard precision
Richard Guenther wrote: > [PR c/51527] > > I think the fix would be sth like > > Index: gcc/convert.c > === > --- gcc/convert.c (revision 186871) > +++ gcc/convert.c (working copy) > @@ -769,6 +769,7 @@ convert_to_integer (tree type, tree expr >(Otherwise would recurse infinitely in convert. */ > if (TYPE_PRECISION (typex) != inprec) > { > + tree otypex = typex; > /* Don't do unsigned arithmetic where signed was wanted, >or vice versa. >Exception: if both of the original operands were > @@ -806,10 +807,11 @@ convert_to_integer (tree type, tree expr > typex = unsigned_type_for (typex); > else > typex = signed_type_for (typex); > - return convert (type, > - fold_build2 (ex_form, typex, > -convert (typex, arg0), > -convert (typex, arg1))); > + if (TYPE_PRECISION (otypex) == TYPE_PRECISION (typex)) > + return convert (type, > + fold_build2 (ex_form, typex, > + convert (typex, arg0), > + convert (typex, arg1))); > } > } > } Thanks for the patch. I bootstrapped and regression-tested on i686-pc-linux-gnu. If it's ok with you I'd go ahead and install it. And maybe Peter could tell if it also fixes the issue on his platform. Johann
Re: locating unsigned type for non-standard precision
On Fri, Apr 27, 2012 at 11:29 AM, Georg-Johann Lay wrote: > Richard Guenther wrote: >> [PR c/51527] >> >> I think the fix would be sth like >> >> Index: gcc/convert.c >> === >> --- gcc/convert.c (revision 186871) >> +++ gcc/convert.c (working copy) >> @@ -769,6 +769,7 @@ convert_to_integer (tree type, tree expr >> (Otherwise would recurse infinitely in convert. */ >> if (TYPE_PRECISION (typex) != inprec) >> { >> + tree otypex = typex; >> /* Don't do unsigned arithmetic where signed was wanted, >> or vice versa. >> Exception: if both of the original operands were >> @@ -806,10 +807,11 @@ convert_to_integer (tree type, tree expr >> typex = unsigned_type_for (typex); >> else >> typex = signed_type_for (typex); >> - return convert (type, >> - fold_build2 (ex_form, typex, >> - convert (typex, arg0), >> - convert (typex, arg1))); >> + if (TYPE_PRECISION (otypex) == TYPE_PRECISION (typex)) >> + return convert (type, >> + fold_build2 (ex_form, typex, >> + convert (typex, arg0), >> + convert (typex, arg1))); >> } >> } >> } > > Thanks for the patch. > > I bootstrapped and regression-tested on i686-pc-linux-gnu. > > If it's ok with you I'd go ahead and install it. > > And maybe Peter could tell if it also fixes the issue on his platform. It's not necessary on the trunk btw, but it's ok for the 4.7 branch. Thanks, Richard. > Johann
Re: GCC47 movmem breaks RA, GCC46 RA is fine
On 27/04/12 09:21, Richard Guenther wrote: This differs from what GCC47 does and seems to work better. I would like help on how to best handle this situation under GCC47. Not provide movmem which looks like open-coded and not in any way "optimized"? Thanks Richard, however I don't understand your comment. GCC46 outputs for this problem: $t25: enterl #H'0002 st AL,@H'fff9 st AH,@H'fff8 ld X,@$XAP_AH ld X,@(0,X) ld AL,#H'0010 ld AH,@H'fff9 bc2 ld AL,@H'fff8 leavel #H'0002 and GCC47, once movmemqi and setmemqi are disabled: $t25: enterl #H'0005 st AH,@(H'0001,Y) ld X,@$XAP_AH ld X,@(0,X) ld AH,#H'0010 st AH,@(0,Y) ld AH,@$XAP_UXL bsr $memcpy ld AL,@(H'0001,Y) leavel #H'0005 It feels to me that GCC46 version is better: * no branch to subroutine memcpy; * less stack usage (argument to enterl); So, using our block copy (bc2) instruction is an optimisation, don't you think? -- PMatos
Re: GCC47 movmem breaks RA, GCC46 RA is fine
On Fri, Apr 27, 2012 at 12:00 PM, Paulo J. Matos wrote: > On 27/04/12 09:21, Richard Guenther wrote: >>> >>> >>> This differs from what GCC47 does and seems to work better. >>> I would like help on how to best handle this situation under GCC47. >> >> >> Not provide movmem which looks like open-coded and not in any way >> "optimized"? >> > > Thanks Richard, however I don't understand your comment. > GCC46 outputs for this problem: > $t25: > enterl #H'0002 > st AL,@H'fff9 > st AH,@H'fff8 > ld X,@$XAP_AH > ld X,@(0,X) > ld AL,#H'0010 > ld AH,@H'fff9 > bc2 > ld AL,@H'fff8 > leavel #H'0002 > > and GCC47, once movmemqi and setmemqi are disabled: > $t25: > enterl #H'0005 > st AH,@(H'0001,Y) > ld X,@$XAP_AH > ld X,@(0,X) > ld AH,#H'0010 > st AH,@(0,Y) > ld AH,@$XAP_UXL > bsr $memcpy > ld AL,@(H'0001,Y) > leavel #H'0005 > > > It feels to me that GCC46 version is better: > * no branch to subroutine memcpy; > * less stack usage (argument to enterl); > > So, using our block copy (bc2) instruction is an optimisation, don't you > think? Yes, it inlines it. You may want to look at s390 which I believe has a similar block-copy operation. Richard. > -- > PMatos >
Re: GCC47 movmem breaks RA, GCC46 RA is fine
On 27/04/12 11:49, Richard Guenther wrote: It feels to me that GCC46 version is better: * no branch to subroutine memcpy; * less stack usage (argument to enterl); So, using our block copy (bc2) instruction is an optimisation, don't you think? Yes, it inlines it. You may want to look at s390 which I believe has a similar block-copy operation. I am not sure I understood your comment. GCC46 generates the bc2 call due to my implementation of movmemqi. If I remove it, as you suggested, GCC47 will always call memcpy and will be worse off. I will me looking at s390 for inspiration. Thanks for the suggestion. -- PMatos
Re: locating unsigned type for non-standard precision
On Fri, Apr 27, 2012 at 4:29 AM, Georg-Johann Lay wrote: > Richard Guenther wrote: >> [PR c/51527] >> >> I think the fix would be sth like >> >> Index: gcc/convert.c >> === >> --- gcc/convert.c (revision 186871) >> +++ gcc/convert.c (working copy) >> @@ -769,6 +769,7 @@ convert_to_integer (tree type, tree expr >> (Otherwise would recurse infinitely in convert. */ >> if (TYPE_PRECISION (typex) != inprec) >> { >> + tree otypex = typex; >> /* Don't do unsigned arithmetic where signed was wanted, >> or vice versa. >> Exception: if both of the original operands were >> @@ -806,10 +807,11 @@ convert_to_integer (tree type, tree expr >> typex = unsigned_type_for (typex); >> else >> typex = signed_type_for (typex); >> - return convert (type, >> - fold_build2 (ex_form, typex, >> - convert (typex, arg0), >> - convert (typex, arg1))); >> + if (TYPE_PRECISION (otypex) == TYPE_PRECISION (typex)) >> + return convert (type, >> + fold_build2 (ex_form, typex, >> + convert (typex, arg0), >> + convert (typex, arg1))); >> } >> } >> } > > Thanks for the patch. > > I bootstrapped and regression-tested on i686-pc-linux-gnu. > > If it's ok with you I'd go ahead and install it. > > And maybe Peter could tell if it also fixes the issue on his platform. It does (though so did moving the original test down past the last change to typex). I've switched mspgcc to use the version you committed. Peter > > Johann
Re: GCC47 movmem breaks RA, GCC46 RA is fine
On 27/04/12 11:49, Richard Guenther wrote: Yes, it inlines it. You may want to look at s390 which I believe has a similar block-copy operation. Richard. I looked at s390 and even though the block copy instruction seems similar ours is much more restrictive since it expects values in specific registers, instead of allowing the register numbers to be passed to the instruction (which is the case with s390 mvcle insn). I decided to try and not hardcode the registers in the instruction but since the instruction requires specific registers as operands I had to create a class per register (with a single register in it) and then register constraints for each of the classes. This turned out not to work. RA breaks even earlier than before. Here's what I did: (define_expand "movmemqi" [(set (match_operand:BLK 0 "memory_operand"); destination (match_operand:BLK 1 "memory_operand")) ; source (use (match_operand:QI 2 "general_operand"))] ; count "!TARGET_NO_BLOCK_COPY && !reload_completed" { rtx dst_addr = XEXP(operands[0], 0); rtx src_addr = XEXP(operands[1], 0); rtx dst_reg = gen_reg_rtx(QImode); /* will be forced into AH */ rtx src_reg = gen_reg_rtx(QImode); /* will be forced into XL */ rtx cnt_reg = gen_reg_rtx(QImode); /* will be forced into AL */ emit_move_insn(cnt_reg, operands[2]); if(GET_CODE(dst_addr) == PLUS) { emit_move_insn(dst_reg, XEXP(dst_addr, 0)); emit_insn(gen_addqi3(dst_reg, dst_reg, XEXP(dst_addr, 1))); } else emit_move_insn(dst_reg, dst_addr); if(GET_CODE(src_addr) == PLUS) { emit_move_insn(src_reg, XEXP(src_addr, 0)); emit_insn(gen_addqi3(src_reg, src_reg, XEXP(src_addr, 1))); } else emit_move_insn(src_reg, src_addr); emit_insn(gen_bc2(dst_reg, src_reg, cnt_reg)); DONE; }) (define_insn "bc2" [(set (match_operand:QI 0 "register_operand" "=l") (const_int 0)) (set (mem:BLK (match_operand:QI 1 "register_operand" "=h")) (mem:BLK (match_operand:QI 2 "register_operand" "=x"))) (set (match_dup 2) (plus:QI (match_dup 2) (match_dup 0))) (set (match_dup 1) (plus:QI (match_dup 1) (match_dup 0)))] "!TARGET_NO_BLOCK_COPY" "bc2") constraints l, h and x correspond to singleton classes for registers AL, AH and XL respectively. I think the problem here is the RA inability to deal with such a constrained register set. Since I want to be able to use our block copy instruction instead of disabling movmemqi, setmemqi and therefore branch to memcpy, is there anything I can try to tune the RA? -- PMatos
Congreso Nacional de Flotillas de Autotransporte Guadalajara, Monterrey y Mexico D.F
Congreso Nacional de Flotillas de Autotransporte 2012 Guadalajara 24 Y 25 de Mayo 2012 Monterrey 28 Y 29 de Mayo 2012 MÉXICO, D.F. 30 Y 31 de Mayo de 2012 Responda con los siguientes datos para recibir un folleto: Nombre: Empresa: Teléfono (Lada): Ext: Número de Interesados: Centro de Atención Telefonica 01 80025020 30 Para desuscribir su correo g...@gnu.org a este boletin responda Flotillasbaja
Fwd: Using movw/movt rather than minipools in ARM gcc
Hello All, We are using gcc trunk as of 4/27/12, and are attempting to add support to the ARM gcc compiler for Native Client. We are trying to get gcc -march=armv7-a to use movw/movt consistently instead of minipools. The motivation is for a new target variant where armv7-a is the minimum supported and non-code in .text is never allowed (per Native Client rules). But the current behavior looks like a generically poor optimization for -march=armv7-a. (Surely memory loads are slower than movw/movt, and no space is saved in many cases.) For further details, this seems to only happen with -O2 or higher. -O1 generates movw/movt, seemingly because cprop is folding away a LO_SUM/HIGH pair. Another data point to note is that "Ubuntu/Linaro 4.5.2-8ubuntu3" does produce movw/movt for this test case, but we haven't tried stock 4.5. I have enabled TARGET_USE_MOVT, which should force a large fraction of constant materialization to use movw/movt rather than pc-relative loads. However, I am still seeing pc-relative loads for the following example case and am looking for help from the experts here. int a[1000], b[1000], c[1000]; void foo(int n) { int i; for (i = 0; i < n; ++i) { a[i] = b[i] + c[i]; } } When I compile this I get: foo: ... ldr r3, .L7 ldr r1, .L7+4 ldr r2, .L7+8 ... .L7: .word b .word c .word a .size foo, .-foo .comm c,4000,4 .comm b,4000,4 .comm a,4000,4 From some investigation, it seems I need to add a define_split to convert SYMBOL_REFs to LO_SUM/HIGH pairs. There is already a function called arm_split_constant that seems to do this, but no rule seems to be firing to cause it to get invoked. Before I dive into writing the define_split, am I missing something obvious? Cheers, David
IRA and two-phase load/store
I'm working on a port that does loads & stores in two phases. Every load/store is funneled through the intermediate registers "ld" and "st" standing between memory and the rest of the register file. Example: ld=4(rB) ... ... rC=ld st=rD 8(rB)=st rB is a base address register, rC and rD are data regs. The ... represents load delay cycles. The CPU has only a single instance of "ld", but the machine description defines five in order to allow overlapping live ranges to pipeline loads. My mov insn patterns have constraints so that a memory destination pairs with the "st" register source, and a memory source pairs with "ld" destination reg. The trouble is that register allocation doesn't understand the constraint, so it loads/stores from/to random data registers. Is there a way to confine register allocation to the "ld" and "st" classes, or is it better to let IRA do what it wants, then fixup after reload with splits to turn single insn rC=MEM into the insn pair ld=MEM ... rC=ld ? Greg
Re: IRA and two-phase load/store
On 04/27/12 14:31, Greg McGary wrote: > I'm working on a port that does loads & stores in two phases. > Every load/store is funneled through the intermediate registers "ld" and "st" > standing between memory and the rest of the register file. > > Example: > ld=4(rB) > ... > ... > rC=ld > > st=rD > 8(rB)=st > > rB is a base address register, rC and rD are data regs. The ... represents > load delay cycles. > > The CPU has only a single instance of "ld", but the machine description > defines five in order to allow overlapping live ranges to pipeline loads. > > My mov insn patterns have constraints so that a memory destination pairs with > the "st" register source, and a memory source pairs with "ld" destination > reg. The trouble is that register allocation doesn't understand the > constraint, so it loads/stores from/to random data registers. Clarification: I understand that IRA will do this, but I also thought that reload was supposed to notice that the insn didn't match its constraints and emit reg copies in order to fixup. It doesn't do that for me--postreload just asserts, complaining that the insn doesn't match its constraints. > Is there a way to confine register allocation to the "ld" and "st" classes, > or is it better to let IRA do what it wants, then fixup after reload with > splits to turn single insn rC=MEM into the insn pair ld=MEM ... rC=ld ? > > Greg
gcc-4.6-20120427 is now available
Snapshot gcc-4.6-20120427 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/4.6-20120427/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 4.6 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_6-branch revision 186921 You'll find: gcc-4.6-20120427.tar.bz2 Complete GCC MD5=7bece036826df08d82ce04693ae343e2 SHA1=17160e039f905ffdf57e0887501425f1240e3ca3 Diffs from 4.6-20120420 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-4.6 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
Re: IRA and two-phase load/store
I think this is what secondary reload is for. Check the internals manual. Something like this shows up in the pdp11 port, where float registers f4 and f5 can't be loaded/stored directly. You can see in that port how this is handled; it seems to work. paul On Apr 27, 2012, at 5:31 PM, Greg McGary wrote: > I'm working on a port that does loads & stores in two phases. > Every load/store is funneled through the intermediate registers "ld" and "st" > standing between memory and the rest of the register file. > > Example: >ld=4(rB) >... >... >rC=ld > >st=rD >8(rB)=st > > rB is a base address register, rC and rD are data regs. The ... represents > load delay cycles. > > The CPU has only a single instance of "ld", but the machine description > defines five in order to allow overlapping live ranges to pipeline loads. > > My mov insn patterns have constraints so that a memory destination pairs with > the "st" register source, and a memory source pairs with "ld" destination > reg. The trouble is that register allocation doesn't understand the > constraint, so it loads/stores from/to random data registers. > > Is there a way to confine register allocation to the "ld" and "st" classes, > or is it better to let IRA do what it wants, then fixup after reload with > splits to turn single insn rC=MEM into the insn pair ld=MEM ... rC=ld ? > > Greg
conflicts between combine and pre global passes?
Hi, I noticed that global passes before combine, like loop-invariant/cprop/cse2 some time have conflicts with combine. The combine pass can only operates with basic block, while these global passes move insns across basic block and left no description info. For example, a case I encountered. (insn 77 75 78 8 (set (reg:SI 171) (const_int [0x270f])) diffmeasure/verify.c:157 176 {*thumb1_movsi_insn} (nil)) (insn 78 77 79 8 (set (reg:SI 172) (const_int 0 [0])) diffmeasure/verify.c:157 176 {*thumb1_movsi_insn} (nil)) (insn 79 78 80 8 (set (reg:SI 170) (plus:SI (plus:SI (reg:SI 172) (reg:SI 172)) (geu:SI (reg:SI 171) (reg/v:SI 138 [ num ] diffmeasure/verify.c:157 226 {thumb1_addsi3_addgeu} (expr_list:REG_DEAD (reg:SI 172) (expr_list:REG_DEAD (reg:SI 171) (expr_list:REG_EQUAL (geu:SI (const_int [0x270f]) (reg/v:SI 138 [ num ])) (nil) (insn 80 79 81 8 (set (reg:QI 169) (subreg:QI (reg:SI 170) 0)) diffmeasure/verify.c:157 187 {*thumb1_movqi_insn} (expr_list:REG_DEAD (reg:SI 170) (nil))) (insn 81 80 82 8 (set (reg:SI 173) (zero_extend:SI (reg:QI 169))) diffmeasure/verify.c:157 158 {*thumb1_zero_extendqisi2_v6} (expr_list:REG_DEAD (reg:QI 169) (nil))) (jump_insn 82 81 124 8 (set (pc) (if_then_else (eq (reg:SI 173) (const_int 0 [0])) (label_ref:SI 129) (pc))) diffmeasure/verify.c:157 196 {cbranchsi4_insn} (expr_list:REG_DEAD (reg:SI 173) (expr_list:REG_BR_PROB (const_int 900 [0x384]) (nil))) -> 129) can be combined into : (insn 77 75 78 8 (set (reg:SI 171) (const_int [0x270f])) diffmeasure/verify.c:157 176 {*thumb1_movsi_insn} (nil)) (note 78 77 79 8 NOTE_INSN_DELETED) (note 79 78 80 8 NOTE_INSN_DELETED) (note 80 79 81 8 NOTE_INSN_DELETED) (note 81 80 82 8 NOTE_INSN_DELETED) (jump_insn 82 81 124 8 (set (pc) (if_then_else (ltu (reg:SI 171) (reg/v:SI 138 [ num ])) (label_ref:SI 129) (pc))) diffmeasure/verify.c:157 196 {cbranchsi4_insn} (expr_list:REG_DEAD (reg:SI 171) (expr_list:REG_BR_PROB (const_int 900 [0x384]) (nil))) -> 129) BUT, if pre-combine passes propagate register 172 in insn79 and delete insn78, the resulting instructions will not be combined. I am not sure how to handle this, so any advice will be very appreciated. Thanks. -- Best Regards.