Prologue: Purpose of this enhancement request is not only to resolve a missed optimization issue for the avr port. It aims to show up ways for resolving a general difficulty for the RTL optimizers that we are presently having for double-set insn: A difficulty that shows up when dealing with side effects of double-set RTL insns that produce unexpected useful results. IMO this is a general issue that will not only show up for the divmod4 patterns but will probably also be very important for the efficiency of code generated after the CCmode transition for targets where 1.) arithmetic or logic operations leave the CC register with useful information and 2.) arithmetic operations on SImode or DImode objects need to be lowered to a sequence of subreg operations. At the end of this bug report I'm about to ask for an early CSE pass or a small change of the jump2 pass. Text body: When having a function like uint32_t d,m; void foo (uint32_t z, uint32_t n) { d = z / n; m = z % n; } the avr port presently does not recognize that it needs to call the udivmodsi function only once. Assembly code generated reads, thus, movw r14,r22 movw r16,r24 movw r10,r18 movw r12,r20 call __udivmodsi4 sts d,r18 sts (d)+1,r19 sts (d)+2,r20 sts (d)+3,r21 movw r24,r16 movw r22,r14 movw r20,r12 movw r18,r10 call __udivmodsi4 sts m,r22 sts (m)+1,r23 sts (m)+2,r24 sts (m)+3,r25 (function prologue/epilogue stripped). The issue is that the CSE passes presently are not able to realize that the same result is calculated twice. The optimizers do, thus, not recognize that the first divmod4 call had a useful side-effect (the one that it calculates as well the mod). This could be fixed by attaching appropriate REG_EQUAL register notes to the insns that copy the results from the library call that reside in hard regs to target pseudos. The present implementation of the avr port does not generate such notes since it generates the calls explicitly and not by optabs.c (because the implementation knows exactly which registers will be clobbered by the library call). I will attach a patch that adds these required REG_EQUAL notes. The important issue is, that the REG_EQUAL note needs to refer to the original pseudos that were used as input parameters for the library call. Otherwise CSE will not realize the optimization opportunity. When using the attached patch, i.e. when adding the appropriate REG_EQUAL notes to the RTL, I veryfied that CSE is able to realize the optimization opportunity. I.e. one ends up with assembly output reading call __udivmodsi4 sts d,r18 sts (d)+1,r19 sts (d)+2,r20 sts (d)+3,r21 sts m,r22 sts (m)+1,r23 sts (m)+2,r24 sts (m)+3,r25 . The only difficulty is, that with the present optimization setup, the jump2 pass removes the insn that contain the useful REG_EQUAL notes for the unexpected useful side-effect because it correctly recognizes that it is "trivially dead". For this reason, I had to place the first CSE pass before the jump2 pass in "passes.c" in order to get above results. I.e. my "passes.c" excerpt reads NEXT_PASS (pass_instantiate_virtual_regs); NEXT_PASS (pass_cse); NEXT_PASS (pass_jump2); instead of NEXT_PASS (pass_instantiate_virtual_regs); NEXT_PASS (pass_jump2); NEXT_PASS (pass_cse); . Epilogue: For this reason the attached patch could only resolve this issue IMO only if one of the following solutions is realized: 1.) the jump2 pass is changed such that it refrains from deleting trivially dead insn if they are containing REG_EQUAL notes because possibly it could be necessary to make them being seen at least once by a CSE pass. 2.) an earlier CSE pass is added to the optimizer flow. Possibly controlled by using a target-dependent flag. Yours, Bjoern
-- Summary: Missed optimizations for divmod Product: gcc Version: 4.1.0 Status: UNCONFIRMED Severity: enhancement Priority: P1 Component: rtl-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: bjoern dot m dot haase at web dot de CC: gcc-bugs at gcc dot gnu dot org GCC build triplet: unknown-x86_64-linux GCC host triplet: unknown-x86_64-linux GCC target triplet: avr-*-* http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23726