http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56309
--- Comment #23 from Richard Biener <rguenth at gcc dot gnu.org> 2013-02-15 09:33:22 UTC --- (In reply to comment #20) > (In reply to comment #12) > > --- by-val-O3.s.orig 2013-02-14 18:06:56.000000000 +0100 > > +++ by-val-O3.s 2013-02-14 18:07:23.000000000 +0100 > > @@ -357,9 +357,8 @@ > > shrq $32, %rdi > > cmpq %r8, %rdx > > cmovbe %r11, %rdi > > - addq $1, %rax > > - cmpq %r8, %rdx > > cmovbe %rdx, %rcx > > + addq $1, %rax > > cmpq %rbp, %rax > > movq %rcx, -8(%rsi,%rax,8) > > jne .L50 > > > > unmodified: Took 14.31 seconds total. > > modified: Took 13.04 seconds total. > > > > So re. comment #9: it's not the problem but it'd be a small improvement. > > FWIW this comes from not eliminating the condition expression in > the conditional moves that ifcvt creates: > > tmp_97 = tmp_93 > 4294967295 ? tmp_95 : tmp_93; > carry_105 = tmp_93 > 4294967295 ? carry_94 : 0; > > I'm surprised this form is allowed at all, I'd expect we only allow > is_gimple_reg() for a COND_EXPR_COND in a RHS context. Yeah, it's on my list (even with partial patches available ...). Note that then vectorizing a COND_EXPR is not different from being able to vectorize a comparison statement. pred_2 = tmp_93 > 4294967295; tmp_97 = pred_2 ? tmp_95 : tmp_93; carry_105 = pred_2 ? carry_94 : 0; this form, both vectorized and not vectorized has issues when doing initial instruction selection during expand (with multiple uses of pred_2 we don't TER it). So it might be that we present combine and other RTL optimizers with initial code they will not be able to handle as well as what we do right now. > Anyway -- separate problem. Indeed.