------- Comment #4 from jakub at gcc dot gnu dot org 2008-03-08 07:48 ------- The reason why the old code without the right shift almost worked is that for the 4 byte aligned 16-bit vars each loop was executed usually twice. .L6: lha 0,0(27) lhz 8,2(26) .align 4 .L4: sync add 9,8,0 rlwinm 10,0,0,0xffff rlwinm 9,9,0,0xffff slw 11,10,31 slw 9,9,31 .L11: lwarx 7,0,29 and 0,7,28 cmpw 0,0,11 bne- 0,.L12 andc 7,7,28 or 7,7,9 stwcx. 7,0,29 bne- 0,.L11 isync .L12: ! srw 0,0,31 ! This insn was added by this patch rlwinm 0,0,0,0xffff cmpw 7,0,10 extsh 0,0 bne 7,.L4 The first time usually the atomic instruction succeeded, but r0 after rlwinm was 0, so most often different from r10. This means the code then jumped to .L4, with r0 = 0 as the expected value of e[0]. r10 then becomes 0 as new expected value, lwarx reads the new actual value of e[0], which will be different from the expected 0. So it jumps to .L12, r0 now contains the e[0] value in upper half and 0 in lower half and r10 is 0, so in the second big loop nothing is changed and the loop exits. This is what happens if there is no contention. If there is contention though, the first loop doesn't compare and swap anything and as shown above, the second loop iteration won't change anything unless e[0] is 0.
-- jakub at gcc dot gnu dot org changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |RESOLVED Resolution| |FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35498