------- Comment #4 from jakub at gcc dot gnu dot org  2008-03-08 07:48 -------
The reason why the old code without the right shift almost worked is that
for the 4 byte aligned 16-bit vars each loop was executed usually twice.
.L6:
        lha 0,0(27)
        lhz 8,2(26)
        .align 4
.L4:
        sync
        add 9,8,0
        rlwinm 10,0,0,0xffff
        rlwinm 9,9,0,0xffff
        slw 11,10,31
        slw 9,9,31
.L11:
        lwarx 7,0,29
        and 0,7,28
        cmpw 0,0,11
        bne- 0,.L12
        andc 7,7,28
        or 7,7,9
        stwcx. 7,0,29
        bne- 0,.L11
        isync
.L12:
!       srw 0,0,31      ! This insn was added by this patch
        rlwinm 0,0,0,0xffff
        cmpw 7,0,10
        extsh 0,0
        bne 7,.L4
The first time usually the atomic instruction succeeded, but r0 after rlwinm
was 0, so most often different from r10.  This means the code then jumped to
.L4, with r0 = 0 as the expected value of e[0]. r10 then becomes 0 as new
expected value, lwarx reads the new actual value of e[0], which will be
different from
the expected 0.  So it jumps to .L12, r0 now contains the e[0] value in upper
half and 0 in lower half and r10 is 0, so in the second big loop nothing is
changed and the loop exits.  This is what happens if there is no contention. 
If there is contention though, the first loop doesn't compare and swap anything
and 
as shown above, the second loop iteration won't change anything unless e[0] is
0.


-- 

jakub at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|                            |FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35498

Reply via email to