On Tue, 26 Apr 2011, Michael Hope wrote:

> Hi Barry.  I think the toolchain is operating correctly here.  The
> current version recognises a divide followed by a modulo and optimises
> this into a call to the standard EABI function __aeabi__uldivmod().
> Note the code:
> 
>         do_div(Kpart, source);
> 
>         K = Kpart & 0xFFFFFFFF;
> 
>         /* Check if we need to round */
>         if ((K % 10) >= 5)
>                 K += 5;
> 
> This function is provided by libgcc for normal applications. The
> kernel provides it's own versions in arch/arm/lib/lib1funcs.s but is
> missing __aeabi_uldivmod (note the 'l' for 64 bit).

The kernel is omitting this function on purpose.  The idea is to prevent 
people from ever using 64-bit by 64-bit divisions since they are always 
costly and avoidable.

This is why the kernel provides a do_div() macro: to allow for 64-bit 
dividend by only 32-bit divisors. And this stems from the fact that gcc 
has no (or used not to have) patterns to match a division with a 64-bit 
dividend and a 32-bit divisor, hence it promotes the divisor to a 64-bit 
value and perform the costly division that the kernel wants to avoid.

Worse, gcc isn't smart enough to optimize the operation even when the 
divisor is constant, which is quite a common operation in the kernel.  
This is why many years ago I wrote the code for the do_div() version you 
can find in arch/arm/include/asm/div64.h where the division is turned 
into a reciprocal multiplication. For example, despite the amount of 
added C code, do_div(x, 10000) now produces the following assembly code 
(where x is assigned to r0-r1):

        adr     r4, .L0
        ldmia   r4, {r4-r5}
        umull   r2, r3, r4, r0
        mov     r2, #0
        umlal   r3, r2, r5, r0
        umlal   r3, r2, r4, r1
        mov     r3, #0
        umlal   r2, r3, r5, r1
        mov     r0, r2, lsr #11
        orr     r0, r0, r3, lsl #21
        mov     r1, r3, lsr #11
        ...
.L0:
        .word   948328779
        .word   879609302

But I digress.  This is just to say that gcc shouldn't pull 
__aeabi_uldivmod in this case because:

1) the division and the modulus are not performed on the same operands;

2) the modulus is performed on a 32-bit variable;

3) the do_div() implementation looks like nothing that gcc could 
   recognize as being a division.

Therefore I don't see how the right pattern could have been matched.


Nicolas

_______________________________________________
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Reply via email to