On 2 March 2016 at 11:35, Edward Nevill <edward.nev...@linaro.org> wrote: > cmp x2, 8 <<< (1) > (1) If count as a 64 bit unsigned is <= 8 then it is probably still <= 8 as a > 32 bit unsigned.
You mean to use "cmp w2, 8" instead? Is there any difference? > (2) Nowhere in the function does it store anything on the stack, so why > drop and restore the stack every time. Also, minor quibble in the > disass, why does sub use #64 whereas add uses just '64' (appreciate this > is probably binutils, not gcc). My reading of the AAPCS64 is that it's not necessary to have a frame at all, only that if you do, it must be quad-word aligned. Clang/LLVM doesn't seem to bother with the push and pop, but it also uses "cmp x". > .L15: > adrp x3, .L4 > add x3, x3, :lo12:.L4 > ldr x2, [x3, x2, lsl #3] > br x2 Hum, this is *exactly* what Clang generates... :) > (4) Seems to be something wrong with the load scheduler here? Why not > move the stp x2, x3 to the end. It does this repeatedly. Again, Clang seems to do what you want... Have you tried building OpenJDK with Clang? cheers, --renato _______________________________________________ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org https://lists.linaro.org/mailman/listinfo/linaro-toolchain