Paul Eggert wrote: > the remaining patches are for other instances of this idiom in Gnulib.
These other instances use 'int', not 'long long'. The machine code is similar: =============================================================================== int sec1 (int t) { return (t < 0 ? (t + 1) / 10000 - 1 : t / 10000); } int sec2 (int t) { return t / 10000 - (t % 10000 < 0); } int sec3 (int t) { return (t + (t < 0)) / 10000 - (t < 0); } =============================================================================== produces (with gcc-9.2.0) sec1: testl %edi, %edi js .L5 movl %edi, %eax movl $1759218605, %edx sarl $31, %edi imull %edx movl %edx, %eax sarl $12, %eax subl %edi, %eax ret .L5: addl $1, %edi movl $1759218605, %edx movl %edi, %eax sarl $31, %edi imull %edx sarl $12, %edx subl %edi, %edx leal -1(%rdx), %eax ret sec2: movl %edi, %eax movl $1759218605, %edx imull %edx movl %edx, %eax movl %edi, %edx sarl $31, %edx sarl $12, %eax subl %edx, %eax imull $10000, %eax, %edx subl %edx, %edi shrl $31, %edi subl %edi, %eax ret .cfi_endproc sec3: movl %edi, %ecx movl $1759218605, %edx shrl $31, %ecx addl %ecx, %edi movl %edi, %eax sarl $31, %edi imull %edx movl %edx, %eax sarl $12, %eax subl %edi, %eax subl %ecx, %eax ret And the benchmark: =============================================================================== #include <stdlib.h> static inline int sec1 (int t) { return (t < 0 ? (t + 1) / 1000 - 1 : t / 1000); } static inline int sec2 (int t) { return t / 1000 - (t % 1000 < 0); } static inline int sec3 (int t) { return (t + (t < 0)) / 1000 - (t < 0); } volatile int t = 347913194; volatile int x; int main (int argc, char *argv[]) { int repeat = atoi (argv[1]); int i; for (i = repeat; i > 0; i--) x = sec1 (t); // or sec2 (t) or sec3 (t) } =============================================================================== On an Intel Core m3 CPU: gcc clang sec1 1.25 ns 1.14 ns sec2 1.78 ns 1.63 ns sec3 1.68 ns 1.73 ns And on sparc64: gcc sec1 7.24 ns sec2 7.51 ns sec3 7.24 ns And on aarch64: gcc sec1 3.54 ns sec2 5.00 ns sec3 4.59 ns Interesting observations here: * While on x86_64 and sparc64 the 32-bit division takes approximately as much time as the 64-bit division, on aarch64 it is 6 to 11 times faster! * On x86_64, clang optimizes sec2 better than sec3. That's a bit paradoxical, because sec2 has an imulq and an imull instruction, whereas sec3 has only an imulq instruction. Regarding your fourth patch: > - (corr_quad + (corr_quad < 0)) / 25 - (corr_quad < 0) Shouldn't that be parenthesized differently? - ((corr_quad + (corr_quad < 0)) / 25 - (corr_quad < 0)) Bruno