https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93902
Bug ID: 93902 Summary: conversion from 64-bit long or unsigned long to double prevents simple optimization Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: vincent-gcc at vinc17 dot net Target Milestone: --- Optimizations that are done with conversions from 32-bit unsigned int to double are no longer done with conversions from 64-bit unsigned long to double. For the case 64-bit long to double, this depends. Example: void bar (void); void foo1 (unsigned int a, unsigned int b) { if (a == b) { if ((double) a != (double) b) bar (); } } void foo2 (long a, long b) { if (a == b) { if ((double) a != (double) b) bar (); } } void foo3 (unsigned long a, unsigned long b) { if (a == b) { if ((double) a != (double) b) bar (); } } Tests done on x86_64 with: gcc-10 (Debian 10-20200222-1) 10.0.1 20200222 (experimental) [master revision 01af7e0a0c2:487fe13f218:e99b18cf7101f205bfdd9f0f29ed51caaec52779] First, using only -O3 gives: * For foo1, just a "ret", i.e. everything has been optimized. * For foo2: .cfi_startproc cmpq %rsi, %rdi je .L7 .L3: ret .p2align 4,,10 .p2align 3 .L7: pxor %xmm0, %xmm0 cvtsi2sdq %rdi, %xmm0 ucomisd %xmm0, %xmm0 jnp .L3 jmp bar@PLT .cfi_endproc I assume that this might be different from foo1 because the conversion can yield a rounding error (since 64 is larger than 53). However, both roundings are done in the same way. The only thing that could prevent optimization is the side effect introduced by the inexact operation, which raises the inexact flag. But GCC ignores it by default (it assumes that the STDC FENV_ACCESS pragma is off). And GCC knows how to optimize this case (see the other test below). * For foo3, this is even much more complicated, even though the C code seems simpler (because the integers can take only non-negative values): .cfi_startproc cmpq %rsi, %rdi je .L16 .L8: ret .p2align 4,,10 .p2align 3 .L16: testq %rsi, %rsi js .L10 pxor %xmm1, %xmm1 cvtsi2sdq %rsi, %xmm1 .L11: testq %rsi, %rsi js .L12 pxor %xmm0, %xmm0 cvtsi2sdq %rsi, %xmm0 .L13: ucomisd %xmm0, %xmm1 jp .L15 comisd %xmm0, %xmm1 je .L8 .L15: jmp bar@PLT .p2align 4,,10 .p2align 3 .L12: movq %rsi, %rax andl $1, %esi pxor %xmm0, %xmm0 shrq %rax orq %rsi, %rax cvtsi2sdq %rax, %xmm0 addsd %xmm0, %xmm0 jmp .L13 .p2align 4,,10 .p2align 3 .L10: movq %rsi, %rax movq %rsi, %rdx pxor %xmm1, %xmm1 shrq %rax andl $1, %edx orq %rdx, %rax cvtsi2sdq %rax, %xmm1 addsd %xmm1, %xmm1 jmp .L11 .cfi_endproc Now, let's add the -ffinite-math-only option, i.e.: "-O3 -ffinite-math-only". Since integers cannot be NaN or Inf, and overflow in the conversion to double is not possible, this should not change anything. * foo1 is still optimized. Good. * foo2 is now optimized like foo1, i.e. to just a "ret". This is good, but surprising compared to the foo2 case without -ffinite-math-only, and to foo3 below. * foo3 is still not optimized and is almost as complicated, with just 2 instructions removed, probably due to the -ffinite-math-only (in case GCC thought that special values were possible).