https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93902

            Bug ID: 93902
           Summary: conversion from 64-bit long or unsigned long to double
                    prevents simple optimization
           Product: gcc
           Version: 10.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: middle-end
          Assignee: unassigned at gcc dot gnu.org
          Reporter: vincent-gcc at vinc17 dot net
  Target Milestone: ---

Optimizations that are done with conversions from 32-bit unsigned int to double
are no longer done with conversions from 64-bit unsigned long to double. For
the case 64-bit long to double, this depends.

Example:

void bar (void);

void foo1 (unsigned int a, unsigned int b)
{
  if (a == b)
    {
      if ((double) a != (double) b)
        bar ();
    }
}

void foo2 (long a, long b)
{
  if (a == b)
    {
      if ((double) a != (double) b)
        bar ();
    }
}

void foo3 (unsigned long a, unsigned long b)
{
  if (a == b)
    {
      if ((double) a != (double) b)
        bar ();
    }
}

Tests done on x86_64 with: gcc-10 (Debian 10-20200222-1) 10.0.1 20200222
(experimental) [master revision
01af7e0a0c2:487fe13f218:e99b18cf7101f205bfdd9f0f29ed51caaec52779]

First, using only -O3 gives:

* For foo1, just a "ret", i.e. everything has been optimized.

* For foo2:

        .cfi_startproc
        cmpq    %rsi, %rdi
        je      .L7
.L3:
        ret
        .p2align 4,,10
        .p2align 3
.L7:
        pxor    %xmm0, %xmm0
        cvtsi2sdq       %rdi, %xmm0
        ucomisd %xmm0, %xmm0
        jnp     .L3
        jmp     bar@PLT
        .cfi_endproc

I assume that this might be different from foo1 because the conversion can
yield a rounding error (since 64 is larger than 53). However, both roundings
are done in the same way. The only thing that could prevent optimization is the
side effect introduced by the inexact operation, which raises the inexact flag.
But GCC ignores it by default (it assumes that the STDC FENV_ACCESS pragma is
off). And GCC knows how to optimize this case (see the other test below).

* For foo3, this is even much more complicated, even though the C code seems
simpler (because the integers can take only non-negative values):

        .cfi_startproc
        cmpq    %rsi, %rdi
        je      .L16
.L8:
        ret
        .p2align 4,,10
        .p2align 3
.L16:
        testq   %rsi, %rsi
        js      .L10
        pxor    %xmm1, %xmm1
        cvtsi2sdq       %rsi, %xmm1
.L11:
        testq   %rsi, %rsi
        js      .L12
        pxor    %xmm0, %xmm0
        cvtsi2sdq       %rsi, %xmm0
.L13:
        ucomisd %xmm0, %xmm1
        jp      .L15
        comisd  %xmm0, %xmm1
        je      .L8
.L15:
        jmp     bar@PLT
        .p2align 4,,10
        .p2align 3
.L12:
        movq    %rsi, %rax
        andl    $1, %esi
        pxor    %xmm0, %xmm0
        shrq    %rax
        orq     %rsi, %rax
        cvtsi2sdq       %rax, %xmm0
        addsd   %xmm0, %xmm0
        jmp     .L13
        .p2align 4,,10
        .p2align 3
.L10:
        movq    %rsi, %rax
        movq    %rsi, %rdx
        pxor    %xmm1, %xmm1
        shrq    %rax
        andl    $1, %edx
        orq     %rdx, %rax
        cvtsi2sdq       %rax, %xmm1
        addsd   %xmm1, %xmm1
        jmp     .L11
        .cfi_endproc

Now, let's add the -ffinite-math-only option, i.e.: "-O3 -ffinite-math-only".
Since integers cannot be NaN or Inf, and overflow in the conversion to double
is not possible, this should not change anything.

* foo1 is still optimized. Good.

* foo2 is now optimized like foo1, i.e. to just a "ret". This is good, but
surprising compared to the foo2 case without -ffinite-math-only, and to foo3
below.

* foo3 is still not optimized and is almost as complicated, with just 2
instructions removed, probably due to the -ffinite-math-only (in case GCC
thought that special values were possible).

Reply via email to