https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121570

--- Comment #4 from kargls at comcast dot net ---
> 
>         movq    %rbx, %rdi
>         movq    %r14, %rsi
>         callq   __for_ieee_next_after_k4_@PLT
>         movss   %xmm0, 12(%rsp)
>         ucomiss 16(%rsp), %xmm0
>         jne     .LBB0_1
>         jp      .LBB0_1

We don't know what Intel is doing within __for_ieee_after_k4.
I've quoted the Fortran standard about requirements:

1) On entry to a procedure, save current exceptions
2) Quiet all exceptions
3) Execute procedure
4) Restore exceptions from entry into function
5) Update exceptions that may have occurred during execution

Those requirements force

> whereas gfortran does
> 
>         movq    %rbx, %rdi
>         movss   %xmm0, 8(%rsp)
>         call    _gfortran_ieee_procedure_entry

this call ...

>         movss   8(%rsp), %xmm0
>         pxor    %xmm1, %xmm1
>         movss   %xmm1, 12(%rsp)
>         call    nextafterf
>         movq    %rbx, %rdi
>         movss   %xmm0, 20(%rsp)
>         movss   %xmm0, 8(%rsp)
>         call    _gfortran_ieee_procedure_exit

and this call.  But, see below ...

>         movss   8(%rsp), %xmm0
>         ucomiss 12(%rsp), %xmm0
> 
> I cannot look at what ifx's __for_ieee_next_after_k4_ does, but
> a separate, more optimized implementation for ieee_next_after might
> be faster also for gfortran.

The only thing that one might be able to do is in-line the _entry
and _exit procedure to avoid function call overhead.  Intel has the
luxury that it deals with only Intel/AMD cpus.  gfortran has seven
different config files: fpu-387.h, fpu-aix.h, fpu-generic.h, fpu-glibc.h
fpu-sysv.h, fpu-aarch64.h, and fpu-macppc.h.

> For example, it could check its argument
> if the operation will raise an exception, and branch in that event
> (which could be marked as unlikely, and after a few iterations, would
> be marked as unlikely to be taken by the CPU).
> 
> Confirmed as an enhancement request.

... here.  If it can be assumed that ieee_next_after, which is
mapped to nextafter (on x86_64-*-freebsd) is already IEEE-754
compliant, then the calls to _entry and _exit are redundant 
so gfortran need not emit them.  That is, 

subroutine foo(x, y)
   use ieee_arithmetic
   real x
   x = ieee_next_after(x, 10.)
end subroutine foo

would be translated to

__attribute__((fn spec (". w w ")))
void foo (real(kind=4) & restrict x, real(kind=4) & restrict y)
{
  c_char fpstate.0[33];

  try
    {
      // Needed for 1 and 2 above on entry into foo
      _gfortran_ieee_procedure_entry ((void *) &fpstate.0);
      {
         // This is 3 above, i.e., execution of procedure
         *x = __builtin_nextafterf (*x, 1.0e+1);
      }
    }
  finally
    {
      // Needed for 4 and 5 above on exit from foo
      _gfortran_ieee_procedure_exit ((void *) &fpstate.0);
    }
}

Reply via email to