https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38959

Peter Cordes <peter at cordes dot ca> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |peter at cordes dot ca

--- Comment #3 from Peter Cordes <peter at cordes dot ca> ---
We can maybe close this as fixed (if -march=i386 didn't exist/work at the time)
or invalid.  Or maybe we want to add some CPU-level awareness to code-gen for
__builtin_ia32_rdtsc / rdpmc / rdtscp.

The cmov / fcomi / fcomi proposed switches are already supported as part of
-march=pentium -mtune=generic or lower, e.g. -march=i386.  (The 32-bit default
is something like arch=i686 and tune=generic, with it being possible to
configure gcc so SSE2 is on by default in 32-bit code.)

Those are the important ones, because they're emitted automatically by the
compiler's back-end.  The other options would just be trying to save you from
yourself, e.g. rejecting source that contains __rdtsc() /
__builtin_ia32_rdtsc()

----

I'm not sure what the situation is with long NOPs.  GCC doesn't (normally?)
emit them, just using .p2align directives for the assembler.  In 32-bit mode,
GAS appears to avoid long NOPs, using either 2-byte xchg ax,ax or pseudo-nops
like   LEA esi,[esi+eiz*1+0x0] that add a cycle of latency to the dep chain
involving ESI.

Even with -march=haswell, gcc+gas fail to use more efficient long NOPs for
padding between functions.


---

I'm not sure if CPUID is ever emitted by gcc's back-end directly, only from
inline asm.  i386/cpuid.h uses inline asm.  But __get_cpuid_max() checks if
CPUID is even supported in a 386-compatible way, checking if a bit in EFLAGS is
sticky or not.  If your source code is written safely, you won't have a problem
unless possibly __builtin_cpu_init runs CPUID without checking, in programs
that use __builtin_cpu_supports() or _is().


__builtin_ia32_rdpmc() and __rdtsc() do *not* check -march= before emitting
rdpmc and rdtsc.  Neither does __rdtscp(), which is interesting because that
instruction is new enough that some still-relevant CPUs don't support it.

__rdpmc() isn't "volatile", though, so stop-start optimizes to 0.  (I found
this bug looking for existing reports of that issue.)



Test cases:  https://godbolt.org/z/hqPdza

FCMOV and CMOV are also handled correctly, but I didn't write functions for
them.

int fcomi(double x, double y) {
    return x<y;
}

gcc8.2 on Godbolt: 
-m32 -O3 -march=haswell code-gen
    (-mfpmath=387 is the default; SSE2 cmpltsd / movd / AND would have done it
nicely.  Or comisd / seta is maybe even better, and gcc uses that.  I wonder if
tune=haswell should imply fpmath=sse)

fcomi(double, double):   # haswell
        fld     QWORD PTR [esp+4]
        fld     QWORD PTR [esp+12]
        xor     eax, eax
        fcomip  st, st(1)      # only available with register operands, and not
in pp form
        fstp    st(0)
        seta    al
        ret

-m32 -O3 -march=i486 code-gen
fcomi(double, double):   # i486
        fld     QWORD PTR [esp+12]
        fcomp   QWORD PTR [esp+4]
        fnstsw  ax
        test    ah, 69           # check above and not unordered, couldn't just
shift/and
        sete    al
        and     eax, 255
        ret

Adding -mtune=generic, we get the expected movzx instead of AND EAX, imm32, so
I guess movzx was slow on 486.





(In reply to Mark Hobley from comment #0)
> Proposed switches:
> 
> --nocpuid  This option causes the compiler to not generate cpuid opcodes
> --nocmov   This option causes the compiler to not generate cmov opcodes
> --nofcmov  This option causes the compiler to not generate fcmov opcodes
> --nofcomi  This option causes the compiler to not generate fcomi opcodes
> --nonopl   This option causes the compiler to not generate fcomi opcodes
> --nordpmc  This option causes the compiler to not generate rdpmc opcodes
> --nordtsc  This option causes the compiler to not generate rdtsc opcodes
> 
> Possibly a general switch that is equivalent to all of the above
> 
> --nosupplementaryinstructions
> 
> Rationale
> 
> It is possible that a developer still wants to compile for a particular
> architecture (for example the i486), but does not wish to generate code with
> supplementary instructions (such as cpuid), that may be present on that
> architecture.

Reply via email to