https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38959
Peter Cordes <peter at cordes dot ca> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |peter at cordes dot ca --- Comment #3 from Peter Cordes <peter at cordes dot ca> --- We can maybe close this as fixed (if -march=i386 didn't exist/work at the time) or invalid. Or maybe we want to add some CPU-level awareness to code-gen for __builtin_ia32_rdtsc / rdpmc / rdtscp. The cmov / fcomi / fcomi proposed switches are already supported as part of -march=pentium -mtune=generic or lower, e.g. -march=i386. (The 32-bit default is something like arch=i686 and tune=generic, with it being possible to configure gcc so SSE2 is on by default in 32-bit code.) Those are the important ones, because they're emitted automatically by the compiler's back-end. The other options would just be trying to save you from yourself, e.g. rejecting source that contains __rdtsc() / __builtin_ia32_rdtsc() ---- I'm not sure what the situation is with long NOPs. GCC doesn't (normally?) emit them, just using .p2align directives for the assembler. In 32-bit mode, GAS appears to avoid long NOPs, using either 2-byte xchg ax,ax or pseudo-nops like LEA esi,[esi+eiz*1+0x0] that add a cycle of latency to the dep chain involving ESI. Even with -march=haswell, gcc+gas fail to use more efficient long NOPs for padding between functions. --- I'm not sure if CPUID is ever emitted by gcc's back-end directly, only from inline asm. i386/cpuid.h uses inline asm. But __get_cpuid_max() checks if CPUID is even supported in a 386-compatible way, checking if a bit in EFLAGS is sticky or not. If your source code is written safely, you won't have a problem unless possibly __builtin_cpu_init runs CPUID without checking, in programs that use __builtin_cpu_supports() or _is(). __builtin_ia32_rdpmc() and __rdtsc() do *not* check -march= before emitting rdpmc and rdtsc. Neither does __rdtscp(), which is interesting because that instruction is new enough that some still-relevant CPUs don't support it. __rdpmc() isn't "volatile", though, so stop-start optimizes to 0. (I found this bug looking for existing reports of that issue.) Test cases: https://godbolt.org/z/hqPdza FCMOV and CMOV are also handled correctly, but I didn't write functions for them. int fcomi(double x, double y) { return x<y; } gcc8.2 on Godbolt: -m32 -O3 -march=haswell code-gen (-mfpmath=387 is the default; SSE2 cmpltsd / movd / AND would have done it nicely. Or comisd / seta is maybe even better, and gcc uses that. I wonder if tune=haswell should imply fpmath=sse) fcomi(double, double): # haswell fld QWORD PTR [esp+4] fld QWORD PTR [esp+12] xor eax, eax fcomip st, st(1) # only available with register operands, and not in pp form fstp st(0) seta al ret -m32 -O3 -march=i486 code-gen fcomi(double, double): # i486 fld QWORD PTR [esp+12] fcomp QWORD PTR [esp+4] fnstsw ax test ah, 69 # check above and not unordered, couldn't just shift/and sete al and eax, 255 ret Adding -mtune=generic, we get the expected movzx instead of AND EAX, imm32, so I guess movzx was slow on 486. (In reply to Mark Hobley from comment #0) > Proposed switches: > > --nocpuid This option causes the compiler to not generate cpuid opcodes > --nocmov This option causes the compiler to not generate cmov opcodes > --nofcmov This option causes the compiler to not generate fcmov opcodes > --nofcomi This option causes the compiler to not generate fcomi opcodes > --nonopl This option causes the compiler to not generate fcomi opcodes > --nordpmc This option causes the compiler to not generate rdpmc opcodes > --nordtsc This option causes the compiler to not generate rdtsc opcodes > > Possibly a general switch that is equivalent to all of the above > > --nosupplementaryinstructions > > Rationale > > It is possible that a developer still wants to compile for a particular > architecture (for example the i486), but does not wish to generate code with > supplementary instructions (such as cpuid), that may be present on that > architecture.