[Bug inline-asm/87733] local register variable not honored with earlyclobber
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87733 Rich Felker changed: What|Removed |Added CC||bugdal at aerifal dot cx --- Comment #10 from Rich Felker --- This is a rather huge bug to have been fixed silently. Could someone who knows the commit that fixed it and information on what versions are affected attach the info to the tracker here? And ideally some information on working around it for older GCCs? >From what I can tell experimenting so far, adding a dummy "0"(r0) constraint, or using + instead of =, makes the problem go away, but potentially has other ill effects from use of an uninitialized object..?
[Bug inline-asm/87733] local register variable not honored with earlyclobber
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87733 --- Comment #12 from Rich Felker --- > You can work around it on older GCC by simply not using a register var > for more than one asm operand, I think? Nope. Making a syscall inherently requires binding specific registers for all of the inputs/outputs, unless you want to spill everything to an explicit structure in memory and load them all explicitly in the asm block. So it really is a big deal. In particular, all mips variants need an earlyclobber constraint for the output register $2 because the old Linux kernel syscall contract was that, when restartable syscalls were interrupted, the syscall number passed in through $2 was lost, and the kernel returned to $pc-8 and expected a userspace instruction to reload $2 with the syscall number from an immediate or another register. If the input to load into $2 were itself passed in $2 (possible without earlyclobber), the reloading would be ineffective and restarting syscalls would execute the wrong syscall. The original mips port of musl had undocumented and seemingly useless "0"(r2) input constraints that were suppressing this bug, using the input to bind the register where the earlyclobber output failed to do so. After some recent changes broke compatibility with older kernels requiring the above contract, I manually reverted them (due to intervening conflicting diffs) and omitted the seemingly useless constraint, and it broke horribly. Eventually I found this bug searching the tracker. My plan for now is just to add back the "0"(r2) constraint, but since r2 is uninitialized, it's not clear that having it as an input constraint is even well-defined. Is this the best thing to do?
[Bug inline-asm/87733] local register variable not honored with earlyclobber
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87733 --- Comment #16 from Rich Felker --- > I didn't say this very well... The only issue is using the same hard > register for two different operands. You don't need to do this for > syscalls (and you do not *need* that *ever*, of course). I hit the bug without using the same hard register for two operands. At least I'm pretty sure it's the same bug because the behavior matches and it's present in 6.3.0 but not 9.2.0. > Can you post some code that fails? If you think this is a GCC bug (in > some older branch?) that we should fix, please open a new PR for it. Here's the relevant code extracted out of musl: #define SYSCALL_CLOBBERLIST \ "$1", "$3", "$11", "$12", "$13", \ "$14", "$15", "$24", "$25", "hi", "lo", "memory" long syscall6(long n, long a, long b, long c, long d, long e, long f) { register long r4 __asm__("$4") = a; register long r5 __asm__("$5") = b; register long r6 __asm__("$6") = c; register long r7 __asm__("$7") = d; register long r8 __asm__("$8") = e; register long r9 __asm__("$9") = f; register long r2 __asm__("$2"); __asm__ __volatile__ ( "subu $sp,$sp,32 ; sw $8,16($sp) ; sw $9,20($sp) ; " "addu $2,$0,%4 ; syscall ;" "addu $sp,$sp,32" : "=&r"(r2), "+r"(r7), "+r"(r8), "+r"(r9) : "ir"(n), "r"(r4), "r"(r5), "r"(r6) : SYSCALL_CLOBBERLIST, "$10"); return r7 && r2>0 ? -r2 : r2; } Built with gcc 6.3.0, %4 ends up expanding to $2, violating the earlyclobber, and %0 gets bound to $16 rather than $2 (which is why the violation is allowed, it seems). With "0"(r2) added to input constraints, the bug goes away. I don't particularly think this bug is something that needs to be fixed in older branches, especially if doing so is hard, but I do think it's something we need a solid reliable workaround for.
[Bug inline-asm/87733] local register variable not honored with earlyclobber
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87733 --- Comment #19 from Rich Felker --- > This looks like bad inline asm. You seem to be using $2, $8, $9 and $sp > explicitly and not letting the compiler know you are using them. $2, $8, and $9 are all explicitly outputs. All changes to $sp are reversed before the asm ends and there are no memory operands which could be sp-based and thereby invalidated by temp changes to it. > I think you want to change those to %0, %2 and %3 and adding one for $sp? All that does it make the code harder to read and more fragile against changes to the order the constraints are written in. > ...and "n" is an argument register, so why use "ir" for n's constraint? > Shouldn't that just be "r"? Maybe that is confusing IRA/LRA/reload? The code has been reduced as a standalone example that still reproduced the bug, from a static inline function that was inlined into a function with exactly the same signature. The static inline has a constant n after constant propagation for almost all places it gets inlined, so it "ir" constraint makes sense there. However, removing the "i" does not make the problem go away anyway.
[Bug inline-asm/87733] local register variable not honored with earlyclobber
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87733 --- Comment #22 from Rich Felker --- What should I call the new bug? The description sounds the same as this one, and it's fixed in gcc 9.x, just not earlier versions, so it seems to be the same bug.
[Bug inline-asm/87733] local register variable not honored with earlyclobber
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87733 --- Comment #24 from Rich Felker --- The reasons I was hesitant to force n to a particular register through an extra register __asm__ temp var was that I was unsure how it would interact with the "i" constraint (maybe prevent it from being used?) and that this is code that needs to be inlined all over the place, and adding more specific-register constraints usually hurts register allocation in all functions where it's used. If the "0"(r2) input constraint seems unsafe to rely on with r2 being uninitialized (is this a real concern I should have?) just writing 0 or n to r2 before the asm would only waste one instruction and shouldn't really hurt.
[Bug inline-asm/87733] local register variable not honored with earlyclobber
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87733 --- Comment #26 from Rich Felker --- Indeed, I just confirmed that binding the n input to a particular register prevents the "i" part of the "ir" alternative from working.
[Bug inline-asm/87733] local register variable not honored with earlyclobber
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87733 --- Comment #27 from Rich Felker --- Also just realized: > Rich, forcing "n" to be in "$r10" seems to do the trick? Is that a reasonable solution for you? It doesn't even work, because the syscall clobbers basically all call-clobbered registers. Current kernels are preserving at least $25 (t9) and $28 (gp) and the syscall argument registers, so $25 may be usable, but it was deemed not clear in 2012. I'm looking back through musl git history, and this is actually why the "i" alternative was wanted -- in basically all uses, "i" is satisfiable, and avoids needing to setup a stack frame and spill a call-saved register to the stack in order to use it to hold the syscall number to reload on restart.
[Bug inline-asm/87733] local register variable not honored with earlyclobber
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87733 --- Comment #28 from Rich Felker --- And it looks like I actually hit this exact bug back in 2012 but misattributed it: https://git.musl-libc.org/cgit/musl/commit/?id=4221f154ff29ab0d6be1e7beaa5ea2d1731bc58e I assumed things went haywire from using two separate "r" constraints, rather than "r" and "0", to bind the same register, but it seems the real problem was that the "=&r"(r2) was not binding at all, and the "0"(r2) served to fix that.
[Bug inline-asm/87733] local register variable not honored with earlyclobber
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87733 --- Comment #30 from Rich Felker --- > You need to make $r10 not a clobber but an inout, of course. And not That's not a correct constraint, because it's clobbered by the kernel between the first syscall instruction's execution and the second execution of the addu instruction after the kernel returns to restart it. $10 absolutely needs to be a clobber because the kernel clobbers it. The asm block can't use any registers the kernel clobbers. > allowing the "i" just costs one more register move, not so bad imo. > So you do have a workaround now. Of course we should see if this can > actually be fixed instead ;-) I don't follow. As long as the "i" gets chosen, the asm inlines nicely. If not, it forces a gratuitous stack frame to spill a non-clobberlisted register to use as the input. The code has been working for the past 8 years with the "0"(r2) input constraint added, and would clearly be valid if r2 were pre-initialized with something.
[Bug inline-asm/87733] local register variable not honored with earlyclobber
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87733 --- Comment #33 from Rich Felker --- > An asm clobber just means "may be an output", and no operand will be assigned > a register mentioned in a clobber. There is no magic. This, plus the compiler cannot assume the value in any of the clobbered registers is preserved across the asm statement. > This is inlined just fine? It produces *wrong code* so it doesn't matter if it inlines fine. $10 is modified by the kernel in the event the syscall is restarted, so the wrong value will be loaded on restart.
[Bug inline-asm/87733] local register variable not honored with earlyclobber
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87733 --- Comment #35 from Rich Felker --- > Oh, your real code is different, and $10 doesn't work for that? I see. No, the real code is exactly that. What you're missing is that the kernel, entered through syscall, has a jump back to the addu after it's clobbered all the registers in the clobberlist if the syscall is interrupted and needs to be restarted.
[Bug tree-optimization/14441] [tree-ssa] missed sib calling when types change
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=14441 Rich Felker changed: What|Removed |Added CC||bugdal at aerifal dot cx --- Comment #11 from Rich Felker --- I've hit what seems to be this same issue on x86_64 with minimal test case: long g(void); int f(void) { return g(); } It's actually really annoying because it causes all of the intended tail-call handling of syscall returns in musl to be non-tail calls since __syscall_ret returns long (needed for a few syscalls) but most thin syscall-wrapper functions return int. If the x86_64 version is not this same issue but something separate I can open a new bug for it.
[Bug c/94631] New: Wrong codegen for arithmetic on bitfields
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94631 Bug ID: 94631 Summary: Wrong codegen for arithmetic on bitfields Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: bugdal at aerifal dot cx Target Milestone: --- Test case: struct foo { unsigned long long low:12, hi:52; }; unsigned long long bar(struct foo *p) { return p->hi*4096; } Should generate only a mask off of the low bits, but gcc generates code to mask off the low 12 bits and the high 12 bits (reducing the result to 52 bits). Presumably GCC is interpreting the expression p->hi as having a phantom type that's only 52 bits wide, rather than having type unsigned long long. clang/LLVM compiles it correctly. I don't believe there's any language in the standard supporting what GCC is doing here.
[Bug c/94631] Wrong codegen for arithmetic on bitfields
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94631 --- Comment #2 from Rich Felker --- So basically the outcome of DR120 was allowing the GCC behavior? It still seems like a bad thing, not required, and likely to produce exploitable bugs (due to truncation of arithmetic) as well as very poor-performance code (due to constant masking).
[Bug c/94631] Wrong codegen for arithmetic on bitfields
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94631 --- Comment #5 from Rich Felker --- No, GCC's treatment also seems to mess up bitfields smaller than int and fully governed by the standard (no implementation-defined use of non-int types): struct foo { unsigned x:31; }; struct foo bar = {0}; bar.x-1 should yield UINT_MAX but yields -1 (same representation but different type) because it behaves as a promotion from a phantom type unsigned:31 to int rather than as having type unsigned to begin with. This can of course be observed by comparing it against 0. It's subtle and dangerous because it may also trigger optimization around UB of signed overflow when the correct behavior would be well-defined modular arithmetic.
[Bug c/94631] Wrong codegen for arithmetic on bitfields
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94631 --- Comment #7 from Rich Felker --- Can you provide a citation for that?
[Bug c/94631] Wrong codegen for arithmetic on bitfields
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94631 --- Comment #8 from Rich Felker --- OK, I think it's in 6.3.1.1 Boolean, characters, and integers, ¶2, but somewhat poorly worded: "The following may be used in an expression wherever an int or unsigned int may be used: - An object or expression with an integer type (other than int or unsigned int) whose integer conversion rank is less than or equal to the rank of int and unsigned int. - A bit-field of type _Bool, int, signed int, or unsigned int. If an int can represent all values of the original type (as restricted by the width, for a bit-field), the value is converted to an int; otherwise, it is converted to an unsigned int. These are called the integer promotions." The first sentence with second bullet point suggests it should behave as unsigned int, but the "as restricted by the width, for a bit-field" in the paragraph after after the bulleted list seems to confirm your interpretation.
[Bug target/94643] New: [x86_64] gratuitous sign extension of nonnegative value from 32 to 64 bits
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94643 Bug ID: 94643 Summary: [x86_64] gratuitous sign extension of nonnegative value from 32 to 64 bits Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: bugdal at aerifal dot cx Target Milestone: --- Test case: #include uint16_t a[]; uint64_t f(int i) { return a[i]*16; } Produces: movslq %edi, %rdi movzwl a(%rdi,%rdi), %eax sall$4, %eax cltq ret The value is necessarily in the range [0,1M) (in particular, nonnegative) and operation on eax has already cleared the upper bits of rax, so cltq is completely gratuitous. I've observed the same in nontrivial examples where movslq gets used.
[Bug target/94646] New: [arm] invalid codegen for conversion from 64-bit int to double hardfloat
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94646 Bug ID: 94646 Summary: [arm] invalid codegen for conversion from 64-bit int to double hardfloat Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: bugdal at aerifal dot cx Target Milestone: --- GCC emits a call to __aeabi_l2d to convert from long long to double. This is invalid for hardfloat ABI because it does not honor rounding modes or raise exception flags. That in turn causes the implementation of fma in musl libc to produce wrong results for non-default rounding modes.
[Bug target/91970] arm: 64bit int to double conversion does not respect rounding mode
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91970 --- Comment #12 from Rich Felker --- There's some awful hand-written asm in libgcc/config/arm/ieee754-df.S replacing the standard libgcc2.c versions; that's the problem. But in order to use the latter it would need to be compiled with -mfloat-abi=softfp since the __aeabi_l2d function (and all the __aeabi_* apparently) use the standard soft-float EABI even on EABIHF targets. I'm not sure why you want a library function to be called for this on hardfloat targets anyway. Inlining the hi*0x1p32+lo is almost surely smaller than the function call, counting spills and conversion of the result back from GP registers to an FP register. It seems like GCC should be able to inline this idiom at a high level for *all* targets that lack a floatdidf operation but have floatsidf. Of course a high level fix is going to be hell to backport, and this really needs a backportable fix or workaround (maintained in mcm not upstream gcc) from musl perspective. Maybe the easiest way to do that is just to hack the right preprocessor conditions for a hardfloat implementation into ieee754-df.S...
[Bug tree-optimization/95097] New: Missed optimization with bitfield value ranges
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95097 Bug ID: 95097 Summary: Missed optimization with bitfield value ranges Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: bugdal at aerifal dot cx Target Milestone: --- #include struct foo { uint32_t x:20; }; int bar(struct foo f) { if (f.x) { uint32_t y = (uint32_t)f.x*4096; if (y<200) return 1; else return 2; } return 3; } Here, truth of the condition f.x implies y>=4096, but GCC does not DCE the y<200 test and return 1 codepath. I actually had this come up in real world code, where I was considering use of an inline function with nontrivial low size cases when a "page count" bitfield is zero, where I expected these nontrivial cases to be optimized out based on already having tested that the page count being nonzero, but GCC was unable to do it. LLVM/clang does it.
[Bug middle-end/95249] New: Stack protector runtime has to waste one byte on null terminator
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95249 Bug ID: 95249 Summary: Stack protector runtime has to waste one byte on null terminator Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: bugdal at aerifal dot cx Target Milestone: --- At least glibc presently stores a null byte in the first byte of the stack protector canary value, so that string-based read overflows can't leak the canary value. On 32-bit targets, this wastes a significant portion of the randomness, making it possible that massive-scale attacks (e.g. against millions of mobile or IoT devices) will have a decent chance of some success bypassing stack protector. musl presently does not zero the first byte, but I received a suggestion that we should do so, and got to thinking about the tradeoffs involved. If GCC would skip one byte below the canary, the full range of values could be used by the stack protector runtime without the risk of string-read-based disclosure. This should be inexpensive in terms of space and time to store a single 0 byte on the stack.
[Bug middle-end/95249] Stack protector runtime has to waste one byte on null terminator
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95249 --- Comment #2 from Rich Felker --- Indeed, using an extra zero pad byte could bump the stack frame size by 4 or 8 or 16 bytes, or could leave it unchanged, depending on alignment prior to adding the byte and the alignment requirements of the target.
[Bug middle-end/95558] New: Invalid IPA optimizations based on weak definition
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95558 Bug ID: 95558 Summary: Invalid IPA optimizations based on weak definition Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: bugdal at aerifal dot cx Target Milestone: --- Created attachment 48689 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48689&action=edit test case Here is a case that came up in WIP code on musl libc, where I wanted to provide a weak dummy definition for functionality that would optionally be replaced by a strong definition elsewhere at ld time. I've been looking for some plausible explanation aside from an IPA bug, like interaction with UB, but I can't find any. In the near-minimal test case here, the function reclaim() still has all of the logic it should, but reclaim_gaps gets optimized down to a nop. What seems to be happening is that the dummy weak definition does not leak into its direct caller via IPA optimizations, but does leak to the caller's caller.
[Bug ipa/95558] Invalid IPA optimizations based on weak definition
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95558 --- Comment #2 from Rich Felker --- Wow. It's interesting that we've never seen this lead to incorrect codegen before, though. All weak dummies should be affected, but only in some cases does the pure get used to optimize out the external call. This suggests there's a major missed optimization around pure functions too, in addition to the wrong application of pure (transfering it from the weak definition to the external declaration) that's the bug.
[Bug ipa/95558] Invalid IPA optimizations based on weak definition
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95558 --- Comment #3 from Rich Felker --- In addition to a fix, this is going to need a workaround as well. Do you have ideas for a clean one? A dummy asm in the dummy function to kill pureness is certainly a big hammer that would work, but it precludes LTO optimization if the weak definition doesn't actually get replaced, so I don't like that. One idea I think would work, but not sure: make an external __weak_dummy_tail function that all the weak dummies tail call to. This should only take a few bytes more than just returning, and precludes pureness analysis in the TU it's in, while still allowing DCE at LTO time when the definition of __weak_dummy_tail becomes available. Is my reasoning correct here?
[Bug target/95921] New: [m68k] invalid codegen for __builtin_sqrt
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95921 Bug ID: 95921 Summary: [m68k] invalid codegen for __builtin_sqrt Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: bugdal at aerifal dot cx Target Milestone: --- On ISA levels below 68040, __builtin_sqrt expands to code that performs an extended-precision sqrt operation rather than a double-precision one. Not only does this give the wrong result; it enables further cascadingly-wrong optimization ala #93806 and related bugs, because the compiler thinks the value in the output register is a double, but it's not. I think the right fix is making the rtl in m68k.md only allow long double operands unless ISA level is at least 68040, in which case the correctly-rounding instruction can be used. Then the standard function will be used instead of a builtin definition, and it can patch up the result accordingly.
[Bug target/95921] [m68k] invalid codegen for __builtin_sqrt
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95921 --- Comment #1 from Rich Felker --- I wonder if the fact that GCC thinks the output of the insn is already double suggests other similar bugs in the m68k backend, though... If extended precision were working correctly, I'd think it would at least expect the result to have extended precision and be trying to drop the excess precision separately. But it's not; it's just returning. Here's my test case: double my_sqrt(double x) { return __builtin_sqrt(x); } with -O2 -std=c11 -fno-math-errno -fomit-frame-pointer The last 2 options are non-critical (GCC still uses the inline insn even with -fmath-errno and branches only for the exceptional case) but clean up the output so it's more clear what's going on.
[Bug target/95921] [m68k] invalid codegen for __builtin_sqrt
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95921 --- Comment #3 from Rich Felker --- Yes,I'm aware m68k has FLT_EVAL_METHOD=2. That's not license for *functions* to return excess precision. The language specification is very clear about where excess precision is and isn't kept, and here it must not be. All results are deterministic even with excess precision. Moreover if there's excess precision where gcc's middle end didn't expect it, it will turn into cascadingly wrong optimization, possibly even making pure integer results wrong.
[Bug target/95921] [m68k] invalid codegen for __builtin_sqrt
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95921 --- Comment #4 from Rich Felker --- The related issue I meant to link to is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93681 which is for x87, but the equivalent happens on m68k due to FLT_EVAL_METHOD being 2 here as well.
[Bug preprocessor/96952] __builtin_thread_pointer support cannot be probed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96952 Rich Felker changed: What|Removed |Added CC||bugdal at aerifal dot cx --- Comment #3 from Rich Felker --- This answer does not seem satisfactory. Whether it will be optimized is not the question. Just whether it's semantically defined. That should either be universally true on GCC versions that offer the builtin (via a libgcc function if nothing else is available) or target-specific (which is known at preprocessing time).
[Bug libstdc++/93421] futex.cc use of futex syscall is not time64-compatible
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93421 --- Comment #2 from Rich Felker --- Rather than #if defined(SYS_futex_time64), I think it should be made: #if defined(SYS_futex_time64) && SYS_futex_time64 != SYS_futex This is in consideration of support for riscv32 and future archs without legacy syscalls. It's my intent in musl to accept the riscv32 port with SYS_futex defined to be equal to SYS_futex_time64; otherwise all software making use of SYS_futex gratuitously breaks.
[Bug libstdc++/93421] futex.cc use of futex syscall is not time64-compatible
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93421 --- Comment #4 from Rich Felker --- Actually I didn't see it, I just saw Florian added to CC and it reminded me of the issue, which reminded me I needed to check this for riscv32 issues with the riscv32 port pending merge. :-)
[Bug target/12306] GOT pointer (r12) reloaded unnecessarily
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=12306 Rich Felker changed: What|Removed |Added CC||bugdal at aerifal dot cx --- Comment #8 from Rich Felker --- I think this should be closed as not a bug. There is no contract that, on function entry, the r12 register contain the callee's GOT pointer. Rather it contains the caller's GOT pointer, and the two will only be equal if both reside in the same DSO. (Note that PowerPC64 ELFv2 ABI goes to great lengths to optimize this case with "local entry point" and fancy ABI contract for how the GOT pointer save/load can be elided. I'm not sure the benefits are well-documented though.)
[Bug tree-optimization/60540] Don't convert int to float when comparing int with float (double) constant
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60540 Rich Felker changed: What|Removed |Added CC||bugdal at aerifal dot cx --- Comment #6 from Rich Felker --- > Only if the int is out of float's range. float's range is [-INF,INF] (endpoints included). There is no such thing as "out of float's range".
[Bug middle-end/56888] memcpy implementation optimized as a call to memcpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56888 --- Comment #41 from Rich Felker --- > Josef Wolf mentioned that he ran into this on the gcc-help mailing list here: > https://gcc.gnu.org/ml/gcc-help/2019-10/msg00079.html I don't think that's an instance of this issue. It's normal/expected that __builtin_foo compiles to a call to foo in the absence of factors that lead to it being optimized to something simpler. The idiom of using __builtin_foo to get the compiler to emit an optimized implementation of foo for you, to serve as the public definition of foo, is simply not valid. That's kinda a shame because it would be nice to be able to do it for lots of math library functions, but of course in order for this to be able to work gcc would have to promise it can generate code for the operation for all targets, which is unlikely to be reasonable.
[Bug tree-optimization/60540] Don't convert int to float when comparing int with float (double) constant
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60540 --- Comment #8 from Rich Felker --- > Floating point types are not guaranteed to support infinity by the C standard Annex F (IEEE 754 alignment) does guarantee it, and GCC aims to implement this. This issue report is specific to target sh*-*-* which uses either softfloat with IEEE types and semantics or SH4 hardfloat which has IEEE types and semantics. So arguments about generality to non-Annex-F C environments are not relevant to the topic here.
[Bug tree-optimization/60540] Don't convert int to float when comparing int with float (double) constant
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60540 --- Comment #10 from Rich Felker --- GCC can choose the behavior for any undefined behavior it wants, and GCC absolutely can make transformations based on behaviors it guarantees or that Annex F guarantees on targets for which it implements the requirements of Annex F. On this particular target, and on every target of any modern relevance, (float)16777217 has well-defined behavior. On ones with floating point environment (most/all hardfloat), it has side effects (inexact), so can't be elided without the flags to make gcc ignore those side effects.
[Bug target/91886] [10 regression] powerpc64 impossible constraint in asm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91886 Rich Felker changed: What|Removed |Added CC||bugdal at aerifal dot cx --- Comment #3 from Rich Felker --- The affected code is in musl and I'd like to get this resolved. Are there different constraints we should be using instead here, or is this a bug that will be fixed on the GCC side?
[Bug target/91886] [10 regression] powerpc64 impossible constraint in asm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91886 --- Comment #8 from Rich Felker --- > Then LLVM has more to fix. Constraints never look at types. A register > constraint (like "wa") simply says what registers are valid. This is blatently false. For x86: int foo(int x) { __asm__("" : "+f"(x)); return x; } yields "error: inconsistent operand constraints in an 'asm'". > For many w* using it in inline asm is plain wrong; for the rest of the > register constraints it is useless, plain "wa" should be used; and there > are some special ones that are so far GCC implementation detail that you > probably wouldn't even consider using them. The asm register constraints are a public interface of "GNU C" for the particular target architecture. Randomly removing them is a breaking change in the language. There is no documented or even reliable way to detect which ones work correctly for a particular compiler version, so change or removal of semantics is particularly problematic. > The maintenance cost for all the constraints we keep around because some > important projects used them is considerable, fwiw. One line in a table to preserve stability of the language is not what I call "maintenance cost".
[Bug target/91886] [10 regression] powerpc64 impossible constraint in asm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91886 --- Comment #9 from Rich Felker --- And ok, to be more productive rather than just angry about the regression, if you really think the "ws" constraint should be removed, what is the proper preprocessor/configure-time check to determine the right constraint and asm form to use without special-casing specific compiler names and versions? Short of an answer to that, the only solution I can see to this on our side is just disabling the asm if a configure check determines that the current code doesn't compile, and that would be rather bleh.
[Bug target/91886] [10 regression] powerpc64 impossible constraint in asm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91886 --- Comment #13 from Rich Felker --- > That does not look at types. It complains that "x" lives in memory, > while the constraint requires a register (a floating point register). That does not sound accurate. An (in this case lvalue since it's an output too, but for input0-only, non-lvalues as well) expression does not "live in memory"; that is not the problem here. Unless the address has leaked outside of the current scope of transformations, objects may not live in memory at all, only in registers, or a mix of registers and temporary spills, etc. The asm constraints guide the compiler's choices of where to put it (and may *force* it to be moved back/forth, e.g. if you use an "m" constraint for a variable that could otherwise be kept in a register, or a register constraint for one that otherwise would only be accessed via memory), not the other way around. The problem here is that GCC has no way to bind an integer expression to a floating point register. It *is* a matter of type. There are probably subtleties to this that I don't understand, but it's not about "living in memory". > No, they are not. The constraints are an implementation detail. And > they *have* to be, or we could never again improve anything. If they are in the documentation, they're not implementation details. They're interfaces necessary to be able to use inline asm, which is a documented and important feature of the "GNU C" language. In particular, "ws" is documented for the purpose we're using it for: https://gcc.gnu.org/onlinedocs/gcc-9.2.0/gcc/Machine-Constraints.html > Unfortunately we currently document most of them in the user manual as > well. It's on my list of things to change, for GCC 10. Most targets > still have this problem, fwiw. If you intend to remove documented functionality on an even larger scale, that is a breaking change, and one which I will (loudly) oppose. If there are legitimate reasons for such changes to be made internally, a layer should be put in place so that code using the constraints continues to work without imposing on the backend implementation. > What I am talking about is that people rely on implementation details > no matter what we do, and then prevent us from changing them. That may be true, but it's not related to this bug report and I have not seen evidence of it happening. I'll gladly fix it if we're doing that anywhere.
[Bug target/91886] [10 regression] powerpc64 impossible constraint in asm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91886 --- Comment #14 from Rich Felker --- > So, if "ws" has been documented in the user documentation, perhaps just > (define_register_constraint "ws" "rs6000_constraints[RS6000_CONSTRAINT_wa]" > "Compatibility alias to wa") > could be added? If it has not been documented, it is fine to remove it. It is clearly documented here: https://gcc.gnu.org/onlinedocs/gcc-9.2.0/gcc/Machine-Constraints.html Whoever removed it in gcc 10 was aware of this because they explicitly deleted it from the documentation: https://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html This should not be a permitted change, at least not without major discussion to reach consensus.
[Bug target/91886] [10 regression] powerpc64 impossible constraint in asm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91886 --- Comment #16 from Rich Felker --- > Using "ws" in inline asm never made sense. It was always the same as > "wa", for all cases where either could be used in inline asm at all. It made sense insomuch as it was documented and was the most clearly-documented as matching the intended usage case, and still makes sense in that the other widely-used compiler did not properly (according to your interpretation) implement "wa", so that "ws" was the only constraint that worked with both.
[Bug target/91886] [10 regression] powerpc64 impossible constraint in asm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91886 --- Comment #18 from Rich Felker --- > So use "wa" instead of "ws" in the two files you use it, and can we get > on with our lives? Translation: Introduce a regression on all existing versions of clang because GCC broke a documented public interface. How about no? > The places where in the past we put old internal constraints (and output > modifiers) back are places where for example glibc used it in installed > headers. That takes a decade or more to fix. These are not old internal constraints. They're publicly documented ones.
[Bug target/91886] [10 regression] powerpc64 impossible constraint in asm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91886 --- Comment #20 from Rich Felker --- > After both musl and LLVM are fixed, if you then *still* feel you > need "ws", then we can talk of course. Deal? No, it's not a deal. Your proposal is *breaking all currently-working versions* of clang because GCC wants to remove a documented public interface. I don't make users of one tool suffer because the maintainers of another tool broke things. That would not be responsible maintenance on my part. If GCC is committed to breaking this, I'll make a configure check to fallback to the C implementation if "ws" does not work, and ship patches in musl-cross-make to fix the GCC regression so that users who get the patch won't be affected.
[Bug target/91886] [10 regression] powerpc64 impossible constraint in asm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91886 --- Comment #22 from Rich Felker --- And to be clear, pretty much all gcc versions from 3.4.6 to present, and all clang/LLVM versions since they fixed some critical bugs (like -ffreestanding not working, which was a show-stopper), are supported compilers for targets they support. We do not drop support for some existing supported compilers because the latest version made an incompatible change.
[Bug target/91886] [10 regression] powerpc64 impossible constraint in asm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91886 --- Comment #24 from Rich Felker --- > Sure, and I'll do that, *if there are users*, *after they fix their stuff*. Nothing is broken on our side here. We are using the documented functionality from gcc 9 going all the way back to whatever version first added this stuff. > I will not add back all constraints I removed. I *cannot* add back many of > those constraints, not with similar semantics anyway. > > Oh, and there were 24 I removed for 10, if I count correctly. All were > internal. That they were documented was a bug; How many others were actually-internal vs having this ridiculous excuse of "it was a bug that we documented it so we can retroactively blame programmers for using it rather than taking responsibility for the contract we documented"? Are any of the "*cannot* add back" ones things that were documented? If not, then you can add back all the ones that were documented with no harm done to anything. If there really are technical reasons that some of the ones removed are difficult to recreate, please say so. I would still strongly disagree with the choice to make such a regression, but at least it would have some reasonable motivation rather than the only motivation I've seen so far, which seems to be your desire to break things on a whim. > that one was actually used > by any program was unexpected (and it took literally half a year before this > was found out, or reported here at least). At least "ws" and "ww" are used, for fmax, fmaxf, fmin, and fminf. The reason it was not found for "literally half a year" is because the regression is not present in any release. Users generally do not use unstable GCC; my understanding is that it's not recommended to do so. > The point is that we will never get to a good state if we cannot fix > up any historical mistakes. That's an extreme exaggeration. There is nothing holding back a "good state" about having two aliases for "wa" to preserve documented historical behavior.
[Bug target/91886] [10 regression] powerpc64 impossible constraint in asm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91886 --- Comment #27 from Rich Felker --- > Have I already mentioned that if any program "in the wild" will use "ws" with > GCC 10, then of course we can add an alias (to "wa") for it? No program > should > use "ws" in inline assembler, ever, but if some programs cannot fix that, we > can fix it for them, sure. I would very much appreciate it if the "ws" (and "ww") aliases could be added. I hope you can appreciate how clang users would respond when linked to this BZ ticket after musl broke for them, if we just changed it to use "wa". Even if (rather, "even though" - I believe you that they're wrong) clang is wrong to reject "wa" here, it would come across to them as completely unreasonable that we broke the constraints that previously worked fine on all compilers. I'm not sure if you would rather us have to do some sort of configure-time check here. Maybe one can be devised that doesn't risk wrong semantics when we can only measure whether the compiler accepts it, not whether it generates the wrong code, but I don't know how (and what's to guarantee that someday someone won't, seeing the combinations "ws" and "ww" as unused, invent a new meaning for one of them?), and even if it is possible, I would find such a configure check to be really ugly long-term maintenance cost in relation to a simple alias to preserve the long-documented behavior.
[Bug target/91886] [10 regression] powerpc64 impossible constraint in asm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91886 --- Comment #29 from Rich Felker --- For reference, here are the affected functions in musl (fmax, fmaxf, fmin, fminf): https://git.musl-libc.org/cgit/musl/tree/src/math/powerpc64?id=v1.1.24
[Bug target/91886] [10 regression] powerpc64 impossible constraint in asm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91886 --- Comment #34 from Rich Felker --- > Does musl not support BE? There is nothing about this that is LE-only > as far as I can see. For powerpc64, musl supports both BE and LE, and uses "elfv2" ABI for both (since it was not present as a target for musl before the new ABI existed). Per the IBM docs, LE/elfv2 (which they confusingly equate) "require" power8+, but there are not actually any constraints in the ABI that impose such a requirement (e.g. argument-passing in registers that previous ISA levels didn't have), and we don't impose it. I believe there are people using musl on pre-power8 powerpc64 systems, at least in BE mode and possibly also in LE mode.
[Bug c/92571] New: gcc erroneously rejects , operator in array dimensions as syntax error
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92571 Bug ID: 92571 Summary: gcc erroneously rejects , operator in array dimensions as syntax error Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: bugdal at aerifal dot cx Target Milestone: --- void foo() { int a[1,1]; } produces: error: expected ']' before ',' token despite the declaration being a valid (variable length, since the comma operator cannot participate in an integer constant expression) array declaration. I found this while testing for whether -Wvla would catch such "gratuitously variable-length" arrays due to comma operator. Obviously this should be caught by both that and whatever warning is appropriate for "you probably meant multi-dimensional array". But it is valid C and should not be rejected as a syntax error.
[Bug c/92571] gcc erroneously rejects , operator in array dimensions as syntax error
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92571 --- Comment #1 from Rich Felker --- Note that I put it in a function just because VLA is invalid at file scope, and I wanted to be clear that this bug is independent of the invalidity of VLA at file scope.
[Bug c/92571] gcc erroneously rejects , operator in array dimensions as syntax error
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92571 Rich Felker changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |INVALID --- Comment #2 from Rich Felker --- Sorry for the noise. Per 6.7.6 Declarators, the expression in [] is assignment-expression, defined in 6.5.16 Assignment operators, which does not include the comma operator. I'm not sure whether there's still an element to this report that the error message could be more useful, but it seems it's not a bug but a quirk in the language spec.
[Bug c/61579] -Wwrite-strings does not behave as a warning option
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61579 --- Comment #6 from Rich Felker --- Ping.
[Bug libstdc++/93325] New: libstdc++ wrongly uses direct clock_gettime syscall on non-glibc, breaks time64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93325 Bug ID: 93325 Summary: libstdc++ wrongly uses direct clock_gettime syscall on non-glibc, breaks time64 Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: bugdal at aerifal dot cx Target Milestone: --- The configure logic for libstdc++ is choosing to make direct clock_gettime syscalls (via syscall()) rather than using the clock_gettime function except on glibc 2.17 or later (when it was moved from librt to libc). This is incompatible with time64 (because struct timespec mismatches the form the old clock_gettime syscall uses) and also undesirable because it can't take advantage of vdso. The hard-coded glibc version dependency is a configure anti-pattern and should be removed; the right way to test this would be just probing for the clock_gettime function without adding any libs (like -lrt).
[Bug libstdc++/93421] New: futex.cc use of futex syscall is not time64-compatible
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93421 Bug ID: 93421 Summary: futex.cc use of futex syscall is not time64-compatible Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: bugdal at aerifal dot cx Target Milestone: --- Created attachment 47704 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47704&action=edit simple fix, not necessarily right for upstream This code directly passes a userspace timespec struct to the SYS_futex syscall, which does not work if the userspace type is 64-bit but the syscall expects legacy 32-bit timespec. I'm attaching the patch we're using in musl-cross-make to fix this. It does not attempt to use the SYS_futex_time64 syscall, since that would require fallback logic with cost tradeoffs for which to try first, and since the timeout is relative and therefore doesn't even need to be 64-bit. Instead it just uses the existence of SYS_futex_time64 to infer that the plain SYS_futex uses a pair of longs, and converts the relative timestamp into that. This assumes that any system where the libc timespec type has been changed for time64 will also have had its headers updated to define SYS_futex_time64. Error handling for extreme out-of-bound values should probably be added.
[Bug middle-end/93509] New: Stack protector should offer trap-only handling
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93509 Bug ID: 93509 Summary: Stack protector should offer trap-only handling Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: bugdal at aerifal dot cx Target Milestone: --- Presently stack protector functionality depends on making a call to __stack_chk_fail (possibly via __stack_chk_fail_local to avoid PLT-call-ABI constraint in the caller). This is less secure than it could be, since it depends on the ability to make function calls (and possibly operate on global data and make syscalls in the callee) in a process whose state is compromised. For example the GOT entries used by PLT could be clobbered or %gs:0x10 (i386 syscall vector) could be clobbered by the same stack-based overflow that caused the stack protector event in the first place. In https://gcc.gnu.org/ml/gcc/2020-01/msg00483.html where the topic is being discussed for other reasons (contract between gcc and libc for where these symbols are provided), I proposed that GCC should offer an option to emit a trapping instruction directly, instead of making a function call, analogous to -fsanitize-undefined-trap-on-error for UBSan. This would work well on all targets where __builtin_trap is defined, but would regress (requiring PLT call) on targets where it uses the default abort() definition (are there any relevant ones?). Segher Boessenkool then requested I file this here on the GCC tracker. Note: I'm filing this for middle-end because that was my best guess of where GCC handles it, but it's possible all this logic is repeated in each target or takes place somewhere else entirely; if so please reassign to appropriate component.
[Bug target/65249] unable to find a register to spill in class 'R0_REGS' when compiling protobuf on sh4
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65249 Rich Felker changed: What|Removed |Added CC||bugdal at aerifal dot cx --- Comment #27 from Rich Felker --- We've hit what seems like almost the exact same issue on gcc 8.3.0 with this minimized testcase: void fg(int *); int get_response(int a) { int b; if (a) fg(&b); return 0; } compiled with -O -c -fstack-protector-strong for sh2eb-linux-muslfdpic. With gcc 9.2.0 it compiles successfully. I looked for a record of such a fix having been made, but couldn't find one. Was it a known issue that was fixed silently, or might it be a lurking bug that's just no longer being hit?
[Bug c/82318] -fexcess-precision=standard has no effect on a libm function call
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82318 Rich Felker changed: What|Removed |Added CC||bugdal at aerifal dot cx --- Comment #5 from Rich Felker --- My understanding is that C2x is fixing this underspecification and will require the library functions to drop excess precision as if they used a return statement. So this really should be fixed in glibc if it's still an issue; if they accept fixing that I don't think GCC needs any action on this. I just fixed it in musl.
[Bug c++/93620] New: Floating point is broken in C++ on targets with excess precision
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93620 Bug ID: 93620 Summary: Floating point is broken in C++ on targets with excess precision Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: bugdal at aerifal dot cx Target Milestone: --- Attempting to use -fexcess-precision=standard with g++ produces: cc1plus: sorry, unimplemented: '-fexcess-precision=standard' for C++ In light of eldritch horrors like pr 85957 this means floating point is essentially catastrophically broken on i386 and m68k. This came to my attention while analyzing https://github.com/OSGeo/PROJ/issues/1906. Most of the problems are g++ incorrectly handling excess precision, and they're having to put awful hacks with volatile objects in place to work around it.
[Bug middle-end/323] optimized code gives strange floating point results
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=323 --- Comment #210 from Rich Felker --- If new reports are going to be marked as duplicates of this, then can it please be moved from SUSPENDED status to REOPENED? The situation is far worse than what seems to have been realized last this was worked on, as evidenced by pr 85957. These issues just came up again breaking real-world software in https://github.com/OSGeo/PROJ/issues/1906
[Bug middle-end/323] optimized code gives strange floating point results
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=323 --- Comment #211 from Rich Felker --- If new reports are going to be marked as duplicates of this, then can it please be moved from SUSPENDED status to REOPENED? The situation is far worse than what seems to have been realized last this was worked on, as evidenced by pr 85957. These issues just came up again breaking real-world software in https://github.com/OSGeo/PROJ/issues/1906
[Bug middle-end/323] optimized code gives strange floating point results
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=323 --- Comment #214 from Rich Felker --- I'm not particular in terms of the path it takes as long as this gets back to a status where it's on the radar for fixing.
[Bug c/85957] i686: Integers appear to be different, but compare as equal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85957 --- Comment #12 from Rich Felker --- Note that -fexcess-precision=standard is not available in C++ mode to fix this. However, -ffloat-store should also ensure consistency to the optimizer (necessary to prevent this bug, and other variants of it, from happening) at the expense of some extreme performance and code size costs and making the floating point results even more semantically incorrect (double-rounding all over the place, mismatching FLT_EVAL_METHOD==2) and -ffloat-store is available in C++ mode. Despite all these nasty effects, it may be a suitable workaround, and at least it avoids letting the optimizer prove 0==1, thereby effectively treating any affected code as if it contained UB. Note that in code written to be excess-precision-aware, making use of float_t and double_t for intermediate operands and only using float and double for in-memory storage, -ffloat-store should yield behavior equivalent to -fexcess-precision=standard.
[Bug c/82318] -fexcess-precision=standard has no effect on a libm function call
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82318 --- Comment #7 from Rich Felker --- I'll inquire about it. Note that F.6 already requires this for C functions; the loophole is just that the implementation itself does not inherently have to consist of C functions. If it's determined that C won't require the library functions not bound to IEEE operations to return values representable in their nominal type, then GCC needs to be aware of whether the target libc can be expected to do so, and if not, it needs to, as a special case, assume there might be excess precision in the return value, so that (double)retval==retval can't be assumed to be true in the optimizer. Note that such an option would be nice to have anyway, for arbitrary functions, since it's necessary for being able to call code that was compiled with -fexcess-precision=fast from code that can't accept the non-conforming/optimizer-unsafe behavior and safely use the return value. It should probably be an attribute, with a flag to set the global default. For example, __attribute__((__returns_excess_precision__)).
[Bug c/85957] i686: Integers appear to be different, but compare as equal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85957 --- Comment #14 from Rich Felker --- > No problems: FLT_EVAL_METHOD==2 means "evaluate all operations and constants > to the range and precision of the long double type", which is what really > occurs. The consequence is indeed double rounding when storing in memory, but > this can happen at *any* time even without -ffloat-store (due to spilling), > because you are never sure that registers are still available; see some > reports in bug 323. It sounds like you misunderstand the standard's requirements on, and GCC's implementation of, FLT_EVAL_METHOD==2/excess-precision. The availability of registers does not in any way affect the result, because when expressions are evaluated with excess precision, any spills must take place in the format of float_t or double_t (long double) and are thereby transparent to the application. The buggy behavior prior to -fexcess-precision=standard (and now produced with -fexcess-precision=fast which is default in "gnu" modes) spills in the nominal type, producing nondeterministic results that depend on the compiler's transformations and that lead to situations like this bug (where the optimizer has been lied to that two expressions are equal, but they're not). > Double rounding can be a problem with some codes, but this just means that > the code is not compatible with FLT_EVAL_METHOD==2. For some floating-point > algorithms, double rounding is not a problem at all, while keeping a result > in extended precision will make them fail. With standards-conforming behavior, the rounding of an operation and of storage to an object of float/double type are discrete roundings and you can observe and handle the intermediate value between them. With -ffloat-store, every operation inherently has a double-rounding attached to it. This behavior is non-conforming but at least deterministic, and is what I was referring to in my previous comment. But I think this is largely a distraction from the issue at hand; I was only pointing out that -ffloat-store is a workaround, but one with its own (often severe) problems.
[Bug c/82318] -fexcess-precision=standard has no effect on a libm function call
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82318 --- Comment #9 from Rich Felker --- Indeed, I don't think the ABI says anything about this; a bug against the psABI should probably be opened.
[Bug c/85957] i686: Integers appear to be different, but compare as equal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85957 --- Comment #16 from Rich Felker --- > And GCC does not do spills in this format, as see in bug 323. In my experience it seems to (assuming -fexcess-precision=standard), though I have not done extensive testing. I'll check and follow up. > This is conforming as there is no requirement to keep intermediate results in > excess precision and range. Such behavior absolutely is non-conforming. The standard reads (5.2.4.2.2 ¶9): "Except for assignment and cast (which remove all extra range and precision), the values yielded by operators with floating operands and values subject to the usual arithmetic conversions and of floating constants are evaluated to a format whose range and precision may be greater than required by the type" Note "are evaluated", not "may be evaluated depending on what spills the compiler chooses to perform".
[Bug c/85957] i686: Integers appear to be different, but compare as equal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85957 --- Comment #17 from Rich Felker --- And indeed you're right that GCC does it wrong. This can be seen from a minimal example: double g(),h(); double f() { return g()+h(); } where gcc emits fstpl/fldp around the second call rather than fstpt/fldt. So this is all even more broken that I thought. It looks like the only way to get deterministic behavior from GCC right now is to get the wrong deterministic behavior via -ffloat-store. Note that libfirm/cparser gets the right result, emitting fstpt/fldt.
[Bug c/85957] i686: Integers appear to be different, but compare as equal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85957 --- Comment #18 from Rich Felker --- It was just pointed out to me that this might be an invalid test since GCC assumes (correctly or not) that the return value of a function does not have excess precision. I'll see if I can make a better test.
[Bug c/85957] i686: Integers appear to be different, but compare as equal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85957 --- Comment #19 from Rich Felker --- Test case provided by Szabolcs Nagy showing that GCC does seem to spill right if it can't assume there's no excess precision to begin with: double h(); double ff(double x, double y) { return x+y+h(); } In theory this doesn't force a spill, but GCC seems to choose to do one, I guess to avoid having to preserve two incoming values (although they're already in stack slots that would be naturally preserved). Here GCC 9.2 with -fexcess-precision=standard -O3 it emits fstpt/fldt.
[Bug tree-optimization/93682] Wrong optimization: on x87 -fexcess-precision=standard is incompatible with -mpc64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93682 --- Comment #2 from Rich Felker --- I think the underlying issue here is just that -mpc64 (along with -mpc32) is just hopelessly broken and should be documented as such. It could probably be made to work, but there are all sorts of issues like float.h being wrong, math library code breaking, etc. On a more fundamental level (but seemingly unrelated to the mechanism of breakage here), the underlying x87 precision control modes are also hopelessly broken. They're not actually single/double precision modes, but single/double mantissa with ld80 exponent. So I don't think it's possible to make the optimizer aware of them without making it aware of two new floating point formats that it doesn't presently know about. If you just pretended they were single/double, the same sort of issue would arise again as soon as someone uses small or large values that should be denormal/underflow/overflow but which retain their full-precision values by virtue of the excess exponent precision.
[Bug c/85957] i686: Integers appear to be different, but compare as equal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85957 --- Comment #25 from Rich Felker --- I think standards-conforming excess precision should be forced on, and added to C++; there are just too many dangerous ways things can break as it is now. If you really think this is a platform of dwindling relevance (though I question that; due to the way patent lifetimes work, the first viable open-hardware x86 clones will almost surely lack sse, no?) then we should not have dangerous hacks for the sake of marginal performance gains, with too few people spending the time to deal with their fallout. I'd be fine with an option to change the behavior of constants, and have it set by default for -std=gnu* as long as the unsafe behavior is removed from -std=gnu*.
[Bug middle-end/93806] Wrong optimization: instability of floating-point results with -funsafe-math-optimizations leads to nonsense
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93806 --- Comment #10 from Rich Felker --- I don't think it's at all clear that -fno-signed-zeros is supposed to mean the programmer is promising that their code has behavior independent of the sign of zeros, and that any construct which would be influenced by the sign of a zero has undefined behavior. I've always read it as a license to optimize in ways that disregard the sign of a zero or change the sign of a zero, but with internal consistency of the program preserved. If -fno-signed-zeros is really supposed to be an option that vastly expands the scope of what's undefined behavior, rather than just removing part of Annex F and allowing the unspecified quality of floating point results that C otherwise allows in the absence of Annex F, it really needs a much much bigger warning in its documentation!
[Bug middle-end/93806] Wrong optimization: instability of floating-point results with -funsafe-math-optimizations leads to nonsense
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93806 --- Comment #12 from Rich Felker --- To me the meaning of internal consistency is very clear: that the semantics of the C language specification are honored and that the only valid transformations are those that follow the "as-if rule". Since C without Annex F allows arbitrarily awful floating point results, your example in comment 11 is fine. Each instance of 1/a can evaluate to a different value. They could even evaluate to random values. However, if you had written: int b = 1/a == 1/0.; int c = b; return b == c; then the function must necessarily return 1, because the single instance of 1/a==1/0. in the abstract machine has a single value, either 0 or 1, and in the abstract machine that value is stored to b, then copied to c, and b and c necessarily have the same value. While I don't think it's likely that GCC would mess up this specific example, it seems that it currently _can_ make transformations such that a more elaborate version of the same idea would be broken.
[Bug middle-end/93806] Wrong optimization: instability of floating-point results with -funsafe-math-optimizations leads to nonsense
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93806 --- Comment #14 from Rich Felker --- Indeed, without Anenx F, division by zero is UB, so it's fine to do anything if the program performs division by zero. So we need examples without division by zero.
[Bug middle-end/93806] Wrong optimization: instability of floating-point results with -funsafe-math-optimizations leads to nonsense
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93806 --- Comment #32 from Rich Felker --- > A slightly modified version of the example, showing the issue with GCC 5 to 7 > (as the noipa attribute directive has been added in GCC 8): Note that __attribute__((__weak__)) necessarily applies noipa and works in basically all GCC versions, so you can use it where you want this kind of example for older GCC.
[Bug target/55431] Invalid auxv search in ppc linux-unwind code.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55431 --- Comment #6 from Rich Felker 2013-02-12 07:08:14 UTC --- That sounds highly doubtful. The sigcontext is (necessarily) on the stack, so the only way accessing past the end of sigcontext could fault is if the access were so far beyond the end to go completely off the stack. The only way this might be plausible is under sigaltstack. In any case, why would this code be reading beyond the end? Does the kernel use different incompatible sigcontext structures based on which vector registers exist on the cpu?
[Bug target/55431] Invalid auxv search in ppc linux-unwind code.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55431 --- Comment #8 from Rich Felker 2013-02-12 15:27:58 UTC --- Is there nothing internal in the sigcontext structure that distinguishes the version? Making the reference to __libc_stack_end weak won't help. If the symbol is undefined, the code in libgcc would crash or malfunction; if it's defined but does not point exactly to the argc/argv start (which, since it's not defined in the ABI, seems to be something that could happen in the future even with glibc), the code will also badly malfunction. If you want to keep using __libc_stack_end, I think it should be conditional at runtime on old/broken kernel and libc versions, and auxv should be ignored otherwise.
[Bug target/54232] New: For x86 PIC code, ebx should be spillable
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54232 Bug #: 54232 Summary: For x86 PIC code, ebx should be spillable Classification: Unclassified Product: gcc Version: unknown Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: target AssignedTo: unassig...@gcc.gnu.org ReportedBy: bug...@aerifal.cx When generating x86 position-independent code, GCC permanently reserves EBX as the GOT register. Even in functions that make no use of global data, EBX cannot be used as a general-purpose register. This both slows down code that's under register pressure and forces inline asm that needs an argument in EBX (e.g. syscalls) to use ugly temp register shuffling to make gcc happy. My proposal, and I understand this may be difficult but I still think it's worth stating, is that the GOT register EBX should be considered spillable like any other register. In particular, the following consequences should result: - If a function is not using the GOT (not accessing global or file-local static symbols or making non-hidden function calls), all GP registers can be used just like in non-PIC code. A pure function with no - If a function is only using a "GOT register" for PC-relative data access, it should not go to the trouble of actually adjusting the PC obtained to point to the GOT. Instead it should generate addressing relative to the PC address that gets loaded into the register. - In a function that's not making calls through the PLT (i.e. a leaf function or a function that only calls hidden/protected functions), the "GOT register" need not be EBX. Any register could be used, and in fact in some trivial functions, using a call-clobbered register would avoid having to save/restore EBX on the stack. - In any function where EBX or any other register is being used to store the GOT address, it should be spillable (either pushed to stack, or simply discarded and reloaded with the standard load sequence when it's needed again later) just like a register caching any other data, so that under register pressure or inline asm constraints, the register becomes temporarily available for another use. It seems like all of these very positive consequences would fall out of just treating GOT and GOT-relative addressing as address expressions based on the GOT address, which could be cached in registers just like any other expression, instead of hard-coding the GOT register as a special reserved register. The only remaining special-case/hard-coding would be treating the need for EBX to contain the GOT address when making calls through the PLT as an extra constraint of the function call ABI.
[Bug target/54232] For x86 PIC code, ebx should be spillable
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54232 --- Comment #1 from Rich Felker 2012-08-12 04:57:07 UTC --- By the way, the code that inspired this report is crypt_blowfish.c and the corresponding asm by Solar Designer. We've been experimenting with performance characteristics while integrating it into musl libc, and I found that the C code is just as fast as the hand-optimized asm on the machine I was testing it on when using static libraries without -fPIC, but takes over 30% more runtime when built with -fPIC due to running out of registers.
[Bug target/54232] For x86 PIC code, ebx should be spillable
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54232 --- Comment #3 from Rich Felker 2012-08-13 13:59:17 UTC --- > I think the GOT is introduced too late to do any fancy ananlysis > on whether we need it or not. This may be true, but if so, it's a highly suboptimal design that's hurting performance badly. 30% on the cryptographic code I looked at, and from working on FFmpeg in the past, I remember quite a few cases where PIC was hurting performance by significant measurable amounts like that too. If there's any way the changes I describe could be targeted even just in the long term, I think it would make a big difference for a lot of software. > I also think that for outgoing function calls the ABI > relies on a properly setup GOT, even for those that bind > locally and thus do not go through the PLT. The extern function call ABI on x86 does not allow the caller to depend on EBX containing the GOT address. This is because the callee has no way of knowing whether it was called by the same DSO it resides in. If not, the GOT address will be invalid for it. For static functions whose addresses never leak out of the translation unit they're defined in, the calling convention is up to GCC. Ideally it would assume the GOT register is already loaded in such functions (as long as all the callees use the GOT), but in reality it rarely does. This is a separate code generation QoI implementation that should perhaps be addressed as its own bug.
[Bug debug/54395] New: DWARF tables should go in non-mapped section unless exceptions are enabled
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54395 Bug #: 54395 Summary: DWARF tables should go in non-mapped section unless exceptions are enabled Classification: Unclassified Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: debug AssignedTo: unassig...@gcc.gnu.org ReportedBy: bug...@aerifal.cx On systems where GCC uses DWARF for debugging information, the DWARF tables are stored in the .eh_frame section, which the linker maps into the LOAD segment for the program, and which is not safely strippable with the strip command (because it messes up section numbering). This is of course necessary if exceptions are enabled (for languages that require them, or for -fexceptions in "GNU C" code), but it's harmful when they're not wanted. It would be nice if GCC had a way to store a "purely for debugging" version of the tables in a separate section that could safely be stripped, that would not get loaded in a LOAD segment, and that would not artificially inflate the size(1) of the object files (which is frustrating when trying to measure relative improvements in optimizing the size of object files). At present, -fno-asynchronous-unwind-tables -fno-unwind-tables can eliminate the problem, but it also conflicts with debugging; it seems impossible to generate object files that are debugging-enabled but that don't push (part of) the debugging data into the mapped-at-runtime part of the program.
[Bug debug/54395] DWARF tables should go in non-mapped section unless exceptions are enabled
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54395 --- Comment #2 from Rich Felker 2012-08-28 23:38:49 UTC --- I can see the argument that some users would want/need that, and perhaps even that you want backtrace() to be available in the default configuration, but I still think there should be a configuration where debugging works without adding unstrippable tables in sections that will be mapped at runtime.
[Bug debug/54395] DWARF tables should go in non-mapped section unless exceptions are enabled
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54395 --- Comment #4 from Rich Felker 2012-08-28 23:52:24 UTC --- Would you care to elaborate on how it would break anything? They're already easily removable with -fno-asynchronous-unwind-tables -fno-unwind-tables. The problem is just that it's impossible to remove them and still have working debugging (unless you want to revert to using stabs or something and adding back a frame pointer...). My request is that it be possible to move them to a strippable/non-mapped section to use them purely as debugging information and not treat them as "part of the program".
[Bug debug/54395] GCC should be able to put DWARF tables in a non-mapped/strippable section for debug-only use
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54395 --- Comment #6 from Rich Felker 2012-08-29 12:43:23 UTC --- I seem to remember gcc -g -fno-asynchronous-unwind-tables -fno-unwind-tables producing a warning that these options are incompatible and that debugging will not work, but at the moment it seems to be doing the right thing. Was I imagining things or are there some gcc versions where the combination is problematic? I'd like to investigate the situation/behavior a bit longer before closing this bug, but it seems like you may have provided a solution. If this solution does work, however, I still think the documentation is lacking; it's not clear that these options would not remove the tables in a way that interferes with debugging.
[Bug target/55012] New: Protected visibility wrongly uses GOT-relative addresses
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55012 Bug #: 55012 Summary: Protected visibility wrongly uses GOT-relative addresses Classification: Unclassified Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassig...@gcc.gnu.org ReportedBy: bug...@aerifal.cx Consider the shared library code: int a __attribute__((visibility("protected"))); int f() { return a; } For this (on i386 at least), gcc generates a GOT-relative (@GOTOFF) reference to "a" rather than a GOT lookup. This will then reference the wrong location if "a" is moved into the main program's data via a copy relocation, which will happen if the main program makes any references to "a". The issue is a subtlety in the semantics of protected visibility. As I understand it and as it's documented, it's supposed to convey the semantic that the definition will not be overridden in the sense of the abstract machine. Copy relocations are not a case of overriding the definition in the abstract machine, but an implementation detail used to support data objects in shared libraries when the main program is non-PIC. With the current behavior, GCC is requiring library authors using visibility to be aware of this implementation detail (which only applies on some targets) and avoid using visibility on these specific targets. That, in my mind, is unreasonable and buggy behavior. Note that where this came up is when trying to use #pragma to set visibility globally in a shared library; doing so broke global objects accessed from the main application, but otherwise behaved as expected.
[Bug target/55012] Protected visibility wrongly uses GOT-relative addresses
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55012 --- Comment #1 from Rich Felker 2012-10-21 22:06:39 UTC --- I'm not sure whether the fix should be in gcc/varasm.c, default_binds_local_p_1(), or in the config/i386/predicates.md, local_symbolic_operand predicate. In the former, all non-default-visibility symbols are considered "local". In the latter, this "local" flag is used to determine that a got-relative offset would be allowed. If varasm.c is modified, it should be to make protected symbols considered non-local. I don't know if this would hurt code generation on other archs that don't use copy relocations, however. If predicated.md is to be modified, I don't see a good way, since the information on visibility seems to be lost at this point. Hidden symbols must continue to be considered local, but protected ones should not.
[Bug target/31798] lib1funcs.asm:1000: undefined reference to `raise'
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31798 Rich Felker changed: What|Removed |Added CC||bugdal at aerifal dot cx --- Comment #2 from Rich Felker --- This does seem to be real, so please reopen it. The problem is that the final command line to the linker looks like: ... $(objs) -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed $(endfiles) Assuming the main program itself does not do any division or call raise, the first -lgcc does not pull in __div0, and the -lc does not pull in raise. However, if any function from libc which does get pulled in performs division, then a reference to __div0 is generated, and the second -lgcc pulls in __div0, which contains a reference to raise. This reference is never resolved. It seems the intent is that link_gcc_c_sequence uses --start-group and --end-group to avoid this problem when -static is used. However, this does not cover the case where no libc.so exists at all, and libc.a is all that's available. I wonder why the --start-group logic is only used for static linking and not unconditionally, since it should be a no-op for shared libraries anyway. FYI, I have received new reports of this bug from musl users, one who wanted to have libc.so be used but who installed it in the wrong location causing libc.a to get used instead, but the rest were users doing purely static-linked systems with no shared libraries at all.
[Bug middle-end/56888] memcpy implementation optimized as a call to memcpy
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56888 Rich Felker changed: What|Removed |Added CC||bugdal at aerifal dot cx --- Comment #19 from Rich Felker --- We are not presently experiencing this issue in musl libc, probably because the current C memcpy code is sufficiently overcomplicated to avoid getting detected by the optimizer as memcpy. However, I'm trying to switch to a new simpler implementation that's much faster when compiled with GCC 4.7.1 (on ARM), but hit this bug when testing on another system using GCC 4.6.1 (ARM). On the latter, even -fno-tree-loop-distribute-patterns does not make any difference. Unless there's a reliable workaround for this bug or at least a known blacklist of bad GCC versions where this bug can't be worked around, I'm afraid we're going to have to resort to generating the asm for each supported arch using a known-good GCC and including that asm in the distribution. This is EXTREMELY frustrating.
[Bug middle-end/58245] New: -fstack-protector[-all] does not protect functions that call noreturn functions
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58245 Bug ID: 58245 Summary: -fstack-protector[-all] does not protect functions that call noreturn functions Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: bugdal at aerifal dot cx This issue is almost identical to bug #23221, but affects functions whose executions end with a call to a noreturn function rather than a tail call. The simplest example is: #include int main() { exit(0); } When compiled with -fstack-protector-all, the function prologue will read and store the canary, but no check will be made before passing execution to exit. This is actually a major practical problem for some users of musl libc, because code like the above appears in many configure scripts, and musl libc uses weak symbols so as to avoid initializing stack-protector (and thereby avoid initializing the TLS register) if there is no reference to __stack_chk_fail. Due to this issue, the above code generates thread-pointer-relative (e.g. %fs-based on x86_64) accesses to read the canary, but no reference to __stack_chk_fail, and then crashes when run, leading to spurious configure failures. For the time being, I have informed users who wish to use -fstack-protector-all that they can add -fno-builtin-exit -D__noreturn__= to their CFLAGS, but this is an ugly workaround. It should be noted that this issue happens even at -O0. I think using noreturn for dead code removal at -O0 is highly undesirable; for instance, it would preclude proper debugging of issues caused by a function erroneously being marked noreturn and actually returning. However that matter probably deserves its own bug report...
[Bug middle-end/58245] -fstack-protector[-all] does not protect functions that call noreturn functions
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58245 --- Comment #1 from Rich Felker --- One more thing: I would be happy with either of two solutions, either: (1) Checking the canary before calling a noreturn function, just like performing a check before a tail-call, or (2) Eliminating the dead-code-removal of the function epilogue at -O0, and for non-zero -O levels, adding an optimization to remove the canary loading from the prologue if no epilogue to check the canary is to be generated.
[Bug middle-end/58245] -fstack-protector[-all] does not protect functions that call noreturn functions
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58245 --- Comment #3 from Rich Felker --- We already do that; the patch is in the musl-cross repo here: https://bitbucket.org/GregorR/musl-cross or https://github.com/GregorR/musl-cross However, we want the stack-protector behavior for GCC with musl to be the same as with glibc, using the TLS canary and __stack_chk_fail function in libc rather than a separate libssp. In all real-world, nontrivial code, everything works fine. The only failure of empty programs like the above which just call exit, which, when combined with -fstack-protector-all, cause failure. In any case, the failure of configure scripts with musl is just one symptom of the problem: useless loads of the canary without a corresponding check of the canary. From a security standpoint, I feel like checking the canary before calling a function that won't return would be the best possible behavior, so that every function gets a check. However, if doing this isn't deemed worthwhile, I think the canary load, which is dead code without a subsequent check, should be optimized out.
[Bug target/58446] Support for musl libc
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58446 Rich Felker changed: What|Removed |Added CC||bugdal at aerifal dot cx --- Comment #9 from Rich Felker --- I don't know what the maintenance policy is for non-latest releases, but it would be wonderful if we could get these into the 4.7 series before it's closed, too. Bootstrapping new toolchains/systems with a different libc than the host system's libc, it's much easier to start with a GCC that doesn't need C++, and it would be a big help if our users could just start with GCC 4.7.x and have it work out of the box, rather than needing to apply third-party patches. (Speaking from the standpoint of musl maintainer.)
[Bug driver/50470] New: gcc does not respect -nostdlib with regard to search paths
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50470 Bug #: 50470 Summary: gcc does not respect -nostdlib with regard to search paths Classification: Unclassified Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: driver AssignedTo: unassig...@gcc.gnu.org ReportedBy: bug...@aerifal.cx Even with -nostdlib, gcc leaves the default library paths in the search path, including /usr/lib (in the form of /usr/lib/gcc/targetstring/version/../../..). This makes -nostdlib basically useless for its only foreseeable purpose, building programs against a completely alternate library ecosystem(*). The only workaround I've found is installing a wrapper script with -wrapper to remove the unwanted paths. (*) Leaving default paths in the search path after the custom ones is not acceptable because configure scripts will find and attempt to use libraries in the default paths if the corresponding library does not exist in the custom path.
[Bug driver/50470] gcc does not respect -nostdlib with regard to search paths
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50470 --- Comment #2 from Rich Felker 2011-09-21 01:34:29 UTC --- The sysroot features may be nice but they're not a substitute for being able to eliminate the default library search path. For example, when using sysroot, -L/new/path will prepend the sysroot to /new/path.
[Bug target/53134] Request for option to disable excess precision on i387
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53134 --- Comment #8 from Rich Felker 2012-04-28 23:14:57 UTC --- I agree, sadly, that WONTFIX is probably the most appropriate action. At least, like Andrew said, we're getting to the point where assuming it's okay to build with -msse2 and -mfpmath=sse is reasonable.
[Bug target/52593] Builtin sqrt on x86 is not correctly rounded
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52593 --- Comment #7 from Rich Felker 2012-04-28 23:21:51 UTC --- This bug seems to have been fixed with the addition of the -fexcess-precision=standard feature, which is now set by default with -std=c99 or c11, and which disables the builtin sqrt based on 387 fsqrt. So apparently it had already been fixed at the time I reported this, but I was unaware of the right options to enable the fix and did not even think to try just using -std=c99. Note that for buggy libm (including glibc's), the fact that gcc has fixed the issue will not fix the incorrect results, since the code in libm makes exactly the same mistake gcc was making. But at least it's possible to fix it there.