[Bug inline-asm/87733] local register variable not honored with earlyclobber

2020-03-13 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87733

Rich Felker  changed:

   What|Removed |Added

 CC||bugdal at aerifal dot cx

--- Comment #10 from Rich Felker  ---
This is a rather huge bug to have been fixed silently. Could someone who knows
the commit that fixed it and information on what versions are affected attach
the info to the tracker here? And ideally some information on working around it
for older GCCs?

>From what I can tell experimenting so far, adding a dummy "0"(r0) constraint,
or using + instead of =, makes the problem go away, but potentially has other
ill effects from use of an uninitialized object..?

[Bug inline-asm/87733] local register variable not honored with earlyclobber

2020-03-14 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87733

--- Comment #12 from Rich Felker  ---
> You can work around it on older GCC by simply not using a register var
> for more than one asm operand, I think?

Nope. Making a syscall inherently requires binding specific registers for all
of the inputs/outputs, unless you want to spill everything to an explicit
structure in memory and load them all explicitly in the asm block. So it really
is a big deal.

In particular, all mips variants need an earlyclobber constraint for the output
register $2 because the old Linux kernel syscall contract was that, when
restartable syscalls were interrupted, the syscall number passed in through $2
was lost, and the kernel returned to $pc-8 and expected a userspace instruction
to reload $2 with the syscall number from an immediate or another register. If
the input to load into $2 were itself passed in $2 (possible without
earlyclobber), the reloading would be ineffective and restarting syscalls would
execute the wrong syscall.

The original mips port of musl had undocumented and seemingly useless "0"(r2)
input constraints that were suppressing this bug, using the input to bind the
register where the earlyclobber output failed to do so. After some recent
changes broke compatibility with older kernels requiring the above contract, I
manually reverted them (due to intervening conflicting diffs) and omitted the
seemingly useless constraint, and it broke horribly. Eventually I found this
bug searching the tracker. My plan for now is just to add back the "0"(r2)
constraint, but since r2 is uninitialized, it's not clear that having it as an
input constraint is even well-defined. Is this the best thing to do?

[Bug inline-asm/87733] local register variable not honored with earlyclobber

2020-03-14 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87733

--- Comment #16 from Rich Felker  ---
> I didn't say this very well...  The only issue is using the same hard
> register for two different operands.  You don't need to do this for
> syscalls (and you do not *need* that *ever*, of course).

I hit the bug without using the same hard register for two operands. At least
I'm pretty sure it's the same bug because the behavior matches and it's present
in 6.3.0 but not 9.2.0.

> Can you post some code that fails?  If you think this is a GCC bug (in
> some older branch?) that we should fix, please open a new PR for it.

Here's the relevant code extracted out of musl:

#define SYSCALL_CLOBBERLIST \
"$1", "$3", "$11", "$12", "$13", \
"$14", "$15", "$24", "$25", "hi", "lo", "memory"

long syscall6(long n, long a, long b, long c, long d, long e, long f)
{
register long r4 __asm__("$4") = a;
register long r5 __asm__("$5") = b;
register long r6 __asm__("$6") = c;
register long r7 __asm__("$7") = d;
register long r8 __asm__("$8") = e;
register long r9 __asm__("$9") = f;
register long r2 __asm__("$2");
__asm__ __volatile__ (
"subu $sp,$sp,32 ; sw $8,16($sp) ; sw $9,20($sp) ; "
"addu $2,$0,%4 ; syscall ;"
"addu $sp,$sp,32"
: "=&r"(r2), "+r"(r7), "+r"(r8), "+r"(r9)
: "ir"(n), "r"(r4), "r"(r5), "r"(r6)
: SYSCALL_CLOBBERLIST, "$10");
return r7 && r2>0 ? -r2 : r2;
}

Built with gcc 6.3.0, %4 ends up expanding to $2, violating the earlyclobber,
and %0 gets bound to $16 rather than $2 (which is why the violation is allowed,
it seems).

With "0"(r2) added to input constraints, the bug goes away.

I don't particularly think this bug is something that needs to be fixed in
older branches, especially if doing so is hard, but I do think it's something
we need a solid reliable workaround for.

[Bug inline-asm/87733] local register variable not honored with earlyclobber

2020-03-14 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87733

--- Comment #19 from Rich Felker  ---
> This looks like bad inline asm.  You seem to be using $2, $8, $9 and $sp 
> explicitly and not letting the compiler know you are using them.

$2, $8, and $9 are all explicitly outputs. All changes to $sp are reversed
before the asm ends and there are no memory operands which could be sp-based
and thereby invalidated by temp changes to it.

> I think you want to change those to %0, %2 and %3 and adding one for $sp?

All that does it make the code harder to read and more fragile against changes
to the order the constraints are written in.

> ...and "n" is an argument register, so why use "ir" for n's constraint? 
> Shouldn't that just be "r"?  Maybe that is confusing IRA/LRA/reload?

The code has been reduced as a standalone example that still reproduced the
bug, from a static inline function that was inlined into a function with
exactly the same signature. The static inline has a constant n after constant
propagation for almost all places it gets inlined, so it "ir" constraint makes
sense there. However, removing the "i" does not make the problem go away
anyway.

[Bug inline-asm/87733] local register variable not honored with earlyclobber

2020-03-14 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87733

--- Comment #22 from Rich Felker  ---
What should I call the new bug? The description sounds the same as this one,
and it's fixed in gcc 9.x, just not earlier versions, so it seems to be the
same bug.

[Bug inline-asm/87733] local register variable not honored with earlyclobber

2020-03-14 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87733

--- Comment #24 from Rich Felker  ---
The reasons I was hesitant to force n to a particular register through an extra
register __asm__ temp var was that I was unsure how it would interact with the
"i" constraint (maybe prevent it from being used?) and that this is code that
needs to be inlined all over the place, and adding more specific-register
constraints usually hurts register allocation in all functions where it's used.

If the "0"(r2) input constraint seems unsafe to rely on with r2 being
uninitialized (is this a real concern I should have?) just writing 0 or n to r2
before the asm would only waste one instruction and shouldn't really hurt.

[Bug inline-asm/87733] local register variable not honored with earlyclobber

2020-03-14 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87733

--- Comment #26 from Rich Felker  ---
Indeed, I just confirmed that binding the n input to a particular register
prevents the "i" part of the "ir" alternative from working.

[Bug inline-asm/87733] local register variable not honored with earlyclobber

2020-03-14 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87733

--- Comment #27 from Rich Felker  ---
Also just realized:

> Rich, forcing "n" to be in "$r10" seems to do the trick?  Is that a reasonable
solution for you?

It doesn't even work, because the syscall clobbers basically all call-clobbered
registers. Current kernels are preserving at least $25 (t9) and $28 (gp) and
the syscall argument registers, so $25 may be usable, but it was deemed not
clear in 2012. I'm looking back through musl git history, and this is actually
why the "i" alternative was wanted -- in basically all uses, "i" is
satisfiable, and avoids needing to setup a stack frame and spill a call-saved
register to the stack in order to use it to hold the syscall number to reload
on restart.

[Bug inline-asm/87733] local register variable not honored with earlyclobber

2020-03-14 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87733

--- Comment #28 from Rich Felker  ---
And it looks like I actually hit this exact bug back in 2012 but misattributed
it:

https://git.musl-libc.org/cgit/musl/commit/?id=4221f154ff29ab0d6be1e7beaa5ea2d1731bc58e

I assumed things went haywire from using two separate "r" constraints, rather
than "r" and "0", to bind the same register, but it seems the real problem was
that the "=&r"(r2) was not binding at all, and the "0"(r2) served to fix that.

[Bug inline-asm/87733] local register variable not honored with earlyclobber

2020-03-14 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87733

--- Comment #30 from Rich Felker  ---
> You need to make $r10 not a clobber but an inout, of course.  And not

That's not a correct constraint, because it's clobbered by the kernel between
the first syscall instruction's execution and the second execution of the addu
instruction after the kernel returns to restart it. $10 absolutely needs to be
a clobber because the kernel clobbers it. The asm block can't use any registers
the kernel clobbers.

> allowing the "i" just costs one more register move, not so bad imo.
> So you do have a workaround now.  Of course we should see if this can
> actually be fixed instead ;-)

I don't follow. As long as the "i" gets chosen, the asm inlines nicely. If not,
it forces a gratuitous stack frame to spill a non-clobberlisted register to use
as the input.

The code has been working for the past 8 years with the "0"(r2) input
constraint added, and would clearly be valid if r2 were pre-initialized with
something.

[Bug inline-asm/87733] local register variable not honored with earlyclobber

2020-03-14 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87733

--- Comment #33 from Rich Felker  ---
> An asm clobber just means "may be an output", and no operand will be assigned
> a register mentioned in a clobber.  There is no magic.

This, plus the compiler cannot assume the value in any of the clobbered
registers is preserved across the asm statement.

> This is inlined just fine?

It produces *wrong code* so it doesn't matter if it inlines fine. $10 is
modified by the kernel in the event the syscall is restarted, so the wrong
value will be loaded on restart.

[Bug inline-asm/87733] local register variable not honored with earlyclobber

2020-03-15 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87733

--- Comment #35 from Rich Felker  ---
> Oh, your real code is different, and $10 doesn't work for that?  I see.

No, the real code is exactly that. What you're missing is that the kernel,
entered through syscall, has a jump back to the addu after it's clobbered all
the registers in the clobberlist if the syscall is interrupted and needs to be
restarted.

[Bug tree-optimization/14441] [tree-ssa] missed sib calling when types change

2020-04-16 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=14441

Rich Felker  changed:

   What|Removed |Added

 CC||bugdal at aerifal dot cx

--- Comment #11 from Rich Felker  ---
I've hit what seems to be this same issue on x86_64 with minimal test case:

long g(void);
int f(void)
{
return g();
}

It's actually really annoying because it causes all of the intended tail-call
handling of syscall returns in musl to be non-tail calls since __syscall_ret
returns long (needed for a few syscalls) but most thin syscall-wrapper
functions return int.

If the x86_64 version is not this same issue but something separate I can open
a new bug for it.

[Bug c/94631] New: Wrong codegen for arithmetic on bitfields

2020-04-16 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94631

Bug ID: 94631
   Summary: Wrong codegen for arithmetic on bitfields
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: bugdal at aerifal dot cx
  Target Milestone: ---

Test case:

struct foo {
unsigned long long low:12, hi:52;
};
unsigned long long bar(struct foo *p)
{
return p->hi*4096;
}

Should generate only a mask off of the low bits, but gcc generates code to mask
off the low 12 bits and the high 12 bits (reducing the result to 52 bits).
Presumably GCC is interpreting the expression p->hi as having a phantom type
that's only 52 bits wide, rather than having type unsigned long long.

clang/LLVM compiles it correctly.

I don't believe there's any language in the standard supporting what GCC is
doing here.

[Bug c/94631] Wrong codegen for arithmetic on bitfields

2020-04-16 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94631

--- Comment #2 from Rich Felker  ---
So basically the outcome of DR120 was allowing the GCC behavior? It still seems
like a bad thing, not required, and likely to produce exploitable bugs (due to
truncation of arithmetic) as well as very poor-performance code (due to
constant masking).

[Bug c/94631] Wrong codegen for arithmetic on bitfields

2020-04-17 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94631

--- Comment #5 from Rich Felker  ---
No, GCC's treatment also seems to mess up bitfields smaller than int and fully
governed by the standard (no implementation-defined use of non-int types):

struct foo {
unsigned x:31;
};

struct foo bar = {0};

bar.x-1 should yield UINT_MAX but yields -1 (same representation but different
type) because it behaves as a promotion from a phantom type unsigned:31 to int
rather than as having type unsigned to begin with.

This can of course be observed by comparing it against 0. It's subtle and
dangerous because it may also trigger optimization around UB of signed overflow
when the correct behavior would be well-defined modular arithmetic.

[Bug c/94631] Wrong codegen for arithmetic on bitfields

2020-04-17 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94631

--- Comment #7 from Rich Felker  ---
Can you provide a citation for that?

[Bug c/94631] Wrong codegen for arithmetic on bitfields

2020-04-17 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94631

--- Comment #8 from Rich Felker  ---
OK, I think it's in 6.3.1.1 Boolean, characters, and integers, ¶2, but somewhat
poorly worded:

"The following may be used in an expression wherever an int or unsigned int may
be used: 

- An object or expression with an integer type (other than int or unsigned int)
whose integer conversion rank is less than or equal to the rank of int and
unsigned int.
- A bit-field of type _Bool, int, signed int, or unsigned int.

If an int can represent all values of the original type (as restricted by the
width, for a bit-field), the value is converted to an int; otherwise, it is
converted to an unsigned int. These are called the integer promotions."

The first sentence with second bullet point suggests it should behave as
unsigned int, but the "as restricted by the width, for a bit-field" in the
paragraph after after the bulleted list seems to confirm your interpretation.

[Bug target/94643] New: [x86_64] gratuitous sign extension of nonnegative value from 32 to 64 bits

2020-04-17 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94643

Bug ID: 94643
   Summary: [x86_64] gratuitous sign extension of nonnegative
value from 32 to 64 bits
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: bugdal at aerifal dot cx
  Target Milestone: ---

Test case:

#include 
uint16_t a[];
uint64_t f(int i)
{
return a[i]*16;
}

Produces:

movslq  %edi, %rdi
movzwl  a(%rdi,%rdi), %eax
sall$4, %eax
cltq
ret

The value is necessarily in the range [0,1M) (in particular, nonnegative) and
operation on eax has already cleared the upper bits of rax, so cltq is
completely gratuitous. I've observed the same in nontrivial examples where
movslq gets used.

[Bug target/94646] New: [arm] invalid codegen for conversion from 64-bit int to double hardfloat

2020-04-17 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94646

Bug ID: 94646
   Summary: [arm] invalid codegen for conversion from 64-bit int
to double hardfloat
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: bugdal at aerifal dot cx
  Target Milestone: ---

GCC emits a call to __aeabi_l2d to convert from long long to double. This is
invalid for hardfloat ABI because it does not honor rounding modes or raise
exception flags. That in turn causes the implementation of fma in musl libc to
produce wrong results for non-default rounding modes.

[Bug target/91970] arm: 64bit int to double conversion does not respect rounding mode

2020-04-18 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91970

--- Comment #12 from Rich Felker  ---
There's some awful hand-written asm in libgcc/config/arm/ieee754-df.S replacing
the standard libgcc2.c versions; that's the problem. But in order to use the
latter it would need to be compiled with -mfloat-abi=softfp since the
__aeabi_l2d function (and all the __aeabi_* apparently) use the standard
soft-float EABI even on EABIHF targets.

I'm not sure why you want a library function to be called for this on hardfloat
targets anyway. Inlining the hi*0x1p32+lo is almost surely smaller than the
function call, counting spills and conversion of the result back from GP
registers to an FP register. It seems like GCC should be able to inline this
idiom at a high level for *all* targets that lack a floatdidf operation but
have floatsidf.

Of course a high level fix is going to be hell to backport, and this really
needs a backportable fix or workaround (maintained in mcm not upstream gcc)
from musl perspective. Maybe the easiest way to do that is just to hack the
right preprocessor conditions for a hardfloat implementation into
ieee754-df.S...

[Bug tree-optimization/95097] New: Missed optimization with bitfield value ranges

2020-05-12 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95097

Bug ID: 95097
   Summary: Missed optimization with bitfield value ranges
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: bugdal at aerifal dot cx
  Target Milestone: ---

#include 
struct foo {
uint32_t x:20;
};
int bar(struct foo f)
{
if (f.x) {
uint32_t y = (uint32_t)f.x*4096;
if (y<200) return 1;
else return 2;
}
return 3;
}

Here, truth of the condition f.x implies y>=4096, but GCC does not DCE the
y<200 test and return 1 codepath.

I actually had this come up in real world code, where I was considering use of
an inline function with nontrivial low size cases when a "page count" bitfield
is zero, where I expected these nontrivial cases to be optimized out based on
already having tested that the page count being nonzero, but GCC was unable to
do it. LLVM/clang does it.

[Bug middle-end/95249] New: Stack protector runtime has to waste one byte on null terminator

2020-05-20 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95249

Bug ID: 95249
   Summary: Stack protector runtime has to waste one byte on null
terminator
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: bugdal at aerifal dot cx
  Target Milestone: ---

At least glibc presently stores a null byte in the first byte of the stack
protector canary value, so that string-based read overflows can't leak the
canary value. On 32-bit targets, this wastes a significant portion of the
randomness, making it possible that massive-scale attacks (e.g. against
millions of mobile or IoT devices) will have a decent chance of some success
bypassing stack protector. musl presently does not zero the first byte, but I
received a suggestion that we should do so, and got to thinking about the
tradeoffs involved.

If GCC would skip one byte below the canary, the full range of values could be
used by the stack protector runtime without the risk of string-read-based
disclosure. This should be inexpensive in terms of space and time to store a
single 0 byte on the stack.

[Bug middle-end/95249] Stack protector runtime has to waste one byte on null terminator

2020-05-20 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95249

--- Comment #2 from Rich Felker  ---
Indeed, using an extra zero pad byte could bump the stack frame size by 4 or 8
or 16 bytes, or could leave it unchanged, depending on alignment prior to
adding the byte and the alignment requirements of the target.

[Bug middle-end/95558] New: Invalid IPA optimizations based on weak definition

2020-06-05 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95558

Bug ID: 95558
   Summary: Invalid IPA optimizations based on weak definition
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: bugdal at aerifal dot cx
  Target Milestone: ---

Created attachment 48689
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48689&action=edit
test case

Here is a case that came up in WIP code on musl libc, where I wanted to provide
a weak dummy definition for functionality that would optionally be replaced by
a strong definition elsewhere at ld time. I've been looking for some plausible
explanation aside from an IPA bug, like interaction with UB, but I can't find
any.

In the near-minimal test case here, the function reclaim() still has all of the
logic it should, but reclaim_gaps gets optimized down to a nop.

What seems to be happening is that the dummy weak definition does not leak into
its direct caller via IPA optimizations, but does leak to the caller's caller.

[Bug ipa/95558] Invalid IPA optimizations based on weak definition

2020-06-06 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95558

--- Comment #2 from Rich Felker  ---
Wow. It's interesting that we've never seen this lead to incorrect codegen
before, though. All weak dummies should be affected, but only in some cases
does the pure get used to optimize out the external call.

This suggests there's a major missed optimization around pure functions too, in
addition to the wrong application of pure (transfering it from the weak
definition to the external declaration) that's the bug.

[Bug ipa/95558] Invalid IPA optimizations based on weak definition

2020-06-06 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95558

--- Comment #3 from Rich Felker  ---
In addition to a fix, this is going to need a workaround as well. Do you have
ideas for a clean one? A dummy asm in the dummy function to kill pureness is
certainly a big hammer that would work, but it precludes LTO optimization if
the weak definition doesn't actually get replaced, so I don't like that.

One idea I think would work, but not sure: make an external __weak_dummy_tail
function that all the weak dummies tail call to. This should only take a few
bytes more than just returning, and precludes pureness analysis in the TU it's
in, while still allowing DCE at LTO time when the definition of
__weak_dummy_tail becomes available.

Is my reasoning correct here?

[Bug target/95921] New: [m68k] invalid codegen for __builtin_sqrt

2020-06-26 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95921

Bug ID: 95921
   Summary: [m68k] invalid codegen for __builtin_sqrt
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: bugdal at aerifal dot cx
  Target Milestone: ---

On ISA levels below 68040, __builtin_sqrt expands to code that performs an
extended-precision sqrt operation rather than a double-precision one. Not only
does this give the wrong result; it enables further cascadingly-wrong
optimization ala #93806 and related bugs, because the compiler thinks the value
in the output register is a double, but it's not.

I think the right fix is making the rtl in m68k.md only allow long double
operands unless ISA level is at least 68040, in which case the
correctly-rounding instruction can be used. Then the standard function will be
used instead of a builtin definition, and it can patch up the result
accordingly.

[Bug target/95921] [m68k] invalid codegen for __builtin_sqrt

2020-06-26 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95921

--- Comment #1 from Rich Felker  ---
I wonder if the fact that GCC thinks the output of the insn is already double
suggests other similar bugs in the m68k backend, though... If extended
precision were working correctly, I'd think it would at least expect the result
to have extended precision and be trying to drop the excess precision
separately. But it's not; it's just returning. Here's my test case:

double my_sqrt(double x)
{
return __builtin_sqrt(x);
}

with -O2 -std=c11 -fno-math-errno -fomit-frame-pointer

The last 2 options are non-critical (GCC still uses the inline insn even with
-fmath-errno and branches only for the exceptional case) but clean up the
output so it's more clear what's going on.

[Bug target/95921] [m68k] invalid codegen for __builtin_sqrt

2020-06-27 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95921

--- Comment #3 from Rich Felker  ---
Yes,I'm aware m68k has FLT_EVAL_METHOD=2. That's not license for *functions* to
return excess precision. The language specification is very clear about where
excess precision is and isn't kept, and here it must not be. All results are
deterministic even with excess precision. Moreover if there's excess precision
where gcc's middle end didn't expect it, it will turn into cascadingly wrong
optimization, possibly even making pure integer results wrong.

[Bug target/95921] [m68k] invalid codegen for __builtin_sqrt

2020-07-01 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95921

--- Comment #4 from Rich Felker  ---
The related issue I meant to link to is
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93681 which is for x87, but the
equivalent happens on m68k due to FLT_EVAL_METHOD being 2 here as well.

[Bug preprocessor/96952] __builtin_thread_pointer support cannot be probed

2020-09-07 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96952

Rich Felker  changed:

   What|Removed |Added

 CC||bugdal at aerifal dot cx

--- Comment #3 from Rich Felker  ---
This answer does not seem satisfactory. Whether it will be optimized is not the
question. Just whether it's semantically defined. That should either be
universally true on GCC versions that offer the builtin (via a libgcc function
if nothing else is available) or target-specific (which is known at
preprocessing time).

[Bug libstdc++/93421] futex.cc use of futex syscall is not time64-compatible

2020-09-23 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93421

--- Comment #2 from Rich Felker  ---
Rather than #if defined(SYS_futex_time64), I think it should be made:

#if defined(SYS_futex_time64) && SYS_futex_time64 != SYS_futex

This is in consideration of support for riscv32 and future archs without legacy
syscalls. It's my intent in musl to accept the riscv32 port with SYS_futex
defined to be equal to SYS_futex_time64; otherwise all software making use of
SYS_futex gratuitously breaks.

[Bug libstdc++/93421] futex.cc use of futex syscall is not time64-compatible

2020-09-23 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93421

--- Comment #4 from Rich Felker  ---
Actually I didn't see it, I just saw Florian added to CC and it reminded me of
the issue, which reminded me I needed to check this for riscv32 issues with the
riscv32 port pending merge. :-)

[Bug target/12306] GOT pointer (r12) reloaded unnecessarily

2019-10-18 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=12306

Rich Felker  changed:

   What|Removed |Added

 CC||bugdal at aerifal dot cx

--- Comment #8 from Rich Felker  ---
I think this should be closed as not a bug. There is no contract that, on
function entry, the r12 register contain the callee's GOT pointer. Rather it
contains the caller's GOT pointer, and the two will only be equal if both
reside in the same DSO.

(Note that PowerPC64 ELFv2 ABI goes to great lengths to optimize this case with
"local entry point" and fancy ABI contract for how the GOT pointer save/load
can be elided. I'm not sure the benefits are well-documented though.)

[Bug tree-optimization/60540] Don't convert int to float when comparing int with float (double) constant

2019-10-18 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60540

Rich Felker  changed:

   What|Removed |Added

 CC||bugdal at aerifal dot cx

--- Comment #6 from Rich Felker  ---
> Only if the int is out of float's range.

float's range is [-INF,INF] (endpoints included). There is no such thing as
"out of float's range".

[Bug middle-end/56888] memcpy implementation optimized as a call to memcpy

2019-10-18 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56888

--- Comment #41 from Rich Felker  ---
> Josef Wolf mentioned that he ran into this on the gcc-help mailing list here: 
> https://gcc.gnu.org/ml/gcc-help/2019-10/msg00079.html

I don't think that's an instance of this issue. It's normal/expected that
__builtin_foo compiles to a call to foo in the absence of factors that lead to
it being optimized to something simpler. The idiom of using __builtin_foo to
get the compiler to emit an optimized implementation of foo for you, to serve
as the public definition of foo, is simply not valid. That's kinda a shame
because it would be nice to be able to do it for lots of math library
functions, but of course in order for this to be able to work gcc would have to
promise it can generate code for the operation for all targets, which is
unlikely to be reasonable.

[Bug tree-optimization/60540] Don't convert int to float when comparing int with float (double) constant

2019-10-18 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60540

--- Comment #8 from Rich Felker  ---
> Floating point types are not guaranteed to support infinity by the C standard

Annex F (IEEE 754 alignment) does guarantee it, and GCC aims to implement this.
This issue report is specific to target sh*-*-* which uses either softfloat
with IEEE types and semantics or SH4 hardfloat which has IEEE types and
semantics. So arguments about generality to non-Annex-F C environments are not
relevant to the topic here.

[Bug tree-optimization/60540] Don't convert int to float when comparing int with float (double) constant

2019-10-18 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60540

--- Comment #10 from Rich Felker  ---
GCC can choose the behavior for any undefined behavior it wants, and GCC
absolutely can make transformations based on behaviors it guarantees or that
Annex F guarantees on targets for which it implements the requirements of Annex
F. On this particular target, and on every target of any modern relevance,
(float)16777217 has well-defined behavior. On ones with floating point
environment (most/all hardfloat), it has side effects (inexact), so can't be
elided without the flags to make gcc ignore those side effects.

[Bug target/91886] [10 regression] powerpc64 impossible constraint in asm

2019-11-05 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91886

Rich Felker  changed:

   What|Removed |Added

 CC||bugdal at aerifal dot cx

--- Comment #3 from Rich Felker  ---
The affected code is in musl and I'd like to get this resolved. Are there
different constraints we should be using instead here, or is this a bug that
will be fixed on the GCC side?

[Bug target/91886] [10 regression] powerpc64 impossible constraint in asm

2019-11-07 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91886

--- Comment #8 from Rich Felker  ---
> Then LLVM has more to fix.  Constraints never look at types.  A register
> constraint (like "wa") simply says what registers are valid.

This is blatently false. For x86:

int foo(int x)
{
__asm__("" : "+f"(x));
return x;
}

yields "error: inconsistent operand constraints in an 'asm'".

> For many w* using it in inline asm is plain wrong; for the rest of the
> register constraints it is useless, plain "wa" should be used; and there
> are some special ones that are so far GCC implementation detail that you
> probably wouldn't even consider using them.

The asm register constraints are a public interface of "GNU C" for the
particular target architecture. Randomly removing them is a breaking change in
the language. There is no documented or even reliable way to detect which ones
work correctly for a particular compiler version, so change or removal of
semantics is particularly problematic.

> The maintenance cost for all the constraints we keep around because some
> important projects used them is considerable, fwiw.

One line in a table to preserve stability of the language is not what I call
"maintenance cost".

[Bug target/91886] [10 regression] powerpc64 impossible constraint in asm

2019-11-07 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91886

--- Comment #9 from Rich Felker  ---
And ok, to be more productive rather than just angry about the regression, if
you really think the "ws" constraint should be removed, what is the proper
preprocessor/configure-time check to determine the right constraint and asm
form to use without special-casing specific compiler names and versions? Short
of an answer to that, the only solution I can see to this on our side is just
disabling the asm if a configure check determines that the current code doesn't
compile, and that would be rather bleh.

[Bug target/91886] [10 regression] powerpc64 impossible constraint in asm

2019-11-07 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91886

--- Comment #13 from Rich Felker  ---
> That does not look at types.  It complains that "x" lives in memory,
> while the constraint requires a register (a floating point register).

That does not sound accurate. An (in this case lvalue since it's an output too,
but for input0-only, non-lvalues as well) expression does not "live in memory";
that is not the problem here. Unless the address has leaked outside of the
current scope of transformations, objects may not live in memory at all, only
in registers, or a mix of registers and temporary spills, etc. The asm
constraints guide the compiler's choices of where to put it (and may *force* it
to be moved back/forth, e.g. if you use an "m" constraint for a variable that
could otherwise be kept in a register, or a register constraint for one that
otherwise would only be accessed via memory), not the other way around.

The problem here is that GCC has no way to bind an integer expression to a
floating point register. It *is* a matter of type. There are probably
subtleties to this that I don't understand, but it's not about "living in
memory".

> No, they are not.  The constraints are an implementation detail.  And
> they *have* to be, or we could never again improve anything.

If they are in the documentation, they're not implementation details. They're
interfaces necessary to be able to use inline asm, which is a documented and
important feature of the "GNU C" language.

In particular, "ws" is documented for the purpose we're using it for:

https://gcc.gnu.org/onlinedocs/gcc-9.2.0/gcc/Machine-Constraints.html

> Unfortunately we currently document most of them in the user manual as
> well.  It's on my list of things to change, for GCC 10.  Most targets
> still have this problem, fwiw.

If you intend to remove documented functionality on an even larger scale, that
is a breaking change, and one which I will (loudly) oppose. If there are
legitimate reasons for such changes to be made internally, a layer should be
put in place so that code using the constraints continues to work without
imposing on the backend implementation.

> What I am talking about is that people rely on implementation details
> no matter what we do, and then prevent us from changing them.

That may be true, but it's not related to this bug report and I have not seen
evidence of it happening. I'll gladly fix it if we're doing that anywhere.

[Bug target/91886] [10 regression] powerpc64 impossible constraint in asm

2019-11-07 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91886

--- Comment #14 from Rich Felker  ---
> So, if "ws" has been documented in the user documentation, perhaps just
> (define_register_constraint "ws" "rs6000_constraints[RS6000_CONSTRAINT_wa]"
>   "Compatibility alias to wa")
> could be added?  If it has not been documented, it is fine to remove it.

It is clearly documented here:

https://gcc.gnu.org/onlinedocs/gcc-9.2.0/gcc/Machine-Constraints.html

Whoever removed it in gcc 10 was aware of this because they explicitly deleted
it from the documentation:

https://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html

This should not be a permitted change, at least not without major discussion to
reach consensus.

[Bug target/91886] [10 regression] powerpc64 impossible constraint in asm

2019-11-07 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91886

--- Comment #16 from Rich Felker  ---
> Using "ws" in inline asm never made sense.  It was always the same as
> "wa", for all cases where either could be used in inline asm at all.

It made sense insomuch as it was documented and was the most clearly-documented
as matching the intended usage case, and still makes sense in that the other
widely-used compiler did not properly (according to your interpretation)
implement "wa", so that "ws" was the only constraint that worked with both.

[Bug target/91886] [10 regression] powerpc64 impossible constraint in asm

2019-11-07 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91886

--- Comment #18 from Rich Felker  ---
> So use "wa" instead of "ws" in the two files you use it, and can we get
> on with our lives?

Translation: Introduce a regression on all existing versions of clang because
GCC broke a documented public interface. How about no?

> The places where in the past we put old internal constraints (and output
> modifiers) back are places where for example glibc used it in installed
> headers.  That takes a decade or more to fix.

These are not old internal constraints. They're publicly documented ones.

[Bug target/91886] [10 regression] powerpc64 impossible constraint in asm

2019-11-07 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91886

--- Comment #20 from Rich Felker  ---
> After both musl and LLVM are fixed, if you then *still* feel you
> need "ws", then we can talk of course.  Deal?

No, it's not a deal. Your proposal is *breaking all currently-working versions*
of clang because GCC wants to remove a documented public interface. I don't
make users of one tool suffer because the maintainers of another tool broke
things. That would not be responsible maintenance on my part.

If GCC is committed to breaking this, I'll make a configure check to fallback
to the C implementation if "ws" does not work, and ship patches in
musl-cross-make to fix the GCC regression so that users who get the patch won't
be affected.

[Bug target/91886] [10 regression] powerpc64 impossible constraint in asm

2019-11-07 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91886

--- Comment #22 from Rich Felker  ---
And to be clear, pretty much all gcc versions from 3.4.6 to present, and all
clang/LLVM versions since they fixed some critical bugs (like -ffreestanding
not working, which was a show-stopper), are supported compilers for targets
they support. We do not drop support for some existing supported compilers
because the latest version made an incompatible change.

[Bug target/91886] [10 regression] powerpc64 impossible constraint in asm

2019-11-07 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91886

--- Comment #24 from Rich Felker  ---
> Sure, and I'll do that, *if there are users*, *after they fix their stuff*.

Nothing is broken on our side here. We are using the documented functionality
from gcc 9 going all the way back to whatever version first added this stuff.

> I will not add back all constraints I removed.  I *cannot* add back many of
> those constraints, not with similar semantics anyway.
>
> Oh, and there were 24 I removed for 10, if I count correctly.  All were
> internal.  That they were documented was a bug;

How many others were actually-internal vs having this ridiculous excuse of "it
was a bug that we documented it so we can retroactively blame programmers for
using it rather than taking responsibility for the contract we documented"? 
Are any of the "*cannot* add back" ones things that were documented? If not,
then you can add back all the ones that were documented with no harm done to
anything. If there really are technical reasons that some of the ones removed
are difficult to recreate, please say so. I would still strongly disagree with
the choice to make such a regression, but at least it would have some
reasonable motivation rather than the only motivation I've seen so far, which
seems to be your desire to break things on a whim.

> that one was actually used
> by any program was unexpected (and it took literally half a year before this
> was found out, or reported here at least).

At least "ws" and "ww" are used, for fmax, fmaxf, fmin, and fminf. The reason
it was not found for "literally half a year" is because the regression is not
present in any release. Users generally do not use unstable GCC; my
understanding is that it's not recommended to do so.

> The point is that we will never get to a good state if we cannot fix
> up any historical mistakes.

That's an extreme exaggeration. There is nothing holding back a "good state"
about having two aliases for "wa" to preserve documented historical behavior.

[Bug target/91886] [10 regression] powerpc64 impossible constraint in asm

2019-11-07 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91886

--- Comment #27 from Rich Felker  ---
> Have I already mentioned that if any program "in the wild" will use "ws" with
> GCC 10, then of course we can add an alias (to "wa") for it?  No program 
> should
> use "ws" in inline assembler, ever, but if some programs cannot fix that, we
> can fix it for them, sure.

I would very much appreciate it if the "ws" (and "ww") aliases could be added.
I hope you can appreciate how clang users would respond when linked to this BZ
ticket after musl broke for them, if we just changed it to use "wa". Even if
(rather, "even though" - I believe you that they're wrong) clang is wrong to
reject "wa" here, it would come across to them as completely unreasonable that
we broke the constraints that previously worked fine on all compilers.

I'm not sure if you would rather us have to do some sort of configure-time
check here. Maybe one can be devised that doesn't risk wrong semantics when we
can only measure whether the compiler accepts it, not whether it generates the
wrong code, but I don't know how (and what's to guarantee that someday someone
won't, seeing the combinations "ws" and "ww" as unused, invent a new meaning
for one of them?), and even if it is possible, I would find such a configure
check to be really ugly long-term maintenance cost in relation to a simple
alias to preserve the long-documented behavior.

[Bug target/91886] [10 regression] powerpc64 impossible constraint in asm

2019-11-07 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91886

--- Comment #29 from Rich Felker  ---
For reference, here are the affected functions in musl (fmax, fmaxf, fmin,
fminf):

https://git.musl-libc.org/cgit/musl/tree/src/math/powerpc64?id=v1.1.24

[Bug target/91886] [10 regression] powerpc64 impossible constraint in asm

2019-11-08 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91886

--- Comment #34 from Rich Felker  ---
> Does musl not support BE?  There is nothing about this that is LE-only
> as far as I can see.

For powerpc64, musl supports both BE and LE, and uses "elfv2" ABI for both
(since it was not present as a target for musl before the new ABI existed). Per
the IBM docs, LE/elfv2 (which they confusingly equate) "require" power8+, but
there are not actually any constraints in the ABI that impose such a
requirement (e.g. argument-passing in registers that previous ISA levels didn't
have), and we don't impose it. I believe there are people using musl on
pre-power8 powerpc64 systems, at least in BE mode and possibly also in LE mode.

[Bug c/92571] New: gcc erroneously rejects , operator in array dimensions as syntax error

2019-11-18 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92571

Bug ID: 92571
   Summary: gcc erroneously rejects , operator in array dimensions
as syntax error
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: bugdal at aerifal dot cx
  Target Milestone: ---

void foo()
{
int a[1,1];
}

produces:

error: expected ']' before ',' token

despite the declaration being a valid (variable length, since the comma
operator cannot participate in an integer constant expression) array
declaration.

I found this while testing for whether -Wvla would catch such "gratuitously
variable-length" arrays due to comma operator. Obviously this should be caught
by both that and whatever warning is appropriate for "you probably meant
multi-dimensional array". But it is valid C and should not be rejected as a
syntax error.

[Bug c/92571] gcc erroneously rejects , operator in array dimensions as syntax error

2019-11-18 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92571

--- Comment #1 from Rich Felker  ---
Note that I put it in a function just because VLA is invalid at file scope, and
I wanted to be clear that this bug is independent of the invalidity of VLA at
file scope.

[Bug c/92571] gcc erroneously rejects , operator in array dimensions as syntax error

2019-11-18 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92571

Rich Felker  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |INVALID

--- Comment #2 from Rich Felker  ---
Sorry for the noise. Per 6.7.6 Declarators, the expression in [] is
assignment-expression, defined in 6.5.16 Assignment operators, which does not
include the comma operator.

I'm not sure whether there's still an element to this report that the error
message could be more useful, but it seems it's not a bug but a quirk in the
language spec.

[Bug c/61579] -Wwrite-strings does not behave as a warning option

2019-12-14 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61579

--- Comment #6 from Rich Felker  ---
Ping.

[Bug libstdc++/93325] New: libstdc++ wrongly uses direct clock_gettime syscall on non-glibc, breaks time64

2020-01-19 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93325

Bug ID: 93325
   Summary: libstdc++ wrongly uses direct clock_gettime syscall on
non-glibc, breaks time64
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: bugdal at aerifal dot cx
  Target Milestone: ---

The configure logic for libstdc++ is choosing to make direct clock_gettime
syscalls (via syscall()) rather than using the clock_gettime function except on
glibc 2.17 or later (when it was moved from librt to libc). This is
incompatible with time64 (because struct timespec mismatches the form the old
clock_gettime syscall uses) and also undesirable because it can't take
advantage of vdso.

The hard-coded glibc version dependency is a configure anti-pattern and should
be removed; the right way to test this would be just probing for the
clock_gettime function without adding any libs (like -lrt).

[Bug libstdc++/93421] New: futex.cc use of futex syscall is not time64-compatible

2020-01-24 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93421

Bug ID: 93421
   Summary: futex.cc use of futex syscall is not time64-compatible
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: bugdal at aerifal dot cx
  Target Milestone: ---

Created attachment 47704
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47704&action=edit
simple fix, not necessarily right for upstream

This code directly passes a userspace timespec struct to the SYS_futex syscall,
which does not work if the userspace type is 64-bit but the syscall expects
legacy 32-bit timespec.

I'm attaching the patch we're using in musl-cross-make to fix this. It does not
attempt to use the SYS_futex_time64 syscall, since that would require fallback
logic with cost tradeoffs for which to try first, and since the timeout is
relative and therefore doesn't even need to be 64-bit. Instead it just uses the
existence of SYS_futex_time64 to infer that the plain SYS_futex uses a pair of
longs, and converts the relative timestamp into that. This assumes that any
system where the libc timespec type has been changed for time64 will also have
had its headers updated to define SYS_futex_time64.

Error handling for extreme out-of-bound values should probably be added.

[Bug middle-end/93509] New: Stack protector should offer trap-only handling

2020-01-30 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93509

Bug ID: 93509
   Summary: Stack protector should offer trap-only handling
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: bugdal at aerifal dot cx
  Target Milestone: ---

Presently stack protector functionality depends on making a call to
__stack_chk_fail (possibly via __stack_chk_fail_local to avoid PLT-call-ABI
constraint in the caller). This is less secure than it could be, since it
depends on the ability to make function calls (and possibly operate on global
data and make syscalls in the callee) in a process whose state is compromised.
For example the GOT entries used by PLT could be clobbered or %gs:0x10 (i386
syscall vector) could be clobbered by the same stack-based overflow that caused
the stack protector event in the first place.

In https://gcc.gnu.org/ml/gcc/2020-01/msg00483.html where the topic is being
discussed for other reasons (contract between gcc and libc for where these
symbols are provided), I proposed that GCC should offer an option to emit a
trapping instruction directly, instead of making a function call, analogous to
-fsanitize-undefined-trap-on-error for UBSan. This would work well on all
targets where __builtin_trap is defined, but would regress (requiring PLT call)
on targets where it uses the default abort() definition (are there any relevant
ones?). Segher Boessenkool then requested I file this here on the GCC tracker.

Note: I'm filing this for middle-end because that was my best guess of where
GCC handles it, but it's possible all this logic is repeated in each target or
takes place somewhere else entirely; if so please reassign to appropriate
component.

[Bug target/65249] unable to find a register to spill in class 'R0_REGS' when compiling protobuf on sh4

2020-01-30 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65249

Rich Felker  changed:

   What|Removed |Added

 CC||bugdal at aerifal dot cx

--- Comment #27 from Rich Felker  ---
We've hit what seems like almost the exact same issue on gcc 8.3.0 with this
minimized testcase:

void fg(int *);
int get_response(int a)
{
  int b;
  if (a) fg(&b);
  return 0;
}

compiled with -O -c -fstack-protector-strong for sh2eb-linux-muslfdpic. With
gcc 9.2.0 it compiles successfully. I looked for a record of such a fix having
been made, but couldn't find one. Was it a known issue that was fixed silently,
or might it be a lurking bug that's just no longer being hit?

[Bug c/82318] -fexcess-precision=standard has no effect on a libm function call

2020-02-06 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82318

Rich Felker  changed:

   What|Removed |Added

 CC||bugdal at aerifal dot cx

--- Comment #5 from Rich Felker  ---
My understanding is that C2x is fixing this underspecification and will require
the library functions to drop excess precision as if they used a return
statement. So this really should be fixed in glibc if it's still an issue; if
they accept fixing that I don't think GCC needs any action on this. I just
fixed it in musl.

[Bug c++/93620] New: Floating point is broken in C++ on targets with excess precision

2020-02-06 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93620

Bug ID: 93620
   Summary: Floating point is broken in C++ on targets with excess
precision
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: bugdal at aerifal dot cx
  Target Milestone: ---

Attempting to use -fexcess-precision=standard with g++ produces:

cc1plus: sorry, unimplemented: '-fexcess-precision=standard' for C++

In light of eldritch horrors like pr 85957 this means floating point is
essentially catastrophically broken on i386 and m68k.

This came to my attention while analyzing
https://github.com/OSGeo/PROJ/issues/1906. Most of the problems are g++
incorrectly handling excess precision, and they're having to put awful hacks
with volatile objects in place to work around it.

[Bug middle-end/323] optimized code gives strange floating point results

2020-02-06 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=323

--- Comment #210 from Rich Felker  ---
If new reports are going to be marked as duplicates of this, then can it please
be moved from SUSPENDED status to REOPENED? The situation is far worse than
what seems to have been realized last this was worked on, as evidenced by pr
85957. These issues just came up again breaking real-world software in
https://github.com/OSGeo/PROJ/issues/1906

[Bug middle-end/323] optimized code gives strange floating point results

2020-02-06 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=323

--- Comment #211 from Rich Felker  ---
If new reports are going to be marked as duplicates of this, then can it please
be moved from SUSPENDED status to REOPENED? The situation is far worse than
what seems to have been realized last this was worked on, as evidenced by pr
85957. These issues just came up again breaking real-world software in
https://github.com/OSGeo/PROJ/issues/1906

[Bug middle-end/323] optimized code gives strange floating point results

2020-02-07 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=323

--- Comment #214 from Rich Felker  ---
I'm not particular in terms of the path it takes as long as this gets back to a
status where it's on the radar for fixing.

[Bug c/85957] i686: Integers appear to be different, but compare as equal

2020-02-07 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85957

--- Comment #12 from Rich Felker  ---
Note that -fexcess-precision=standard is not available in C++ mode to fix this.

However, -ffloat-store should also ensure consistency to the optimizer
(necessary to prevent this bug, and other variants of it, from happening) at
the expense of some extreme performance and code size costs and making the
floating point results even more semantically incorrect (double-rounding all
over the place, mismatching FLT_EVAL_METHOD==2) and -ffloat-store is available
in C++ mode. Despite all these nasty effects, it may be a suitable workaround,
and at least it avoids letting the optimizer prove 0==1, thereby effectively
treating any affected code as if it contained UB.

Note that in code written to be excess-precision-aware, making use of float_t
and double_t for intermediate operands and only using float and double for
in-memory storage, -ffloat-store should yield behavior equivalent to
-fexcess-precision=standard.

[Bug c/82318] -fexcess-precision=standard has no effect on a libm function call

2020-02-07 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82318

--- Comment #7 from Rich Felker  ---
I'll inquire about it. Note that F.6 already requires this for C functions; the
loophole is just that the implementation itself does not inherently have to
consist of C functions.

If it's determined that C won't require the library functions not bound to IEEE
operations to return values representable in their nominal type, then GCC needs
to be aware of whether the target libc can be expected to do so, and if not, it
needs to, as a special case, assume there might be excess precision in the
return value, so that (double)retval==retval can't be assumed to be true in the
optimizer.

Note that such an option would be nice to have anyway, for arbitrary functions,
since it's necessary for being able to call code that was compiled with
-fexcess-precision=fast from code that can't accept the
non-conforming/optimizer-unsafe behavior and safely use the return value. It
should probably be an attribute, with a flag to set the global default. For
example, __attribute__((__returns_excess_precision__)).

[Bug c/85957] i686: Integers appear to be different, but compare as equal

2020-02-08 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85957

--- Comment #14 from Rich Felker  ---
> No problems: FLT_EVAL_METHOD==2 means "evaluate all operations and constants 
> to the range and precision of the long double type", which is what really 
> occurs. The consequence is indeed double rounding when storing in memory, but 
> this can happen at *any* time even without -ffloat-store (due to spilling), 
> because you are never sure that registers are still available; see some 
> reports in bug 323.

It sounds like you misunderstand the standard's requirements on, and GCC's
implementation of, FLT_EVAL_METHOD==2/excess-precision. The availability of
registers does not in any way affect the result, because when expressions are
evaluated with excess precision, any spills must take place in the format of
float_t or double_t (long double) and are thereby transparent to the
application. The buggy behavior prior to -fexcess-precision=standard (and now
produced with -fexcess-precision=fast which is default in "gnu" modes) spills
in the nominal type, producing nondeterministic results that depend on the
compiler's transformations and that lead to situations like this bug (where the
optimizer has been lied to that two expressions are equal, but they're not).

> Double rounding can be a problem with some codes, but this just means that 
> the code is not compatible with FLT_EVAL_METHOD==2. For some floating-point 
> algorithms, double rounding is not a problem at all, while keeping a result 
> in extended precision will make them fail.

With standards-conforming behavior, the rounding of an operation and of storage
to an object of float/double type are discrete roundings and you can observe
and handle the intermediate value between them. With -ffloat-store, every
operation inherently has a double-rounding attached to it. This behavior is
non-conforming but at least deterministic, and is what I was referring to in my
previous comment. But I think this is largely a distraction from the issue at
hand; I was only pointing out that -ffloat-store is a workaround, but one with
its own (often severe) problems.

[Bug c/82318] -fexcess-precision=standard has no effect on a libm function call

2020-02-08 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82318

--- Comment #9 from Rich Felker  ---
Indeed, I don't think the ABI says anything about this; a bug against the psABI
should probably be opened.

[Bug c/85957] i686: Integers appear to be different, but compare as equal

2020-02-09 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85957

--- Comment #16 from Rich Felker  ---
> And GCC does not do spills in this format, as see in bug 323.

In my experience it seems to (assuming -fexcess-precision=standard), though I
have not done extensive testing. I'll check and follow up.

> This is conforming as there is no requirement to keep intermediate results in 
> excess precision and range.

Such behavior absolutely is non-conforming. The standard reads (5.2.4.2.2 ¶9):

"Except for assignment and cast (which remove all extra range and precision),
the values yielded by operators with floating operands and values subject to
the usual arithmetic conversions and of floating constants are evaluated to a
format whose range and precision may be greater than required by the type"

Note "are evaluated", not "may be evaluated depending on what spills the
compiler chooses to perform".

[Bug c/85957] i686: Integers appear to be different, but compare as equal

2020-02-09 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85957

--- Comment #17 from Rich Felker  ---
And indeed you're right that GCC does it wrong. This can be seen from a minimal
example:

double g(),h();
double f()
{
return g()+h();
}

where gcc emits fstpl/fldp around the second call rather than fstpt/fldt.

So this is all even more broken that I thought. It looks like the only way to
get deterministic behavior from GCC right now is to get the wrong deterministic
behavior via -ffloat-store.

Note that libfirm/cparser gets the right result, emitting fstpt/fldt.

[Bug c/85957] i686: Integers appear to be different, but compare as equal

2020-02-09 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85957

--- Comment #18 from Rich Felker  ---
It was just pointed out to me that this might be an invalid test since GCC
assumes (correctly or not) that the return value of a function does not have
excess precision. I'll see if I can make a better test.

[Bug c/85957] i686: Integers appear to be different, but compare as equal

2020-02-09 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85957

--- Comment #19 from Rich Felker  ---
Test case provided by Szabolcs Nagy showing that GCC does seem to spill right
if it can't assume there's no excess precision to begin with:

double h();
double ff(double x, double y)
{
return x+y+h();
}

In theory this doesn't force a spill, but GCC seems to choose to do one, I
guess to avoid having to preserve two incoming values (although they're already
in stack slots that would be naturally preserved).

Here GCC 9.2 with -fexcess-precision=standard -O3 it emits fstpt/fldt.

[Bug tree-optimization/93682] Wrong optimization: on x87 -fexcess-precision=standard is incompatible with -mpc64

2020-02-11 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93682

--- Comment #2 from Rich Felker  ---
I think the underlying issue here is just that -mpc64 (along with -mpc32) is
just hopelessly broken and should be documented as such. It could probably be
made to work, but there are all sorts of issues like float.h being wrong, math
library code breaking, etc.

On a more fundamental level (but seemingly unrelated to the mechanism of
breakage here), the underlying x87 precision control modes are also hopelessly
broken. They're not actually single/double precision modes, but single/double
mantissa with ld80 exponent. So I don't think it's possible to make the
optimizer aware of them without making it aware of two new floating point
formats that it doesn't presently know about. If you just pretended they were
single/double, the same sort of issue would arise again as soon as someone uses
small or large values that should be denormal/underflow/overflow but which
retain their full-precision values by virtue of the excess exponent precision.

[Bug c/85957] i686: Integers appear to be different, but compare as equal

2020-02-11 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85957

--- Comment #25 from Rich Felker  ---
I think standards-conforming excess precision should be forced on, and added to
C++; there are just too many dangerous ways things can break as it is now. If
you really think this is a platform of dwindling relevance (though I question
that; due to the way patent lifetimes work, the first viable open-hardware x86
clones will almost surely lack sse, no?) then we should not have dangerous
hacks for the sake of marginal performance gains, with too few people spending
the time to deal with their fallout.

I'd be fine with an option to change the behavior of constants, and have it set
by default for -std=gnu* as long as the unsafe behavior is removed from
-std=gnu*.

[Bug middle-end/93806] Wrong optimization: instability of floating-point results with -funsafe-math-optimizations leads to nonsense

2020-02-20 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93806

--- Comment #10 from Rich Felker  ---
I don't think it's at all clear that -fno-signed-zeros is supposed to mean the
programmer is promising that their code has behavior independent of the sign of
zeros, and that any construct which would be influenced by the sign of a zero
has undefined behavior. I've always read it as a license to optimize in ways
that disregard the sign of a zero or change the sign of a zero, but with
internal consistency of the program preserved.

If -fno-signed-zeros is really supposed to be an option that vastly expands the
scope of what's undefined behavior, rather than just removing part of Annex F
and allowing the unspecified quality of floating point results that C otherwise
allows in the absence of Annex F, it really needs a much much bigger warning in
its documentation!

[Bug middle-end/93806] Wrong optimization: instability of floating-point results with -funsafe-math-optimizations leads to nonsense

2020-02-20 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93806

--- Comment #12 from Rich Felker  ---
To me the meaning of internal consistency is very clear: that the semantics of
the C language specification are honored and that the only valid
transformations are those that follow the "as-if rule". Since C without Annex F
allows arbitrarily awful floating point results, your example in comment 11 is
fine. Each instance of 1/a can evaluate to a different value. They could even
evaluate to random values. However, if you had written:

  int b = 1/a == 1/0.;
  int c = b;
  return b == c;

then the function must necessarily return 1, because the single instance of
1/a==1/0. in the abstract machine has a single value, either 0 or 1, and in the
abstract machine that value is stored to b, then copied to c, and b and c
necessarily have the same value. While I don't think it's likely that GCC would
mess up this specific example, it seems that it currently _can_ make
transformations such that a more elaborate version of the same idea would be
broken.

[Bug middle-end/93806] Wrong optimization: instability of floating-point results with -funsafe-math-optimizations leads to nonsense

2020-02-20 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93806

--- Comment #14 from Rich Felker  ---
Indeed, without Anenx F, division by zero is UB, so it's fine to do anything if
the program performs division by zero. So we need examples without division by
zero.

[Bug middle-end/93806] Wrong optimization: instability of floating-point results with -funsafe-math-optimizations leads to nonsense

2020-02-25 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93806

--- Comment #32 from Rich Felker  ---
> A slightly modified version of the example, showing the issue with GCC 5 to 7 
> (as the noipa attribute directive has been added in GCC 8):

Note that __attribute__((__weak__)) necessarily applies noipa and works in
basically all GCC versions, so you can use it where you want this kind of
example for older GCC.

[Bug target/55431] Invalid auxv search in ppc linux-unwind code.

2013-02-11 Thread bugdal at aerifal dot cx


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55431



--- Comment #6 from Rich Felker  2013-02-12 07:08:14 
UTC ---

That sounds highly doubtful. The sigcontext is (necessarily) on the stack, so

the only way accessing past the end of sigcontext could fault is if the access

were so far beyond the end to go completely off the stack. The only way this

might be plausible is under sigaltstack.



In any case, why would this code be reading beyond the end? Does the kernel use

different incompatible sigcontext structures based on which vector registers

exist on the cpu?


[Bug target/55431] Invalid auxv search in ppc linux-unwind code.

2013-02-12 Thread bugdal at aerifal dot cx


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55431



--- Comment #8 from Rich Felker  2013-02-12 15:27:58 
UTC ---

Is there nothing internal in the sigcontext structure that distinguishes the

version?



Making the reference to __libc_stack_end weak won't help. If the symbol is

undefined, the code in libgcc would crash or malfunction; if it's defined but

does not point exactly to the argc/argv start (which, since it's not defined in

the ABI, seems to be something that could happen in the future even with

glibc), the code will also badly malfunction.



If you want to keep using __libc_stack_end, I think it should be conditional at

runtime on old/broken kernel and libc versions, and auxv should be ignored

otherwise.


[Bug target/54232] New: For x86 PIC code, ebx should be spillable

2012-08-11 Thread bugdal at aerifal dot cx
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54232

 Bug #: 54232
   Summary: For x86 PIC code, ebx should be spillable
Classification: Unclassified
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: target
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: bug...@aerifal.cx


When generating x86 position-independent code, GCC permanently reserves EBX as
the GOT register. Even in functions that make no use of global data, EBX cannot
be used as a general-purpose register. This both slows down code that's under
register pressure and forces inline asm that needs an argument in EBX (e.g.
syscalls) to use ugly temp register shuffling to make gcc happy.

My proposal, and I understand this may be difficult but I still think it's
worth stating, is that the GOT register EBX should be considered spillable like
any other register. In particular, the following consequences should result:

- If a function is not using the GOT (not accessing global or file-local static
symbols or making non-hidden function calls), all GP registers can be used just
like in non-PIC code. A pure function with no

- If a function is only using a "GOT register" for PC-relative data access, it
should not go to the trouble of actually adjusting the PC obtained to point to
the GOT. Instead it should generate addressing relative to the PC address that
gets loaded into the register.

- In a function that's not making calls through the PLT (i.e. a leaf function
or a function that only calls hidden/protected functions), the "GOT register"
need not be EBX. Any register could be used, and in fact in some trivial
functions, using a call-clobbered register would avoid having to save/restore
EBX on the stack.

- In any function where EBX or any other register is being used to store the
GOT address, it should be spillable (either pushed to stack, or simply
discarded and reloaded with the standard load sequence when it's needed again
later) just like a register caching any other data, so that under register
pressure or inline asm constraints, the register becomes temporarily available
for another use.

It seems like all of these very positive consequences would fall out of just
treating GOT and GOT-relative addressing as address expressions based on the
GOT address, which could be cached in registers just like any other expression,
instead of hard-coding the GOT register as a special reserved register. The
only remaining special-case/hard-coding would be treating the need for EBX to
contain the GOT address when making calls through the PLT as an extra
constraint of the function call ABI.


[Bug target/54232] For x86 PIC code, ebx should be spillable

2012-08-11 Thread bugdal at aerifal dot cx
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54232

--- Comment #1 from Rich Felker  2012-08-12 04:57:07 
UTC ---
By the way, the code that inspired this report is crypt_blowfish.c and the
corresponding asm by Solar Designer. We've been experimenting with performance
characteristics while integrating it into musl libc, and I found that the C
code is just as fast as the hand-optimized asm on the machine I was testing it
on when using static libraries without -fPIC, but takes over 30% more runtime
when built with -fPIC due to running out of registers.


[Bug target/54232] For x86 PIC code, ebx should be spillable

2012-08-13 Thread bugdal at aerifal dot cx
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54232

--- Comment #3 from Rich Felker  2012-08-13 13:59:17 
UTC ---
> I think the GOT is introduced too late to do any fancy ananlysis
> on whether we need it or not.

This may be true, but if so, it's a highly suboptimal design that's hurting
performance badly. 30% on the cryptographic code I looked at, and from working
on FFmpeg in the past, I remember quite a few cases where PIC was hurting
performance by significant measurable amounts like that too. If there's any way
the changes I describe could be targeted even just in the long term, I think it
would make a big difference for a lot of software.

> I also think that for outgoing function calls the ABI
> relies on a properly setup GOT, even for those that bind
> locally and thus do not go through the PLT.

The extern function call ABI on x86 does not allow the caller to depend on EBX
containing the GOT address. This is because the callee has no way of knowing
whether it was called by the same DSO it resides in. If not, the GOT address
will be invalid for it.

For static functions whose addresses never leak out of the translation unit
they're defined in, the calling convention is up to GCC. Ideally it would
assume the GOT register is already loaded in such functions (as long as all the
callees use the GOT), but in reality it rarely does. This is a separate code
generation QoI implementation that should perhaps be addressed as its own bug.


[Bug debug/54395] New: DWARF tables should go in non-mapped section unless exceptions are enabled

2012-08-28 Thread bugdal at aerifal dot cx
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54395

 Bug #: 54395
   Summary: DWARF tables should go in non-mapped section unless
exceptions are enabled
Classification: Unclassified
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: debug
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: bug...@aerifal.cx


On systems where GCC uses DWARF for debugging information, the DWARF tables are
stored in the .eh_frame section, which the linker maps into the LOAD segment
for the program, and which is not safely strippable with the strip command
(because it messes up section numbering). This is of course necessary if
exceptions are enabled (for languages that require them, or for -fexceptions in
"GNU C" code), but it's harmful when they're not wanted. It would be nice if
GCC had a way to store a "purely for debugging" version of the tables in a
separate section that could safely be stripped, that would not get loaded in a
LOAD segment, and that would not artificially inflate the size(1) of the object
files (which is frustrating when trying to measure relative improvements in
optimizing the size of object files).

At present, -fno-asynchronous-unwind-tables -fno-unwind-tables can eliminate
the problem, but it also conflicts with debugging; it seems impossible to
generate object files that are debugging-enabled but that don't push (part of)
the debugging data into the mapped-at-runtime part of the program.


[Bug debug/54395] DWARF tables should go in non-mapped section unless exceptions are enabled

2012-08-28 Thread bugdal at aerifal dot cx
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54395

--- Comment #2 from Rich Felker  2012-08-28 23:38:49 
UTC ---
I can see the argument that some users would want/need that, and perhaps even
that you want backtrace() to be available in the default configuration, but I
still think there should be a configuration where debugging works without
adding unstrippable tables in sections that will be mapped at runtime.


[Bug debug/54395] DWARF tables should go in non-mapped section unless exceptions are enabled

2012-08-28 Thread bugdal at aerifal dot cx
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54395

--- Comment #4 from Rich Felker  2012-08-28 23:52:24 
UTC ---
Would you care to elaborate on how it would break anything? They're already
easily removable with -fno-asynchronous-unwind-tables -fno-unwind-tables. The
problem is just that it's impossible to remove them and still have working
debugging (unless you want to revert to using stabs or something and adding
back a frame pointer...). My request is that it be possible to move them to a
strippable/non-mapped section to use them purely as debugging information and
not treat them as "part of the program".


[Bug debug/54395] GCC should be able to put DWARF tables in a non-mapped/strippable section for debug-only use

2012-08-29 Thread bugdal at aerifal dot cx
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54395

--- Comment #6 from Rich Felker  2012-08-29 12:43:23 
UTC ---
I seem to remember gcc -g -fno-asynchronous-unwind-tables -fno-unwind-tables
producing a warning that these options are incompatible and that debugging will
not work, but at the moment it seems to be doing the right thing. Was I
imagining things or are there some gcc versions where the combination is
problematic?

I'd like to investigate the situation/behavior a bit longer before closing this
bug, but it seems like you may have provided a solution. If this solution does
work, however, I still think the documentation is lacking; it's not clear that
these options would not remove the tables in a way that interferes with
debugging.


[Bug target/55012] New: Protected visibility wrongly uses GOT-relative addresses

2012-10-21 Thread bugdal at aerifal dot cx


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55012



 Bug #: 55012

   Summary: Protected visibility wrongly uses GOT-relative

addresses

Classification: Unclassified

   Product: gcc

   Version: unknown

Status: UNCONFIRMED

  Severity: normal

  Priority: P3

 Component: target

AssignedTo: unassig...@gcc.gnu.org

ReportedBy: bug...@aerifal.cx





Consider the shared library code:



int a __attribute__((visibility("protected")));

int f() { return a; }



For this (on i386 at least), gcc generates a GOT-relative (@GOTOFF) reference

to "a" rather than a GOT lookup. This will then reference the wrong location if

"a" is moved into the main program's data via a copy relocation, which will

happen if the main program makes any references to "a".



The issue is a subtlety in the semantics of protected visibility. As I

understand it and as it's documented, it's supposed to convey the semantic that

the definition will not be overridden in the sense of the abstract machine.

Copy relocations are not a case of overriding the definition in the abstract

machine, but an implementation detail used to support data objects in shared

libraries when the main program is non-PIC. With the current behavior, GCC is

requiring library authors using visibility to be aware of this implementation

detail (which only applies on some targets) and avoid using visibility on these

specific targets. That, in my mind, is unreasonable and buggy behavior.



Note that where this came up is when trying to use #pragma to set visibility

globally in a shared library; doing so broke global objects accessed from the

main application, but otherwise behaved as expected.


[Bug target/55012] Protected visibility wrongly uses GOT-relative addresses

2012-10-21 Thread bugdal at aerifal dot cx


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55012



--- Comment #1 from Rich Felker  2012-10-21 22:06:39 
UTC ---

I'm not sure whether the fix should be in gcc/varasm.c,

default_binds_local_p_1(), or in the config/i386/predicates.md,

local_symbolic_operand predicate.



In the former, all non-default-visibility symbols are considered "local". In

the latter, this "local" flag is used to determine that a got-relative offset

would be allowed.



If varasm.c is modified, it should be to make protected symbols considered

non-local. I don't know if this would hurt code generation on other archs that

don't use copy relocations, however.



If predicated.md is to be modified, I don't see a good way, since the

information on visibility seems to be lost at this point. Hidden symbols must

continue to be considered local, but protected ones should not.


[Bug target/31798] lib1funcs.asm:1000: undefined reference to `raise'

2013-07-09 Thread bugdal at aerifal dot cx
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31798

Rich Felker  changed:

   What|Removed |Added

 CC||bugdal at aerifal dot cx

--- Comment #2 from Rich Felker  ---
This does seem to be real, so please reopen it. The problem is that the final
command line to the linker looks like:

... $(objs) -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed
-lgcc_s --no-as-needed $(endfiles)

Assuming the main program itself does not do any division or call raise, the
first -lgcc does not pull in __div0, and the -lc does not pull in raise.
However, if any function from libc which does get pulled in performs division,
then a reference to __div0 is generated, and the second -lgcc pulls in __div0,
which contains a reference to raise. This reference is never resolved.

It seems the intent is that link_gcc_c_sequence uses --start-group and
--end-group to avoid this problem when -static is used. However, this does not
cover the case where no libc.so exists at all, and libc.a is all that's
available. I wonder why the --start-group logic is only used for static linking
and not unconditionally, since it should be a no-op for shared libraries
anyway.

FYI, I have received new reports of this bug from musl users, one who wanted to
have libc.so be used but who installed it in the wrong location causing libc.a
to get used instead, but the rest were users doing purely static-linked systems
with no shared libraries at all.


[Bug middle-end/56888] memcpy implementation optimized as a call to memcpy

2013-07-27 Thread bugdal at aerifal dot cx
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56888

Rich Felker  changed:

   What|Removed |Added

 CC||bugdal at aerifal dot cx

--- Comment #19 from Rich Felker  ---
We are not presently experiencing this issue in musl libc, probably because the
current C memcpy code is sufficiently overcomplicated to avoid getting detected
by the optimizer as memcpy. However, I'm trying to switch to a new simpler
implementation that's much faster when compiled with GCC 4.7.1 (on ARM), but
hit this bug when testing on another system using GCC 4.6.1 (ARM). On the
latter, even -fno-tree-loop-distribute-patterns does not make any difference.
Unless there's a reliable workaround for this bug or at least a known blacklist
of bad GCC versions where this bug can't be worked around, I'm afraid we're
going to have to resort to generating the asm for each supported arch using a
known-good GCC and including that asm in the distribution.

This is EXTREMELY frustrating.


[Bug middle-end/58245] New: -fstack-protector[-all] does not protect functions that call noreturn functions

2013-08-26 Thread bugdal at aerifal dot cx
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58245

Bug ID: 58245
   Summary: -fstack-protector[-all] does not protect functions
that call noreturn functions
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: bugdal at aerifal dot cx

This issue is almost identical to bug #23221, but affects functions whose
executions end with a call to a noreturn function rather than a tail call. The
simplest example is:

#include 
int main()
{
exit(0);
}

When compiled with -fstack-protector-all, the function prologue will read and
store the canary, but no check will be made before passing execution to exit.

This is actually a major practical problem for some users of musl libc, because
code like the above appears in many configure scripts, and musl libc uses weak
symbols so as to avoid initializing stack-protector (and thereby avoid
initializing the TLS register) if there is no reference to __stack_chk_fail.
Due to this issue, the above code generates thread-pointer-relative (e.g.
%fs-based on x86_64) accesses to read the canary, but no reference to
__stack_chk_fail, and then crashes when run, leading to spurious configure
failures. For the time being, I have informed users who wish to use
-fstack-protector-all that they can add -fno-builtin-exit -D__noreturn__= to
their CFLAGS, but this is an ugly workaround.

It should be noted that this issue happens even at -O0. I think using noreturn
for dead code removal at -O0 is highly undesirable; for instance, it would
preclude proper debugging of issues caused by a function erroneously being
marked noreturn and actually returning. However that matter probably deserves
its own bug report...


[Bug middle-end/58245] -fstack-protector[-all] does not protect functions that call noreturn functions

2013-08-26 Thread bugdal at aerifal dot cx
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58245

--- Comment #1 from Rich Felker  ---
One more thing: I would be happy with either of two solutions, either:

(1) Checking the canary before calling a noreturn function, just like
performing a check before a tail-call, or

(2) Eliminating the dead-code-removal of the function epilogue at -O0, and for
non-zero -O levels, adding an optimization to remove the canary loading from
the prologue if no epilogue to check the canary is to be generated.


[Bug middle-end/58245] -fstack-protector[-all] does not protect functions that call noreturn functions

2013-08-26 Thread bugdal at aerifal dot cx
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58245

--- Comment #3 from Rich Felker  ---
We already do that; the patch is in the musl-cross repo here:

https://bitbucket.org/GregorR/musl-cross or
https://github.com/GregorR/musl-cross

However, we want the stack-protector behavior for GCC with musl to be the same
as with glibc, using the TLS canary and __stack_chk_fail function in libc
rather than a separate libssp. In all real-world, nontrivial code, everything
works fine. The only failure of empty programs like the above which just call
exit, which, when combined with -fstack-protector-all, cause failure.

In any case, the failure of configure scripts with musl is just one symptom of
the problem: useless loads of the canary without a corresponding check of the
canary. From a security standpoint, I feel like checking the canary before
calling a function that won't return would be the best possible behavior, so
that every function gets a check. However, if doing this isn't deemed
worthwhile, I think the canary load, which is dead code without a subsequent
check, should be optimized out.


[Bug target/58446] Support for musl libc

2013-09-17 Thread bugdal at aerifal dot cx
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58446

Rich Felker  changed:

   What|Removed |Added

 CC||bugdal at aerifal dot cx

--- Comment #9 from Rich Felker  ---
I don't know what the maintenance policy is for non-latest releases, but it
would be wonderful if we could get these into the 4.7 series before it's
closed, too. Bootstrapping new toolchains/systems with a different libc than
the host system's libc, it's much easier to start with a GCC that doesn't need
C++, and it would be a big help if our users could just start with GCC 4.7.x
and have it work out of the box, rather than needing to apply third-party
patches.

(Speaking from the standpoint of musl maintainer.)


[Bug driver/50470] New: gcc does not respect -nostdlib with regard to search paths

2011-09-20 Thread bugdal at aerifal dot cx
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50470

 Bug #: 50470
   Summary: gcc does not respect -nostdlib with regard to search
paths
Classification: Unclassified
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: driver
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: bug...@aerifal.cx


Even with -nostdlib, gcc leaves the default library paths in the search path,
including /usr/lib (in the form of /usr/lib/gcc/targetstring/version/../../..).
This makes -nostdlib basically useless for its only foreseeable purpose,
building programs against a completely alternate library ecosystem(*). The only
workaround I've found is installing a wrapper script with -wrapper to remove
the unwanted paths.

(*) Leaving default paths in the search path after the custom ones is not
acceptable because configure scripts will find and attempt to use libraries in
the default paths if the corresponding library does not exist in the custom
path.


[Bug driver/50470] gcc does not respect -nostdlib with regard to search paths

2011-09-20 Thread bugdal at aerifal dot cx
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50470

--- Comment #2 from Rich Felker  2011-09-21 01:34:29 
UTC ---
The sysroot features may be nice but they're not a substitute for being able to
eliminate the default library search path. For example, when using sysroot,
-L/new/path will prepend the sysroot to /new/path.


[Bug target/53134] Request for option to disable excess precision on i387

2012-04-28 Thread bugdal at aerifal dot cx
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53134

--- Comment #8 from Rich Felker  2012-04-28 23:14:57 
UTC ---
I agree, sadly, that WONTFIX is probably the most appropriate action. At least,
like Andrew said, we're getting to the point where assuming it's okay to build
with -msse2 and -mfpmath=sse is reasonable.


[Bug target/52593] Builtin sqrt on x86 is not correctly rounded

2012-04-28 Thread bugdal at aerifal dot cx
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52593

--- Comment #7 from Rich Felker  2012-04-28 23:21:51 
UTC ---
This bug seems to have been fixed with the addition of the
-fexcess-precision=standard feature, which is now set by default with -std=c99
or c11, and which disables the builtin sqrt based on 387 fsqrt. So apparently
it had already been fixed at the time I reported this, but I was unaware of the
right options to enable the fix and did not even think to try just using
-std=c99.

Note that for buggy libm (including glibc's), the fact that gcc has fixed the
issue will not fix the incorrect results, since the code in libm makes exactly
the same mistake gcc was making. But at least it's possible to fix it there.


  1   2   3   >