[Bug libstdc++/95609] span could have better layout

2020-10-28 Thread s_gccbugzilla at nedprod dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95609

--- Comment #6 from Niall Douglas  ---
Cool, thanks. I believe that all three major STLs now implement struct iovec
compatibility with span. That's a nice win.

[Bug target/80878] -mcx16 (enable 128 bit CAS) on x86_64 seems not to work on 7.1.0

2021-05-06 Thread s_gccbugzilla at nedprod dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80878

Niall Douglas  changed:

   What|Removed |Added

 CC||s_gccbugzilla at nedprod dot 
com

--- Comment #30 from Niall Douglas  ---
I got bit by this GCC regression today at work. Consider
https://godbolt.org/z/M9fd7nhdh where std::atomic<__int128> is compare
exchanged with -march=sandybridge:

- On GCC 6.4 and earlier, this emits lock cmpxchg16b, as you would expect.

- From GCC 7 up to trunk (12?), this emits __atomic_compare_exchange_16.

- On clang, this emits lock cmpxchg16b, as you would expect.

This is clearly a regression. GCCs before 7 did the right thing. GCCs from 7
onwards do not. clangs with libstdc++ do do the right thing.

This isn't just an x64 thing, either. Consider https://godbolt.org/z/x6d5GE4o6
where GCC on ARM64 emits __atomic_compare_exchange_16, whereas clang on ARM64
emits ldaxp/stlxp, as you would expect.

Please mark this bug as a regression affecting all versions of GCC from 7 to
trunk, and affecting all 128 bit atomic capable architectures not just x64.

[Bug target/80878] -mcx16 (enable 128 bit CAS) on x86_64 seems not to work on 7.1.0

2021-05-07 Thread s_gccbugzilla at nedprod dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80878

--- Comment #33 from Niall Douglas  ---
(In reply to Andrew Pinski from comment #31)
> 
> Again the problem is stuff like:
> static const _Atomic __int128_t t = 2000;
> 
> __int128_t g(void)
> {
>   return t;
> }
> 
> DOES NOT WORK if you use CAS (or ldaxp/stlxp).

I think we are talking about different things here. You are talking about
calling convention. I'm talking about
std::atomic<__int128>::compare_exchange_weak() i.e. that the specific member
function compare_exchange_weak() is not generating cmpxchg16b if compiled with
GCC, but does with clang.

Re: your original point, I cannot say anything about _Atomic. However, for
std::atomic<__int128>, as __int128 is not an integral type it seems reasonable
to me that its specialisation tell the compiler to not store it in read only
memory. Mark it with attribute section, give it a non-trivial destructor, or
whatever it needs.

Perhaps I ought to open a separate issue here, as my specific problem is that
std::atomic<__int128>::compare_exchange_weak() is not using cmpxchg16b? Mr.
Wakely your thoughts?

[Bug target/94649] 16-byte aligned atomic_compare_exchange doesn not generate cmpxcg16b on x86_64

2021-05-07 Thread s_gccbugzilla at nedprod dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94649

Niall Douglas  changed:

   What|Removed |Added

 CC||s_gccbugzilla at nedprod dot 
com

--- Comment #4 from Niall Douglas  ---
Relocating my issue from PR 80878 to here:

I got bit by this GCC regression today at work. Consider
https://godbolt.org/z/M9fd7nhdh where
std::atomic<__int128>::compare_exchange_weak() is called with option
-march=sandybridge passed to the command line:

- On GCC 6.4 and earlier, this emits lock cmpxchg16b, as you would expect.

- From GCC 7 up to trunk (12?), this emits __atomic_compare_exchange_16.

- On clang, this emits lock cmpxchg16b, as you would expect.

This is clearly a regression. GCCs before 7 did the right thing. GCCs from 7
onwards do not. clangs with libstdc++ do do the right thing.

Please mark this bug as a regression affecting all versions of GCC from 7 to
trunk.

--- cut ---

NOTE that unlike the original PR above where the struct is a UDT, I am talking
here about std::atomic<__int128>::compare_exchange_weak(). It seems weird that
__int128 is treated as a UDT when the CPU is perfectly capable of hardware CAS.

Common feedback from this and other PRs:

1. Changing this would break ABI

Firstly, I told GCC -march=sandybridge, and we know that libatomic will choose
cmpxchg16b to implement __atomic_compare_exchange_16 because cpuid for
sandybridge will say cmpxchg16b is supported. So, it's the same implementation
for __atomic_compare_exchange_16, nothing breaks here.


2. static const std::atomic<__int128>::load() will segfault

std::atomic<__int128> could examine the macro environment
(__GCC_HAVE_SYNC_COMPARE_AND_SWAP_16 et al) and if only 128 bit compare and
swap is available, but 128 bit atomics are not, then std::atomic<__int128>
could be conditionally marked with attribute section to prevent it being stored
into the read only code section.

That said, I don't actually consider static const std::atomic<__int128>::load()
segfaulting important enough to special case, in my opinion.


3. This was changed in GCC 7 because _Atomic is broken

_Atomic is indeed broken, but I am talking about std::atomic the C++ library
type here. As Mr. Wakely said in another PR:

> std::atomic just calls the relevant __atomic built-in for all operations.
> What the built-in does is not up to libstdc++.

... to this I would say both yes and no. __GCC_HAVE_SYNC_COMPARE_AND_SWAP_16 
is not defined if the architecture relies on software emulation (libatomic) to
implement 128 bit CAS. So std::atomic::compare_exchange_X()
*could* examine macros for architecture and presence of
__GCC_HAVE_SYNC_COMPARE_AND_SWAP_16 and inline some assembler for certain
architectures as a QoI measure, which is not ABI breaking because if
__GCC_HAVE_SYNC_COMPARE_AND_SWAP_16 is 1, then libatomic will be choosing that
same assembler in any case. Note that I refer to the CAS operation only, for
load and store it's trivial to write CAS based emulations, but you could just
leave those continue to call libatomic.

Ultimately I probably agree that because _Atomic is broken, the compiler is not
the right thing to change here. But libstdc++'s std::atomic implementation is
another matter.

[Bug target/80878] -mcx16 (enable 128 bit CAS) on x86_64 seems not to work on 7.1.0

2021-05-07 Thread s_gccbugzilla at nedprod dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80878

--- Comment #35 from Niall Douglas  ---
(In reply to Jonathan Wakely from comment #34)

> > Perhaps I ought to open a separate issue here, as my specific problem is
> > that std::atomic<__int128>::compare_exchange_weak() is not using cmpxchg16b?
> 
> Isn't that covered by PR 94649?

That issue is definitely closer to mine, but still not the same. Still, I'll
relocate this report from here to there. Thanks for pointing me at it.

[Bug c++/108659] New: Suboptimal 128 bit atomics codegen on AArch64 and x64

2023-02-03 Thread s_gccbugzilla at nedprod dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108659

Bug ID: 108659
   Summary: Suboptimal 128 bit atomics codegen on AArch64 and x64
   Product: gcc
   Version: 12.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: s_gccbugzilla at nedprod dot com
  Target Milestone: ---

Related:
- https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80878
- https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94649
- https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688

I got bitten by this again, latest GCC still does not emit single instruction
128 bit atomics, even when the -march is easily new enough. Here is a godbolt
comparing latest MSVC, latest GCC and latest clang for the skylake-avx512
architecture, which unquestionably supports cmpxchg16b. Only clang emits the
single instruction atomic:

https://godbolt.org/z/EnbeeW4az

I'm gathering from the issue comments and from the comments at
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688 that you're going to wait
for AMD to guarantee atomicity of SSE instructions before changing the codegen
here, which makes sense. However I also wanted to raise potentially suboptimal
128 bit atomic codegen by GCC for AArch64 as compared to clang:

https://godbolt.org/z/oKv4o81nv

GCC emits `dmb` to force a global memory fence, whereas clang does not.

I think clang is in the right here, the seq_cst atomic semantics are not
supposed to globally memory fence.

[Bug target/108659] Suboptimal 128 bit atomics codegen on AArch64 and x64

2023-02-03 Thread s_gccbugzilla at nedprod dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108659

--- Comment #3 from Niall Douglas  ---
> AMD has guaranteed it, but there is still VIA and Zhaoxin and while we have 
> some statement from the latter, I'm not sure it is enough and we don't have 
> anything from VIA.  See PR104688 for details.

I'm wondering if a compiler opt out flag like -no-msseatomic16 to turn off use
of SSE for 128 bit atomics wouldn't be an idea? Given the small market share of
those CPU vendors, seems a shame to hold up implementation.

(Also, if you do turn it on by default and advertise that widely, I suspect
those vendors will hurry up with their documentation)

> FWIW, the GCC codegen for aarch64 is at https://godbolt.org/z/qvx9484nY (arm 
> and aarch64 are different targets). It emits a call to libatomic, which for 
> GCC 13 will use a lockless implementation when possible at runtime, see 
> g:d1288d850944f69a795e4ff444a427eba3fec11b

Thanks for the catch, my mistake. It would seem the codegen is similarly
inferior to the codegen from clang for both aarch64 and x64.

You may be interested in reading https://reviews.llvm.org/D110069. It wanted to
have LLVM generate a 128 bit AArch64 CAS for atomics. LLVM merged that change,
it'll be in the next release.

[Bug target/108659] Suboptimal 128 bit atomics codegen on AArch64 and x64

2023-02-03 Thread s_gccbugzilla at nedprod dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108659

--- Comment #7 from Niall Douglas  ---
(In reply to Andrew Pinski from comment #4)
> (In reply to Niall Douglas from comment #3) 
> > You may be interested in reading https://reviews.llvm.org/D110069. It wanted
> > to have LLVM generate a 128 bit AArch64 CAS for atomics. LLVM merged that
> > change, it'll be in the next release.
> 
> Using CAS for atomic load is not valid thing to do ...
> Because atomic load from constant rodata needs to work.
> LLVM breaks this case as they don't care about it. GCC does though.

I've heard that argument before, and I've always wondered why _Atomic128 types
couldn't have an attribute which applies attribute section to their static
const variable incarnations to force them into r/w memory. That would also
solve the LLVM issue. Said attribute is not unuseful in general actually, it
would help avoid having to mess with mprotect to apply copy on write perms on
regions in .rodata when you need to modify static const variable values.

I don't think that the standard *guarantees* that static const variables go
into read only memory, and besides, before C23 128 bit integers weren't
supported anyway so one could argue as a proprietary extension (__int128) you
get proprietary special casing.

Niall

[Bug target/108659] Suboptimal 128 bit atomics codegen on AArch64 and x64

2023-02-03 Thread s_gccbugzilla at nedprod dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108659

--- Comment #10 from Niall Douglas  ---
(In reply to Jakub Jelinek from comment #9)
> (In reply to Wilco from comment #8)
> > Yes that sounds like a reasonable approach.
> 
> I don't think so.  Not all variables on which __atomic_* intrinsics are used
> are actually _Atomic, the vars can be embedded in const aggregates etc.

I'd have the attribute propagate to enclosing types, like over-alignment.

[Bug c++/101133] [coroutines] co_await doesn't accept a valid awaitable object if await_resume()'s return type is not a built-in type.

2022-05-02 Thread s_gccbugzilla at nedprod dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101133

Niall Douglas  changed:

   What|Removed |Added

 CC||s_gccbugzilla at nedprod dot 
com

--- Comment #4 from Niall Douglas  ---
Looks like this fix didn't get backported into the 11 branch, as the 11.3
release is also broken.

https://godbolt.org/z/ojo759Md1

[Bug libstdc++/95609] span could have better layout

2021-08-23 Thread s_gccbugzilla at nedprod dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95609

--- Comment #8 from Niall Douglas  ---
(In reply to Jonathan Wakely from comment #7)
> (In reply to Niall Douglas from comment #0)
> > I would assume that the ABI ship has sailed, as usual, but if libstdc++'s
> > span could instead have the layout:
> > 
> > {
> >   T *p;
> >   size_t l;
> > }
> > 
> > ... then a span would be layout-compatible with struct iovec,
> 
> No it won't, because iovec has a void* member, and span has a byte*
> member, which makes them not layout-compatible. Similarly for span char> or any other specialization of span.

It is indeed a shame that the compiler won't optimise this appropriately
without additional markup on the iovec structure:

https://godbolt.org/z/cTo5xPxjE

I can tell you that in LLFIO we simply reinterpret cast on those platforms
where we know they'll be bit identical because a bunch of static asserts
wouldn't have allowed compilation otherwise.

Niall

[Bug c++/111041] New: Malformed requires syntax should produce better diagnostics

2023-08-16 Thread s_gccbugzilla at nedprod dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111041

Bug ID: 111041
   Summary: Malformed requires syntax should produce better
diagnostics
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: s_gccbugzilla at nedprod dot com
  Target Milestone: ---

I know this will be a minor enhancement request, however I would like this to
produce more useful diagnostics:

template concept test1 = requires(T x) { x.foo() -> bool;};

Currently we get "error: expected unqualified-id before 'bool'".

template concept test1 = requires(T x) { x.foo() ->
std::same_as;};

Currently we get "error: wrong number of template arguments (1, should be 2)".


I would like something like "error: requires item needs curly brackets if
requiring return type yielded from an expression".

I know this will seem an inconsequential enhancement request, but I lost
several hours of productivity today because I forgot the curly brackets in some
deeply nested requires clauses, and the current GCC diagnostics did not help my
tired old eyes see I had forgotten the curly brackets. I kept staring at it,
couldn't see the issue, and it was curly brackets all along.

[Bug c++/112339] New: ICE with namespaced attribute on function

2023-11-01 Thread s_gccbugzilla at nedprod dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112339

Bug ID: 112339
   Summary: ICE with namespaced attribute on function
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: s_gccbugzilla at nedprod dot com
  Target Milestone: ---

This, if compiled with trunk GCC or any recent major version of GCC:

```
[[clang::no_sanitize("bounds")]] void foo() 
{
}
```

... with options `-Wno-attributes=clang::no_sanitize -fsanitize=undefined`
produces ICE:

```
: In function 'void foo()':
:3:1: internal compiler error: in tree_to_uhwi, at tree.cc:6469
3 | }
  | ^
0x266430e internal_error(char const*, ...)
???:0
0xb01c2c fancy_abort(char const*, int, char const*)
???:0
0xb97938 cp_genericize(tree_node*)
???:0
0xbe33ef finish_function(bool)
???:0
0xcebe49 c_parse_file()
???:0
0xe2ceb9 c_common_parse_file()
???:0
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See  for instructions.
Compiler returned: 1
```

Godbolt demo: https://godbolt.org/z/hs3xqeqn1

[Bug c/112339] ICE with clang::no_sanitize and -fsanitize=

2023-11-08 Thread s_gccbugzilla at nedprod dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112339

--- Comment #3 from Niall Douglas  ---
Thanks for the patch. I've sent it on to the originator of the bug, if they
confirm it fixes their issue to me I'll let you know.