Change between g++-9.4.0 and g++-14.2.0

2025-02-24 Thread Sidney Marshall

In the code:
---
#include 
#include 
#include 

// commenting out this forward declaration causes the code to fail
// under g++-14.2.0
//
// the forward declaration is not needed for g++-9.4.0
template
std::ostream& operator<<(std::ostream& os, const std::pair& p);

template
std::ostream& operator<<(std::ostream& os, const std::vector& v) {
  os << "[";
  for(typename std::vector::const_iterator i = v.begin();
  i != v.end();
  ++i) {
if(i != v.begin() ) os << ", ";
os << *i;
  }
  os << "]";
  return os;
}

template
std::ostream& operator<<(std::ostream& os, const std::pair& p) {
  os << "(" << p.first << "," << p.second << ")";
  return os;
}

using namespace std;

int main() {
  cout << pair(3, 4) << endl;
  cout << vector(3, 7) << endl;
  cout << vector >(3, pair(3, 4)) << endl;
}

// output:
// (3,4)
// [7, 7, 7]
// [(3,4), (3,4), (3,4)]
---

the code compiles and runs under  g++-9.4.0 with or without the 
forward declaration but g++-14.2.0 requires the forward declaration.


Did the standard change or is g++ now being more strict in following 
the standard.


Note that the actual code is more compicated but this is enough the 
show the difference.


--Sidney Marshall



Re: Change between g++-9.4.0 and g++-14.2.0

2025-02-24 Thread Andrew Pinski via Gcc
On Mon, Feb 24, 2025 at 9:26 PM Sidney Marshall  wrote:
>
> In the code:
> ---
> #include 
> #include 
> #include 
>
> // commenting out this forward declaration causes the code to fail
> // under g++-14.2.0
> //
> // the forward declaration is not needed for g++-9.4.0
> template
> std::ostream& operator<<(std::ostream& os, const std::pair& p);
>
> template
> std::ostream& operator<<(std::ostream& os, const std::vector& v) {
>os << "[";
>for(typename std::vector::const_iterator i = v.begin();
>i != v.end();
>++i) {
>  if(i != v.begin() ) os << ", ";
>  os << *i;
>}
>os << "]";
>return os;
> }
>
> template
> std::ostream& operator<<(std::ostream& os, const std::pair& p) {
>os << "(" << p.first << "," << p.second << ")";
>return os;
> }
>
> using namespace std;
>
> int main() {
>cout << pair(3, 4) << endl;
>cout << vector(3, 7) << endl;
>cout << vector >(3, pair(3, 4)) << endl;
> }
>
> // output:
> // (3,4)
> // [7, 7, 7]
> // [(3,4), (3,4), (3,4)]
> ---
>
> the code compiles and runs under  g++-9.4.0 with or without the
> forward declaration but g++-14.2.0 requires the forward declaration.
>
> Did the standard change or is g++ now being more strict in following
> the standard.

GCC changed to be correct: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51577

Thanks,
Andrew

>
> Note that the actual code is more compicated but this is enough the
> show the difference.
>
> --Sidney Marshall
>


Re: Backend for a stack-oriented architecture

2025-02-24 Thread Florian Weimer
* Michael Matz:

> Hello,
>
> On Mon, 24 Feb 2025, Florian Weimer wrote:
>
>> .proc fib (_long) (_long)
>> # Argument/result register: %3
>> # return address register: %2
>> # local register: %1
>> # outgoing argument/return register: %0
>>   .framesize 24 # in bytes, three registers excluding the incoming argument
> ...
>>   ret 24
>
> Random observation: if the callee pops the stack you will have a harder 
> time dealing with stdarg functions.

The callee only pops the stack up to and including the return address.
(The operand behaves differently from the i386 instruction.)

Variadic functions will be handled quite differently on this target.
A memory-safe va_arg needs some sort of type descriptor.

>> I tried to create a GCC backend for this, by looking at the existing
>> mmix backend (for the register windows) and the bpf backend (due to
>> its verified nature) for inspiration.  I did not get very far because
>> it's my first GCC backend.  I wonder if that's just my lack
>> experience, or if the target just isn't a good fit for GCC.  I found
>> Hans-Peter Nilsson old description of the CRIS backend, and it it
>> mentions that GCC is not a good starting point for accumulator
>> machines with just one register (and my target doesn't even have that,
>> not really).
>> 
>> If advisable, I can redefine the target to make it more GCC-friendly,
>> perhaps by introducing a register file separate from the stack.
>> (Although this would make the emulator and verifier more complex.)
>
> Perhaps some inspiration can be gleaned from other 0-address machines, 
> i.e. pure stack ones.  One I know of that has a GCC port is the (meanwhile 
> fairly old) zpu ( https://en.wikipedia.org/wiki/ZPU_(processor) and 
> https://github.com/zylin/zpugcc , look at toolchain/gcc/gcc/config/zpu 
> there).

This looks like a good point for comparison, thanks.

> But if you don't want to endlessly wrestle against GCC it's probably 
> easier to architect your insn set with some registers (they could still be 
> memory-mapped, and be in fact offsets from a special base register that's 
> not exposed to GCC - perhaps that gives you the features you seek for your 
> mostly memory-safe guarantees?).

I need to control aliasing, so if there's a pointer into the stack, it
must be at a known offset from the stack pointer, and if it is used in
an offset operand, the offset must be statically known, too.
Otherwise, things get very, very complicated.  Whether I do this in
the assembler or in GCC doesn't really matter.  The assembler already
knows about what (I think) GCC calls the argument pointer, and I
should use that from the GCC side.


aarch64 built-in SIMD types

2025-02-24 Thread Tom Kacvinsky via Gcc
Hi all,

I am trying to find where the aarch64 SIMD built in types are defined in
GCC.
For instance, __Int8x8_t.  I see some code in gcc/config/aarch64 for these,
but
then it goes deeper into internals of gcc that I don't quite follow.

Any help pointing to where I should look would be appreciated.

Thanks,

Tom


Re: Backend for a stack-oriented architecture

2025-02-24 Thread Jeff Law via Gcc




On 2/24/25 4:32 AM, Florian Weimer wrote:

As a hobby project, I'm working on a mostly memory-safe architecture
that is targeted at direct software emulation.  The majority of its
instructions have memory operands that are relative to the stack
pointer.  Calls and returns adjust the stack pointer, so I suppose one
could say that the architecture has register windows.  The reason for
the stack-based approach is that an emulated register move would be a
memory-to-memory move anyway.  I believe this approach is similar to
the Lua VM and other VMs which hare generally considered
register-based instead of stack-based.  Writing it by hand, it feels
more register-based than stack-based, too, although the direct support
for multiple return values makes it possible to use some Forth-like
idioms.

Here's some example code (destination operand comes first):

.proc fib (_long) (_long)
# Argument/result register: %3
# return address register: %2
# local register: %1
# outgoing argument/return register: %0
   .framesize 24 # in bytes, three registers excluding the incoming argument
   ldic %1, 2
   jlels %3, %1, :0# If the argument is less than 2, just return it.
   addlc %0, %3, -1# Prepare argument for first recursive call.
   callp fib
   mv %1, %0   # Save result of first call.
   addlc %0, %3, -2# Prepare argument for second recursive call.
   callp fib
   addlso %3, %1, %0   # Sum of results, with an overflow check.
:0
   ret 24

The call instruction increments the stack pointer by 24 bytes to
create the new frame.  This is how the argument becomes available as
%3 in the callee.  (The real assembler has a minimal register
allocator and computes the frame size automatically, so that the
change of the argument registers is hidden from the programmer
new local registers are introduced.)

I tried to create a GCC backend for this, by looking at the existing
mmix backend (for the register windows) and the bpf backend (due to
its verified nature) for inspiration.  I did not get very far because
it's my first GCC backend.  I wonder if that's just my lack
experience, or if the target just isn't a good fit for GCC.  I found
Hans-Peter Nilsson old description of the CRIS backend, and it it
mentions that GCC is not a good starting point for accumulator
machines with just one register (and my target doesn't even have that,
not really).

If advisable, I can redefine the target to make it more GCC-friendly,
perhaps by introducing a register file separate from the stack.
(Although this would make the emulator and verifier more complex.)
You might be able to pretend you have a flat register file up through 
allocation & reloading, then convert that to a stack.   The code in 
reg-stack.c might help since it does basically the same thing for the 
x86 FP unit.


But it's not a great fit for GCC in general.

Jeff



Gcc - MIPIM 2025

2025-02-24 Thread Samantha Dickerson
Hello,

I hope you're doing well.

The buyer list for MIPIM 2025 is available, featuring 25369 verified contacts 
with endless usage rights. If this is something you'd like to explore, I'm 
happy to send you pricing and further details.

Looking forward to hearing from you.

Best regards,

Samantha Dickerson

Demand Generation




GCC used to store pointers in FP registers on aarch64

2025-02-24 Thread Attila Szegedi via Gcc
Hi folks,

I'm looking for a bit of a historic context for a fun GCC behavior we
stumbled across. For... reasons we build some of our binaries using an
older version of GCC (8.3.1, yes, we'll be upgrading soon, and no, this
message is not about helping with an ancient version :-) )

We noticed that this version of GCC compiling on aarch64 will happily use
FP registers to temporarily store/load pointers, so there'd be "fmov d9,
x1" to store a pointer, and then later when it's used as a parameter to a
function call we'll see "fmov x1, d9" etc. We noticed this while
investigating some crashes that seemed to always occur in functions called
with parameters loaded through this mechanism, on certain specific models
of aarch64 CPUs. On the face of it, this doesn't seem a _too_ terrible idea
– one'd think that a FP register should preserve the bit pattern so as long
as the only operations are stores and loads, what's the harm, right? Hey,
more free registers! Except, on some silicon, it's unfortunately strongly
correlated with crashes further down the callee chain.

Further proving the theory is that after we did some judicious application
of __attribute__((target("general-regs-only"))) to offending functions to
discourage the compiler from the practice, the crashes were gone.
Unfortunately, it sometimes required contorting the code to move any
implied uses of FP out of the way (heck, an inlined std::map constructor
requires FP operations 'cause of its load factor!)

I also noticed that a more modern version of GCC (e.g. 12.x) does not seem
to emit such code anymore (thus also eliminating the problem.) Curiously, I
couldn't wrangle a good enough Google search term to find anything about
what brought about the change – a discussion, a blog post, anything. I
wanted to know if the practice of stashing pointers in FP registers indeed
proved to be dangerous and was thus deliberately abandoned, or is it maybe
just a byproduct of some other change.

If someone knows more about this, I'd be very curious to hear about it.
It'd be great to know that this was an explicitly eliminated behavior so we
can rest assured that by using a newer version of GCC we will not get
bitten by it again.

Thanks,
  Attila.


Re: GCC used to store pointers in FP registers on aarch64

2025-02-24 Thread Kyrylo Tkachov via Gcc
Hi Attila,

> On 24 Feb 2025, at 10:46, Attila Szegedi via Gcc  wrote:
> 
> Hi folks,
> 
> I'm looking for a bit of a historic context for a fun GCC behavior we
> stumbled across. For... reasons we build some of our binaries using an
> older version of GCC (8.3.1, yes, we'll be upgrading soon, and no, this
> message is not about helping with an ancient version :-) )
> 
> We noticed that this version of GCC compiling on aarch64 will happily use
> FP registers to temporarily store/load pointers, so there'd be "fmov d9,
> x1" to store a pointer, and then later when it's used as a parameter to a
> function call we'll see "fmov x1, d9" etc. We noticed this while
> investigating some crashes that seemed to always occur in functions called
> with parameters loaded through this mechanism, on certain specific models
> of aarch64 CPUs. On the face of it, this doesn't seem a _too_ terrible idea
> – one'd think that a FP register should preserve the bit pattern so as long
> as the only operations are stores and loads, what's the harm, right? Hey,
> more free registers! Except, on some silicon, it's unfortunately strongly
> correlated with crashes further down the callee chain.
> 
> Further proving the theory is that after we did some judicious application
> of __attribute__((target("general-regs-only"))) to offending functions to
> discourage the compiler from the practice, the crashes were gone.
> Unfortunately, it sometimes required contorting the code to move any
> implied uses of FP out of the way (heck, an inlined std::map constructor
> requires FP operations 'cause of its load factor!)
> 
> I also noticed that a more modern version of GCC (e.g. 12.x) does not seem
> to emit such code anymore (thus also eliminating the problem.) Curiously, I
> couldn't wrangle a good enough Google search term to find anything about
> what brought about the change – a discussion, a blog post, anything. I
> wanted to know if the practice of stashing pointers in FP registers indeed
> proved to be dangerous and was thus deliberately abandoned, or is it maybe
> just a byproduct of some other change.
> 
> If someone knows more about this, I'd be very curious to hear about it.
> It'd be great to know that this was an explicitly eliminated behavior so we
> can rest assured that by using a newer version of GCC we will not get
> bitten by it again.
> 

I’d say it was just a side-effect of various optimization decisions. GCC may 
still decide to move things between the FP and GP regs instead of the stack, 
it’s really a matter of CPU-specific costs.
I haven’t heard of such an issue like you describe before.
Generally, the base AArch64 ABI assumes the presence of FP+SIMD registers.
-mgeneral-regs-only and the general-regs-only attribute can be used if you know 
what you’re doing in a software stack that you control, but it’s probably just 
a workaround for what seems to be a hardware issue you’re facing.

Thanks,
Kyrill 

> Thanks,
>  Attila.



Re: GCC used to store pointers in FP registers on aarch64

2025-02-24 Thread Florian Weimer
* Attila Szegedi via Gcc:

> We noticed that this version of GCC compiling on aarch64 will happily use
> FP registers to temporarily store/load pointers, so there'd be "fmov d9,
> x1" to store a pointer, and then later when it's used as a parameter to a
> function call we'll see "fmov x1, d9" etc. We noticed this while
> investigating some crashes that seemed to always occur in functions called
> with parameters loaded through this mechanism, on certain specific models
> of aarch64 CPUs. On the face of it, this doesn't seem a _too_ terrible idea
> – one'd think that a FP register should preserve the bit pattern so as long
> as the only operations are stores and loads, what's the harm, right? Hey,
> more free registers! Except, on some silicon, it's unfortunately strongly
> correlated with crashes further down the callee chain.

Surely not preserving floating point bit patterns in registers would
be a silicon bug?  That seems … quite unlikely.  GCC 8 has seen
extensive use on AArch64, on a variety of implementations, and I don't
recall problems in this area.  I don't follow AArch64 *that* closely,
admittedly, but I expect it would have caused quite a ruckus.

Do you use some sort of conservative garbage collector that
incorrectly skips scanning of floating point registers?


Backend for a stack-oriented architecture

2025-02-24 Thread Florian Weimer
As a hobby project, I'm working on a mostly memory-safe architecture
that is targeted at direct software emulation.  The majority of its
instructions have memory operands that are relative to the stack
pointer.  Calls and returns adjust the stack pointer, so I suppose one
could say that the architecture has register windows.  The reason for
the stack-based approach is that an emulated register move would be a
memory-to-memory move anyway.  I believe this approach is similar to
the Lua VM and other VMs which hare generally considered
register-based instead of stack-based.  Writing it by hand, it feels
more register-based than stack-based, too, although the direct support
for multiple return values makes it possible to use some Forth-like
idioms.

Here's some example code (destination operand comes first):

.proc fib (_long) (_long)
# Argument/result register: %3
# return address register: %2
# local register: %1
# outgoing argument/return register: %0
  .framesize 24 # in bytes, three registers excluding the incoming argument
  ldic %1, 2
  jlels %3, %1, :0# If the argument is less than 2, just return it.
  addlc %0, %3, -1# Prepare argument for first recursive call.
  callp fib
  mv %1, %0   # Save result of first call.
  addlc %0, %3, -2# Prepare argument for second recursive call.
  callp fib
  addlso %3, %1, %0   # Sum of results, with an overflow check.
:0
  ret 24

The call instruction increments the stack pointer by 24 bytes to
create the new frame.  This is how the argument becomes available as
%3 in the callee.  (The real assembler has a minimal register
allocator and computes the frame size automatically, so that the
change of the argument registers is hidden from the programmer
new local registers are introduced.)

I tried to create a GCC backend for this, by looking at the existing
mmix backend (for the register windows) and the bpf backend (due to
its verified nature) for inspiration.  I did not get very far because
it's my first GCC backend.  I wonder if that's just my lack
experience, or if the target just isn't a good fit for GCC.  I found
Hans-Peter Nilsson old description of the CRIS backend, and it it
mentions that GCC is not a good starting point for accumulator
machines with just one register (and my target doesn't even have that,
not really).

If advisable, I can redefine the target to make it more GCC-friendly,
perhaps by introducing a register file separate from the stack.
(Although this would make the emulator and verifier more complex.)


Re: GCC used to store pointers in FP registers on aarch64

2025-02-24 Thread Attila Szegedi via Gcc
On Mon, Feb 24, 2025 at 12:41 PM Florian Weimer  wrote:

>
> Surely not preserving floating point bit patterns in registers would
> be a silicon bug?


Indeed it would be.


> That seems … quite unlikely.  GCC 8 has seen
> extensive use on AArch64, on a variety of implementations, and I don't
> recall problems in this area.  I don't follow AArch64 *that* closely,
> admittedly, but I expect it would have caused quite a ruckus.
>

Yeah. The lack of discussion also led me to believe that even if this is an
issue, it's definitely not a widely encountered one. (It's also possible
that it's a red herring, although, well, as I said, forcing general regs
only did fix it.)

>
> Do you use some sort of conservative garbage collector that
> incorrectly skips scanning of floating point registers?
>

Great question, to have thought of that! Fortunately, we are not.

Thank you for giving it some thought.

Attila.


Re: GCC used to store pointers in FP registers on aarch64

2025-02-24 Thread Florian Weimer
* Attila Szegedi:

>> That seems … quite unlikely.  GCC 8 has seen extensive use on
>> AArch64, on a variety of implementations, and I don't recall
>> problems in this area.  I don't follow AArch64 *that* closely,
>> admittedly, but I expect it would have caused quite a ruckus.
>>
>
> Yeah. The lack of discussion also led me to believe that even if this is an
> issue, it's definitely not a widely encountered one. (It's also possible
> that it's a red herring, although, well, as I said, forcing general regs
> only did fix it.)

Is it non-deterministic?  It might be a context switching issue in the
kernel/hypervisor/firmware.  I usually don't notice fixes for those
because they do not lead to questions whether it's necessary to
rebuild the whole distribution.  These bugs do happen from time to time:

  [PATCH v3 0/8] KVM: arm64: FPSIMD/SVE/SME fixes
  



Re: GCC used to store pointers in FP registers on aarch64

2025-02-24 Thread Attila Szegedi via Gcc
On Mon, Feb 24, 2025 at 1:21 PM Florian Weimer  wrote:

> * Attila Szegedi:
>
> >> That seems … quite unlikely.  GCC 8 has seen extensive use on
> >> AArch64, on a variety of implementations, and I don't recall
> >> problems in this area.  I don't follow AArch64 *that* closely,
> >> admittedly, but I expect it would have caused quite a ruckus.
> >>
> >
> > Yeah. The lack of discussion also led me to believe that even if this is
> an
> > issue, it's definitely not a widely encountered one. (It's also possible
> > that it's a red herring, although, well, as I said, forcing general regs
> > only did fix it.)
>
> Is it non-deterministic?  It might be a context switching issue in the
> kernel/hypervisor/firmware.  I usually don't notice fixes for those
> because they do not lead to questions whether it's necessary to
> rebuild the whole distribution.  These bugs do happen from time to time:
>
>   [PATCH v3 0/8] KVM: arm64: FPSIMD/SVE/SME fixes
>   <
> https://lore.kernel.org/linux-arm-kernel/20250210195226.1215254-1-mark.rutl...@arm.com/
> >
>

Huh. That is interesting. Yes, it is non-deterministic. And it does occur
solely in containerized environments, so it's eminently possible it's a
hypervisor issue.

(Still, a bit amusing that nothing came up with regard to how I don't see
GCC 12 schedule pointers onto aarch64 FP registers anymore :-).  I
understood from Kyrylo's post that it's probably because no such explicit
decision was made.)

Attila.


Re: Backend for a stack-oriented architecture

2025-02-24 Thread Michael Matz via Gcc
Hello,

On Mon, 24 Feb 2025, Florian Weimer wrote:

> .proc fib (_long) (_long)
> # Argument/result register: %3
> # return address register: %2
> # local register: %1
> # outgoing argument/return register: %0
>   .framesize 24 # in bytes, three registers excluding the incoming argument
...
>   ret 24

Random observation: if the callee pops the stack you will have a harder 
time dealing with stdarg functions.

> I tried to create a GCC backend for this, by looking at the existing
> mmix backend (for the register windows) and the bpf backend (due to
> its verified nature) for inspiration.  I did not get very far because
> it's my first GCC backend.  I wonder if that's just my lack
> experience, or if the target just isn't a good fit for GCC.  I found
> Hans-Peter Nilsson old description of the CRIS backend, and it it
> mentions that GCC is not a good starting point for accumulator
> machines with just one register (and my target doesn't even have that,
> not really).
> 
> If advisable, I can redefine the target to make it more GCC-friendly,
> perhaps by introducing a register file separate from the stack.
> (Although this would make the emulator and verifier more complex.)

Perhaps some inspiration can be gleaned from other 0-address machines, 
i.e. pure stack ones.  One I know of that has a GCC port is the (meanwhile 
fairly old) zpu ( https://en.wikipedia.org/wiki/ZPU_(processor) and 
https://github.com/zylin/zpugcc , look at toolchain/gcc/gcc/config/zpu 
there).

For GCC the trick will almost always be to lie to GCC, claim that there 
are a couple of hard registers and rewrite them fairly late into 
stack-pointer relative references (that's e.g. what the above port is 
doing).  GCCs facility for a register stack (reg-stack.cc) is used only 
for x87 regs, and hence is quite likely to be usable only there, not if 
the general regs are also stack based.

But if you don't want to endlessly wrestle against GCC it's probably 
easier to architect your insn set with some registers (they could still be 
memory-mapped, and be in fact offsets from a special base register that's 
not exposed to GCC - perhaps that gives you the features you seek for your 
mostly memory-safe guarantees?).


Ciao,
Michael.


Re: GCC used to store pointers in FP registers on aarch64

2025-02-24 Thread Mark Rutland via Gcc
On Mon, Feb 24, 2025 at 10:46:42AM +0100, Attila Szegedi wrote:
> Hi folks,

Hi,

I've been pointed at this thread due to the reference to my Linux patch
series fixing some KVM FPSIMD/SVE/SME issues.

> I'm looking for a bit of a historic context for a fun GCC behavior we
> stumbled across. For... reasons we build some of our binaries using an
> older version of GCC (8.3.1, yes, we'll be upgrading soon, and no, this
> message is not about helping with an ancient version :-) )
> 
> We noticed that this version of GCC compiling on aarch64 will happily use
> FP registers to temporarily store/load pointers, so there'd be "fmov d9,
> x1" to store a pointer, and then later when it's used as a parameter to a
> function call we'll see "fmov x1, d9" etc. We noticed this while
> investigating some crashes that seemed to always occur in functions called
> with parameters loaded through this mechanism, on certain specific models
> of aarch64 CPUs.

Hmmm... IIUC d9 specifically should be preserved by callees per AAPCS64;
do you see this with specific registers? e.g. v8 to v15?

Are you able to share any more information about the configuration(s)
that you see this with, e.g.

* Which CPU(s)?

  If you're not able to say which CPU(s) specifically, knowing whether
  SVE and/or SME are present would be helpful.

* Which kernel version(s), assuming this is with Linux?

  If virtualization is involved, knowing the guest and host kernel
  versions would be helpful.

Thanks,
Mark.