Re: Proposed new pass to optimise mode register assignments

2024-09-07 Thread Richard Biener via Gcc



> Am 06.09.2024 um 17:38 schrieb Andrew Carlotti :
> 
> Hi,
> 
> I'm working on optimising assignments to the AArch64 Floating-point Mode
> Register (FPMR), as part of our FP8 enablement work.  Claudio has already
> implemented FPMR as a hard register, with the intention that FP8 intrinsic
> functions will compile to a combination of an fpmr register set, followed by 
> an
> FP8 operation that takes fpmr as an input operand.
> 
> It would clearly be inefficient to retain an explicit FPMR assignment prior to
> each FP8 instruction (especially in the common case where every assignment 
> uses
> the same FPMR value).  I think the best way to optimise this would be to
> implement a new pass that can optimise assignments to individual hard 
> registers.
> 
> There are a number of existing passes that do similar optimisations, but which
> I believe are unsuitable for this scenario for various reasons.  For example:
> 
> - cse1 can already optimise FPMR assignments within an extended basic block,
>  but can't handle broader optimisations.
> - pre (in gcse.c) doesn't work with assigning constant values, which would 
> miss
>  many potential usages.  It also has limits on how far code can be moved,
>  based around ideas of register pressure that don't apply to the context of a
>  single hard register that shouldn't be used by the register allocator for
>  anything else.  Additionally, it doesn't run at -Os.
> - hoist (also using gcse.c) only handles constant values, and only runs when
>  optimising for size.  It also has the rest of the issues that pre does.
> - mode_sw only handles a small finite set of modes.  The mode requirements are
>  determined solely by the instructions that require the specific mode, so mode
>  switches don't depend on the output of previous instructions.
> 
> 
> My intention would be for the new pass to reuse ideas, and hopefully some of
> the existing code, from the mode-switching and gcse passes.  In particular,
> gcse.c (or it's dependencies) has code that could identify when values 
> assigned
> to the FPMR are known to be the same (although we may not need the full CSE
> capabilities of gcse.c), and mode-switching.cc knows how to globally optimise
> mdoe assignments (and unlike gcse.c, doesn't use cautious heuristics to avoid
> excessively increasing register pressure).
> 
> Initially the new pass would only apply to the AArch64 FPMR register, but in
> future it could also be used for other hard registers with similar properties.
> 
> Does anyone have any comments on this approach, before I start writing any
> code?

Can you explain in more detail why the mode-switching pass infrastructure isn’t 
a good fit?  ISTR it already is customizable via target hooks.

Richard 

> Thanks,
> Andrew
> 
> 


Re: #pragma once behavior

2024-09-07 Thread Richard Biener via Gcc



> Am 07.09.2024 um 07:27 schrieb Jeremy Rifkin :
> 
> 
>> 
>> This is why I said what is the a same file if you can't rely on inodes 
>> working? 
> 
> I don't have a good answer for such a case. Of course, no matter how one
> approaches #pragma once there will be cases that aren't handled.
> 
> The criteria to optimize for, imo, is which has the most clear failure
> mode. Contents happening match could occur naturally without realizing,
> which is hard to triage. Mtimes colliding could easily happen without
> realizing, which is also hard to triage and reproduce. Path issues pop
> up as real build systems use links. Mtime can fail on multiple mounts,
> path certainly will. In my opinion, the failure modes for contents and
> mtime are very sub-ideal. Path isn't adequate, it seems clear supporting
> links is an important goal.
> 
> To level-set: I don't think it's reasonable to expect #pragma once to
> handle multiple distinct copies of the same file. Especially given that
> contents isn't an option.
> 
> The failure mode of inodes, however, is a lot clearer. It breaks with
> things like multiple mounts and filesystems that don't have inodes. The
> way I see it, advice to users becomes clear since it's much clearer
> exactly how and why #pragma once might break.
> 
>> Early 2000s vs now have a different landscape when it comes to file systems.
> 
> Given the landscape today, could it make sense to re-evaluate mtime + content?

I’d rather drop the feature - IIRC it’s already deprecated and I’d oppose 
standardization as doing so doesn’t make much sense given all the above issues 
and there is a working solution - include guards.  What can pragma once do that 
include guards can‘t?  What’s the issue to solve?  Include guard collisions?  
That’s a much better understood failure mode than indodes, hardlinks or mtime.

Richard

> Cheers
> Jeremy
> 
> On Sep 6 2024, at 10:29 pm, Andrew Pinski  wrote:
> 
 On Fri, Sep 6, 2024, 7:42 PM Jeremy Rifkin  wrote:
>>> 
 Hi Andrew,
 Thanks for the thoughts and quick reply.
 
> Not always. because inodes are not always stable on some file systems.
> And also does not work with multi-mounted devices too.
 
 Unusual filesystems and multiple mounts are indeed the failing. As I
 mentioned, there's no silver bullet; they each have pitfalls. I do,
 however, think this is a less surprising failure mode than GCC's which
 rears its head in surprising and inconsistent cases.
 
> I say if the file has the same content, then it is the same file and
> GCC uses that definition.
 
 GCC doesn't use this definition, really. It's relying primarily on the
 mtime check and only falling back to contents in case of collision.
 
 The point on same contents contents == same file is well received. When
 I wrote the first draft of my paper I wrote it proposing this, however,
 I have become convinced this isn't the right approach based on examples
 where you could intend to include two files with the same contents that
 actually mean different things (such as Example 1).
 
 GCC's approach is hybrid, half relying on something from the filesystem
 and half relying on the contents. As far as I can tell this can lead to
 a worst of both worlds.
 
> GCC definition is the only one which supports all issues described
> here dealing with inodes (sometimes being non-stable), canonical paths
> and both kinds of links and even re-mounted file systems.
 
 I'd initially been thinking of a content-based solution in order to
 avoid any filesystem reliance and support multiple mounts etc. The
 problem currently is even GCC's approach, which has the best chance of
 working on multiple mounts, doesn't work consistently due to potential
 differences in mtime resolution. 
 
> What does the other implementations say about changing their
> definition of what "the same file is"? Have you asked clang and MSVC
> folks?
 
 I've not yet asked. If I proceed with a proposal paper what I'll most
 likely be proposing is what Clang does, worded in terms of same device
 same location. I started here since GCC's approach is least similar to
 that than what MSVC does. It's also easier to reach out to
 developers on
 open source projects.
>> 
>> Except the clang solution does not work for some file systems and is
>> broken when used on them. Maybe those file systems are not in use as
>> they once were and that is why clang didn't run into folks asking to
>> fix it.
>> 
>> Early 2000s vs now have a different landscape when it comes to file
>> systems. This is why I said what is the a same file if you can't rely
>> on inodes working? 
>> 
>> Thanks,
>> Andrew
>> 
>> 
>> 
>> 
>>> 
 
 
 Thanks,
 Jeremy
 
 
 On Sep 6 2024, at 8:16 pm, Andrew Pinski  wrote:
 
> On Fri, Sep 6, 2024 at 5:49 PM Jeremy Rifkin  wrote:
>

Re: Proposed CHOST change for the 64bit time_t transition

2024-09-07 Thread Arsen Arsenović via Gcc
Bruno Haible  writes:

> Arsen Arsenović wrote:
>> An alternative that I pondered was to teach the linker about some notion
>> of "compatibility strings" that it would compare and reject if
>> different, plus teaching the compiler how to emit those, plus teaching
>> glibc to tell the compiler to emit those..  We could have key-value
>> pairs in some section.  For each key K, we could have the linker check
>> that, for each (shared or otherwise) object either does not contain K or
>> contains K with the same value as all the other ones, and produce an
>> error otherwise.  On the resulting object, the KV pairs would be the
>> union of all KV pairs of all constituent objects.
>
> This sounds much like the arm eabi attributes: If a .s file does not
> start with
> .eabi_attribute 20, 1
> .eabi_attribute 21, 1
> .eabi_attribute 23, 3
> .eabi_attribute 24, 1
> .eabi_attribute 25, 1
> .eabi_attribute 26, 2
> .eabi_attribute 30, 2
> .eabi_attribute 34, 0
> .eabi_attribute 18, 4
> the resulting .o file cannot be linked with other .o files on the system.
>
> Is it a hassle even for packages that don't use time_t of off_t (such as
> GNU libffcall or libffi).
>
> Yes, it would be useful to have a way to have the linker warn if a binary
> that depends on 32-bit time_t and a binary that depends on 64-bit time_t
> get linked together. But PLEASE implement this in a way that is a no-op
> when time_t is not used by either of the two binaries.

I mentioned that a missing pair should not preclude linking.  This is
because what I imagined is somehow attaching this tag to the typedef for
time_t, or such, so if the typedef is not used, no such tag is emitted.

I'm not sure how feasible that is, I, of course, don't have a full
implementation.
-- 
Arsen Arsenović


signature.asc
Description: PGP signature


Re: Proposed CHOST change for the 64bit time_t transition

2024-09-07 Thread Michał Górny via Gcc
Hello,

On Thu, 2024-09-05 at 03:06 +0100, Wookey wrote:
> So it's interesting that in fact Gentoo _does_ want to do this, but it
> seems to me that this is now a done deal, and 'everyone' has already
> switched within the existing triplets, even Debian, which is the
> hardest place to do this because it involved 1100 library transitions,
> with another 3500-odd rebuilds.

I disagree with this claim, and I really feel like in this whole thread
people are underestimating the gravity of this situation.

In the case of Gentoo, being a source distribution implies that
the vast majority of users are building everything from source.  While
binary packages are supported to some degree, this is not something we
can rely on.  As such, we need to assume that the transition will
involve rebuilding everything from source.

This in turn implies rebuilding whole production systems, bottom up. 
This means that there will be significant periods of time during which
production systems will be running with ABI incompatibilities between
programs and their dependent libraries, and these periods will be much
longer than if binary packages were used.  In other words, production
systems will be quasi-broken for extended period of time.

What's even more dangerous is that any particular rebuild can actually
fail -- and then the user may suddenly be left with a quasi-broken
system, pending urgent recovery.  I mean, just imagine that one of your
dependent libraries failed to rebuild, and you end up having a 32-bit
time_t program with some of its dependencies using 32-bit time_t,
and others 64-bit time_t already -- and you suddenly have to figure out
how to make the problematic package build again, or revert everything
back.

Now, I'm not an expert on ABI but I do believe that the potential for
breakage is pretty serious.  After all, we are talking about doubling
the type size.  So we're talking about function parameters being
shifted, struct offset mismatches, stack corruption, heap corruption...

I think it does make sense to clearly indicate that this is a serious
change of ABI.  Furthermore, I personally would even go as far as to
change libdir, as to ensure that we can rebuild everything without
immediately breaking production systems in place.

On top of that, we need to account for the fact that users may have some
executables that they've compiled and linked to the system libraries
themselves, and these also may start suddenly misbehaving.  So perhaps
the right thing is to ensure that you can't even run an executable
unless all its linked libraries use the same time_t variant.

-- 
Best regards,
Michał Górny



signature.asc
Description: This is a digitally signed message part


Re: #pragma once behavior

2024-09-07 Thread Eric Gallager via Gcc
On Sat, Sep 7, 2024 at 5:34 AM Richard Biener via Gcc  wrote:
>
>
>
> > Am 07.09.2024 um 07:27 schrieb Jeremy Rifkin :
> >
> > 
> >>
> >> This is why I said what is the a same file if you can't rely on inodes 
> >> working?
> >
> > I don't have a good answer for such a case. Of course, no matter how one
> > approaches #pragma once there will be cases that aren't handled.
> >
> > The criteria to optimize for, imo, is which has the most clear failure
> > mode. Contents happening match could occur naturally without realizing,
> > which is hard to triage. Mtimes colliding could easily happen without
> > realizing, which is also hard to triage and reproduce. Path issues pop
> > up as real build systems use links. Mtime can fail on multiple mounts,
> > path certainly will. In my opinion, the failure modes for contents and
> > mtime are very sub-ideal. Path isn't adequate, it seems clear supporting
> > links is an important goal.
> >
> > To level-set: I don't think it's reasonable to expect #pragma once to
> > handle multiple distinct copies of the same file. Especially given that
> > contents isn't an option.
> >
> > The failure mode of inodes, however, is a lot clearer. It breaks with
> > things like multiple mounts and filesystems that don't have inodes. The
> > way I see it, advice to users becomes clear since it's much clearer
> > exactly how and why #pragma once might break.
> >
> >> Early 2000s vs now have a different landscape when it comes to file 
> >> systems.
> >
> > Given the landscape today, could it make sense to re-evaluate mtime + 
> > content?
>
> I’d rather drop the feature - IIRC it’s already deprecated and I’d oppose 
> standardization
> as doing so doesn’t make much sense given all the above issues and there is a 
> working
> solution - include guards.  What can pragma once do that include guards 
> can‘t?  What’s
> the issue to solve?  Include guard collisions?  That’s a much better 
> understood failure
> mode than indodes, hardlinks or mtime.

IMO if GCC is going to focus harder on include guards as the preferred
method of dealing with these sorts of problems, then it would make
sense to prioritize adopting some of clang's diagnostics regarding
include guards, for example, -Wheader-guard, as requested in bug
96842: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96842


>
> Richard
>
> > Cheers
> > Jeremy
> >
> > On Sep 6 2024, at 10:29 pm, Andrew Pinski  wrote:
> >
>  On Fri, Sep 6, 2024, 7:42 PM Jeremy Rifkin  wrote:
> >>>
>  Hi Andrew,
>  Thanks for the thoughts and quick reply.
> 
> > Not always. because inodes are not always stable on some file systems.
> > And also does not work with multi-mounted devices too.
> 
>  Unusual filesystems and multiple mounts are indeed the failing. As I
>  mentioned, there's no silver bullet; they each have pitfalls. I do,
>  however, think this is a less surprising failure mode than GCC's which
>  rears its head in surprising and inconsistent cases.
> 
> > I say if the file has the same content, then it is the same file and
> > GCC uses that definition.
> 
>  GCC doesn't use this definition, really. It's relying primarily on the
>  mtime check and only falling back to contents in case of collision.
> 
>  The point on same contents contents == same file is well received. When
>  I wrote the first draft of my paper I wrote it proposing this, however,
>  I have become convinced this isn't the right approach based on examples
>  where you could intend to include two files with the same contents that
>  actually mean different things (such as Example 1).
> 
>  GCC's approach is hybrid, half relying on something from the filesystem
>  and half relying on the contents. As far as I can tell this can lead to
>  a worst of both worlds.
> 
> > GCC definition is the only one which supports all issues described
> > here dealing with inodes (sometimes being non-stable), canonical paths
> > and both kinds of links and even re-mounted file systems.
> 
>  I'd initially been thinking of a content-based solution in order to
>  avoid any filesystem reliance and support multiple mounts etc. The
>  problem currently is even GCC's approach, which has the best chance of
>  working on multiple mounts, doesn't work consistently due to potential
>  differences in mtime resolution.
> 
> > What does the other implementations say about changing their
> > definition of what "the same file is"? Have you asked clang and MSVC
> > folks?
> 
>  I've not yet asked. If I proceed with a proposal paper what I'll most
>  likely be proposing is what Clang does, worded in terms of same device
>  same location. I started here since GCC's approach is least similar to
>  that than what MSVC does. It's also easier to reach out to
>  developers on
>  open source projects.
> >>
> >> Except the clang solution does

Re: #pragma once behavior

2024-09-07 Thread Jeremy Rifkin
Hi,

> IIRC it’s already deprecated

It appears it was undeprecated in 2003.

> What can pragma once do that include guards can‘t?  What’s the issue
> to solve?  Include guard collisions?

There is nothing #pragma once does that include guards can't. The reason
people turn to it is because:
- It's convenient
  - You don't have to think of a name
  - It's fast and easy to type
  - You don't have to change the name when copy+pasting to a new file
- It's less clutter
- It's less error prone:
  - You can't forget to rename the guard
  - You can't make typos in the name
  - You can't inadvertently not cover the whole file, or mismatch an #endif
- And yes, collisions

Whether or not #pragma once is a good thing or not is one question; It
absolutely has inherent limitations. The reason I'm interested in this
is that the status quo is that it's widely used and despite
implementation divergence most users never run into issues with it. But,
imo, it could at least be a little more reliable. I don't think it's a
reasonable expectation to get people to stop using it.

Cheers,
Jeremy

On Sep 7 2024, at 4:32 am, Richard Biener  wrote:

>  
>  
>> Am 07.09.2024 um 07:27 schrieb Jeremy Rifkin :
>>  
>> 
>>>  
>>> This is why I said what is the a same file if you can't rely on
>>> inodes working?  
>>  
>> I don't have a good answer for such a case. Of course, no matter how one
>> approaches #pragma once there will be cases that aren't handled.
>>  
>> The criteria to optimize for, imo, is which has the most clear failure
>> mode. Contents happening match could occur naturally without realizing,
>> which is hard to triage. Mtimes colliding could easily happen without
>> realizing, which is also hard to triage and reproduce. Path issues pop
>> up as real build systems use links. Mtime can fail on multiple mounts,
>> path certainly will. In my opinion, the failure modes for contents and
>> mtime are very sub-ideal. Path isn't adequate, it seems clear supporting
>> links is an important goal.
>>  
>> To level-set: I don't think it's reasonable to expect #pragma once to
>> handle multiple distinct copies of the same file. Especially given that
>> contents isn't an option.
>>  
>> The failure mode of inodes, however, is a lot clearer. It breaks with
>> things like multiple mounts and filesystems that don't have inodes. The
>> way I see it, advice to users becomes clear since it's much clearer
>> exactly how and why #pragma once might break.
>>  
>>> Early 2000s vs now have a different landscape when it comes to file systems.
>>  
>> Given the landscape today, could it make sense to re-evaluate mtime + 
>> content?
>  
> I’d rather drop the feature - IIRC it’s already deprecated and I’d
> oppose standardization as doing so doesn’t make much sense given all
> the above issues and there is a working solution - include guards.  
> What can pragma once do that include guards can‘t?  What’s the issue
> to solve?  Include guard collisions?  That’s a much better understood
> failure mode than indodes, hardlinks or mtime.
>  
> Richard
>  
>> Cheers
>> Jeremy
>>  
>> On Sep 6 2024, at 10:29 pm, Andrew Pinski  wrote:
>>  
> On Fri, Sep 6, 2024, 7:42 PM Jeremy Rifkin  wrote:
  
> Hi Andrew,
> Thanks for the thoughts and quick reply.
>  
>> Not always. because inodes are not always stable on some file systems.
>> And also does not work with multi-mounted devices too.
>  
> Unusual filesystems and multiple mounts are indeed the failing. As I
> mentioned, there's no silver bullet; they each have pitfalls. I do,
> however, think this is a less surprising failure mode than GCC's which
> rears its head in surprising and inconsistent cases.
>  
>> I say if the file has the same content, then it is the same file and
>> GCC uses that definition.
>  
> GCC doesn't use this definition, really. It's relying primarily on the
> mtime check and only falling back to contents in case of collision.
>  
> The point on same contents contents == same file is well received. When
> I wrote the first draft of my paper I wrote it proposing this, however,
> I have become convinced this isn't the right approach based on examples
> where you could intend to include two files with the same contents that
> actually mean different things (such as Example 1).
>  
> GCC's approach is hybrid, half relying on something from the filesystem
> and half relying on the contents. As far as I can tell this can
> lead to
> a worst of both worlds.
>  
>> GCC definition is the only one which supports all issues described
>> here dealing with inodes (sometimes being non-stable), canonical paths
>> and both kinds of links and even re-mounted file systems.
>  
> I'd initially been thinking of a content-based solution in order to
> avoid any filesystem reliance and support multiple mounts etc. The
> problem currently is even GCC's approach,

Re: Proposed new pass to optimise mode register assignments

2024-09-07 Thread Jeff Law via Gcc




On 9/7/24 1:09 AM, Richard Biener wrote:




Am 06.09.2024 um 17:38 schrieb Andrew Carlotti :

Hi,

I'm working on optimising assignments to the AArch64 Floating-point Mode
Register (FPMR), as part of our FP8 enablement work.  Claudio has already
implemented FPMR as a hard register, with the intention that FP8 intrinsic
functions will compile to a combination of an fpmr register set, followed by an
FP8 operation that takes fpmr as an input operand.

It would clearly be inefficient to retain an explicit FPMR assignment prior to 
whic
each FP8 instruction (especially in the common case where every assignment uses
the same FPMR value).  I think the best way to optimise this would be to
implement a new pass that can optimise assignments to individual hard registers.

There are a number of existing passes that do similar optimisations, but which
I believe are unsuitable for this scenario for various reasons.  For example:

- cse1 can already optimise FPMR assignments within an extended basic block,
  but can't handle broader optimisations.
- pre (in gcse.c) doesn't work with assigning constant values, which would miss
  many potential usages.  It also has limits on how far code can be moved,
  based around ideas of register pressure that don't apply to the context of a
  single hard register that shouldn't be used by the register allocator for
  anything else.  Additionally, it doesn't run at -Os.
- hoist (also using gcse.c) only handles constant values, and only runs when
  optimising for size.  It also has the rest of the issues that pre does.
- mode_sw only handles a small finite set of modes.  The mode requirements are
  determined solely by the instructions that require the specific mode, so mode
  switches don't depend on the output of previous instructions.


My intention would be for the new pass to reuse ideas, and hopefully some of
the existing code, from the mode-switching and gcse passes.  In particular,
gcse.c (or it's dependencies) has code that could identify when values assigned
to the FPMR are known to be the same (although we may not need the full CSE
capabilities of gcse.c), and mode-switching.cc knows how to globally optimise
mdoe assignments (and unlike gcse.c, doesn't use cautious heuristics to avoid
excessively increasing register pressure).

Initially the new pass would only apply to the AArch64 FPMR register, but in
future it could also be used for other hard registers with similar properties.

Does anyone have any comments on this approach, before I start writing any
code?


Can you explain in more detail why the mode-switching pass

infrastructure isn’t a good fit?  ISTR it already is customizable via
target hooks.
Agreed.  Mode switching seems to be the right pass to look at.

It probably is worth pointing out that mode switching is LCM based and 
as such never speculates.  Given the potential cost of a mode switch, 
failure to speculate may be a notable limitation (though the same would 
apply to the ideas Andrew floated above).


This has recently come up in the RISC-V space due to needing VXRM 
assignments so that we can utilize the vaaddu add-with-averaging 
instructions.Placement of VXRM mode switches looks optimal from an 
LCM standpoint, but speculation can measurably improve performance.  It 
was something like 2% on the BPI for x264.  The k1/m1 chip in the BPI is 
almost certainly flushing its pipelines on the VXRM assignment.


I've got a hack here that I'll submit upstream at some point.  Just not 
at the top of my list yet -- especially now that our uarch has been 
fixed to not flush its pipelines at VXRM assignments ;-)


jeff


gcc-14-20240907 is now available

2024-09-07 Thread GCC Administrator via Gcc
Snapshot gcc-14-20240907 is now available on
  https://gcc.gnu.org/pub/gcc/snapshots/14-20240907/
and on various mirrors, see https://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 14 git branch
with the following options: git://gcc.gnu.org/git/gcc.git branch 
releases/gcc-14 revision 0c80216b7bb0938bff7db230cbefa5bc3e8b3034

You'll find:

 gcc-14-20240907.tar.xz   Complete GCC

  SHA256=d69cf147dcbcb20320382b1dd42bf9ab34251009d3ad0c6ae249eaaad0e9a605
  SHA1=ff3c34e9f20d4486c159e09e6fc6685306c68bb9

Diffs from 14-20240831 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-14
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


Re: Proposed new pass to optimise mode register assignments

2024-09-07 Thread Richard Biener via Gcc



> Am 07.09.2024 um 17:56 schrieb Jeff Law :
> 
> 
> 
> On 9/7/24 1:09 AM, Richard Biener wrote:
 Am 06.09.2024 um 17:38 schrieb Andrew Carlotti :
>>> 
>>> Hi,
>>> 
>>> I'm working on optimising assignments to the AArch64 Floating-point Mode
>>> Register (FPMR), as part of our FP8 enablement work.  Claudio has already
>>> implemented FPMR as a hard register, with the intention that FP8 intrinsic
>>> functions will compile to a combination of an fpmr register set, followed 
>>> by an
>>> FP8 operation that takes fpmr as an input operand.
>>> 
>>> It would clearly be inefficient to retain an explicit FPMR assignment prior 
>>> to whic
>>> each FP8 instruction (especially in the common case where every assignment 
>>> uses
>>> the same FPMR value).  I think the best way to optimise this would be to
>>> implement a new pass that can optimise assignments to individual hard 
>>> registers.
>>> 
>>> There are a number of existing passes that do similar optimisations, but 
>>> which
>>> I believe are unsuitable for this scenario for various reasons.  For 
>>> example:
>>> 
>>> - cse1 can already optimise FPMR assignments within an extended basic block,
>>>  but can't handle broader optimisations.
>>> - pre (in gcse.c) doesn't work with assigning constant values, which would 
>>> miss
>>>  many potential usages.  It also has limits on how far code can be moved,
>>>  based around ideas of register pressure that don't apply to the context of 
>>> a
>>>  single hard register that shouldn't be used by the register allocator for
>>>  anything else.  Additionally, it doesn't run at -Os.
>>> - hoist (also using gcse.c) only handles constant values, and only runs when
>>>  optimising for size.  It also has the rest of the issues that pre does.
>>> - mode_sw only handles a small finite set of modes.  The mode requirements 
>>> are
>>>  determined solely by the instructions that require the specific mode, so 
>>> mode
>>>  switches don't depend on the output of previous instructions.
>>> 
>>> 
>>> My intention would be for the new pass to reuse ideas, and hopefully some of
>>> the existing code, from the mode-switching and gcse passes.  In particular,
>>> gcse.c (or it's dependencies) has code that could identify when values 
>>> assigned
>>> to the FPMR are known to be the same (although we may not need the full CSE
>>> capabilities of gcse.c), and mode-switching.cc knows how to globally 
>>> optimise
>>> mdoe assignments (and unlike gcse.c, doesn't use cautious heuristics to 
>>> avoid
>>> excessively increasing register pressure).
>>> 
>>> Initially the new pass would only apply to the AArch64 FPMR register, but in
>>> future it could also be used for other hard registers with similar 
>>> properties.
>>> 
>>> Does anyone have any comments on this approach, before I start writing any
>>> code?
>> Can you explain in more detail why the mode-switching pass
> infrastructure isn’t a good fit?  ISTR it already is customizable via
> target hooks.
> Agreed.  Mode switching seems to be the right pass to look at.
> 
> It probably is worth pointing out that mode switching is LCM based and as 
> such never speculates.  Given the potential cost of a mode switch, failure to 
> speculate may be a notable limitation (though the same would apply to the 
> ideas Andrew floated above).
> 
> This has recently come up in the RISC-V space due to needing VXRM assignments 
> so that we can utilize the vaaddu add-with-averaging instructions.
> Placement of VXRM mode switches looks optimal from an LCM standpoint, but 
> speculation can measurably improve performance.  It was something like 2% on 
> the BPI for x264.  The k1/m1 chip in the BPI is almost certainly flushing its 
> pipelines on the VXRM assignment.
> 
> I've got a hack here that I'll submit upstream at some point.  Just not at 
> the top of my list yet -- especially now that our uarch has been fixed to not 
> flush its pipelines at VXRM assignments ;-)

I suppose LCM could be enhanced to handle partial antic and if the edges it 
speculates on are cold that might even be profitable on less great 
implementations?

> 
> jeff