Re: Proposed new pass to optimise mode register assignments

2024-09-08 Thread Andrew Carlotti via Gcc
On Sat, Sep 07, 2024 at 09:09:52AM +0200, Richard Biener wrote:
> 
> 
> > Am 06.09.2024 um 17:38 schrieb Andrew Carlotti :
> > 
> > Hi,
> > 
> > I'm working on optimising assignments to the AArch64 Floating-point Mode
> > Register (FPMR), as part of our FP8 enablement work.  Claudio has already
> > implemented FPMR as a hard register, with the intention that FP8 intrinsic
> > functions will compile to a combination of an fpmr register set, followed 
> > by an
> > FP8 operation that takes fpmr as an input operand.
> > 
> > It would clearly be inefficient to retain an explicit FPMR assignment prior 
> > to
> > each FP8 instruction (especially in the common case where every assignment 
> > uses
> > the same FPMR value).  I think the best way to optimise this would be to
> > implement a new pass that can optimise assignments to individual hard 
> > registers.
> > 
> > There are a number of existing passes that do similar optimisations, but 
> > which
> > I believe are unsuitable for this scenario for various reasons.  For 
> > example:
> > 
> > - cse1 can already optimise FPMR assignments within an extended basic block,
> >  but can't handle broader optimisations.
> > - pre (in gcse.c) doesn't work with assigning constant values, which would 
> > miss
> >  many potential usages.  It also has limits on how far code can be moved,
> >  based around ideas of register pressure that don't apply to the context of 
> > a
> >  single hard register that shouldn't be used by the register allocator for
> >  anything else.  Additionally, it doesn't run at -Os.
> > - hoist (also using gcse.c) only handles constant values, and only runs when
> >  optimising for size.  It also has the rest of the issues that pre does.
> > - mode_sw only handles a small finite set of modes.  The mode requirements 
> > are
> >  determined solely by the instructions that require the specific mode, so 
> > mode
> >  switches don't depend on the output of previous instructions.
> > 
> > 
> > My intention would be for the new pass to reuse ideas, and hopefully some of
> > the existing code, from the mode-switching and gcse passes.  In particular,
> > gcse.c (or it's dependencies) has code that could identify when values 
> > assigned
> > to the FPMR are known to be the same (although we may not need the full CSE
> > capabilities of gcse.c), and mode-switching.cc knows how to globally 
> > optimise
> > mdoe assignments (and unlike gcse.c, doesn't use cautious heuristics to 
> > avoid
> > excessively increasing register pressure).
> > 
> > Initially the new pass would only apply to the AArch64 FPMR register, but in
> > future it could also be used for other hard registers with similar 
> > properties.
> > 
> > Does anyone have any comments on this approach, before I start writing any
> > code?
> 
> Can you explain in more detail why the mode-switching pass infrastructure
> isn’t a good fit?  ISTR it already is customizable via target hooks.
> 
> Richard 
> 

I forgot to explain how FPMR is used.

The FPMR register contains a large number of fields that control the data
formats and saturation/scaling behaviour used in various fp8 conversion an
multiplication intrinsics.  At present, I think there are 2^26 valid defined
values that an be used in the FPMR.  Furthermore, these values are not always
compile-time constants - we expect that devlopers will often reuse the same
compiled code (e.g. a matrix multiplication library routine) with different
formats or scaling/saturation behaviour selected at runtime (e.g. by passing a
parameter to the library routine).

(The specification for the FPRM register can be found at [1].  It's usage in
fp8 intrinsics is described in the draft ACLE spec at [2].)

As I understand it, the existing mode-switching pass infrastructure is built
around a small number of modes, where the choice of mode is a compile time
constant, and the total number of possible modes is fixed when building GCC.
Our usage of the FPMR register does not meet any of these criteria.  I don't
see how these limitations could be overcome with target hooks within the
contraints of the existing pass.



[1] 
https://developer.arm.com/documentation/ddi0601/2024-06/AArch64-Registers/FPMR--Floating-point-Mode-Register?lang=en

[2] 
https://github.com/ARM-software/acle/pull/323/files#diff-516526d4a18101dc85300bc2033d0f86dc46c505b7510a7694baabea851aedfaR5664
^ (Expand the large main/acle.md diff to see the relevant section)


Re: Proposed CHOST change for the 64bit time_t transition

2024-09-08 Thread Arsen Arsenović via Gcc
Bruno Haible  writes:

> Paul Eggert wrote:
>> I'd rather just switch, as Debian has.
>
> I'd go one step further, and not only
>   make the ABI transition without changing the canonical triplet,
> but also
>   make gcc and clang define -D_FILE_OFFSET_BITS=64 -D_TIME_BITS=64
>   among their predefines.

At that point, we should bump SONAME of libc and simply remove 32-bit
time support.  This would probably be okay generally.  Just doing the
predefine doesn't really fix anything - all the problems of not being
able to detect whether t64 support exists still persist, with no
mechanism to prevent mixing.

But, in the case we don't do a bump, why not update the tuple?  This'd
allow easy communication of whether we have 32 bit time to all
components of the system, and, in lieu of a better detection mechanism,
it'd allow anyone at a glance to look at a hosts tuple and see whether
it is compatible with something based on the tuple it was built on.

AFAIK the only reason not to do that is cosmetic - and it is ugly, but
this is for a dead platform, so I'm not too worried.

> Rationale:
>
>   * We want that a user of a distro with the new ABI can build
> packages in the usual way:
>   - ./configure; make; make install (when using Autoconf), or
>   - make; make install  (when there is just a Makefile).
> This *requires* that gcc and clang get patched, as indicated
> above.
> (Only changing Debian-specific files or variables won't do it.)
>
>   * Once this has been done, is there a need for a triplet change?
> Not in the toolchain, and not in the packages either.
> Needs that have been mentioned in [1][2]:
>
>   - Users would like to know in which ABI they / their distro lives.
> This can be done through a property in /etc/os-release.
>
>   - "risks incompatibility with other distributions" [2]
> What is the problem? Do we expect users to build binaries
> on 32-bit distro X and try to run them on 32-bit distro Y?
> Or do we expect binary package distributors (like Mozilla,
> videolan.org) to do so?
> It was my impression that this approach is doomed anyway,
> because so many shared libraries have different major version
> in distro X than in distro Y.

It is, many do it anyway, and what's worse, many users misguidedly
prefer it to better approaches (like the ones you mentioned).

> And that such binary package distributors use flatpak, AppImage, etc.
> precisely to get out of this dilemma.
>
>   - Building gcc and glibc might need some particular options.
> Such options can be documented without requiring a new triplet.
>
> References:
> [1] https://wiki.debian.org/ReleaseGoals/64bit-time
> [2] https://wiki.gentoo.org/wiki/Project:Toolchain/time64_migration

-- 
Arsen Arsenović


signature.asc
Description: PGP signature


Re: Proposed new pass to optimise mode register assignments

2024-09-08 Thread Jeff Law via Gcc





On 9/8/24 12:52 AM, Richard Biener wrote:



I suppose LCM could be enhanced to handle partial antic and if the

edges it speculates on are cold that might even be profitable on less
great implementations?
Yea, that's not a bad idea at all and suspect it would be useful outside 
this mode switching context.


Jeff


Re: Proposed new pass to optimise mode register assignments

2024-09-08 Thread Jeff Law via Gcc




On 9/7/24 7:06 PM, Andrew Carlotti wrote:



I forgot to explain how FPMR is used.

The FPMR register contains a large number of fields that control the data
formats and saturation/scaling behaviour used in various fp8 conversion an
multiplication intrinsics.  At present, I think there are 2^26 valid defined
values that an be used in the FPMR.  Furthermore, these values are not always
compile-time constants - we expect that devlopers will often reuse the same
compiled code (e.g. a matrix multiplication library routine) with different
formats or scaling/saturation behaviour selected at runtime (e.g. by passing a
parameter to the library routine).

(The specification for the FPRM register can be found at [1].  It's usage in
fp8 intrinsics is described in the draft ACLE spec at [2].)

As I understand it, the existing mode-switching pass infrastructure is built
around a small number of modes, where the choice of mode is a compile time
constant, and the total number of possible modes is fixed when building GCC.
Our usage of the FPMR register does not meet any of these criteria.  I don't
see how these limitations could be overcome with target hooks within the
contraints of the existing pass.
In which case you might want to look at how RISC-V is handling vsetvl to 
set the vector configuration.  It's a target specific pass, but comes 
closer to what you're doing.  It's still LCM based, but deals with some 
of the quirks of the RISC-V vector architecture.


Jeff


gcc-15-20240908 is now available

2024-09-08 Thread GCC Administrator via Gcc
Snapshot gcc-15-20240908 is now available on
  https://gcc.gnu.org/pub/gcc/snapshots/15-20240908/
and on various mirrors, see https://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 15 git branch
with the following options: git://gcc.gnu.org/git/gcc.git branch master 
revision 1e17a11171dae2bc5d60380bf48c12ef49f59c7f

You'll find:

 gcc-15-20240908.tar.xz   Complete GCC

  SHA256=15eb5261d5ca0f5c3f7125fae9dff58e5c89a1b6c1329abe15d0457baf866d66
  SHA1=3d4b1269970b4a90c4e46ff783ce58470c957f5c

Diffs from 15-20240901 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-15
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


Re: Proposed CHOST change for the 64bit time_t transition

2024-09-08 Thread Jacob Bachmeyer via Gcc

Arsen Arsenović wrote:

Bruno Haible  writes:
  

Paul Eggert wrote:


I'd rather just switch, as Debian has.
  

I'd go one step further, and not only
  make the ABI transition without changing the canonical triplet,
but also
  make gcc and clang define -D_FILE_OFFSET_BITS=64 -D_TIME_BITS=64
  among their predefines.



At that point, we should bump SONAME of libc and simply remove 32-bit
time support.  This would probably be okay generally.


This is probably the best solution to this problem at hand, especially 
since the old ABI has a definite expiration date about 14 years from 
now.  Bump the libc SONAME major and hope that we can get rid of the 
last dependencies on the old SONAME before the deadline.  We will have 
14 years to do it, if that arch is even still used then.



[...]
But, in the case we don't do a bump, why not update the tuple?  This'd
allow easy communication of whether we have 32 bit time to all
components of the system, and, in lieu of a better detection mechanism,
it'd allow anyone at a glance to look at a hosts tuple and see whether
it is compatible with something based on the tuple it was built on.
  


This is closely related to the idea I floated a year ago of redefining 
configuration tuples as lists of tags (with a canonical order) 
progressively narrowing a broad architecture.  Start with CPU 
architecture and work "narrower" from there.  In that system, adding 
"-t32" and "-t64" to indicate time_t width would be the simple 
solution.  In context then, it was to handle different libc choices on 
Windows.



-- Jacob


Re: #pragma once behavior

2024-09-08 Thread Michael Clark via Gcc

I like the sound of resolved path identity from search, including the 
constructed full path and the index within include paths. if I was writing a 
compiler from scratch, i'd problem do something like this: 
SHA2(include-path-search-offset-integer-of-found-header-to-support-include-next)
 + 
SHA2(container-relative-constructed-full-path-to-selected-header-during-include-search)
 + SHA2(selected-header-content) and make it timeless. if you evaluate the 
search yourself, you have the constructed full path identity while you are 
searching and you don't need to do tricks to to get a full container root 
relative path because you already have it during the search. it's a forward 
search path constructed relative root. what are the problems? it can include 
the same file twice if it has two path identities (hardlink or symlink) yet 
that seems the safer option to me because constant search configuration 
relative identity sounds better. as they could be defined to be different 
identities wrt #pragma once. it's the semantically safer option. and the 
canonical example that fails (same content, different path, same mtime) would 
succeed with this choice. pick holes in this. I might be overlooking something.