Re: Proposed new pass to optimise mode register assignments
On Sat, Sep 07, 2024 at 09:09:52AM +0200, Richard Biener wrote: > > > > Am 06.09.2024 um 17:38 schrieb Andrew Carlotti : > > > > Hi, > > > > I'm working on optimising assignments to the AArch64 Floating-point Mode > > Register (FPMR), as part of our FP8 enablement work. Claudio has already > > implemented FPMR as a hard register, with the intention that FP8 intrinsic > > functions will compile to a combination of an fpmr register set, followed > > by an > > FP8 operation that takes fpmr as an input operand. > > > > It would clearly be inefficient to retain an explicit FPMR assignment prior > > to > > each FP8 instruction (especially in the common case where every assignment > > uses > > the same FPMR value). I think the best way to optimise this would be to > > implement a new pass that can optimise assignments to individual hard > > registers. > > > > There are a number of existing passes that do similar optimisations, but > > which > > I believe are unsuitable for this scenario for various reasons. For > > example: > > > > - cse1 can already optimise FPMR assignments within an extended basic block, > > but can't handle broader optimisations. > > - pre (in gcse.c) doesn't work with assigning constant values, which would > > miss > > many potential usages. It also has limits on how far code can be moved, > > based around ideas of register pressure that don't apply to the context of > > a > > single hard register that shouldn't be used by the register allocator for > > anything else. Additionally, it doesn't run at -Os. > > - hoist (also using gcse.c) only handles constant values, and only runs when > > optimising for size. It also has the rest of the issues that pre does. > > - mode_sw only handles a small finite set of modes. The mode requirements > > are > > determined solely by the instructions that require the specific mode, so > > mode > > switches don't depend on the output of previous instructions. > > > > > > My intention would be for the new pass to reuse ideas, and hopefully some of > > the existing code, from the mode-switching and gcse passes. In particular, > > gcse.c (or it's dependencies) has code that could identify when values > > assigned > > to the FPMR are known to be the same (although we may not need the full CSE > > capabilities of gcse.c), and mode-switching.cc knows how to globally > > optimise > > mdoe assignments (and unlike gcse.c, doesn't use cautious heuristics to > > avoid > > excessively increasing register pressure). > > > > Initially the new pass would only apply to the AArch64 FPMR register, but in > > future it could also be used for other hard registers with similar > > properties. > > > > Does anyone have any comments on this approach, before I start writing any > > code? > > Can you explain in more detail why the mode-switching pass infrastructure > isn’t a good fit? ISTR it already is customizable via target hooks. > > Richard > I forgot to explain how FPMR is used. The FPMR register contains a large number of fields that control the data formats and saturation/scaling behaviour used in various fp8 conversion an multiplication intrinsics. At present, I think there are 2^26 valid defined values that an be used in the FPMR. Furthermore, these values are not always compile-time constants - we expect that devlopers will often reuse the same compiled code (e.g. a matrix multiplication library routine) with different formats or scaling/saturation behaviour selected at runtime (e.g. by passing a parameter to the library routine). (The specification for the FPRM register can be found at [1]. It's usage in fp8 intrinsics is described in the draft ACLE spec at [2].) As I understand it, the existing mode-switching pass infrastructure is built around a small number of modes, where the choice of mode is a compile time constant, and the total number of possible modes is fixed when building GCC. Our usage of the FPMR register does not meet any of these criteria. I don't see how these limitations could be overcome with target hooks within the contraints of the existing pass. [1] https://developer.arm.com/documentation/ddi0601/2024-06/AArch64-Registers/FPMR--Floating-point-Mode-Register?lang=en [2] https://github.com/ARM-software/acle/pull/323/files#diff-516526d4a18101dc85300bc2033d0f86dc46c505b7510a7694baabea851aedfaR5664 ^ (Expand the large main/acle.md diff to see the relevant section)
Re: Proposed CHOST change for the 64bit time_t transition
Bruno Haible writes: > Paul Eggert wrote: >> I'd rather just switch, as Debian has. > > I'd go one step further, and not only > make the ABI transition without changing the canonical triplet, > but also > make gcc and clang define -D_FILE_OFFSET_BITS=64 -D_TIME_BITS=64 > among their predefines. At that point, we should bump SONAME of libc and simply remove 32-bit time support. This would probably be okay generally. Just doing the predefine doesn't really fix anything - all the problems of not being able to detect whether t64 support exists still persist, with no mechanism to prevent mixing. But, in the case we don't do a bump, why not update the tuple? This'd allow easy communication of whether we have 32 bit time to all components of the system, and, in lieu of a better detection mechanism, it'd allow anyone at a glance to look at a hosts tuple and see whether it is compatible with something based on the tuple it was built on. AFAIK the only reason not to do that is cosmetic - and it is ugly, but this is for a dead platform, so I'm not too worried. > Rationale: > > * We want that a user of a distro with the new ABI can build > packages in the usual way: > - ./configure; make; make install (when using Autoconf), or > - make; make install (when there is just a Makefile). > This *requires* that gcc and clang get patched, as indicated > above. > (Only changing Debian-specific files or variables won't do it.) > > * Once this has been done, is there a need for a triplet change? > Not in the toolchain, and not in the packages either. > Needs that have been mentioned in [1][2]: > > - Users would like to know in which ABI they / their distro lives. > This can be done through a property in /etc/os-release. > > - "risks incompatibility with other distributions" [2] > What is the problem? Do we expect users to build binaries > on 32-bit distro X and try to run them on 32-bit distro Y? > Or do we expect binary package distributors (like Mozilla, > videolan.org) to do so? > It was my impression that this approach is doomed anyway, > because so many shared libraries have different major version > in distro X than in distro Y. It is, many do it anyway, and what's worse, many users misguidedly prefer it to better approaches (like the ones you mentioned). > And that such binary package distributors use flatpak, AppImage, etc. > precisely to get out of this dilemma. > > - Building gcc and glibc might need some particular options. > Such options can be documented without requiring a new triplet. > > References: > [1] https://wiki.debian.org/ReleaseGoals/64bit-time > [2] https://wiki.gentoo.org/wiki/Project:Toolchain/time64_migration -- Arsen Arsenović signature.asc Description: PGP signature
Re: Proposed new pass to optimise mode register assignments
On 9/8/24 12:52 AM, Richard Biener wrote: I suppose LCM could be enhanced to handle partial antic and if the edges it speculates on are cold that might even be profitable on less great implementations? Yea, that's not a bad idea at all and suspect it would be useful outside this mode switching context. Jeff
Re: Proposed new pass to optimise mode register assignments
On 9/7/24 7:06 PM, Andrew Carlotti wrote: I forgot to explain how FPMR is used. The FPMR register contains a large number of fields that control the data formats and saturation/scaling behaviour used in various fp8 conversion an multiplication intrinsics. At present, I think there are 2^26 valid defined values that an be used in the FPMR. Furthermore, these values are not always compile-time constants - we expect that devlopers will often reuse the same compiled code (e.g. a matrix multiplication library routine) with different formats or scaling/saturation behaviour selected at runtime (e.g. by passing a parameter to the library routine). (The specification for the FPRM register can be found at [1]. It's usage in fp8 intrinsics is described in the draft ACLE spec at [2].) As I understand it, the existing mode-switching pass infrastructure is built around a small number of modes, where the choice of mode is a compile time constant, and the total number of possible modes is fixed when building GCC. Our usage of the FPMR register does not meet any of these criteria. I don't see how these limitations could be overcome with target hooks within the contraints of the existing pass. In which case you might want to look at how RISC-V is handling vsetvl to set the vector configuration. It's a target specific pass, but comes closer to what you're doing. It's still LCM based, but deals with some of the quirks of the RISC-V vector architecture. Jeff
gcc-15-20240908 is now available
Snapshot gcc-15-20240908 is now available on https://gcc.gnu.org/pub/gcc/snapshots/15-20240908/ and on various mirrors, see https://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 15 git branch with the following options: git://gcc.gnu.org/git/gcc.git branch master revision 1e17a11171dae2bc5d60380bf48c12ef49f59c7f You'll find: gcc-15-20240908.tar.xz Complete GCC SHA256=15eb5261d5ca0f5c3f7125fae9dff58e5c89a1b6c1329abe15d0457baf866d66 SHA1=3d4b1269970b4a90c4e46ff783ce58470c957f5c Diffs from 15-20240901 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-15 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
Re: Proposed CHOST change for the 64bit time_t transition
Arsen Arsenović wrote: Bruno Haible writes: Paul Eggert wrote: I'd rather just switch, as Debian has. I'd go one step further, and not only make the ABI transition without changing the canonical triplet, but also make gcc and clang define -D_FILE_OFFSET_BITS=64 -D_TIME_BITS=64 among their predefines. At that point, we should bump SONAME of libc and simply remove 32-bit time support. This would probably be okay generally. This is probably the best solution to this problem at hand, especially since the old ABI has a definite expiration date about 14 years from now. Bump the libc SONAME major and hope that we can get rid of the last dependencies on the old SONAME before the deadline. We will have 14 years to do it, if that arch is even still used then. [...] But, in the case we don't do a bump, why not update the tuple? This'd allow easy communication of whether we have 32 bit time to all components of the system, and, in lieu of a better detection mechanism, it'd allow anyone at a glance to look at a hosts tuple and see whether it is compatible with something based on the tuple it was built on. This is closely related to the idea I floated a year ago of redefining configuration tuples as lists of tags (with a canonical order) progressively narrowing a broad architecture. Start with CPU architecture and work "narrower" from there. In that system, adding "-t32" and "-t64" to indicate time_t width would be the simple solution. In context then, it was to handle different libc choices on Windows. -- Jacob
Re: #pragma once behavior
I like the sound of resolved path identity from search, including the constructed full path and the index within include paths. if I was writing a compiler from scratch, i'd problem do something like this: SHA2(include-path-search-offset-integer-of-found-header-to-support-include-next) + SHA2(container-relative-constructed-full-path-to-selected-header-during-include-search) + SHA2(selected-header-content) and make it timeless. if you evaluate the search yourself, you have the constructed full path identity while you are searching and you don't need to do tricks to to get a full container root relative path because you already have it during the search. it's a forward search path constructed relative root. what are the problems? it can include the same file twice if it has two path identities (hardlink or symlink) yet that seems the safer option to me because constant search configuration relative identity sounds better. as they could be defined to be different identities wrt #pragma once. it's the semantically safer option. and the canonical example that fails (same content, different path, same mtime) would succeed with this choice. pick holes in this. I might be overlooking something.