Re: On the subject of module consumer diagnostics.
On Tue, Sep 03, 2024 at 16:53:43 +0100, Iain Sandoe wrote: > I think that might be a misunderstanding on the part of the author; > AFAIU both GCC and MSVC _do_ require access to the sources at BMI > consume-time to give decent diagnostics. I think that there might be > confusion because the compilation would suceed on those toolchains > without the sources - but with poorer diagnostic quality? Does this have (additional) implications for caching tools and modules? They cache diagnostic output, but if these other paths showing up or disappearing affects the output, the cache key should incorporate that as well. Should there be a way for such tools to get this information somehow? Ideally the paths would only matter if reported diagnostics *would* look at the files, not just "there's a BMI that mentions a source path X" kind of inspection. --Ben
Re: #pragma once behavior
On Fri, Sep 06, 2024 at 00:03:23 -0500, Jeremy Rifkin wrote: > Hello, > > I'm looking at #pragma once behavior among the major C/C++ compilers as > part of a proposal paper for standardizing #pragma once. (This is > apparently a very controversial topic) > > To put my question up-front: Would GCC ever be open to altering its > #pragma once behavior to bring it more in-line with behavior from other > compilers and possibly more in-line with what users expect? > > To elaborate more: > > Design decisions for #pragma once essentially boil down to a file-based > definitions vs a content-based definition of "same file". > > A file-based definition is easier to reason about and more in-line with > what users expect, however, distinct copies of headers can't be handled > and multiple mount points are problematic. > > A content-based definition works for distinct copies, multiple mount > points, and is completely sufficient 99% of the time, however, it could > potentially break in hard-to-debug ways in a few notable cases (more > information later). > > Currently the three major C/C++ compilers treat #pragma once very differently: > - GCC uses file mtime + file contents > - Clang uses inodes > - MSVC uses file path > > None of the major compilers have documented their #pragma once semantics. > > In practice all three of these approaches work pretty well most of the > time (which is why people feel comfortable using #pragma once). However, > they can each break in their own ways. > > As mentioned earlier, clang and MSVC's file-based definitions of "same > file" break for multiple mount points and multiple copies of the same > header. MSVC's approach breaks for symbolic links and hard links. > > GCC's hybrid approach can break in surprising ways. I have three > examples to share: > > Example 1: > > Consider a scenario such as: > > usr/ > include/ > library_a/ > library_main.hpp > foo.hpp > library_b/ > library_main.hpp > foo.hpp > src/ > main.cpp > > main.cpp: > #include "library_a/library_main.hpp" > #include "library_b/library_main.hpp" > > And both library_main.hpp's have: > #pragma once > #include "foo.hpp" Could a "uses the relative search path" fact be used to mix into the file's identity? This way the `once` key would see "this content looked for things in directory `library_a`" and would see that `library_b/library_main.hpp`, despite the same content (and mtime) is actually a different context and actually perform the inclusions? Of course, this fails if `#include "../common/foo.hpp"` is used in each location as that *would* then want to elide the second inclusion. I don't know how this problem is avoided without actually reading the contents again. But the "I read this file" can remember what relative paths were searched (since the contents are the same at least). > Example 2: > > namespace v1 { > #include "library_v1.hpp" > } > namespace v2 { > #include "library_v2.hpp" > } > > Where both library headers include their own copy of a shared header > using #pragma once. Again, the context of the inclusion matters, so "is wrapped in a scope" can modify the "onceness" (`extern "C"` is probably the more common instance). > Example 3: > > usr/ > include/ > library/ > library.hpp > vendored-dependency.hpp > src/ > main.cpp > vendored-dependency.hpp > > main.cpp: > #include "vendored-dependency.hpp" > #include > > library.hpp: > #pragma once > #include "vendored-dependency.hpp" This is basically the same as Example 1 as far as context goes. Note that context cannot include `#define` state because `#once` is defined to be the first thing in the file and a file that is intended to be included multiple times (e.g., Boost.PP shenanigans) in different states cannot, in good faith, use `#once`. Hrm…though if we are doing `otherdir/samecontent`, the different preprocessor state *might* change that "what relative files did we look for?" state… Nothing is easy :( . --Ben
Re: On the subject of module consumer diagnostics.
On Fri, 2024-09-06 at 08:44 -0400, Ben Boeckel via Gcc wrote: > On Tue, Sep 03, 2024 at 16:53:43 +0100, Iain Sandoe wrote: > > I think that might be a misunderstanding on the part of the author; > > AFAIU both GCC and MSVC _do_ require access to the sources at BMI > > consume-time to give decent diagnostics. I think that there might > > be > > confusion because the compilation would suceed on those toolchains > > without the sources - but with poorer diagnostic quality? > > Does this have (additional) implications for caching tools and > modules? > They cache diagnostic output, but if these other paths showing up or > disappearing affects the output, the cache key should incorporate > that > as well. What kinds of caching tools are you thinking of? I'm curious about caching of diagnostics, and how the diagnostics are represented in the cache. FWIW, SARIF has a way of storing the source associated with a diagnostic (and/or hashes of the source), and GCC's SARIF output uses this to capture the source of any file referred to by path by a diagnostic in the SARIF output (but we don't yet capture hashes of source). Dave > Should there be a way for such tools to get this information > somehow? Ideally the paths would only matter if reported diagnostics > *would* look at the files, not just "there's a BMI that mentions a > source path X" kind of inspection. > > --Ben >
Re: On the subject of module consumer diagnostics.
On Fri, Sep 06, 2024 at 09:30:26 -0400, David Malcolm wrote: > On Fri, 2024-09-06 at 08:44 -0400, Ben Boeckel via Gcc wrote: > > Does this have (additional) implications for caching tools and > > modules? > > They cache diagnostic output, but if these other paths showing up or > > disappearing affects the output, the cache key should incorporate > > that > > as well. > > What kinds of caching tools are you thinking of? `ccache`, `sccache`, etc. These tools try to detect if the compilation would be the same and place the object in its output location and report the cached output on stdout/stderr as performed in the original compile so that it acts "just like the compiler"'s execution. > I'm curious about caching of diagnostics, and how the diagnostics are > represented in the cache. I know `sccache` just stores it as a text blob; `ccache` is probably the same, but I haven't been in its code myself to know. --Ben
Re: On the subject of module consumer diagnostics.
On 9/6/24 9:41 AM, Ben Boeckel wrote: On Fri, Sep 06, 2024 at 09:30:26 -0400, David Malcolm wrote: On Fri, 2024-09-06 at 08:44 -0400, Ben Boeckel via Gcc wrote: Does this have (additional) implications for caching tools and modules? They cache diagnostic output, but if these other paths showing up or disappearing affects the output, the cache key should incorporate that as well. What kinds of caching tools are you thinking of? `ccache`, `sccache`, etc. These tools try to detect if the compilation would be the same and place the object in its output location and report the cached output on stdout/stderr as performed in the original compile so that it acts "just like the compiler"'s execution. I'm curious about caching of diagnostics, and how the diagnostics are represented in the cache. I know `sccache` just stores it as a text blob; `ccache` is probably the same, but I haven't been in its code myself to know. Certainly these tools are complicated when the preprocessor output isn't enough to reproduce the compilation. It might be nice to have some combined preprocessed form of the primary translation unit and any interface units it depends on... Jason
Proposed new pass to optimise mode register assignments
Hi, I'm working on optimising assignments to the AArch64 Floating-point Mode Register (FPMR), as part of our FP8 enablement work. Claudio has already implemented FPMR as a hard register, with the intention that FP8 intrinsic functions will compile to a combination of an fpmr register set, followed by an FP8 operation that takes fpmr as an input operand. It would clearly be inefficient to retain an explicit FPMR assignment prior to each FP8 instruction (especially in the common case where every assignment uses the same FPMR value). I think the best way to optimise this would be to implement a new pass that can optimise assignments to individual hard registers. There are a number of existing passes that do similar optimisations, but which I believe are unsuitable for this scenario for various reasons. For example: - cse1 can already optimise FPMR assignments within an extended basic block, but can't handle broader optimisations. - pre (in gcse.c) doesn't work with assigning constant values, which would miss many potential usages. It also has limits on how far code can be moved, based around ideas of register pressure that don't apply to the context of a single hard register that shouldn't be used by the register allocator for anything else. Additionally, it doesn't run at -Os. - hoist (also using gcse.c) only handles constant values, and only runs when optimising for size. It also has the rest of the issues that pre does. - mode_sw only handles a small finite set of modes. The mode requirements are determined solely by the instructions that require the specific mode, so mode switches don't depend on the output of previous instructions. My intention would be for the new pass to reuse ideas, and hopefully some of the existing code, from the mode-switching and gcse passes. In particular, gcse.c (or it's dependencies) has code that could identify when values assigned to the FPMR are known to be the same (although we may not need the full CSE capabilities of gcse.c), and mode-switching.cc knows how to globally optimise mdoe assignments (and unlike gcse.c, doesn't use cautious heuristics to avoid excessively increasing register pressure). Initially the new pass would only apply to the AArch64 FPMR register, but in future it could also be used for other hard registers with similar properties. Does anyone have any comments on this approach, before I start writing any code? Thanks, Andrew
Re: Proposed CHOST change for the 64bit time_t transition
Paul Eggert writes: > One possible improvement would be to append "t32" if you want 32-bit time_t, > instead of appending "t64" for 64-bit time_t. That way, people wouldn't be > stuck with appending that confusing "t64" for the foreseeable future, and only > specialists concerned with 32-bit time_t would need to know about the issue. But that'd change semantics in non-obvious ways. The intention behind this suggestion is to have a mechanism to communicate to packages and the toolchain alike that "yes, this system is Y2038-proof". There is currently no mechanism to do that. There isn't even a mechanism to guess based on your dependencies whether you should also enable LFS and T64 (and there can't be a general one - you'd need to detect what libraries are doing what if they have time_t or other system integers on ABI boundaries, which is not generally possible). Not that the latter would suffice - even if we changed all packages we can to use such a mechanism, there would be plenty of packages that don't (think of all the hand-rolled makefiles..). An alternative that I pondered was to teach the linker about some notion of "compatibility strings" that it would compare and reject if different, plus teaching the compiler how to emit those, plus teaching glibc to tell the compiler to emit those.. We could have key-value pairs in some section. For each key K, we could have the linker check that, for each (shared or otherwise) object either does not contain K or contains K with the same value as all the other ones, and produce an error otherwise. On the resulting object, the KV pairs would be the union of all KV pairs of all constituent objects. ... but this is for i?86, a CPU family I haven't used in ~15 years (and I suspect many also have not..), and there are other things eating my time. And it'd still require a world rebuild. > Personally, I hope backward-compatibility concerns don't require this sort of > thing. I'd rather just switch, as Debian has. The "status quo" of some packages enabling it of their own volition, and some not, leads to various subtle breakages (example: https://bugs.gentoo.org/828001). I think switching like that would not be much different. I do not know what approach Debian took, but if it is one of altering the toolchain, then this is a sure way to introduce subtle divergences between distros (this is why I've suggested we CC the GCC and binutils MLs); if it is one of teaching debhelper (is that the right tool? not sure) about it, then this will break user-compiled packages (so, ./configure && make && make install, or moral equivalent). If it is to alter libc, then, can we do libc.so.7? ;) The only actually solid approach I see today is to /somehow/ communicate to the system to not use 32-bit time, ever (and consequently, to enable LFS). I think that the "least effort" path to do that is through the tuple. There's precedent for this also, AFAIK, in the 32-bit ARM world (gnueabi/gnueabihf, whatever that means). config.guess would need to be altered a little bit. My preference is for [[ $os = *-*-gnu*t64* ]] informing glibc to completely ignore _FILE_OFFSET_BITS, _LARGEFILE_SOURCE, _LARGEFILE64_SOURCE, and _TIME_BITS and just presume 64 for all of those system integers. This means that config.guess could undef those (in case a toolchain sets those) and include some libc file, then check for sizeof (time_t), or just have glibc define something if on a gnut64 target. > I felt the same way about the 64-bit off_t back in the 1990s. It was obvious > to > me even at the time that we would have been significantly better off making > off_t 64-bit, while keeping 32-bit off_t in the ABI for backward > compatibility; > this is what NetBSD did with time_t in 2012. Although I realize others felt > differently, I never fully understood their concerns. That is history now I fear; I also wish that time_t was made 64-bit a long ago ;) > And here I am, three decades later, still having to make changes[1] to > Autoconf's AC_SYS_LARGEFILE macro to continue to support that 30-year-old > off_t3 > mistake, and now with 64-bit time_t interacting with 64-off_t in > non-orthogonal > ways. Indeed, and the "best" part is that, whatever you do in autoconf, unless a program exists in isolation only interfacing with libc, it will break some consumer (or will be broken by some dependency) because there's no mechanism to signal the time_t size across ABI boundaries. -- Arsen Arsenović signature.asc Description: PGP signature
gcc-13-20240906 is now available
Snapshot gcc-13-20240906 is now available on https://gcc.gnu.org/pub/gcc/snapshots/13-20240906/ and on various mirrors, see https://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 13 git branch with the following options: git://gcc.gnu.org/git/gcc.git branch releases/gcc-13 revision 8ad345a8387a5449eee7223b9b3ab8d68a0e6c2e You'll find: gcc-13-20240906.tar.xz Complete GCC SHA256=f64f8c3e3117250ff6f88926ca4b300b5f09cc5d3c700300e12a58032a02cbef SHA1=5f9ea8677de1ac0cb1b6cf638112e69abfa485f9 Diffs from 13-20240830 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-13 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
Re: Proposed CHOST change for the 64bit time_t transition
Paul Eggert wrote: > I'd rather just switch, as Debian has. I'd go one step further, and not only make the ABI transition without changing the canonical triplet, but also make gcc and clang define -D_FILE_OFFSET_BITS=64 -D_TIME_BITS=64 among their predefines. Rationale: * We want that a user of a distro with the new ABI can build packages in the usual way: - ./configure; make; make install (when using Autoconf), or - make; make install (when there is just a Makefile). This *requires* that gcc and clang get patched, as indicated above. (Only changing Debian-specific files or variables won't do it.) * Once this has been done, is there a need for a triplet change? Not in the toolchain, and not in the packages either. Needs that have been mentioned in [1][2]: - Users would like to know in which ABI they / their distro lives. This can be done through a property in /etc/os-release. - "risks incompatibility with other distributions" [2] What is the problem? Do we expect users to build binaries on 32-bit distro X and try to run them on 32-bit distro Y? Or do we expect binary package distributors (like Mozilla, videolan.org) to do so? It was my impression that this approach is doomed anyway, because so many shared libraries have different major version in distro X than in distro Y. And that such binary package distributors use flatpak, AppImage, etc. precisely to get out of this dilemma. - Building gcc and glibc might need some particular options. Such options can be documented without requiring a new triplet. References: [1] https://wiki.debian.org/ReleaseGoals/64bit-time [2] https://wiki.gentoo.org/wiki/Project:Toolchain/time64_migration
Re: Proposed CHOST change for the 64bit time_t transition
Arsen Arsenović wrote: > An alternative that I pondered was to teach the linker about some notion > of "compatibility strings" that it would compare and reject if > different, plus teaching the compiler how to emit those, plus teaching > glibc to tell the compiler to emit those.. We could have key-value > pairs in some section. For each key K, we could have the linker check > that, for each (shared or otherwise) object either does not contain K or > contains K with the same value as all the other ones, and produce an > error otherwise. On the resulting object, the KV pairs would be the > union of all KV pairs of all constituent objects. This sounds much like the arm eabi attributes: If a .s file does not start with .eabi_attribute 20, 1 .eabi_attribute 21, 1 .eabi_attribute 23, 3 .eabi_attribute 24, 1 .eabi_attribute 25, 1 .eabi_attribute 26, 2 .eabi_attribute 30, 2 .eabi_attribute 34, 0 .eabi_attribute 18, 4 the resulting .o file cannot be linked with other .o files on the system. Is it a hassle even for packages that don't use time_t of off_t (such as GNU libffcall or libffi). Yes, it would be useful to have a way to have the linker warn if a binary that depends on 32-bit time_t and a binary that depends on 64-bit time_t get linked together. But PLEASE implement this in a way that is a no-op when time_t is not used by either of the two binaries. Bruno
Re: #pragma once behavior
Thanks Andrew, I appreciate the context and links. It looks like the prior implementation failed to handle links due to being based on file path, given cpp_simplify_pathname. Do you have thoughts on the use if device ID + inode as a way to also accommodate symbolic links and hard links without the fickleness of mtime? Cheers, Jeremy On Sep 6 2024, at 12:25 am, Andrew Pinski wrote: > On Thu, Sep 5, 2024 at 10:04 PM Jeremy Rifkin wrote: >> >> Hello, >> >> I'm looking at #pragma once behavior among the major C/C++ compilers as >> part of a proposal paper for standardizing #pragma once. (This is >> apparently a very controversial topic) >> >> To put my question up-front: Would GCC ever be open to altering its >> #pragma once behavior to bring it more in-line with behavior from other >> compilers and possibly more in-line with what users expect? >> >> To elaborate more: >> >> Design decisions for #pragma once essentially boil down to a file-based >> definitions vs a content-based definition of "same file". >> >> A file-based definition is easier to reason about and more in-line with >> what users expect, however, distinct copies of headers can't be handled >> and multiple mount points are problematic. >> >> A content-based definition works for distinct copies, multiple mount >> points, and is completely sufficient 99% of the time, however, it could >> potentially break in hard-to-debug ways in a few notable cases (more >> information later). >> >> Currently the three major C/C++ compilers treat #pragma once very >> differently: >> - GCC uses file mtime + file contents >> - Clang uses inodes >> - MSVC uses file path > > See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52566#c2 . > Note this was changed specifically in GCC 3.4 to fix the issue around > symlinks and hard links. > See https://gcc.gnu.org/pipermail/gcc-patches/2003-July/111203.html > for more information on the fixes. > > In fact `#pragma once` was deprecated before GCC 3.4 because it would > do incorrectly what clang and MSVC are doing and that was considered > wrong. > So GCC behavior has been this way before clang was even written. > > Thanks, > Andrew > >> >> None of the major compilers have documented their #pragma once semantics. >> >> In practice all three of these approaches work pretty well most of the >> time (which is why people feel comfortable using #pragma once). However, >> they can each break in their own ways. >> >> As mentioned earlier, clang and MSVC's file-based definitions of "same >> file" break for multiple mount points and multiple copies of the same >> header. MSVC's approach breaks for symbolic links and hard links. >> >> GCC's hybrid approach can break in surprising ways. I have three >> examples to share: >> >> Example 1: >> >> Consider a scenario such as: >> >> usr/ >> include/ >> library_a/ >> library_main.hpp >> foo.hpp >> library_b/ >> library_main.hpp >> foo.hpp >> src/ >> main.cpp >> >> main.cpp: >> #include "library_a/library_main.hpp" >> #include "library_b/library_main.hpp" >> >> And both library_main.hpp's have: >> #pragma once >> #include "foo.hpp" >> >> Example 2: >> >> namespace v1 { >> #include "library_v1.hpp" >> } >> namespace v2 { >> #include "library_v2.hpp" >> } >> >> Where both library headers include their own copy of a shared header >> using #pragma once. >> >> Example 3: >> >> usr/ >> include/ >> library/ >> library.hpp >> vendored-dependency.hpp >> src/ >> main.cpp >> vendored-dependency.hpp >> >> main.cpp: >> #include "vendored-dependency.hpp" >> #include >> >> library.hpp: >> #pragma once >> #include "vendored-dependency.hpp" >> >> Assuming the same contents byte-for-byte of vendored-dependency.hpp, and >> it uses #pragma once. >> >> Each of these examples are plausible scenarios where two files with the >> same contents could be #included. In each example, on GCC, the code can >> work or break based on mtime: >> - Example 1: Breaks if mtimes for library_main.hpp happen to be the same >> - Example 2: Breaks if mtimes for the shared dependency copies happen to >> be the same >> - Example 3: Only works if mtimes are the same >> >> File mtimes can happen to match sometimes, e.g. in a fresh git clone. >> However, this is a rather fickle criteria to rely on and could easily >> diverge in the middle of development. Notably, Example 2 was shared with >> me as an example where #pragma once worked great in development and >> broke in CI. >> >> Additionally, while GCC's approach might be able to handle multiple >> mounts better than other approaches, it can still break under multiple >> mounts if mtime resolution differs. >> >> Obviously there is no silver bullet for making #pragma once work >> perfectly all the time, however, I think it's easier to provide clear >> guarantees for #pragma once behavior when the definition of "same file" >> is based on file identity on d
Re: #pragma once behavior
Thanks Martin, There's some context on N2896 in the meeting minutes: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2941.pdf I think the key thing about N2896 is that it left unqualified #once implementation-defined, which is no better than the current state of affairs. I'm trying to approach this with a focused paper on just providing a clear set of mechanics for #pragma once. I do recognize the uphill battle and legitimate concern about "blessing" an unreliable feature. My thesis is that #pragma once is unreliable today because every implementation approaches it differently and none of them document what they do. While no silver bullet is possible, my hope is that the current state of affairs could at least be made much better by a clear set of semantics. This would enables a more clear understanding of where #pragma once may fall over. I'd love to hear your thoughts. Cheers, Jeremy On Sep 6 2024, at 12:58 am, Martin Uecker wrote: > > There was a recent related proposal for C23. > > https://www9.open-std.org/JTC1/SC22/WG14/www/docs/n2896.htm > > See also the email by Linus Torvalds referenced in > this paper. > > Note that this proposal was not adopted for ISO C23. > I can't find when it was discussed, but IIRC the general > criticism was that the regular form is not reliable and > difficult to standardize any specific rules and that the > form with ID does not add much value over traditional > include guards. > > Martin > > > > Am Freitag, dem 06.09.2024 um 00:03 -0500 schrieb Jeremy Rifkin: >> Hello, >> >> I'm looking at #pragma once behavior among the major C/C++ compilers as >> part of a proposal paper for standardizing #pragma once. (This is >> apparently a very controversial topic) >> >> To put my question up-front: Would GCC ever be open to altering its >> #pragma once behavior to bring it more in-line with behavior from other >> compilers and possibly more in-line with what users expect? >> >> To elaborate more: >> >> Design decisions for #pragma once essentially boil down to a file-based >> definitions vs a content-based definition of "same file". >> >> A file-based definition is easier to reason about and more in-line with >> what users expect, however, distinct copies of headers can't be handled >> and multiple mount points are problematic. >> >> A content-based definition works for distinct copies, multiple mount >> points, and is completely sufficient 99% of the time, however, it could >> potentially break in hard-to-debug ways in a few notable cases (more >> information later). >> >> Currently the three major C/C++ compilers treat #pragma once very >> differently: >> - GCC uses file mtime + file contents >> - Clang uses inodes >> - MSVC uses file path >> >> None of the major compilers have documented their #pragma once semantics. >> >> In practice all three of these approaches work pretty well most of the >> time (which is why people feel comfortable using #pragma once). However, >> they can each break in their own ways. >> >> As mentioned earlier, clang and MSVC's file-based definitions of "same >> file" break for multiple mount points and multiple copies of the same >> header. MSVC's approach breaks for symbolic links and hard links. >> >> GCC's hybrid approach can break in surprising ways. I have three >> examples to share: >> >> Example 1: >> >> Consider a scenario such as: >> >> usr/ >> include/ >> library_a/ >> library_main.hpp >> foo.hpp >> library_b/ >> library_main.hpp >> foo.hpp >> src/ >> main.cpp >> >> main.cpp: >> #include "library_a/library_main.hpp" >> #include "library_b/library_main.hpp" >> >> And both library_main.hpp's have: >> #pragma once >> #include "foo.hpp" >> >> Example 2: >> >> namespace v1 { >> #include "library_v1.hpp" >> } >> namespace v2 { >> #include "library_v2.hpp" >> } >> >> Where both library headers include their own copy of a shared header >> using #pragma once. >> >> Example 3: >> >> usr/ >> include/ >> library/ >> library.hpp >> vendored-dependency.hpp >> src/ >> main.cpp >> vendored-dependency.hpp >> >> main.cpp: >> #include "vendored-dependency.hpp" >> #include >> >> library.hpp: >> #pragma once >> #include "vendored-dependency.hpp" >> >> Assuming the same contents byte-for-byte of vendored-dependency.hpp, and >> it uses #pragma once. >> >> Each of these examples are plausible scenarios where two files with the >> same contents could be #included. In each example, on GCC, the code can >> work or break based on mtime: >> - Example 1: Breaks if mtimes for library_main.hpp happen to be the same >> - Example 2: Breaks if mtimes for the shared dependency copies happen to >> be the same >> - Example 3: Only works if mtimes are the same >> >> File mtimes can happen to match sometimes, e.g. in a fresh git clone. >> However, this is a rather fickle criteria to rely on and could easily >> diverge in the middle of development. Notably, Example 2 was s
Re: #pragma once behavior
On Fri, Sep 6, 2024 at 5:49 PM Jeremy Rifkin wrote: > > Thanks Andrew, I appreciate the context and links. It looks like the > prior implementation failed to handle links due to being based on file > path, given cpp_simplify_pathname. Do you have thoughts on the use if > device ID + inode as a way to also accommodate symbolic links and hard > links without the fickleness of mtime? Not always. because inodes are not always stable on some file systems. And also does not work with multi-mounted devices too. The whole definition of what is the same file is really up for debate here. I say if the file has the same content, then it is the same file and GCC uses that definition. While clang says it is based on if it is the same inode which is not always true because of file systems which don't use an inode number. While MSVC says it is based on the path but what is the canonical path to a file, is a hard link to the same file the same file or not; what about symbolic links? How about overlays and mounted directories are they the same then? GCC definition is the only one which supports all issues described here dealing with inodes (sometimes being non-stable), canonical paths and both kinds of links and even re-mounted file systems. What does the other implementations say about changing their definition of what "the same file is"? Have you asked clang and MSVC folks? Anyways GCC has an optimization already for #ifdef/#define/#endif (and that is documented here: https://gcc.gnu.org/onlinedocs/cppinternals/Guard-Macros.html) so does it make sense to really standardize `#pramga once` here or just push other implementations to add a similar optimization instead? Thanks, Andrew Pinski > > Cheers, > Jeremy > > On Sep 6 2024, at 12:25 am, Andrew Pinski wrote: > > > On Thu, Sep 5, 2024 at 10:04 PM Jeremy Rifkin wrote: > >> > >> Hello, > >> > >> I'm looking at #pragma once behavior among the major C/C++ compilers as > >> part of a proposal paper for standardizing #pragma once. (This is > >> apparently a very controversial topic) > >> > >> To put my question up-front: Would GCC ever be open to altering its > >> #pragma once behavior to bring it more in-line with behavior from other > >> compilers and possibly more in-line with what users expect? > >> > >> To elaborate more: > >> > >> Design decisions for #pragma once essentially boil down to a file-based > >> definitions vs a content-based definition of "same file". > >> > >> A file-based definition is easier to reason about and more in-line with > >> what users expect, however, distinct copies of headers can't be handled > >> and multiple mount points are problematic. > >> > >> A content-based definition works for distinct copies, multiple mount > >> points, and is completely sufficient 99% of the time, however, it could > >> potentially break in hard-to-debug ways in a few notable cases (more > >> information later). > >> > >> Currently the three major C/C++ compilers treat #pragma once very > >> differently: > >> - GCC uses file mtime + file contents > >> - Clang uses inodes > >> - MSVC uses file path > > > > See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52566#c2 . > > Note this was changed specifically in GCC 3.4 to fix the issue around > > symlinks and hard links. > > See https://gcc.gnu.org/pipermail/gcc-patches/2003-July/111203.html > > for more information on the fixes. > > > > In fact `#pragma once` was deprecated before GCC 3.4 because it would > > do incorrectly what clang and MSVC are doing and that was considered > > wrong. > > So GCC behavior has been this way before clang was even written. > > > > Thanks, > > Andrew > > > >> > >> None of the major compilers have documented their #pragma once semantics. > >> > >> In practice all three of these approaches work pretty well most of the > >> time (which is why people feel comfortable using #pragma once). However, > >> they can each break in their own ways. > >> > >> As mentioned earlier, clang and MSVC's file-based definitions of "same > >> file" break for multiple mount points and multiple copies of the same > >> header. MSVC's approach breaks for symbolic links and hard links. > >> > >> GCC's hybrid approach can break in surprising ways. I have three > >> examples to share: > >> > >> Example 1: > >> > >> Consider a scenario such as: > >> > >> usr/ > >> include/ > >> library_a/ > >> library_main.hpp > >> foo.hpp > >> library_b/ > >> library_main.hpp > >> foo.hpp > >> src/ > >> main.cpp > >> > >> main.cpp: > >> #include "library_a/library_main.hpp" > >> #include "library_b/library_main.hpp" > >> > >> And both library_main.hpp's have: > >> #pragma once > >> #include "foo.hpp" > >> > >> Example 2: > >> > >> namespace v1 { > >> #include "library_v1.hpp" > >> } > >> namespace v2 { > >> #include "library_v2.hpp" > >> } > >> > >> Where both library headers include their own copy of a shared header > >> using #pragma once. > >> > >> Example 3: > >> > >> usr/ > >> i
Re: #pragma once behavior
> Could a "uses the relative search path" fact be used to mix into the > file's identity? This way the `once` key would see "this content looked > for things in directory `library_a`" and would see that > `library_b/library_main.hpp`, despite the same content (and mtime) is > actually a different context and actually perform the inclusions? I think I see what you're getting at. I am having a hard time imagining an implementation that doesn't lead to a lot of complexity and I think handling hard links would also be out of the question. Jeremy On Sep 6 2024, at 8:26 am, Ben Boeckel wrote: > On Fri, Sep 06, 2024 at 00:03:23 -0500, Jeremy Rifkin wrote: >> Hello, >> >> I'm looking at #pragma once behavior among the major C/C++ compilers as >> part of a proposal paper for standardizing #pragma once. (This is >> apparently a very controversial topic) >> >> To put my question up-front: Would GCC ever be open to altering its >> #pragma once behavior to bring it more in-line with behavior from other >> compilers and possibly more in-line with what users expect? >> >> To elaborate more: >> >> Design decisions for #pragma once essentially boil down to a file-based >> definitions vs a content-based definition of "same file". >> >> A file-based definition is easier to reason about and more in-line with >> what users expect, however, distinct copies of headers can't be handled >> and multiple mount points are problematic. >> >> A content-based definition works for distinct copies, multiple mount >> points, and is completely sufficient 99% of the time, however, it could >> potentially break in hard-to-debug ways in a few notable cases (more >> information later). >> >> Currently the three major C/C++ compilers treat #pragma once very >> differently: >> - GCC uses file mtime + file contents >> - Clang uses inodes >> - MSVC uses file path >> >> None of the major compilers have documented their #pragma once semantics. >> >> In practice all three of these approaches work pretty well most of the >> time (which is why people feel comfortable using #pragma once). However, >> they can each break in their own ways. >> >> As mentioned earlier, clang and MSVC's file-based definitions of "same >> file" break for multiple mount points and multiple copies of the same >> header. MSVC's approach breaks for symbolic links and hard links. >> >> GCC's hybrid approach can break in surprising ways. I have three >> examples to share: >> >> Example 1: >> >> Consider a scenario such as: >> >> usr/ >> include/ >> library_a/ >> library_main.hpp >> foo.hpp >> library_b/ >> library_main.hpp >> foo.hpp >> src/ >> main.cpp >> >> main.cpp: >> #include "library_a/library_main.hpp" >> #include "library_b/library_main.hpp" >> >> And both library_main.hpp's have: >> #pragma once >> #include "foo.hpp" > > Could a "uses the relative search path" fact be used to mix into the > file's identity? This way the `once` key would see "this content looked > for things in directory `library_a`" and would see that > `library_b/library_main.hpp`, despite the same content (and mtime) is > actually a different context and actually perform the inclusions? > > Of course, this fails if `#include "../common/foo.hpp"` is used in each > location as that *would* then want to elide the second inclusion. I > don't know how this problem is avoided without actually reading the > contents again. But the "I read this file" can remember what relative > paths were searched (since the contents are the same at least). > >> Example 2: >> >> namespace v1 { >> #include "library_v1.hpp" >> } >> namespace v2 { >> #include "library_v2.hpp" >> } >> >> Where both library headers include their own copy of a shared header >> using #pragma once. > > Again, the context of the inclusion matters, so "is wrapped in a scope" > can modify the "onceness" (`extern "C"` is probably the more common > instance). > >> Example 3: >> >> usr/ >> include/ >> library/ >> library.hpp >> vendored-dependency.hpp >> src/ >> main.cpp >> vendored-dependency.hpp >> >> main.cpp: >> #include "vendored-dependency.hpp" >> #include >> >> library.hpp: >> #pragma once >> #include "vendored-dependency.hpp" > > This is basically the same as Example 1 as far as context goes. > > Note that context cannot include `#define` state because `#once` is > defined to be the first thing in the file and a file that is intended to > be included multiple times (e.g., Boost.PP shenanigans) in different > states cannot, in good faith, use `#once`. > > Hrm…though if we are doing `otherdir/samecontent`, the different > preprocessor state *might* change that "what relative files did we look > for?" state… Nothing is easy :( . > > --Ben >
Re: #pragma once behavior
Hi Andrew, Thanks for the thoughts and quick reply. > Not always. because inodes are not always stable on some file systems. > And also does not work with multi-mounted devices too. Unusual filesystems and multiple mounts are indeed the failing. As I mentioned, there's no silver bullet; they each have pitfalls. I do, however, think this is a less surprising failure mode than GCC's which rears its head in surprising and inconsistent cases. > I say if the file has the same content, then it is the same file and > GCC uses that definition. GCC doesn't use this definition, really. It's relying primarily on the mtime check and only falling back to contents in case of collision. The point on same contents contents == same file is well received. When I wrote the first draft of my paper I wrote it proposing this, however, I have become convinced this isn't the right approach based on examples where you could intend to include two files with the same contents that actually mean different things (such as Example 1). GCC's approach is hybrid, half relying on something from the filesystem and half relying on the contents. As far as I can tell this can lead to a worst of both worlds. > GCC definition is the only one which supports all issues described > here dealing with inodes (sometimes being non-stable), canonical paths > and both kinds of links and even re-mounted file systems. I'd initially been thinking of a content-based solution in order to avoid any filesystem reliance and support multiple mounts etc. The problem currently is even GCC's approach, which has the best chance of working on multiple mounts, doesn't work consistently due to potential differences in mtime resolution. > What does the other implementations say about changing their > definition of what "the same file is"? Have you asked clang and MSVC > folks? I've not yet asked. If I proceed with a proposal paper what I'll most likely be proposing is what Clang does, worded in terms of same device same location. I started here since GCC's approach is least similar to that than what MSVC does. It's also easier to reach out to developers on open source projects. Thanks, Jeremy On Sep 6 2024, at 8:16 pm, Andrew Pinski wrote: > On Fri, Sep 6, 2024 at 5:49 PM Jeremy Rifkin wrote: >> >> Thanks Andrew, I appreciate the context and links. It looks like the >> prior implementation failed to handle links due to being based on file >> path, given cpp_simplify_pathname. Do you have thoughts on the use if >> device ID + inode as a way to also accommodate symbolic links and hard >> links without the fickleness of mtime? > > Not always. because inodes are not always stable on some file systems. > And also does not work with multi-mounted devices too. > The whole definition of what is the same file is really up for debate here. > I say if the file has the same content, then it is the same file and > GCC uses that definition. While clang says it is based on if it is the > same inode which is not always true because of file systems which > don't use an inode number. While MSVC says it is based on the path but > what is the canonical path to a file, is a hard link to the same file > the same file or not; what about symbolic links? How about overlays > and mounted directories are they the same then? > GCC definition is the only one which supports all issues described > here dealing with inodes (sometimes being non-stable), canonical paths > and both kinds of links and even re-mounted file systems. > > What does the other implementations say about changing their > definition of what "the same file is"? Have you asked clang and MSVC > folks? > Anyways GCC has an optimization already for #ifdef/#define/#endif (and > that is documented here: > https://gcc.gnu.org/onlinedocs/cppinternals/Guard-Macros.html) so does > it make sense to really standardize `#pramga once` here or just push > other implementations to add a similar optimization instead? > > Thanks, > Andrew Pinski > >> >> Cheers, >> Jeremy >> >> On Sep 6 2024, at 12:25 am, Andrew Pinski wrote: >> >> > On Thu, Sep 5, 2024 at 10:04 PM Jeremy Rifkin wrote: >> >> >> >> Hello, >> >> >> >> I'm looking at #pragma once behavior among the major C/C++ >> compilers as >> >> part of a proposal paper for standardizing #pragma once. (This is >> >> apparently a very controversial topic) >> >> >> >> To put my question up-front: Would GCC ever be open to altering its >> >> #pragma once behavior to bring it more in-line with behavior from other >> >> compilers and possibly more in-line with what users expect? >> >> >> >> To elaborate more: >> >> >> >> Design decisions for #pragma once essentially boil down to a file-based >> >> definitions vs a content-based definition of "same file". >> >> >> >> A file-based definition is easier to reason about and more in-line with >> >> what users expect, however, distinct copies of headers can't be handled >> >> and multiple mount points are problematic. >> >> >>
Re: #pragma once behavior
On Fri, Sep 6, 2024, 7:42 PM Jeremy Rifkin wrote: > Hi Andrew, > Thanks for the thoughts and quick reply. > > > Not always. because inodes are not always stable on some file systems. > > And also does not work with multi-mounted devices too. > > Unusual filesystems and multiple mounts are indeed the failing. As I > mentioned, there's no silver bullet; they each have pitfalls. I do, > however, think this is a less surprising failure mode than GCC's which > rears its head in surprising and inconsistent cases. > > > I say if the file has the same content, then it is the same file and > > GCC uses that definition. > > GCC doesn't use this definition, really. It's relying primarily on the > mtime check and only falling back to contents in case of collision. > > The point on same contents contents == same file is well received. When > I wrote the first draft of my paper I wrote it proposing this, however, > I have become convinced this isn't the right approach based on examples > where you could intend to include two files with the same contents that > actually mean different things (such as Example 1). > > GCC's approach is hybrid, half relying on something from the filesystem > and half relying on the contents. As far as I can tell this can lead to > a worst of both worlds. > > > GCC definition is the only one which supports all issues described > > here dealing with inodes (sometimes being non-stable), canonical paths > > and both kinds of links and even re-mounted file systems. > > I'd initially been thinking of a content-based solution in order to > avoid any filesystem reliance and support multiple mounts etc. The > problem currently is even GCC's approach, which has the best chance of > working on multiple mounts, doesn't work consistently due to potential > differences in mtime resolution. > > > What does the other implementations say about changing their > > definition of what "the same file is"? Have you asked clang and MSVC > > folks? > > I've not yet asked. If I proceed with a proposal paper what I'll most > likely be proposing is what Clang does, worded in terms of same device > same location. I started here since GCC's approach is least similar to > that than what MSVC does. It's also easier to reach out to developers on > open source projects. > Except the clang solution does not work for some file systems and is broken when used on them. Maybe those file systems are not in use as they once were and that is why clang didn't run into folks asking to fix it. Early 2000s vs now have a different landscape when it comes to file systems. This is why I said what is the a same file if you can't rely on inodes working? Thanks, Andrew > > Thanks, > Jeremy > > > On Sep 6 2024, at 8:16 pm, Andrew Pinski wrote: > > > On Fri, Sep 6, 2024 at 5:49 PM Jeremy Rifkin wrote: > >> > >> Thanks Andrew, I appreciate the context and links. It looks like the > >> prior implementation failed to handle links due to being based on file > >> path, given cpp_simplify_pathname. Do you have thoughts on the use if > >> device ID + inode as a way to also accommodate symbolic links and hard > >> links without the fickleness of mtime? > > > > Not always. because inodes are not always stable on some file systems. > > And also does not work with multi-mounted devices too. > > The whole definition of what is the same file is really up for debate > here. > > I say if the file has the same content, then it is the same file and > > GCC uses that definition. While clang says it is based on if it is the > > same inode which is not always true because of file systems which > > don't use an inode number. While MSVC says it is based on the path but > > what is the canonical path to a file, is a hard link to the same file > > the same file or not; what about symbolic links? How about overlays > > and mounted directories are they the same then? > > GCC definition is the only one which supports all issues described > > here dealing with inodes (sometimes being non-stable), canonical paths > > and both kinds of links and even re-mounted file systems. > > > > What does the other implementations say about changing their > > definition of what "the same file is"? Have you asked clang and MSVC > > folks? > > Anyways GCC has an optimization already for #ifdef/#define/#endif (and > > that is documented here: > > https://gcc.gnu.org/onlinedocs/cppinternals/Guard-Macros.html) so does > > it make sense to really standardize `#pramga once` here or just push > > other implementations to add a similar optimization instead? > > > > Thanks, > > Andrew Pinski > > > >> > >> Cheers, > >> Jeremy > >> > >> On Sep 6 2024, at 12:25 am, Andrew Pinski wrote: > >> > >> > On Thu, Sep 5, 2024 at 10:04 PM Jeremy Rifkin > wrote: > >> >> > >> >> Hello, > >> >> > >> >> I'm looking at #pragma once behavior among the major C/C++ > >> compilers as > >> >> part of a proposal paper for standardizing #pragma once. (This is > >> >> apparently a very controversial topic) >
Re: #pragma once behavior
> This is why I said what is the a same file if you can't rely on inodes > working? I don't have a good answer for such a case. Of course, no matter how one approaches #pragma once there will be cases that aren't handled. The criteria to optimize for, imo, is which has the most clear failure mode. Contents happening match could occur naturally without realizing, which is hard to triage. Mtimes colliding could easily happen without realizing, which is also hard to triage and reproduce. Path issues pop up as real build systems use links. Mtime can fail on multiple mounts, path certainly will. In my opinion, the failure modes for contents and mtime are very sub-ideal. Path isn't adequate, it seems clear supporting links is an important goal. To level-set: I don't think it's reasonable to expect #pragma once to handle multiple distinct copies of the same file. Especially given that contents isn't an option. The failure mode of inodes, however, is a lot clearer. It breaks with things like multiple mounts and filesystems that don't have inodes. The way I see it, advice to users becomes clear since it's much clearer exactly how and why #pragma once might break. > Early 2000s vs now have a different landscape when it comes to file systems. Given the landscape today, could it make sense to re-evaluate mtime + content? Cheers Jeremy On Sep 6 2024, at 10:29 pm, Andrew Pinski wrote: >> On Fri, Sep 6, 2024, 7:42 PM Jeremy Rifkin wrote: >> >>> Hi Andrew, >>> Thanks for the thoughts and quick reply. >>> Not always. because inodes are not always stable on some file systems. And also does not work with multi-mounted devices too. >>> >>> Unusual filesystems and multiple mounts are indeed the failing. As I >>> mentioned, there's no silver bullet; they each have pitfalls. I do, >>> however, think this is a less surprising failure mode than GCC's which >>> rears its head in surprising and inconsistent cases. >>> I say if the file has the same content, then it is the same file and GCC uses that definition. >>> >>> GCC doesn't use this definition, really. It's relying primarily on the >>> mtime check and only falling back to contents in case of collision. >>> >>> The point on same contents contents == same file is well received. When >>> I wrote the first draft of my paper I wrote it proposing this, however, >>> I have become convinced this isn't the right approach based on examples >>> where you could intend to include two files with the same contents that >>> actually mean different things (such as Example 1). >>> >>> GCC's approach is hybrid, half relying on something from the filesystem >>> and half relying on the contents. As far as I can tell this can lead to >>> a worst of both worlds. >>> GCC definition is the only one which supports all issues described here dealing with inodes (sometimes being non-stable), canonical paths and both kinds of links and even re-mounted file systems. >>> >>> I'd initially been thinking of a content-based solution in order to >>> avoid any filesystem reliance and support multiple mounts etc. The >>> problem currently is even GCC's approach, which has the best chance of >>> working on multiple mounts, doesn't work consistently due to potential >>> differences in mtime resolution. >>> What does the other implementations say about changing their definition of what "the same file is"? Have you asked clang and MSVC folks? >>> >>> I've not yet asked. If I proceed with a proposal paper what I'll most >>> likely be proposing is what Clang does, worded in terms of same device >>> same location. I started here since GCC's approach is least similar to >>> that than what MSVC does. It's also easier to reach out to >>> developers on >>> open source projects. > > Except the clang solution does not work for some file systems and is > broken when used on them. Maybe those file systems are not in use as > they once were and that is why clang didn't run into folks asking to > fix it. > > Early 2000s vs now have a different landscape when it comes to file > systems. This is why I said what is the a same file if you can't rely > on inodes working? > > Thanks, > Andrew > > > > >> >>> >>> >>> Thanks, >>> Jeremy >>> >>> >>> On Sep 6 2024, at 8:16 pm, Andrew Pinski wrote: >>> On Fri, Sep 6, 2024 at 5:49 PM Jeremy Rifkin wrote: > > Thanks Andrew, I appreciate the context and links. It looks like the > prior implementation failed to handle links due to being based on file > path, given cpp_simplify_pathname. Do you have thoughts on the use if > device ID + inode as a way to also accommodate symbolic links and hard > links without the fickleness of mtime? Not always. because inodes are not always stable on some file systems. And also does not work with multi-mounted devices too. The whole definition of what is the same file is really up for debate