[PATCH RESEND 1/1] p1689r5: initial support
This patch implements support for [P1689R5][] to communicate to a build system the C++20 module dependencies to build systems so that they may build `.gcm` files in the proper order. Support is communicated through the following three new flags: - `-fdeps-format=` specifies the format for the output. Currently named `p1689r5`. - `-fdeps-file=` specifies the path to the file to write the format to. - `-fdep-output=` specifies the `.o` that will be written for the TU that is scanned. This is required so that the build system can correlate the dependency output with the actual compilation that will occur. CMake supports this format as of 17 Jun 2022 (to be part of 3.25.0) using an experimental feature selection (to allow for future usage evolution without committing to how it works today). While it remains experimental, docs may be found in CMake's documentation for experimental features. Future work may include using this format for Fortran module dependencies as well, however this is still pending work. [P1689R5]: https://isocpp.org/files/papers/P1689R5.html [cmake-experimental]: https://gitlab.kitware.com/cmake/cmake/-/blob/master/Help/dev/experimental.rst TODO: - header-unit information fields Header units (including the standard library headers) are 100% unsupported right now because the `-E` mechanism wants to import their BMIs. A new mode (i.e., something more workable than existing `-E` behavior) that mocks up header units as if they were imported purely from their path and content would be required. - non-utf8 paths The current standard says that paths that are not unambiguously represented using UTF-8 are not supported (because these cases are rare and the extra complication is not worth it at this time). Future versions of the format might have ways of encoding non-UTF-8 paths. For now, this patch just doesn't support non-UTF-8 paths (ignoring the "unambiguously represetable in UTF-8" case). - figure out why junk gets placed at the end of the file Sometimes it seems like the file gets a lot of `NUL` bytes appended to it. It happens rarely and seems to be the result of some `ftruncate`-style call which results in extra padding in the contents. Noting it here as an observation at least. Signed-off-by: Ben Boeckel --- gcc/ChangeLog | 9 ++ gcc/c-family/ChangeLog | 6 + gcc/c-family/c-opts.cc | 40 ++- gcc/c-family/c.opt | 12 ++ gcc/cp/ChangeLog| 5 + gcc/cp/module.cc| 3 +- gcc/doc/invoke.texi | 15 +++ gcc/fortran/ChangeLog | 5 + gcc/fortran/cpp.cc | 4 +- gcc/genmatch.cc | 2 +- gcc/input.cc| 4 +- libcpp/ChangeLog| 11 ++ libcpp/include/cpplib.h | 12 +- libcpp/include/mkdeps.h | 17 ++- libcpp/init.cc | 14 ++- libcpp/mkdeps.cc| 235 ++-- 16 files changed, 368 insertions(+), 26 deletions(-) diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 6dded16c0e3..2d61de6adde 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,12 @@ +2022-09-20 Ben Boeckel + + * doc/invoke.texi: Document -fdeps-format=, -fdep-file=, and + -fdep-output= flags. + * genmatch.cc (main): Add new preprocessor parameter used for C++ + module tracking. + * input.cc (test_lexer): Add new preprocessor parameter used for C++ + module tracking. + 2022-09-19 Torbjörn SVENSSON * targhooks.cc (default_zero_call_used_regs): Improve sorry diff --git a/gcc/c-family/ChangeLog b/gcc/c-family/ChangeLog index ba3d76dd6cb..569dcd96e8c 100644 --- a/gcc/c-family/ChangeLog +++ b/gcc/c-family/ChangeLog @@ -1,3 +1,9 @@ +2022-09-20 Ben Boeckel + + * c-opts.cc (c_common_handle_option): Add fdeps_file variable and + -fdeps-format=, -fdep-file=, and -fdep-output= parsing. + * c.opt: Add -fdeps-format=, -fdep-file=, and -fdep-output= flags. + 2022-09-15 Richard Biener * c-common.h (build_void_list_node): Remove. diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc index babaa2fc157..617d0e93696 100644 --- a/gcc/c-family/c-opts.cc +++ b/gcc/c-family/c-opts.cc @@ -77,6 +77,9 @@ static bool verbose; /* Dependency output file. */ static const char *deps_file; +/* Enhanced dependency output file. */ +static const char *fdeps_file; + /* The prefix given by -iprefix, if any. */ static const char *iprefix; @@ -360,6 +363,23 @@ c_common_handle_option (size_t scode, const char *arg, HOST_WIDE_INT value, deps_file = arg; break; +case OPT_fdep_format_: + if (!strcmp (arg, "p1689r5")) + cpp_opts->deps.format = DEPS_FMT_P1689R5; + else + error ("%<-fdep-format=%> unknown format %s", arg); + break; + +case OPT_fdep_file_: + deps_seen = true; + fdeps_file = arg; + break; + +case OPT_fdep_output_: + deps_seen = true; + defer_opt (code, arg); + b
[PATCH RESEND 0/1] RFC: P1689R5 support
This patch adds initial support for ISO C++'s [P1689R5][], a format for describing C++ module requirements and provisions based on the source code. This is required because compiling C++ with modules is not embarrassingly parallel and need to be ordered to ensure that `import some_module;` can be satisfied in time by making sure that the TU with `export import some_module;` is compiled first. [P1689R5]: https://isocpp.org/files/papers/P1689R5.html I'd like feedback on the approach taken here with respect to the user-visible flags. I'll also note that header units are not supported at this time because the current `-E` behavior with respect to `import ;` is to search for an appropriate `.gcm` file which is not something such a "scan" can support. A new mode will likely need to be created (e.g., replacing `-E` with `-fc++-module-scanning` or something) where headers are looked up "normally" and processed only as much as scanning requires. Testing is currently happening in CMake's CI using a prior revision of this patch (the differences are basically the changelog, some style, and `trtbd` instead of `p1689r5` as the format name). For testing within GCC, I'll work on the following: - scanning non-module source - scanning module-importing source (`import X;`) - scanning module-exporting source (`export module X;`) - scanning module implementation unit (`module X;`) - flag combinations? Are there existing tools for handling JSON output for testing purposes? Basically, something that I can add to the test suite that doesn't care about whitespace, but checks the structure (with sensible replacements for absolute paths where relevant)? For the record, Clang has patches with similar flags and behavior by Chuanqi Xu here: https://reviews.llvm.org/D134269 with the same flags (though using my old `trtbd` spelling for the format name). Thanks, --Ben Ben Boeckel (1): p1689r5: initial support gcc/ChangeLog | 9 ++ gcc/c-family/ChangeLog | 6 + gcc/c-family/c-opts.cc | 40 ++- gcc/c-family/c.opt | 12 ++ gcc/cp/ChangeLog| 5 + gcc/cp/module.cc| 3 +- gcc/doc/invoke.texi | 15 +++ gcc/fortran/ChangeLog | 5 + gcc/fortran/cpp.cc | 4 +- gcc/genmatch.cc | 2 +- gcc/input.cc| 4 +- libcpp/ChangeLog| 11 ++ libcpp/include/cpplib.h | 12 +- libcpp/include/mkdeps.h | 17 ++- libcpp/init.cc | 14 ++- libcpp/mkdeps.cc| 235 ++-- 16 files changed, 368 insertions(+), 26 deletions(-) base-commit: d812e8cb2a920fd75768e16ca8ded59ad93c172f -- 2.37.3
[PATCH 1/1] libcpp: allow UCS_LIMIT codepoints in UTF-8 strings
libcpp/ * charset.cc: Allow `UCS_LIMIT` in UTF-8 strings. Reported-by: Damien Guibouret Fixes: c1dbaa6656a (libcpp: reject codepoints above 0x10, 2023-06-06) Signed-off-by: Ben Boeckel --- libcpp/charset.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libcpp/charset.cc b/libcpp/charset.cc index d4f573e365f..54ebab2b8a4 100644 --- a/libcpp/charset.cc +++ b/libcpp/charset.cc @@ -1891,7 +1891,7 @@ cpp_valid_utf8_p (const char *buffer, size_t num_bytes) invalid because they cannot be represented in UTF-16. Reject such values.*/ - if (cp >= UCS_LIMIT) + if (cp > UCS_LIMIT) return false; } /* No problems encountered. */ -- 2.40.1
[modules] Preprocessing requires compiled header unit modules
Hi, I'm looking to update my patch which implements P1689[1] for GCC. I originally wrote this back in February 2019 at Kona for a proof of concept[2], but never really got the time to delve back in until now. I'd like to get header units implemented prior to submitting it as it will be an important part in the long run (as I don't think having a partial implementation in the wild would be beneficial). Now that modules have been merged, my patch works just fine for named modules. However, for header unit modules, it runs into a problem that imported header units are required to be compiled and available in the mapper while scanning for dependencies. Example code: ```c++ # use-header.cpp module; import "header-unit.hpp"; int main(int argc, char* argv[]) { return good; } ``` ```c++ # header-unit.hpp constexpr int good = 0; ``` When trying to preprocess this: ```console % g++ -std=c++20 -fmodules-ts -E use-header.cpp -o use-header.cpp.i In module imported at use-header.cpp:3:1: ./header-unit.hpp: error: failed to read compiled module: No such file or directory ./header-unit.hpp: note: compiled module file is ‘gcm.cache/,/header-unit.hpp.gcm’ ./header-unit.hpp: note: imports must be built before being imported ./header-unit.hpp: fatal error: returning to the gate for a mechanical issue compilation terminated ``` There used to be no need to do this back prior to the modules landing in `master`, but I can see this being an oversight in the meantime. I am not familiar enough with the code to know how to untangle its current state (attempting to bisect the change throws me back in to the depths of the history without modules so "does it work" is unanswerable there). I tried this on commits: 4efde6781bba8d64b9dcff07e7efe71d35aa6f6a c++: Modules Is Landing f4ed267fa5b82d6dafbc8afc82baf45bfcae549c master as of the 23rd Thanks, --Ben [1] https://wg21.link/p1689r4 [2] https://github.com/mathstuf/cxx-modules-sandbox/ (see the `docker` branch for the instructions on putting things together)
Re: [modules] Preprocessing requires compiled header unit modules
On Thu, Apr 21, 2022 at 06:05:52 +0200, Boris Kolpackov wrote: > I don't think it is. A header unit (unlike a named module) may export > macros which could affect further dependencies. Consider: > > import "header-unit.hpp"; // May or may not export macro FOO. > > #ifdef FOO > import "header-unit2.hpp" > #endif I agree that the header needs to be *found*, but scanning cannot require a pre-existing BMI for that header. A new mode likely needs to be laid down to get the information necessary (instead of just piggy-backing on `-E` behavior to get what I want). --Ben
Re: [modules] Preprocessing requires compiled header unit modules
On Thu, Apr 21, 2022 at 18:59:56 +0100, Iain Sandoe wrote: > Hi Ben, > > > On 21 Apr 2022, at 13:05, Ben Boeckel via Gcc wrote: > > > > On Thu, Apr 21, 2022 at 06:05:52 +0200, Boris Kolpackov wrote: > >> I don't think it is. A header unit (unlike a named module) may export > >> macros which could affect further dependencies. Consider: > >> > >> import "header-unit.hpp"; // May or may not export macro FOO. > > 1. If you know how this was built, then you could do an -E -fdirectives-only > build (both >GCC and clang support this now) to obtain the macros. My understanding is that how it gets used determines how it should be made for Clang (because the consumer's `-D`, `-W`, etc. flags matter). I do not yet know how I am to support this in CMake. > 2. I suppose we could invent a tool (or FE mode) to dump the macros exported > by a HU *** Fun considerations: - are `-D` flags exported? `-U`? - how about this if `value` is the same as or different from the at-start expansion: ```c++ #undef SOME_MACRO #define SOME_MACRO value ``` - how about `#undef FOO`? > >> #ifdef FOO > >> import "header-unit2.hpp" > >> #endif > > > > I agree that the header needs to be *found*, but scanning cannot require > > a pre-existing BMI for that header. A new mode likely needs to be laid > > down to get the information necessary (instead of just piggy-backing on > > `-E` behavior to get what I want). > > perhaps that means (2)? Can't it just read the header as if it wasn't imported? AFAIU, that's what GCC did in Jan 2019. I understand that CPP state is probably not easy, but something to consider. > *** it’s kinda frustrating that this is hard infomation to get as a > developer, so > perhaps we can anticipate users wanting such output. I think cacheing and distributed build tools are the most likely consumers of such information. --Ben
Re: [modules] Preprocessing requires compiled header unit modules
On Fri, Apr 22, 2022 at 16:08:00 +0200, Boris Kolpackov wrote: > Ben Boeckel writes: > > I agree that the header needs to be *found*, but scanning cannot require > > a pre-existing BMI for that header. > > Well, if scanning cannot require a pre-existing BMI but a pre-existing > BMI is required to get accurate dependency information, then something > has to give. If we need to know and have dependencies prepared before we can figure out the dependencies for a TU, modules are unsolvable (without an active build executor). If C++ implementations are really going to require that, then not being able to perform `cl.exe *.cpp` anymore should be the least of people's worries on the recent megathread because the following tools are all unsuitable for C++ with header units without major overhauls (alphabetical): - autoconf/automake - cmake - gn - gyp - make (not GNU make, though even that requires some active involvement via the socket communications) - meson - ninja I'm sure there are many more in use out there. > In this example, if you treat import of header-unit.hpp as > include, you will get incorrect dependency information. > > So to make this work correctly we will need to re-create the > macro isolation semantics of import for include. I'm aware. This is what I alluded to with "CPP state is probably not easy". > Even if we manage to do this, there are some implications I > am not sure we will like: the isolated macros will contain > inclusion guards, which means we will keep re-scanning the > same files potentially many many time. Here is an example, > assume each header-unitN.hpp includes or imports : Note that scanning each module TU only happens once. Header units might just get *read* in the course of scanning other units. And headers are read multiple times already over the lifetime of the build, so we're not making things worse here. I wonder if we had had a modularized stdlib which landed with modules if header units would have been a thing in the first place… --Ben
Re: [modules] Preprocessing requires compiled header unit modules
On Mon, Apr 25, 2022 at 11:42:14 +0200, Boris Kolpackov wrote: > 1. Firstly, this only applies to header units, not named modules. Correct. > 2. I am not sure what you mean by "active build executor" (but it >does sound ominous, I will grant you that ;-)). One that does more than schedule builds, but is "actively" participating in the build itself. Ninja's `dyndep` is about as close one can be to the edge IMO. `make` is potentially active because of its behavior around `include` of output files, but this is a niche feature. > 3. I agree some build systems may require "major overhauls" to >support header units via the module mapper. I would like this >not to be the case, but so far nobody has implemented an >alternative (that I am aware of) that is correct and scalable >and I personally have doubts such a thing is achievable. One the compilers can scan headers, CMake should be able to do it. MSVC is the closest right now (though the name it wants for any given header usage site is currently an issue). > Ben Boeckel writes: > > Note that scanning each module TU only happens once. Header units might > > just get *read* in the course of scanning other units. > > > > And headers are read multiple times already over the lifetime of the > > build, so we're not making things worse here. > > I am not sure I follow. Say we have 10 TUs each include or import > 10 headers each of which includes . If we use include, > then when scanning each of these 10 TUs we have to scan > once (since all the subsequent includes are suppressed by include > guards). So total of 10x1=10 scans of for the entire > build. > > Now if instead of include we use import (which, during the scan, is > treated as include with macro isolation), we are looking at 10 scans > of for each TU (because the include guards are ignored). > So total of 10x10=100 scans of for the build. > > What am I missing? Ew, indeed. I suppose a sufficiently smart compiler could just encode headers as a "diff" to the preprocessor and symbol state and apply it upon revisiting, but that seems like a high bar (re-reading if it detects a difference in preprocessor state that "matters"). --Ben
Re: [RFC] Support for nonzero attribute
On Sun, Jun 05, 2022 at 20:09:04 +, Miika via Gcc wrote: > Based on Jakub's and Yair's comments I created a new attribute "inrange". > Inrage takes three arguments, pos min and max. > Pos being the argument position in the function, and min and max defines the > range of valid integer. Both min and max are inclusive and work with enums. > Warnings are enabled with the new flag: "-Winrange". Is this something that could be applied to variables or types (I've not much experience with GCC attribute internal mechanisms, so maybe not)? For example: ``` // This variable must be in range. int percentage __attribute__((inrange(0, 100))); // Any variable declared through this typedef must be in range. typedef int __attribute__((inrange(0, 100))) percentage_t; ``` --Ben
Re: Proposal: allow to extend C++ template argument deduction via plugins
On Thu, Jul 14, 2022 at 18:46:47 +0200, Dan Klishch via Gcc wrote: > As far as I understand the currently available plugin extension points, it is > not possible to modify template argument deduction algorithm (except the > theoretical possibility to completely override parsing step). However, such > opportunity might be beneficial for projects like libpqxx, for example, when > database schema and query text are available at compile-time, return types of > the query might be inferred by the plugin. > > I propose to add something like PLUGIN_FUNCTION_CALL plugin_event which will > allow to modify function calls conditionally. Will a patch adding such > functionality be welcomed? Note that I'm not a GCC developer, so my opinion isn't worth much on the acceptability-to-GCC front. Wouldn't this make it a C++ dialect? How would non-GCC compilers be able to be compatible with such an API? That is, if the schema and query being available changes the API (nevermind the schema itself changing), I don't see how this works in general. --Ben
Re: Importable STL Header Units?
On Mon, Jul 25, 2022 at 19:48:28 +1000, Zopolis0 via Gcc wrote: > Currently, when importing the standard library, one has to > separately compile each unit they want to use, which is a hindrance to the > build process and a gap in the implementation. > > Is there any particular reason why gcc doesn't provide importable header > units for the standard library? Is there a timeline for this? Given that > the standard library can already be compiled into header units, is there > any particular reason why those aren't shipped with gcc? Some questions that come to mind if GCC were to provide this: - What flags should be used for these pre-compiled units? - How do you expect build systems to be able to find them? Will there be some kind of naming convention? It'd certainly be nice if other implementations would cooperate… (FD: CMake developer working on module support myself) It is my belief that any module-supporting build system has to consider the possibility of compiling any given external module because there's no guarantee that the flag set is compatible (say, `-ffast-math` or other "fun" flags with ABI implications) and you'll need your own compilation of the module anyways. Having pre-compiled units is nice, but merely an optimization for any robust build system; you certainly can't just use any available pre-compiled module blindly (unless you're relying on the linker to detect mismatches or something). The C++ committee's SG15 (Tooling) is looking into the problem. These papers are likely of interest to you (still under discussion): https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p2577r2.pdf https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p2581r0.pdf More papers will certainly appear: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/ --Ben
Re: Wanted: original ConceptGCC downloads / branch, concepts-lite branch
On Wed, Aug 17, 2022 at 12:42:42 +0100, Aaron Gray via Gcc wrote: > I am looking for the original ConceptGCC source code, the > https://www.generic-programming.org/software/ConceptGCC/download.html has > all broken links and the SVN is gone. > > Is this available on GCC git or SVN ? There is this repo that may be of interest: https://github.com/asutton/gcc No idea of the state of it though or how useful it is for C++20 Concepts. --Ben
[PATCH 1/1] p1689r5: initial support
This patch implements support for [P1689R5][] to communicate to a build system the C++20 module dependencies to build systems so that they may build `.gcm` files in the proper order. Support is communicated through the following three new flags: - `-fdeps-format=` specifies the format for the output. Currently named `p1689r5`. - `-fdeps-file=` specifies the path to the file to write the format to. - `-fdep-output=` specifies the `.o` that will be written for the TU that is scanned. This is required so that the build system can correlate the dependency output with the actual compilation that will occur. CMake supports this format as of 17 Jun 2022 (to be part of 3.25.0) using an experimental feature selection (to allow for future usage evolution without committing to how it works today). While it remains experimental, docs may be found in CMake's documentation for experimental features. Future work may include using this format for Fortran module dependencies as well, however this is still pending work. [P1689R5]: https://isocpp.org/files/papers/P1689R5.html [cmake-experimental]: https://gitlab.kitware.com/cmake/cmake/-/blob/master/Help/dev/experimental.rst TODO: - header-unit information fields Header units (including the standard library headers) are 100% unsupported right now because the `-E` mechanism wants to import their BMIs. A new mode (i.e., something more workable than existing `-E` behavior) that mocks up header units as if they were imported purely from their path and content would be required. - non-utf8 paths The current standard says that paths that are not unambiguously represented using UTF-8 are not supported (because these cases are rare and the extra complication is not worth it at this time). Future versions of the format might have ways of encoding non-UTF-8 paths. For now, this patch just doesn't support non-UTF-8 paths (ignoring the "unambiguously represetable in UTF-8" case). - figure out why junk gets placed at the end of the file Sometimes it seems like the file gets a lot of `NUL` bytes appended to it. It happens rarely and seems to be the result of some `ftruncate`-style call which results in extra padding in the contents. Noting it here as an observation at least. Signed-off-by: Ben Boeckel --- gcc/ChangeLog | 9 ++ gcc/c-family/ChangeLog | 6 + gcc/c-family/c-opts.cc | 40 ++- gcc/c-family/c.opt | 12 ++ gcc/cp/ChangeLog| 5 + gcc/cp/module.cc| 3 +- gcc/doc/invoke.texi | 15 +++ gcc/fortran/ChangeLog | 5 + gcc/fortran/cpp.cc | 4 +- gcc/genmatch.cc | 2 +- gcc/input.cc| 4 +- libcpp/ChangeLog| 11 ++ libcpp/include/cpplib.h | 12 +- libcpp/include/mkdeps.h | 17 ++- libcpp/init.cc | 14 ++- libcpp/mkdeps.cc| 235 ++-- 16 files changed, 368 insertions(+), 26 deletions(-) diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 6dded16c0e3..2d61de6adde 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,12 @@ +2022-09-20 Ben Boeckel + + * doc/invoke.texi: Document -fdeps-format=, -fdep-file=, and + -fdep-output= flags. + * genmatch.cc (main): Add new preprocessor parameter used for C++ + module tracking. + * input.cc (test_lexer): Add new preprocessor parameter used for C++ + module tracking. + 2022-09-19 Torbjörn SVENSSON * targhooks.cc (default_zero_call_used_regs): Improve sorry diff --git a/gcc/c-family/ChangeLog b/gcc/c-family/ChangeLog index ba3d76dd6cb..569dcd96e8c 100644 --- a/gcc/c-family/ChangeLog +++ b/gcc/c-family/ChangeLog @@ -1,3 +1,9 @@ +2022-09-20 Ben Boeckel + + * c-opts.cc (c_common_handle_option): Add fdeps_file variable and + -fdeps-format=, -fdep-file=, and -fdep-output= parsing. + * c.opt: Add -fdeps-format=, -fdep-file=, and -fdep-output= flags. + 2022-09-15 Richard Biener * c-common.h (build_void_list_node): Remove. diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc index babaa2fc157..617d0e93696 100644 --- a/gcc/c-family/c-opts.cc +++ b/gcc/c-family/c-opts.cc @@ -77,6 +77,9 @@ static bool verbose; /* Dependency output file. */ static const char *deps_file; +/* Enhanced dependency output file. */ +static const char *fdeps_file; + /* The prefix given by -iprefix, if any. */ static const char *iprefix; @@ -360,6 +363,23 @@ c_common_handle_option (size_t scode, const char *arg, HOST_WIDE_INT value, deps_file = arg; break; +case OPT_fdep_format_: + if (!strcmp (arg, "p1689r5")) + cpp_opts->deps.format = DEPS_FMT_P1689R5; + else + error ("%<-fdep-format=%> unknown format %s", arg); + break; + +case OPT_fdep_file_: + deps_seen = true; + fdeps_file = arg; + break; + +case OPT_fdep_output_: + deps_seen = true; + defer_opt (code, arg); + b
[PATCH 0/1] RFC: P1689R5 support
This patch adds initial support for ISO C++'s [P1689R5][], a format for describing C++ module requirements and provisions based on the source code. This is required because compiling C++ with modules is not embarrassingly parallel and need to be ordered to ensure that `import some_module;` can be satisfied in time by making sure that the TU with `export import some_module;` is compiled first. [P1689R5]: https://isocpp.org/files/papers/P1689R5.html I'd like feedback on the approach taken here with respect to the user-visible flags. I'll also note that header units are not supported at this time because the current `-E` behavior with respect to `import ;` is to search for an appropriate `.gcm` file which is not something such a "scan" can support. A new mode will likely need to be created (e.g., replacing `-E` with `-fc++-module-scanning` or something) where headers are looked up "normally" and processed only as much as scanning requires. Testing is currently happening in CMake's CI using a prior revision of this patch (the differences are basically the changelog, some style, and `trtbd` instead of `p1689r5` as the format name). For testing within GCC, I'll work on the following: - scanning non-module source - scanning module-importing source (`import X;`) - scanning module-exporting source (`export module X;`) - scanning module implementation unit (`module X;`) - flag combinations? Are there existing tools for handling JSON output for testing purposes? Basically, something that I can add to the test suite that doesn't care about whitespace, but checks the structure (with sensible replacements for absolute paths where relevant)? For the record, Clang has patches with similar flags and behavior by Chuanqi Xu here: https://reviews.llvm.org/D134269 with the same flags (though using my old `trtbd` spelling for the format name). Thanks, --Ben Ben Boeckel (1): p1689r5: initial support gcc/ChangeLog | 9 ++ gcc/c-family/ChangeLog | 6 + gcc/c-family/c-opts.cc | 40 ++- gcc/c-family/c.opt | 12 ++ gcc/cp/ChangeLog| 5 + gcc/cp/module.cc| 3 +- gcc/doc/invoke.texi | 15 +++ gcc/fortran/ChangeLog | 5 + gcc/fortran/cpp.cc | 4 +- gcc/genmatch.cc | 2 +- gcc/input.cc| 4 +- libcpp/ChangeLog| 11 ++ libcpp/include/cpplib.h | 12 +- libcpp/include/mkdeps.h | 17 ++- libcpp/init.cc | 14 ++- libcpp/mkdeps.cc| 235 ++-- 16 files changed, 368 insertions(+), 26 deletions(-) base-commit: d812e8cb2a920fd75768e16ca8ded59ad93c172f -- 2.37.3
Re: [PATCH RESEND 1/1] p1689r5: initial support
On Tue, Oct 04, 2022 at 21:12:03 +0200, Harald Anlauf wrote: > Am 04.10.22 um 17:12 schrieb Ben Boeckel: > > This patch implements support for [P1689R5][] to communicate to a build > > system the C++20 module dependencies to build systems so that they may > > build `.gcm` files in the proper order. > > Is there a reason that you are touching so many frontends? Just those that needed the update for `cpp_finish`. It does align with those that will (eventually) need this support anyways (AFAIK). > > diff --git a/gcc/fortran/cpp.cc b/gcc/fortran/cpp.cc > > index 364bd0d2a85..0b9df9c02cd 100644 > > --- a/gcc/fortran/cpp.cc > > +++ b/gcc/fortran/cpp.cc > > @@ -712,7 +712,7 @@ gfc_cpp_done (void) > > FILE *f = fopen (gfc_cpp_option.deps_filename, "w"); > > if (f) > > { > > - cpp_finish (cpp_in, f); > > + cpp_finish (cpp_in, f, NULL); > > fclose (f); > > } > > else > > @@ -721,7 +721,7 @@ gfc_cpp_done (void) > > xstrerror (errno)); > > } > > else > > - cpp_finish (cpp_in, stdout); > > + cpp_finish (cpp_in, stdout, NULL); > > } > > > > cpp_undef_all (cpp_in); > > Couldn't you simply default the third argument of cpp_finish() to NULL? I could do that. Wasn't sure how much that would be acceptable given that it is a "silent" change, but it would reduce the number of files touched here. Thanks, --Ben
Re: [PATCH RESEND 1/1] p1689r5: initial support
On Mon, Oct 10, 2022 at 17:04:09 -0400, Jason Merrill wrote: > On 10/4/22 11:12, Ben Boeckel wrote: > > This patch implements support for [P1689R5][] to communicate to a build > > system the C++20 module dependencies to build systems so that they may > > build `.gcm` files in the proper order. > > Thanks! > > > Support is communicated through the following three new flags: > > > > - `-fdeps-format=` specifies the format for the output. Currently named > >`p1689r5`. > > > > - `-fdeps-file=` specifies the path to the file to write the format to. > > Do you expect users to want to emit Makefile (-MM) and P1689 > dependencies from the same compilation? Yes, the build system wants to know what files affect scanning as well (e.g., `#include ` is still possible and if it changes, this operation should be performed again. The `-M` flags do this quite nicely already :) . > > - `-fdep-output=` specifies the `.o` that will be written for the TU > >that is scanned. This is required so that the build system can > >correlate the dependency output with the actual compilation that will > >occur. > > The dependency machinery already needs to be able to figure out the name > of the output file, can't we reuse that instead of specifying it on the > command line? This is how it determines the output of the compilation. Because it is piggy-backing on the `-E` flag, the `-o` flag specifies the output of the preprocessed source (purely a side-effect right now). > > diff --git a/libcpp/include/cpplib.h b/libcpp/include/cpplib.h > > index 2db1e9cbdfb..90787230a9e 100644 > > --- a/libcpp/include/cpplib.h > > +++ b/libcpp/include/cpplib.h > > @@ -298,6 +298,9 @@ typedef CPPCHAR_SIGNED_T cppchar_signed_t; > > /* Style of header dependencies to generate. */ > > enum cpp_deps_style { DEPS_NONE = 0, DEPS_USER, DEPS_SYSTEM }; > > > > +/* Format of header dependencies to generate. */ > > +enum cpp_deps_format { DEPS_FMT_NONE = 0, DEPS_FMT_P1689R5 }; > > Why not add this to cpp_deps_style? It is orthogonal. The `-M` flags and `-fdeps-*` flags are similar, but `-fdeps-*` dependencies are fundamentally different than `-M` dependencies. The comment does need updated though. `-M` is about discovered dependencies: those that you find out while doing work. `-fdep-*` is about ordering dependencies: extracting information from file content in order to even order future work around. It stems from the loss of embarassing parallelism when building C++20 code that uses `import`. It's isomorphic to the Fortran module compilation ordering problem. > > @@ -387,7 +410,7 @@ make_write_vec (const mkdeps::vec &vec, > > FILE *fp, > > .PHONY targets for all the dependencies too. */ > > > > static void > > -make_write (const cpp_reader *pfile, FILE *fp, unsigned int colmax) > > +make_write (const cpp_reader *pfile, FILE *fp, unsigned int colmax, int > > extra) > > Instead of adding the "extra" parameter... Hmm. Not sure why I had named this so poorly. Maybe this comes from my initial stab at this functionality in 2019 (the format has been hammered out in ISO C++'s SG15 since then). > > { > > const mkdeps *d = pfile->deps; > > > > @@ -398,7 +421,7 @@ make_write (const cpp_reader *pfile, FILE *fp, unsigned > > int colmax) > > if (d->deps.size ()) > > { > > column = make_write_vec (d->targets, fp, 0, colmax, d->quote_lwm); > > - if (CPP_OPTION (pfile, deps.modules) && d->cmi_name) > > + if (extra && CPP_OPTION (pfile, deps.modules) && d->cmi_name) > > column = make_write_name (d->cmi_name, fp, column, colmax); > > fputs (":", fp); > > column++; > > @@ -412,7 +435,7 @@ make_write (const cpp_reader *pfile, FILE *fp, unsigned > > int colmax) > > if (!CPP_OPTION (pfile, deps.modules)) > > return; > > ...how about checking CPP_OPTION for p1689r5 mode here? I'll take a look at this. > > > > - if (d->modules.size ()) > > + if (extra && d->modules.size ()) > > { > > column = make_write_vec (d->targets, fp, 0, colmax, d->quote_lwm); > > if (d->cmi_name) > > @@ -423,7 +446,7 @@ make_write (const cpp_reader *pfile, FILE *fp, unsigned > > int colmax) > > fputs ("\n", fp); > > } > > > > - if (d->module_name) > > + if (extra && d->module_name) > > { > > if (d->cmi_name) > > { > > @@ -455,7 +478,7 @@ make_write (c
Re: [PATCH RESEND 1/1] p1689r5: initial support
On Tue, Oct 11, 2022 at 07:42:43 -0400, Ben Boeckel wrote: > On Mon, Oct 10, 2022 at 17:04:09 -0400, Jason Merrill wrote: > > Can we share utf8 parsing code with decode_utf8_char in pretty-print.cc? > > I can look at factoring that out. I'll have to decode its logic to see > how much overlap there is. There is some mismatch. First, that is in `gcc` and this is in `libcpp`. Second, `pretty-print.cc`'s implementation: - fails on an empty string; - accepts extended-length (5+-byte) encodings which are invalid Unicode; and - decodes codepoint-by-codepoint instead of just validating the entire string. --Ben
Re: [PATCH RESEND 0/1] RFC: P1689R5 support
On Thu, Oct 13, 2022 at 13:08:46 -0400, David Malcolm wrote: > On Mon, 2022-10-10 at 16:21 -0400, Jason Merrill wrote: > > David Malcolm would probably know best about JSON wrangling. > > Unfortunately our JSON output doesn't make any guarantees about the > ordering of keys within an object, so the precise textual output > changes from run to run. I've coped with that in my test cases by > limiting myself to simple regexes of fragments of the JSON output. > > Martin Liska [CCed] went much further in > 4e275dccfc2467b3fe39012a3dd2a80bac257dd0 by adding a run-gcov-pytest > DejaGnu directive, allowing for test cases for gcov to be written in > Python, which can thus test much more interesting assertions about the > generated JSON. Ok, if Python is acceptable, I'll use its stdlib to do "fancy" things. Part of this is because I want to assert that unnecessary fields don't exist and that sounds…unlikely to be possible in any maintainable way (assuming it is possible) with regexen. `jq` could help immensely, but that is probably a bridge too far :) . Thanks, --Ben
Re: [PATCH RESEND 1/1] p1689r5: initial support
On Thu, Oct 20, 2022 at 11:39:25 -0400, Jason Merrill wrote: > Oops, I was thinking this was in gcc as well. In libcpp there's > _cpp_valid_utf8 (which calls one_utf8_to_cppchar). This routine has a lot more logic (including UCN decoding) and the `one_utf8_to_cppchar` also supports out-of-bounds codepoints above `0x10`. --Ben
[PATCH v2 0/1] RFC: P1689R5 support
Hi, This patch adds initial support for ISO C++'s [P1689R5][], a format for describing C++ module requirements and provisions based on the source code. This is required because compiling C++ with modules is not embarrassingly parallel and need to be ordered to ensure that `import some_module;` can be satisfied in time by making sure that the TU with `export import some_module;` is compiled first. [P1689R5]: https://isocpp.org/files/papers/P1689R5.html I'd like feedback on the approach taken here with respect to the user-visible flags. I'll also note that header units are not supported at this time because the current `-E` behavior with respect to `import ;` is to search for an appropriate `.gcm` file which is not something such a "scan" can support. A new mode will likely need to be created (e.g., replacing `-E` with `-fc++-module-scanning` or something) where headers are looked up "normally" and processed only as much as scanning requires. For the record, Clang has patches with similar flags and behavior by Chuanqi Xu here: https://reviews.llvm.org/D134269 with the same flags. Thanks, --Ben --- v1 -> v2: - removal of the `deps_write(extra)` parameter to option-checking where ndeeded - default parameter of `cpp_finish(fdeps_stream = NULL)` - unification of libcpp UTF-8 validity functions from v1 - test cases for flag parsing states (depflags-*) and p1689 output (p1689-*) Ben Boeckel (3): libcpp: reject codepoints above 0x10 libcpp: add a function to determine UTF-8 validity of a C string p1689r5: initial support gcc/ChangeLog | 5 + gcc/c-family/ChangeLog| 6 + gcc/c-family/c-opts.cc| 40 +++- gcc/c-family/c.opt| 12 + gcc/cp/ChangeLog | 5 + gcc/cp/module.cc | 3 +- gcc/doc/invoke.texi | 15 ++ gcc/testsuite/ChangeLog | 7 + gcc/testsuite/g++.dg/modules/depflags-f-MD.C | 2 + gcc/testsuite/g++.dg/modules/depflags-f.C | 1 + gcc/testsuite/g++.dg/modules/depflags-fi.C| 3 + gcc/testsuite/g++.dg/modules/depflags-fj-MD.C | 3 + gcc/testsuite/g++.dg/modules/depflags-fj.C| 4 + .../g++.dg/modules/depflags-fjo-MD.C | 4 + gcc/testsuite/g++.dg/modules/depflags-fjo.C | 5 + gcc/testsuite/g++.dg/modules/depflags-fo-MD.C | 3 + gcc/testsuite/g++.dg/modules/depflags-fo.C| 4 + gcc/testsuite/g++.dg/modules/depflags-j-MD.C | 2 + gcc/testsuite/g++.dg/modules/depflags-j.C | 3 + gcc/testsuite/g++.dg/modules/depflags-jo-MD.C | 3 + gcc/testsuite/g++.dg/modules/depflags-jo.C| 4 + gcc/testsuite/g++.dg/modules/depflags-o-MD.C | 2 + gcc/testsuite/g++.dg/modules/depflags-o.C | 3 + gcc/testsuite/g++.dg/modules/modules.exp | 11 + gcc/testsuite/g++.dg/modules/p1689-1.C| 18 ++ gcc/testsuite/g++.dg/modules/p1689-1.exp.json | 27 +++ gcc/testsuite/g++.dg/modules/p1689-2.C| 16 ++ gcc/testsuite/g++.dg/modules/p1689-2.exp.json | 16 ++ gcc/testsuite/g++.dg/modules/p1689-3.C| 14 ++ gcc/testsuite/g++.dg/modules/p1689-3.exp.json | 16 ++ gcc/testsuite/g++.dg/modules/p1689-4.C| 14 ++ gcc/testsuite/g++.dg/modules/p1689-4.exp.json | 14 ++ gcc/testsuite/g++.dg/modules/p1689-5.C| 14 ++ gcc/testsuite/g++.dg/modules/p1689-5.exp.json | 14 ++ gcc/testsuite/g++.dg/modules/test-p1689.py| 222 ++ gcc/testsuite/lib/modules.exp | 71 ++ libcpp/ChangeLog | 23 ++ libcpp/charset.cc | 22 +- libcpp/include/cpplib.h | 12 +- libcpp/include/mkdeps.h | 17 +- libcpp/init.cc| 13 +- libcpp/internal.h | 2 + libcpp/mkdeps.cc | 149 +++- 43 files changed, 823 insertions(+), 21 deletions(-) create mode 100644 gcc/testsuite/g++.dg/modules/depflags-f-MD.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-f.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fi.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fj-MD.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fj.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fjo-MD.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fjo.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fo-MD.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fo.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-j-MD.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-j.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-jo-MD.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-jo.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-o-MD.C create mode 100644
[PATCH v2 2/3] libcpp: add a function to determine UTF-8 validity of a C string
This simplifies the interface for other UTF-8 validity detections when a simple "yes" or "no" answer is sufficient. Signed-off-by: Ben Boeckel --- libcpp/ChangeLog | 6 ++ libcpp/charset.cc | 18 ++ libcpp/internal.h | 2 ++ 3 files changed, 26 insertions(+) diff --git a/libcpp/ChangeLog b/libcpp/ChangeLog index 4d707277531..4e2c7900ae2 100644 --- a/libcpp/ChangeLog +++ b/libcpp/ChangeLog @@ -1,3 +1,9 @@ +2022-10-27 Ben Boeckel + + * include/charset.cc: Add `_cpp_valid_utf8_str` which determines + whether a C string is valid UTF-8 or not. + * include/internal.h: Add prototype for `_cpp_valid_utf8_str`. + 2022-10-27 Ben Boeckel * include/charset.cc: Reject encodings of codepoints above 0x10. diff --git a/libcpp/charset.cc b/libcpp/charset.cc index e9da6674b5f..0524ab6beba 100644 --- a/libcpp/charset.cc +++ b/libcpp/charset.cc @@ -1864,6 +1864,24 @@ _cpp_valid_utf8 (cpp_reader *pfile, return true; } +extern bool +_cpp_valid_utf8_str (const char *name) +{ + const uchar* in = (const uchar*)name; + size_t len = strlen(name); + cppchar_t cp; + + while (*in) +{ + if (one_utf8_to_cppchar(&in, &len, &cp)) + { + return false; + } +} + + return true; +} + /* Subroutine of convert_hex and convert_oct. N is the representation in the execution character set of a numeric escape; write it into the string buffer TBUF and update the end-of-string pointer therein. WIDE diff --git a/libcpp/internal.h b/libcpp/internal.h index badfd1b40da..4f2dd4a2f5c 100644 --- a/libcpp/internal.h +++ b/libcpp/internal.h @@ -834,6 +834,8 @@ extern bool _cpp_valid_utf8 (cpp_reader *pfile, struct normalize_state *nst, cppchar_t *cp); +extern bool _cpp_valid_utf8_str (const char *str); + extern void _cpp_destroy_iconv (cpp_reader *); extern unsigned char *_cpp_convert_input (cpp_reader *, const char *, unsigned char *, size_t, size_t, -- 2.37.3
[PATCH v2 3/3] p1689r5: initial support
This patch implements support for [P1689R5][] to communicate to a build system the C++20 module dependencies to build systems so that they may build `.gcm` files in the proper order. Support is communicated through the following three new flags: - `-fdeps-format=` specifies the format for the output. Currently named `p1689r5`. - `-fdeps-file=` specifies the path to the file to write the format to. - `-fdep-output=` specifies the `.o` that will be written for the TU that is scanned. This is required so that the build system can correlate the dependency output with the actual compilation that will occur. CMake supports this format as of 17 Jun 2022 (to be part of 3.25.0) using an experimental feature selection (to allow for future usage evolution without committing to how it works today). While it remains experimental, docs may be found in CMake's documentation for experimental features. Future work may include using this format for Fortran module dependencies as well, however this is still pending work. [P1689R5]: https://isocpp.org/files/papers/P1689R5.html [cmake-experimental]: https://gitlab.kitware.com/cmake/cmake/-/blob/master/Help/dev/experimental.rst TODO: - header-unit information fields Header units (including the standard library headers) are 100% unsupported right now because the `-E` mechanism wants to import their BMIs. A new mode (i.e., something more workable than existing `-E` behavior) that mocks up header units as if they were imported purely from their path and content would be required. - non-utf8 paths The current standard says that paths that are not unambiguously represented using UTF-8 are not supported (because these cases are rare and the extra complication is not worth it at this time). Future versions of the format might have ways of encoding non-UTF-8 paths. For now, this patch just doesn't support non-UTF-8 paths (ignoring the "unambiguously represetable in UTF-8" case). - figure out why junk gets placed at the end of the file Sometimes it seems like the file gets a lot of `NUL` bytes appended to it. It happens rarely and seems to be the result of some `ftruncate`-style call which results in extra padding in the contents. Noting it here as an observation at least. Signed-off-by: Ben Boeckel --- gcc/ChangeLog | 5 + gcc/c-family/ChangeLog| 6 + gcc/c-family/c-opts.cc| 40 +++- gcc/c-family/c.opt| 12 + gcc/cp/ChangeLog | 5 + gcc/cp/module.cc | 3 +- gcc/doc/invoke.texi | 15 ++ gcc/testsuite/ChangeLog | 7 + gcc/testsuite/g++.dg/modules/depflags-f-MD.C | 2 + gcc/testsuite/g++.dg/modules/depflags-f.C | 1 + gcc/testsuite/g++.dg/modules/depflags-fi.C| 3 + gcc/testsuite/g++.dg/modules/depflags-fj-MD.C | 3 + gcc/testsuite/g++.dg/modules/depflags-fj.C| 4 + .../g++.dg/modules/depflags-fjo-MD.C | 4 + gcc/testsuite/g++.dg/modules/depflags-fjo.C | 5 + gcc/testsuite/g++.dg/modules/depflags-fo-MD.C | 3 + gcc/testsuite/g++.dg/modules/depflags-fo.C| 4 + gcc/testsuite/g++.dg/modules/depflags-j-MD.C | 2 + gcc/testsuite/g++.dg/modules/depflags-j.C | 3 + gcc/testsuite/g++.dg/modules/depflags-jo-MD.C | 3 + gcc/testsuite/g++.dg/modules/depflags-jo.C| 4 + gcc/testsuite/g++.dg/modules/depflags-o-MD.C | 2 + gcc/testsuite/g++.dg/modules/depflags-o.C | 3 + gcc/testsuite/g++.dg/modules/modules.exp | 11 + gcc/testsuite/g++.dg/modules/p1689-1.C| 18 ++ gcc/testsuite/g++.dg/modules/p1689-1.exp.json | 27 +++ gcc/testsuite/g++.dg/modules/p1689-2.C| 16 ++ gcc/testsuite/g++.dg/modules/p1689-2.exp.json | 16 ++ gcc/testsuite/g++.dg/modules/p1689-3.C| 14 ++ gcc/testsuite/g++.dg/modules/p1689-3.exp.json | 16 ++ gcc/testsuite/g++.dg/modules/p1689-4.C| 14 ++ gcc/testsuite/g++.dg/modules/p1689-4.exp.json | 14 ++ gcc/testsuite/g++.dg/modules/p1689-5.C| 14 ++ gcc/testsuite/g++.dg/modules/p1689-5.exp.json | 14 ++ gcc/testsuite/g++.dg/modules/test-p1689.py| 222 ++ gcc/testsuite/lib/modules.exp | 71 ++ libcpp/ChangeLog | 11 + libcpp/include/cpplib.h | 12 +- libcpp/include/mkdeps.h | 17 +- libcpp/init.cc| 13 +- libcpp/mkdeps.cc | 149 +++- 41 files changed, 789 insertions(+), 19 deletions(-) create mode 100644 gcc/testsuite/g++.dg/modules/depflags-f-MD.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-f.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fi.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fj-MD.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fj.C cr
[PATCH v2 1/3] libcpp: reject codepoints above 0x10FFFF
Unicode does not support such values because they are unrepresentable in UTF-16. Signed-off-by: Ben Boeckel --- libcpp/ChangeLog | 6 ++ libcpp/charset.cc | 4 ++-- 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/libcpp/ChangeLog b/libcpp/ChangeLog index 18d5bcceaf0..4d707277531 100644 --- a/libcpp/ChangeLog +++ b/libcpp/ChangeLog @@ -1,3 +1,9 @@ +2022-10-27 Ben Boeckel + + * include/charset.cc: Reject encodings of codepoints above 0x10. + UTF-16 does not support such codepoints and therefore all Unicode + rejects such values. + 2022-10-19 Lewis Hyatt * include/cpplib.h (struct cpp_string): Use new "string_length" GTY. diff --git a/libcpp/charset.cc b/libcpp/charset.cc index 12a398e7527..e9da6674b5f 100644 --- a/libcpp/charset.cc +++ b/libcpp/charset.cc @@ -216,7 +216,7 @@ one_utf8_to_cppchar (const uchar **inbufp, size_t *inbytesleftp, if (c <= 0x3FF && nbytes > 5) return EILSEQ; /* Make sure the character is valid. */ - if (c > 0x7FFF || (c >= 0xD800 && c <= 0xDFFF)) return EILSEQ; + if (c > 0x10 || (c >= 0xD800 && c <= 0xDFFF)) return EILSEQ; *cp = c; *inbufp = inbuf; @@ -320,7 +320,7 @@ one_utf32_to_utf8 (iconv_t bigend, const uchar **inbufp, size_t *inbytesleftp, s += inbuf[bigend ? 2 : 1] << 8; s += inbuf[bigend ? 3 : 0]; - if (s >= 0x7FFF || (s >= 0xD800 && s <= 0xDFFF)) + if (s > 0x10 || (s >= 0xD800 && s <= 0xDFFF)) return EILSEQ; rval = one_cppchar_to_utf8 (s, outbufp, outbytesleftp); -- 2.37.3
Re: [PATCH v2 2/3] libcpp: add a function to determine UTF-8 validity of a C string
On Fri, Oct 28, 2022 at 08:59:16 -0400, David Malcolm wrote: > On Thu, 2022-10-27 at 19:16 -0400, Ben Boeckel wrote: > > This simplifies the interface for other UTF-8 validity detections > > when a > > simple "yes" or "no" answer is sufficient. > > > > Signed-off-by: Ben Boeckel > > --- > > libcpp/ChangeLog | 6 ++ > > libcpp/charset.cc | 18 ++ > > libcpp/internal.h | 2 ++ > > 3 files changed, 26 insertions(+) > > > > diff --git a/libcpp/ChangeLog b/libcpp/ChangeLog > > index 4d707277531..4e2c7900ae2 100644 > > --- a/libcpp/ChangeLog > > +++ b/libcpp/ChangeLog > > @@ -1,3 +1,9 @@ > > +2022-10-27 Ben Boeckel > > + > > + * include/charset.cc: Add `_cpp_valid_utf8_str` which > > determines > > + whether a C string is valid UTF-8 or not. > > + * include/internal.h: Add prototype for > > `_cpp_valid_utf8_str`. > > + > > 2022-10-27 Ben Boeckel > > > > * include/charset.cc: Reject encodings of codepoints above > > 0x10. > > The patch looks good to me, with the same potential caveat that you > might need to move the ChangeLog entry from the patch "body" to the > leading blurb, to satisfy: > ./contrib/gcc-changelog/git_check_commit.py Ah, I had missed that. Now fixed locally for patches 1 and 2; will be in v3 pending some time for further reviews. THanks, --Ben
Re: [PATCH v2 3/3] p1689r5: initial support
On Thu, Oct 27, 2022 at 19:16:44 -0400, Ben Boeckel wrote: > diff --git a/gcc/testsuite/g++.dg/modules/modules.exp > b/gcc/testsuite/g++.dg/modules/modules.exp > index afb323d0efd..7fe8825144f 100644 > --- a/gcc/testsuite/g++.dg/modules/modules.exp > +++ b/gcc/testsuite/g++.dg/modules/modules.exp > @@ -28,6 +28,7 @@ > # { dg-module-do [link|run] [xfail] [options] } # link [and run] > > load_lib g++-dg.exp > +load_lib modules.exp > > # If a testcase doesn't have special options, use these. > global DEFAULT_CXXFLAGS > @@ -237,6 +238,13 @@ proc cleanup_module_files { files } { > } > } > > +# delete the specified set of dep files > +proc cleanup_dep_files { files } { > +foreach file $files { > + file_on_host delete $file > +} > +} > + > global testdir > set testdir $srcdir/$subdir > proc srcdir {} { > @@ -310,6 +318,7 @@ foreach src [lsort [find $srcdir/$subdir {*_a.[CHX}]] { > set std_list [module-init $src] > foreach std $std_list { > set mod_files {} > + set dep_files {} > global module_do > set module_do {"compile" "P"} > set asm_list {} > @@ -346,6 +355,8 @@ foreach src [lsort [find $srcdir/$subdir {*_a.[CHX}]] { > set mod_files [find $DEFAULT_REPO *.gcm] > } > cleanup_module_files $mod_files > + > + cleanup_dep_files $dep_files > } > } > } These `cleanup_dep_files` hunks are leftovers from my attempts at getting the P1689 and flags tests working; they'll be gone in v3. --Ben
Re: [PATCH v2 3/3] p1689r5: initial support
On Tue, Nov 01, 2022 at 08:57:37 -0600, Tom Tromey wrote: > >>>>> "Ben" == Ben Boeckel via Gcc-patches writes: > > Ben> - `-fdeps-file=` specifies the path to the file to write the format to. > > I don't know how this output is intended to be used, but one mistake > made with the other dependency-tracking options was that the output file > isn't created atomically. As a consequence, Makefiles normally have to > work around this to be robust. If that's a possible issue here then it > would be best to handle it in this patch. I don't think there'll be any race here because it's the "output" of the rule as far as the build graph is concerned. It's also JSON, so anything reading it "early" will get a partial object and easily detect "something went wrong". And for clarity, the `-o` flag used in CMake with this is just a side effect of the `-E` mechanism used and is completely ignored in the CMake usage of this. --Ben
[PATCH v3 0/3] RFC: P1689R5 support
Hi, This patch adds initial support for ISO C++'s [P1689R5][], a format for describing C++ module requirements and provisions based on the source code. This is required because compiling C++ with modules is not embarrassingly parallel and need to be ordered to ensure that `import some_module;` can be satisfied in time by making sure that the TU with `export import some_module;` is compiled first. [P1689R5]: https://isocpp.org/files/papers/P1689R5.html I'd like feedback on the approach taken here with respect to the user-visible flags. I'll also note that header units are not supported at this time because the current `-E` behavior with respect to `import ;` is to search for an appropriate `.gcm` file which is not something such a "scan" can support. A new mode will likely need to be created (e.g., replacing `-E` with `-fc++-module-scanning` or something) where headers are looked up "normally" and processed only as much as scanning requires. For the record, Clang has patches with similar flags and behavior by Chuanqi Xu here: https://reviews.llvm.org/D134269 with the same flags. Thanks, --Ben --- v2 -> v3: - changelog entries moved to commit messages - documentation updated/added in the UTF-8 routine editing v1 -> v2: - removal of the `deps_write(extra)` parameter to option-checking where ndeeded - default parameter of `cpp_finish(fdeps_stream = NULL)` - unification of libcpp UTF-8 validity functions from v1 - test cases for flag parsing states (depflags-*) and p1689 output (p1689-*) Ben Boeckel (3): libcpp: reject codepoints above 0x10 libcpp: add a function to determine UTF-8 validity of a C string p1689r5: initial support gcc/c-family/c-opts.cc| 40 +++- gcc/c-family/c.opt| 12 + gcc/cp/module.cc | 3 +- gcc/doc/invoke.texi | 15 ++ gcc/testsuite/g++.dg/modules/depflags-f-MD.C | 2 + gcc/testsuite/g++.dg/modules/depflags-f.C | 1 + gcc/testsuite/g++.dg/modules/depflags-fi.C| 3 + gcc/testsuite/g++.dg/modules/depflags-fj-MD.C | 3 + gcc/testsuite/g++.dg/modules/depflags-fj.C| 4 + .../g++.dg/modules/depflags-fjo-MD.C | 4 + gcc/testsuite/g++.dg/modules/depflags-fjo.C | 5 + gcc/testsuite/g++.dg/modules/depflags-fo-MD.C | 3 + gcc/testsuite/g++.dg/modules/depflags-fo.C| 4 + gcc/testsuite/g++.dg/modules/depflags-j-MD.C | 2 + gcc/testsuite/g++.dg/modules/depflags-j.C | 3 + gcc/testsuite/g++.dg/modules/depflags-jo-MD.C | 3 + gcc/testsuite/g++.dg/modules/depflags-jo.C| 4 + gcc/testsuite/g++.dg/modules/depflags-o-MD.C | 2 + gcc/testsuite/g++.dg/modules/depflags-o.C | 3 + gcc/testsuite/g++.dg/modules/modules.exp | 1 + gcc/testsuite/g++.dg/modules/p1689-1.C| 18 ++ gcc/testsuite/g++.dg/modules/p1689-1.exp.json | 27 +++ gcc/testsuite/g++.dg/modules/p1689-2.C| 16 ++ gcc/testsuite/g++.dg/modules/p1689-2.exp.json | 16 ++ gcc/testsuite/g++.dg/modules/p1689-3.C| 14 ++ gcc/testsuite/g++.dg/modules/p1689-3.exp.json | 16 ++ gcc/testsuite/g++.dg/modules/p1689-4.C| 14 ++ gcc/testsuite/g++.dg/modules/p1689-4.exp.json | 14 ++ gcc/testsuite/g++.dg/modules/p1689-5.C| 14 ++ gcc/testsuite/g++.dg/modules/p1689-5.exp.json | 14 ++ gcc/testsuite/g++.dg/modules/test-p1689.py| 222 ++ gcc/testsuite/lib/modules.exp | 71 ++ libcpp/charset.cc | 28 ++- libcpp/include/cpplib.h | 12 +- libcpp/include/mkdeps.h | 17 +- libcpp/init.cc| 13 +- libcpp/internal.h | 2 + libcpp/mkdeps.cc | 149 +++- 38 files changed, 773 insertions(+), 21 deletions(-) create mode 100644 gcc/testsuite/g++.dg/modules/depflags-f-MD.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-f.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fi.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fj-MD.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fj.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fjo-MD.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fjo.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fo-MD.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fo.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-j-MD.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-j.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-jo-MD.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-jo.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-o-MD.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-o.C create mode 100644 gcc/testsuite/g++.dg/modules/p1689-1.C create mode 100644 gcc/testsuite/g++.dg/modules
[PATCH v3 1/3] libcpp: reject codepoints above 0x10FFFF
Unicode does not support such values because they are unrepresentable in UTF-16. libcpp/ * charset.cc: Reject encodings of codepoints above 0x10. UTF-16 does not support such codepoints and therefore all Unicode rejects such values. Signed-off-by: Ben Boeckel --- libcpp/charset.cc | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/libcpp/charset.cc b/libcpp/charset.cc index 12a398e7527..324b5b19136 100644 --- a/libcpp/charset.cc +++ b/libcpp/charset.cc @@ -158,6 +158,10 @@ struct _cpp_strbuf encoded as any of DF 80, E0 9F 80, F0 80 9F 80, F8 80 80 9F 80, or FC 80 80 80 9F 80. Only the first is valid. + Additionally, Unicode declares that all codepoints above 0010 are + invalid because they cannot be represented in UTF-16. As such, all 5- and + 6-byte encodings are invalid. + An implementation note: the transformation from UTF-16 to UTF-8, or vice versa, is easiest done by using UTF-32 as an intermediary. */ @@ -216,7 +220,7 @@ one_utf8_to_cppchar (const uchar **inbufp, size_t *inbytesleftp, if (c <= 0x3FF && nbytes > 5) return EILSEQ; /* Make sure the character is valid. */ - if (c > 0x7FFF || (c >= 0xD800 && c <= 0xDFFF)) return EILSEQ; + if (c > 0x10 || (c >= 0xD800 && c <= 0xDFFF)) return EILSEQ; *cp = c; *inbufp = inbuf; @@ -320,7 +324,7 @@ one_utf32_to_utf8 (iconv_t bigend, const uchar **inbufp, size_t *inbytesleftp, s += inbuf[bigend ? 2 : 1] << 8; s += inbuf[bigend ? 3 : 0]; - if (s >= 0x7FFF || (s >= 0xD800 && s <= 0xDFFF)) + if (s > 0x10 || (s >= 0xD800 && s <= 0xDFFF)) return EILSEQ; rval = one_cppchar_to_utf8 (s, outbufp, outbytesleftp); -- 2.38.1
[PATCH v3 2/3] libcpp: add a function to determine UTF-8 validity of a C string
This simplifies the interface for other UTF-8 validity detections when a simple "yes" or "no" answer is sufficient. libcpp/ * charset.cc: Add `_cpp_valid_utf8_str` which determines whether a C string is valid UTF-8 or not. * internal.h: Add prototype for `_cpp_valid_utf8_str`. Signed-off-by: Ben Boeckel --- libcpp/charset.cc | 20 libcpp/internal.h | 2 ++ 2 files changed, 22 insertions(+) diff --git a/libcpp/charset.cc b/libcpp/charset.cc index 324b5b19136..e130bc01f48 100644 --- a/libcpp/charset.cc +++ b/libcpp/charset.cc @@ -1868,6 +1868,26 @@ _cpp_valid_utf8 (cpp_reader *pfile, return true; } +/* Detect whether a C-string is a valid UTF-8-encoded set of bytes. Returns +`false` if any contained byte sequence encodes an invalid Unicode codepoint +or is not a valid UTF-8 sequence. Returns `true` otherwise. */ + +extern bool +_cpp_valid_utf8_str (const char *name) +{ + const uchar* in = (const uchar*)name; + size_t len = strlen(name); + cppchar_t cp; + + while (*in) +{ + if (one_utf8_to_cppchar(&in, &len, &cp)) + return false; +} + + return true; +} + /* Subroutine of convert_hex and convert_oct. N is the representation in the execution character set of a numeric escape; write it into the string buffer TBUF and update the end-of-string pointer therein. WIDE diff --git a/libcpp/internal.h b/libcpp/internal.h index badfd1b40da..4f2dd4a2f5c 100644 --- a/libcpp/internal.h +++ b/libcpp/internal.h @@ -834,6 +834,8 @@ extern bool _cpp_valid_utf8 (cpp_reader *pfile, struct normalize_state *nst, cppchar_t *cp); +extern bool _cpp_valid_utf8_str (const char *str); + extern void _cpp_destroy_iconv (cpp_reader *); extern unsigned char *_cpp_convert_input (cpp_reader *, const char *, unsigned char *, size_t, size_t, -- 2.38.1
[PATCH v3 3/3] p1689r5: initial support
This patch implements support for [P1689R5][] to communicate to a build system the C++20 module dependencies to build systems so that they may build `.gcm` files in the proper order. Support is communicated through the following three new flags: - `-fdeps-format=` specifies the format for the output. Currently named `p1689r5`. - `-fdeps-file=` specifies the path to the file to write the format to. - `-fdep-output=` specifies the `.o` that will be written for the TU that is scanned. This is required so that the build system can correlate the dependency output with the actual compilation that will occur. CMake supports this format as of 17 Jun 2022 (to be part of 3.25.0) using an experimental feature selection (to allow for future usage evolution without committing to how it works today). While it remains experimental, docs may be found in CMake's documentation for experimental features. Future work may include using this format for Fortran module dependencies as well, however this is still pending work. [P1689R5]: https://isocpp.org/files/papers/P1689R5.html [cmake-experimental]: https://gitlab.kitware.com/cmake/cmake/-/blob/master/Help/dev/experimental.rst TODO: - header-unit information fields Header units (including the standard library headers) are 100% unsupported right now because the `-E` mechanism wants to import their BMIs. A new mode (i.e., something more workable than existing `-E` behavior) that mocks up header units as if they were imported purely from their path and content would be required. - non-utf8 paths The current standard says that paths that are not unambiguously represented using UTF-8 are not supported (because these cases are rare and the extra complication is not worth it at this time). Future versions of the format might have ways of encoding non-UTF-8 paths. For now, this patch just doesn't support non-UTF-8 paths (ignoring the "unambiguously represetable in UTF-8" case). - figure out why junk gets placed at the end of the file Sometimes it seems like the file gets a lot of `NUL` bytes appended to it. It happens rarely and seems to be the result of some `ftruncate`-style call which results in extra padding in the contents. Noting it here as an observation at least. libcpp/ * include/cpplib.h: Add cpp_deps_format enum. (cpp_options): Add format field (cpp_finish): Add dependency stream parameter. * include/mkdeps.h (deps_add_module_target): Add new preprocessor parameter used for C++ module tracking. * init.cc (cpp_finish): Add new preprocessor parameter used for C++ module tracking. * mkdeps.cc (mkdeps): Implement P1689R5 output. gcc/ * doc/invoke.texi: Document -fdeps-format=, -fdep-file=, and -fdep-output= flags. gcc/c-family/ * c-opts.cc (c_common_handle_option): Add fdeps_file variable and -fdeps-format=, -fdep-file=, and -fdep-output= parsing. * c.opt: Add -fdeps-format=, -fdep-file=, and -fdep-output= flags. gcc/cp/ * module.cc (preprocessed_module): Pass whether the module is exported to dependency tracking. gcc/testsuite/ * g++.dg/modules/depflags-f-MD.C: New test. * g++.dg/modules/depflags-f.C: New test. * g++.dg/modules/depflags-fi.C: New test. * g++.dg/modules/depflags-fj-MD.C: New test. * g++.dg/modules/depflags-fj.C: New test. * g++.dg/modules/depflags-fjo-MD.C: New test. * g++.dg/modules/depflags-fjo.C: New test. * g++.dg/modules/depflags-fo-MD.C: New test. * g++.dg/modules/depflags-fo.C: New test. * g++.dg/modules/depflags-j-MD.C: New test. * g++.dg/modules/depflags-j.C: New test. * g++.dg/modules/depflags-jo-MD.C: New test. * g++.dg/modules/depflags-jo.C: New test. * g++.dg/modules/depflags-o-MD.C: New test. * g++.dg/modules/depflags-o.C: New test. * g++.dg/modules/p1689-1.C: New test. * g++.dg/modules/p1689-1.exp.json: New test expectation. * g++.dg/modules/p1689-2.C: New test. * g++.dg/modules/p1689-2.exp.json: New test expectation. * g++.dg/modules/p1689-3.C: New test. * g++.dg/modules/p1689-3.exp.json: New test expectation. * g++.dg/modules/p1689-4.C: New test. * g++.dg/modules/p1689-4.exp.json: New test expectation. * g++.dg/modules/p1689-5.C: New test. * g++.dg/modules/p1689-5.exp.json: New test expectation. * g++.dg/modules/modules.exp: Load new P1689 library routines. * g++.dg/modules/test-p1689.py: New tool for validating P1689 output. * lib/modules.exp: Support for validating P1689 outputs. Signed-off-by: Ben Boeckel --- gcc/c-family/c-opts.cc| 40 +++- gcc/c-family/c.opt| 12 + gcc/cp/module.cc | 3 +- gcc/doc/invoke.texi | 15 ++ gcc/testsuite/g++
Re: How can Autoconf help with the transition to stricter compilation defaults?
On Tue, Nov 15, 2022 at 15:09:19 -0800, Paul Eggert wrote: > This may be a hack, but it's a *good* hack. It's likely to fix > real-world bugs that would be caused if Clang becomes overly pedantic by > default here. And the probability of introducing real-world bugs is > essentially zero. FWIW, CMake uses the same signature for detecting whether a function exists or not: https://gitlab.kitware.com/cmake/cmake/-/blob/master/Modules/CheckFunctionExists.c It's also been like this (without the `void`) for 20 years; the `void` argument was added 6 years ago: https://gitlab.kitware.com/cmake/cmake/-/commits/master/Modules/CheckFunctionExists.c --Ben
[PATCH v4 1/3] libcpp: reject codepoints above 0x10FFFF
Unicode does not support such values because they are unrepresentable in UTF-16. libcpp/ * charset.cc: Reject encodings of codepoints above 0x10. UTF-16 does not support such codepoints and therefore all Unicode rejects such values. Signed-off-by: Ben Boeckel --- libcpp/charset.cc | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/libcpp/charset.cc b/libcpp/charset.cc index 12a398e7527..324b5b19136 100644 --- a/libcpp/charset.cc +++ b/libcpp/charset.cc @@ -158,6 +158,10 @@ struct _cpp_strbuf encoded as any of DF 80, E0 9F 80, F0 80 9F 80, F8 80 80 9F 80, or FC 80 80 80 9F 80. Only the first is valid. + Additionally, Unicode declares that all codepoints above 0010 are + invalid because they cannot be represented in UTF-16. As such, all 5- and + 6-byte encodings are invalid. + An implementation note: the transformation from UTF-16 to UTF-8, or vice versa, is easiest done by using UTF-32 as an intermediary. */ @@ -216,7 +220,7 @@ one_utf8_to_cppchar (const uchar **inbufp, size_t *inbytesleftp, if (c <= 0x3FF && nbytes > 5) return EILSEQ; /* Make sure the character is valid. */ - if (c > 0x7FFF || (c >= 0xD800 && c <= 0xDFFF)) return EILSEQ; + if (c > 0x10 || (c >= 0xD800 && c <= 0xDFFF)) return EILSEQ; *cp = c; *inbufp = inbuf; @@ -320,7 +324,7 @@ one_utf32_to_utf8 (iconv_t bigend, const uchar **inbufp, size_t *inbytesleftp, s += inbuf[bigend ? 2 : 1] << 8; s += inbuf[bigend ? 3 : 0]; - if (s >= 0x7FFF || (s >= 0xD800 && s <= 0xDFFF)) + if (s > 0x10 || (s >= 0xD800 && s <= 0xDFFF)) return EILSEQ; rval = one_cppchar_to_utf8 (s, outbufp, outbytesleftp); -- 2.38.1
[PATCH v4 0/3] RFC: P1689R5 support
Hi, This patch adds initial support for ISO C++'s [P1689R5][], a format for describing C++ module requirements and provisions based on the source code. This is required because compiling C++ with modules is not embarrassingly parallel and need to be ordered to ensure that `import some_module;` can be satisfied in time by making sure that the TU with `export import some_module;` is compiled first. [P1689R5]: https://isocpp.org/files/papers/P1689R5.html I'd like feedback on the approach taken here with respect to the user-visible flags. I'll also note that header units are not supported at this time because the current `-E` behavior with respect to `import ;` is to search for an appropriate `.gcm` file which is not something such a "scan" can support. A new mode will likely need to be created (e.g., replacing `-E` with `-fc++-module-scanning` or something) where headers are looked up "normally" and processed only as much as scanning requires. For the record, Clang has patches with similar flags and behavior by Chuanqi Xu here: https://reviews.llvm.org/D134269 with the same flags. Thanks, --Ben --- v3 -> v4: - add missing spaces between function names and arguments v2 -> v3: - changelog entries moved to commit messages - documentation updated/added in the UTF-8 routine editing v1 -> v2: - removal of the `deps_write(extra)` parameter to option-checking where ndeeded - default parameter of `cpp_finish(fdeps_stream = NULL)` - unification of libcpp UTF-8 validity functions from v1 - test cases for flag parsing states (depflags-*) and p1689 output (p1689-*) Ben Boeckel (3): libcpp: reject codepoints above 0x10 libcpp: add a function to determine UTF-8 validity of a C string p1689r5: initial support gcc/c-family/c-opts.cc| 40 +++- gcc/c-family/c.opt| 12 + gcc/cp/module.cc | 3 +- gcc/doc/invoke.texi | 15 ++ gcc/testsuite/g++.dg/modules/depflags-f-MD.C | 2 + gcc/testsuite/g++.dg/modules/depflags-f.C | 1 + gcc/testsuite/g++.dg/modules/depflags-fi.C| 3 + gcc/testsuite/g++.dg/modules/depflags-fj-MD.C | 3 + gcc/testsuite/g++.dg/modules/depflags-fj.C| 4 + .../g++.dg/modules/depflags-fjo-MD.C | 4 + gcc/testsuite/g++.dg/modules/depflags-fjo.C | 5 + gcc/testsuite/g++.dg/modules/depflags-fo-MD.C | 3 + gcc/testsuite/g++.dg/modules/depflags-fo.C| 4 + gcc/testsuite/g++.dg/modules/depflags-j-MD.C | 2 + gcc/testsuite/g++.dg/modules/depflags-j.C | 3 + gcc/testsuite/g++.dg/modules/depflags-jo-MD.C | 3 + gcc/testsuite/g++.dg/modules/depflags-jo.C| 4 + gcc/testsuite/g++.dg/modules/depflags-o-MD.C | 2 + gcc/testsuite/g++.dg/modules/depflags-o.C | 3 + gcc/testsuite/g++.dg/modules/modules.exp | 1 + gcc/testsuite/g++.dg/modules/p1689-1.C| 18 ++ gcc/testsuite/g++.dg/modules/p1689-1.exp.json | 27 +++ gcc/testsuite/g++.dg/modules/p1689-2.C| 16 ++ gcc/testsuite/g++.dg/modules/p1689-2.exp.json | 16 ++ gcc/testsuite/g++.dg/modules/p1689-3.C| 14 ++ gcc/testsuite/g++.dg/modules/p1689-3.exp.json | 16 ++ gcc/testsuite/g++.dg/modules/p1689-4.C| 14 ++ gcc/testsuite/g++.dg/modules/p1689-4.exp.json | 14 ++ gcc/testsuite/g++.dg/modules/p1689-5.C| 14 ++ gcc/testsuite/g++.dg/modules/p1689-5.exp.json | 14 ++ gcc/testsuite/g++.dg/modules/test-p1689.py| 222 ++ gcc/testsuite/lib/modules.exp | 71 ++ libcpp/charset.cc | 28 ++- libcpp/include/cpplib.h | 12 +- libcpp/include/mkdeps.h | 17 +- libcpp/init.cc| 13 +- libcpp/internal.h | 2 + libcpp/mkdeps.cc | 149 +++- 38 files changed, 773 insertions(+), 21 deletions(-) create mode 100644 gcc/testsuite/g++.dg/modules/depflags-f-MD.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-f.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fi.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fj-MD.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fj.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fjo-MD.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fjo.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fo-MD.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fo.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-j-MD.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-j.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-jo-MD.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-jo.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-o-MD.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-o.C create mode 100644 gcc/testsuite/g++.
[PATCH v4 2/3] libcpp: add a function to determine UTF-8 validity of a C string
This simplifies the interface for other UTF-8 validity detections when a simple "yes" or "no" answer is sufficient. libcpp/ * charset.cc: Add `_cpp_valid_utf8_str` which determines whether a C string is valid UTF-8 or not. * internal.h: Add prototype for `_cpp_valid_utf8_str`. Signed-off-by: Ben Boeckel --- libcpp/charset.cc | 20 libcpp/internal.h | 2 ++ 2 files changed, 22 insertions(+) diff --git a/libcpp/charset.cc b/libcpp/charset.cc index 324b5b19136..422cb52595c 100644 --- a/libcpp/charset.cc +++ b/libcpp/charset.cc @@ -1868,6 +1868,26 @@ _cpp_valid_utf8 (cpp_reader *pfile, return true; } +/* Detect whether a C-string is a valid UTF-8-encoded set of bytes. Returns +`false` if any contained byte sequence encodes an invalid Unicode codepoint +or is not a valid UTF-8 sequence. Returns `true` otherwise. */ + +extern bool +_cpp_valid_utf8_str (const char *name) +{ + const uchar* in = (const uchar*)name; + size_t len = strlen (name); + cppchar_t cp; + + while (*in) +{ + if (one_utf8_to_cppchar (&in, &len, &cp)) + return false; +} + + return true; +} + /* Subroutine of convert_hex and convert_oct. N is the representation in the execution character set of a numeric escape; write it into the string buffer TBUF and update the end-of-string pointer therein. WIDE diff --git a/libcpp/internal.h b/libcpp/internal.h index badfd1b40da..4f2dd4a2f5c 100644 --- a/libcpp/internal.h +++ b/libcpp/internal.h @@ -834,6 +834,8 @@ extern bool _cpp_valid_utf8 (cpp_reader *pfile, struct normalize_state *nst, cppchar_t *cp); +extern bool _cpp_valid_utf8_str (const char *str); + extern void _cpp_destroy_iconv (cpp_reader *); extern unsigned char *_cpp_convert_input (cpp_reader *, const char *, unsigned char *, size_t, size_t, -- 2.38.1
[PATCH v4 3/3] p1689r5: initial support
This patch implements support for [P1689R5][] to communicate to a build system the C++20 module dependencies to build systems so that they may build `.gcm` files in the proper order. Support is communicated through the following three new flags: - `-fdeps-format=` specifies the format for the output. Currently named `p1689r5`. - `-fdeps-file=` specifies the path to the file to write the format to. - `-fdep-output=` specifies the `.o` that will be written for the TU that is scanned. This is required so that the build system can correlate the dependency output with the actual compilation that will occur. CMake supports this format as of 17 Jun 2022 (to be part of 3.25.0) using an experimental feature selection (to allow for future usage evolution without committing to how it works today). While it remains experimental, docs may be found in CMake's documentation for experimental features. Future work may include using this format for Fortran module dependencies as well, however this is still pending work. [P1689R5]: https://isocpp.org/files/papers/P1689R5.html [cmake-experimental]: https://gitlab.kitware.com/cmake/cmake/-/blob/master/Help/dev/experimental.rst TODO: - header-unit information fields Header units (including the standard library headers) are 100% unsupported right now because the `-E` mechanism wants to import their BMIs. A new mode (i.e., something more workable than existing `-E` behavior) that mocks up header units as if they were imported purely from their path and content would be required. - non-utf8 paths The current standard says that paths that are not unambiguously represented using UTF-8 are not supported (because these cases are rare and the extra complication is not worth it at this time). Future versions of the format might have ways of encoding non-UTF-8 paths. For now, this patch just doesn't support non-UTF-8 paths (ignoring the "unambiguously represetable in UTF-8" case). - figure out why junk gets placed at the end of the file Sometimes it seems like the file gets a lot of `NUL` bytes appended to it. It happens rarely and seems to be the result of some `ftruncate`-style call which results in extra padding in the contents. Noting it here as an observation at least. libcpp/ * include/cpplib.h: Add cpp_deps_format enum. (cpp_options): Add format field (cpp_finish): Add dependency stream parameter. * include/mkdeps.h (deps_add_module_target): Add new preprocessor parameter used for C++ module tracking. * init.cc (cpp_finish): Add new preprocessor parameter used for C++ module tracking. * mkdeps.cc (mkdeps): Implement P1689R5 output. gcc/ * doc/invoke.texi: Document -fdeps-format=, -fdep-file=, and -fdep-output= flags. gcc/c-family/ * c-opts.cc (c_common_handle_option): Add fdeps_file variable and -fdeps-format=, -fdep-file=, and -fdep-output= parsing. * c.opt: Add -fdeps-format=, -fdep-file=, and -fdep-output= flags. gcc/cp/ * module.cc (preprocessed_module): Pass whether the module is exported to dependency tracking. gcc/testsuite/ * g++.dg/modules/depflags-f-MD.C: New test. * g++.dg/modules/depflags-f.C: New test. * g++.dg/modules/depflags-fi.C: New test. * g++.dg/modules/depflags-fj-MD.C: New test. * g++.dg/modules/depflags-fj.C: New test. * g++.dg/modules/depflags-fjo-MD.C: New test. * g++.dg/modules/depflags-fjo.C: New test. * g++.dg/modules/depflags-fo-MD.C: New test. * g++.dg/modules/depflags-fo.C: New test. * g++.dg/modules/depflags-j-MD.C: New test. * g++.dg/modules/depflags-j.C: New test. * g++.dg/modules/depflags-jo-MD.C: New test. * g++.dg/modules/depflags-jo.C: New test. * g++.dg/modules/depflags-o-MD.C: New test. * g++.dg/modules/depflags-o.C: New test. * g++.dg/modules/p1689-1.C: New test. * g++.dg/modules/p1689-1.exp.json: New test expectation. * g++.dg/modules/p1689-2.C: New test. * g++.dg/modules/p1689-2.exp.json: New test expectation. * g++.dg/modules/p1689-3.C: New test. * g++.dg/modules/p1689-3.exp.json: New test expectation. * g++.dg/modules/p1689-4.C: New test. * g++.dg/modules/p1689-4.exp.json: New test expectation. * g++.dg/modules/p1689-5.C: New test. * g++.dg/modules/p1689-5.exp.json: New test expectation. * g++.dg/modules/modules.exp: Load new P1689 library routines. * g++.dg/modules/test-p1689.py: New tool for validating P1689 output. * lib/modules.exp: Support for validating P1689 outputs. Signed-off-by: Ben Boeckel --- gcc/c-family/c-opts.cc| 40 +++- gcc/c-family/c.opt| 12 + gcc/cp/module.cc | 3 +- gcc/doc/invoke.texi | 15 ++ gcc/testsuite/g++
Re: -minstd: Require a minimum std version, without being specific
On Wed, Dec 21, 2022 at 19:33:48 +0100, Alejandro Colomar via Gcc wrote: > I've long had this wish: an option similar to -std, but which would not > specify > the standard. Rather, mark a requirement that the standard be at least a > version. > > This would be especially useful for libraries, which might for example > require > C99 or C11 to work. They would be able to specify -minstd=c11 in their pc(5) > file (for use with pkgconf(1)). That way, a program using such library, > would > be free to use -std to specify the C version that the project should be > compiled > with; maybe gnu17, maybe even gnu2x. But if the program tries to compile > under, > say gnu89, the compiler would report an error. (FD: CMake developer, ISO C++ SG15 committee member) I'd like to see us move away from "flag soup" and instead towards more structured information here. This request corresponds to CMake's `C_STANDARD` target property[1] which CMake then puts together to mean something. Note that there are real hazards with just putting the flags like this into `.pc` files directly. If you have a C library and say "-minstd=c99" and I'm C++, what is supposed to happen with this flag for a C++ compilre? Does it translate to C++11 (which is the first C++ standard to "fully contain" C99)? What if there is no answer (e.g., `-minstd=c23`)? FWIW, my idea is to broaden the concept CMake has of "usage requirements" to not be so CMake-centered and to encompass things like "you need an rpath to X to use my libraries" or even "here's an entry for `PYTHONPATH` to use my Python code". ISO C++'s SG15 has started discussion of such things with an eye towards standardization here: https://github.com/isocpp/pkg-fmt --Ben [1]https://cmake.org/cmake/help/latest/prop_tgt/C_STANDARD.html
[PATCH v5 0/5] P1689R5 support
Hi, This patch series adds initial support for ISO C++'s [P1689R5][], a format for describing C++ module requirements and provisions based on the source code. This is required because compiling C++ with modules is not embarrassingly parallel and need to be ordered to ensure that `import some_module;` can be satisfied in time by making sure that any TU with `export import some_module;` is compiled first. [P1689R5]: https://isocpp.org/files/papers/P1689R5.html I've also added patches to include imported module CMI files and the module mapper file as dependencies of the compilation. I briefly looked into adding dependencies on response files as well, but that appeared to need some code contortions to have a `class mkdeps` available before parsing the command line or to keep the information around until one was made. I'd like feedback on the approach taken here with respect to the user-visible flags. I'll also note that header units are not supported at this time because the current `-E` behavior with respect to `import ;` is to search for an appropriate `.gcm` file which is not something such a "scan" can support. A new mode will likely need to be created (e.g., replacing `-E` with `-fc++-module-scanning` or something) where headers are looked up "normally" and processed only as much as scanning requires. FWIW, Clang as taken an alternate approach with its `clang-scan-deps` tool rather than using the compiler directly. Thanks, --Ben --- v4 -> v5: - add dependency tracking for imported modules to `-MF` - add dependency tracking for static module mapper files given to `-fmodule-mapper=` v3 -> v4: - add missing spaces between function names and arguments v2 -> v3: - changelog entries moved to commit messages - documentation updated/added in the UTF-8 routine editing v1 -> v2: - removal of the `deps_write(extra)` parameter to option-checking where ndeeded - default parameter of `cpp_finish(fdeps_stream = NULL)` - unification of libcpp UTF-8 validity functions from v1 - test cases for flag parsing states (depflags-*) and p1689 output (p1689-*) Ben Boeckel (5): libcpp: reject codepoints above 0x10 libcpp: add a function to determine UTF-8 validity of a C string p1689r5: initial support c++modules: report imported CMI files as dependencies c++modules: report module mapper files as a dependency gcc/c-family/c-opts.cc| 40 +++- gcc/c-family/c.opt| 12 + gcc/cp/mapper-client.cc | 4 + gcc/cp/mapper-client.h| 1 + gcc/cp/module.cc | 23 +- gcc/doc/invoke.texi | 15 ++ gcc/testsuite/g++.dg/modules/depflags-f-MD.C | 2 + gcc/testsuite/g++.dg/modules/depflags-f.C | 1 + gcc/testsuite/g++.dg/modules/depflags-fi.C| 3 + gcc/testsuite/g++.dg/modules/depflags-fj-MD.C | 3 + gcc/testsuite/g++.dg/modules/depflags-fj.C| 4 + .../g++.dg/modules/depflags-fjo-MD.C | 4 + gcc/testsuite/g++.dg/modules/depflags-fjo.C | 5 + gcc/testsuite/g++.dg/modules/depflags-fo-MD.C | 3 + gcc/testsuite/g++.dg/modules/depflags-fo.C| 4 + gcc/testsuite/g++.dg/modules/depflags-j-MD.C | 2 + gcc/testsuite/g++.dg/modules/depflags-j.C | 3 + gcc/testsuite/g++.dg/modules/depflags-jo-MD.C | 3 + gcc/testsuite/g++.dg/modules/depflags-jo.C| 4 + gcc/testsuite/g++.dg/modules/depflags-o-MD.C | 2 + gcc/testsuite/g++.dg/modules/depflags-o.C | 3 + gcc/testsuite/g++.dg/modules/modules.exp | 1 + gcc/testsuite/g++.dg/modules/p1689-1.C| 18 ++ gcc/testsuite/g++.dg/modules/p1689-1.exp.json | 27 +++ gcc/testsuite/g++.dg/modules/p1689-2.C| 16 ++ gcc/testsuite/g++.dg/modules/p1689-2.exp.json | 16 ++ gcc/testsuite/g++.dg/modules/p1689-3.C| 14 ++ gcc/testsuite/g++.dg/modules/p1689-3.exp.json | 16 ++ gcc/testsuite/g++.dg/modules/p1689-4.C| 14 ++ gcc/testsuite/g++.dg/modules/p1689-4.exp.json | 14 ++ gcc/testsuite/g++.dg/modules/p1689-5.C| 14 ++ gcc/testsuite/g++.dg/modules/p1689-5.exp.json | 14 ++ gcc/testsuite/g++.dg/modules/test-p1689.py| 222 ++ gcc/testsuite/lib/modules.exp | 71 ++ libcpp/charset.cc | 28 ++- libcpp/include/cpplib.h | 12 +- libcpp/include/mkdeps.h | 17 +- libcpp/init.cc| 13 +- libcpp/internal.h | 2 + libcpp/mkdeps.cc | 149 +++- 40 files changed, 789 insertions(+), 30 deletions(-) create mode 100644 gcc/testsuite/g++.dg/modules/depflags-f-MD.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-f.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fi.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fj-MD.C create mode 10064
[PATCH v5 2/5] libcpp: add a function to determine UTF-8 validity of a C string
This simplifies the interface for other UTF-8 validity detections when a simple "yes" or "no" answer is sufficient. libcpp/ * charset.cc: Add `_cpp_valid_utf8_str` which determines whether a C string is valid UTF-8 or not. * internal.h: Add prototype for `_cpp_valid_utf8_str`. Signed-off-by: Ben Boeckel --- libcpp/charset.cc | 20 libcpp/internal.h | 2 ++ 2 files changed, 22 insertions(+) diff --git a/libcpp/charset.cc b/libcpp/charset.cc index f7ae12ea5a2..616be9d02ee 100644 --- a/libcpp/charset.cc +++ b/libcpp/charset.cc @@ -1868,6 +1868,26 @@ _cpp_valid_utf8 (cpp_reader *pfile, return true; } +/* Detect whether a C-string is a valid UTF-8-encoded set of bytes. Returns +`false` if any contained byte sequence encodes an invalid Unicode codepoint +or is not a valid UTF-8 sequence. Returns `true` otherwise. */ + +extern bool +_cpp_valid_utf8_str (const char *name) +{ + const uchar* in = (const uchar*)name; + size_t len = strlen (name); + cppchar_t cp; + + while (*in) +{ + if (one_utf8_to_cppchar (&in, &len, &cp)) + return false; +} + + return true; +} + /* Subroutine of convert_hex and convert_oct. N is the representation in the execution character set of a numeric escape; write it into the string buffer TBUF and update the end-of-string pointer therein. WIDE diff --git a/libcpp/internal.h b/libcpp/internal.h index 9724676a8cd..48520901b2d 100644 --- a/libcpp/internal.h +++ b/libcpp/internal.h @@ -834,6 +834,8 @@ extern bool _cpp_valid_utf8 (cpp_reader *pfile, struct normalize_state *nst, cppchar_t *cp); +extern bool _cpp_valid_utf8_str (const char *str); + extern void _cpp_destroy_iconv (cpp_reader *); extern unsigned char *_cpp_convert_input (cpp_reader *, const char *, unsigned char *, size_t, size_t, -- 2.39.0
[PATCH v5 1/5] libcpp: reject codepoints above 0x10FFFF
Unicode does not support such values because they are unrepresentable in UTF-16. libcpp/ * charset.cc: Reject encodings of codepoints above 0x10. UTF-16 does not support such codepoints and therefore all Unicode rejects such values. Signed-off-by: Ben Boeckel --- libcpp/charset.cc | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/libcpp/charset.cc b/libcpp/charset.cc index 3c47d4f868b..f7ae12ea5a2 100644 --- a/libcpp/charset.cc +++ b/libcpp/charset.cc @@ -158,6 +158,10 @@ struct _cpp_strbuf encoded as any of DF 80, E0 9F 80, F0 80 9F 80, F8 80 80 9F 80, or FC 80 80 80 9F 80. Only the first is valid. + Additionally, Unicode declares that all codepoints above 0010 are + invalid because they cannot be represented in UTF-16. As such, all 5- and + 6-byte encodings are invalid. + An implementation note: the transformation from UTF-16 to UTF-8, or vice versa, is easiest done by using UTF-32 as an intermediary. */ @@ -216,7 +220,7 @@ one_utf8_to_cppchar (const uchar **inbufp, size_t *inbytesleftp, if (c <= 0x3FF && nbytes > 5) return EILSEQ; /* Make sure the character is valid. */ - if (c > 0x7FFF || (c >= 0xD800 && c <= 0xDFFF)) return EILSEQ; + if (c > 0x10 || (c >= 0xD800 && c <= 0xDFFF)) return EILSEQ; *cp = c; *inbufp = inbuf; @@ -320,7 +324,7 @@ one_utf32_to_utf8 (iconv_t bigend, const uchar **inbufp, size_t *inbytesleftp, s += inbuf[bigend ? 2 : 1] << 8; s += inbuf[bigend ? 3 : 0]; - if (s >= 0x7FFF || (s >= 0xD800 && s <= 0xDFFF)) + if (s > 0x10 || (s >= 0xD800 && s <= 0xDFFF)) return EILSEQ; rval = one_cppchar_to_utf8 (s, outbufp, outbytesleftp); -- 2.39.0
[PATCH v5 4/5] c++modules: report imported CMI files as dependencies
They affect the build, so report them via `-MF` mechanisms. gcc/cp/ * module.cc (do_import): Report imported CMI files as dependencies. Signed-off-by: Ben Boeckel --- gcc/cp/module.cc | 2 ++ 1 file changed, 2 insertions(+) diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc index ebd30f63d81..dbd1b721616 100644 --- a/gcc/cp/module.cc +++ b/gcc/cp/module.cc @@ -18966,6 +18966,8 @@ module_state::do_import (cpp_reader *reader, bool outermost) dump () && dump ("CMI is %s", file); if (note_module_cmi_yes || inform_cmi_p) inform (loc, "reading CMI %qs", file); + /* Add the CMI file to the dependency tracking. */ + deps_add_dep (cpp_get_deps (reader), file); fd = open (file, O_RDONLY | O_CLOEXEC | O_BINARY); e = errno; } -- 2.39.0
[PATCH v5 3/5] p1689r5: initial support
This patch implements support for [P1689R5][] to communicate to a build system the C++20 module dependencies to build systems so that they may build `.gcm` files in the proper order. Support is communicated through the following three new flags: - `-fdeps-format=` specifies the format for the output. Currently named `p1689r5`. - `-fdeps-file=` specifies the path to the file to write the format to. - `-fdep-output=` specifies the `.o` that will be written for the TU that is scanned. This is required so that the build system can correlate the dependency output with the actual compilation that will occur. CMake supports this format as of 17 Jun 2022 (to be part of 3.25.0) using an experimental feature selection (to allow for future usage evolution without committing to how it works today). While it remains experimental, docs may be found in CMake's documentation for experimental features. Future work may include using this format for Fortran module dependencies as well, however this is still pending work. [P1689R5]: https://isocpp.org/files/papers/P1689R5.html [cmake-experimental]: https://gitlab.kitware.com/cmake/cmake/-/blob/master/Help/dev/experimental.rst TODO: - header-unit information fields Header units (including the standard library headers) are 100% unsupported right now because the `-E` mechanism wants to import their BMIs. A new mode (i.e., something more workable than existing `-E` behavior) that mocks up header units as if they were imported purely from their path and content would be required. - non-utf8 paths The current standard says that paths that are not unambiguously represented using UTF-8 are not supported (because these cases are rare and the extra complication is not worth it at this time). Future versions of the format might have ways of encoding non-UTF-8 paths. For now, this patch just doesn't support non-UTF-8 paths (ignoring the "unambiguously represetable in UTF-8" case). - figure out why junk gets placed at the end of the file Sometimes it seems like the file gets a lot of `NUL` bytes appended to it. It happens rarely and seems to be the result of some `ftruncate`-style call which results in extra padding in the contents. Noting it here as an observation at least. libcpp/ * include/cpplib.h: Add cpp_deps_format enum. (cpp_options): Add format field (cpp_finish): Add dependency stream parameter. * include/mkdeps.h (deps_add_module_target): Add new preprocessor parameter used for C++ module tracking. * init.cc (cpp_finish): Add new preprocessor parameter used for C++ module tracking. * mkdeps.cc (mkdeps): Implement P1689R5 output. gcc/ * doc/invoke.texi: Document -fdeps-format=, -fdep-file=, and -fdep-output= flags. gcc/c-family/ * c-opts.cc (c_common_handle_option): Add fdeps_file variable and -fdeps-format=, -fdep-file=, and -fdep-output= parsing. * c.opt: Add -fdeps-format=, -fdep-file=, and -fdep-output= flags. gcc/cp/ * module.cc (preprocessed_module): Pass whether the module is exported to dependency tracking. gcc/testsuite/ * g++.dg/modules/depflags-f-MD.C: New test. * g++.dg/modules/depflags-f.C: New test. * g++.dg/modules/depflags-fi.C: New test. * g++.dg/modules/depflags-fj-MD.C: New test. * g++.dg/modules/depflags-fj.C: New test. * g++.dg/modules/depflags-fjo-MD.C: New test. * g++.dg/modules/depflags-fjo.C: New test. * g++.dg/modules/depflags-fo-MD.C: New test. * g++.dg/modules/depflags-fo.C: New test. * g++.dg/modules/depflags-j-MD.C: New test. * g++.dg/modules/depflags-j.C: New test. * g++.dg/modules/depflags-jo-MD.C: New test. * g++.dg/modules/depflags-jo.C: New test. * g++.dg/modules/depflags-o-MD.C: New test. * g++.dg/modules/depflags-o.C: New test. * g++.dg/modules/p1689-1.C: New test. * g++.dg/modules/p1689-1.exp.json: New test expectation. * g++.dg/modules/p1689-2.C: New test. * g++.dg/modules/p1689-2.exp.json: New test expectation. * g++.dg/modules/p1689-3.C: New test. * g++.dg/modules/p1689-3.exp.json: New test expectation. * g++.dg/modules/p1689-4.C: New test. * g++.dg/modules/p1689-4.exp.json: New test expectation. * g++.dg/modules/p1689-5.C: New test. * g++.dg/modules/p1689-5.exp.json: New test expectation. * g++.dg/modules/modules.exp: Load new P1689 library routines. * g++.dg/modules/test-p1689.py: New tool for validating P1689 output. * lib/modules.exp: Support for validating P1689 outputs. Signed-off-by: Ben Boeckel --- gcc/c-family/c-opts.cc| 40 +++- gcc/c-family/c.opt| 12 + gcc/cp/module.cc | 3 +- gcc/doc/invoke.texi | 15 ++ gcc/testsuite/g++
[PATCH v5 5/5] c++modules: report module mapper files as a dependency
It affects the build, and if used as a static file, can reliably be tracked using the `-MF` mechanism. gcc/cp/: * mapper-client.cc, mapper-client.h (open_module_client): Accept dependency tracking and track module mapper files as dependencies. * module.cc (make_mapper, get_mapper): Pass the dependency tracking class down. Signed-off-by: Ben Boeckel --- gcc/cp/mapper-client.cc | 4 gcc/cp/mapper-client.h | 1 + gcc/cp/module.cc| 18 +- 3 files changed, 14 insertions(+), 9 deletions(-) diff --git a/gcc/cp/mapper-client.cc b/gcc/cp/mapper-client.cc index 39e80df2d25..0ce5679d659 100644 --- a/gcc/cp/mapper-client.cc +++ b/gcc/cp/mapper-client.cc @@ -34,6 +34,7 @@ along with GCC; see the file COPYING3. If not see #include "diagnostic-core.h" #include "mapper-client.h" #include "intl.h" +#include "mkdeps.h" #include "../../c++tools/resolver.h" @@ -132,6 +133,7 @@ spawn_mapper_program (char const **errmsg, std::string &name, module_client * module_client::open_module_client (location_t loc, const char *o, + class mkdeps *deps, void (*set_repo) (const char *), char const *full_program_name) { @@ -285,6 +287,8 @@ module_client::open_module_client (location_t loc, const char *o, errmsg = "opening"; else { + /* Add the mapper file to the dependency tracking. */ + deps_add_dep (deps, name.c_str ()); if (int l = r->read_tuple_file (fd, ident, false)) { if (l > 0) diff --git a/gcc/cp/mapper-client.h b/gcc/cp/mapper-client.h index b32723ce296..a3b0b8adc51 100644 --- a/gcc/cp/mapper-client.h +++ b/gcc/cp/mapper-client.h @@ -55,6 +55,7 @@ public: public: static module_client *open_module_client (location_t loc, const char *option, + class mkdeps *, void (*set_repo) (const char *), char const *); static void close_module_client (location_t loc, module_client *); diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc index dbd1b721616..37066bf072b 100644 --- a/gcc/cp/module.cc +++ b/gcc/cp/module.cc @@ -3969,12 +3969,12 @@ static GTY(()) vec *partial_specializations; /* Our module mapper (created lazily). */ module_client *mapper; -static module_client *make_mapper (location_t loc); -inline module_client *get_mapper (location_t loc) +static module_client *make_mapper (location_t loc, class mkdeps *deps); +inline module_client *get_mapper (location_t loc, class mkdeps *deps) { auto *res = mapper; if (!res) -res = make_mapper (loc); +res = make_mapper (loc, deps); return res; } @@ -14031,7 +14031,7 @@ get_module (const char *ptr) /* Create a new mapper connecting to OPTION. */ module_client * -make_mapper (location_t loc) +make_mapper (location_t loc, class mkdeps *deps) { timevar_start (TV_MODULE_MAPPER); const char *option = module_mapper_name; @@ -14039,7 +14039,7 @@ make_mapper (location_t loc) option = getenv ("CXX_MODULE_MAPPER"); mapper = module_client::open_module_client -(loc, option, &set_cmi_repo, +(loc, option, deps, &set_cmi_repo, (save_decoded_options[0].opt_index == OPT_SPECIAL_program_name) && save_decoded_options[0].arg != progname ? save_decoded_options[0].arg : nullptr); @@ -19503,7 +19503,7 @@ maybe_translate_include (cpp_reader *reader, line_maps *lmaps, location_t loc, dump.push (NULL); dump () && dump ("Checking include translation '%s'", path); - auto *mapper = get_mapper (cpp_main_loc (reader)); + auto *mapper = get_mapper (cpp_main_loc (reader), cpp_get_deps (reader)); size_t len = strlen (path); path = canonicalize_header_name (NULL, loc, true, path, len); @@ -19619,7 +19619,7 @@ module_begin_main_file (cpp_reader *reader, line_maps *lmaps, static void name_pending_imports (cpp_reader *reader) { - auto *mapper = get_mapper (cpp_main_loc (reader)); + auto *mapper = get_mapper (cpp_main_loc (reader), cpp_get_deps (reader)); if (!vec_safe_length (pending_imports)) /* Not doing anything. */ @@ -20089,7 +20089,7 @@ init_modules (cpp_reader *reader) if (!flag_module_lazy) /* Get the mapper now, if we're not being lazy. */ -get_mapper (cpp_main_loc (reader)); +get_mapper (cpp_main_loc (reader), cpp_get_deps (reader)); if (!flag_preprocess_only) { @@ -20299,7 +20299,7 @@ late_finish_module (cpp_reader *reader, module_processing_cookie *cookie, if (!errorcount) { - auto *mapper = get_mapper (cpp_main_loc (reader)); + auto *mapper = get_mapper (cpp_main_loc (reader), cpp_get_deps (reader)); mapper->ModuleCompiled (state->get_flatname ()); } else if (cookie->cmi_name) -- 2.39.0
Re: [PATCH v5 0/5] P1689R5 support
On Wed, Jan 25, 2023 at 16:06:31 -0500, Ben Boeckel wrote: > This patch series adds initial support for ISO C++'s [P1689R5][], a > format for describing C++ module requirements and provisions based on > the source code. This is required because compiling C++ with modules is > not embarrassingly parallel and need to be ordered to ensure that > `import some_module;` can be satisfied in time by making sure that any > TU with `export import some_module;` is compiled first. > > [P1689R5]: https://isocpp.org/files/papers/P1689R5.html > > I've also added patches to include imported module CMI files and the > module mapper file as dependencies of the compilation. I briefly looked > into adding dependencies on response files as well, but that appeared to > need some code contortions to have a `class mkdeps` available before > parsing the command line or to keep the information around until one was > made. > > I'd like feedback on the approach taken here with respect to the > user-visible flags. I'll also note that header units are not supported > at this time because the current `-E` behavior with respect to `import > ;` is to search for an appropriate `.gcm` file which is not > something such a "scan" can support. A new mode will likely need to be > created (e.g., replacing `-E` with `-fc++-module-scanning` or something) > where headers are looked up "normally" and processed only as much as > scanning requires. > > FWIW, Clang as taken an alternate approach with its `clang-scan-deps` > tool rather than using the compiler directly. Ping? It'd be nice to have this supported in at least GCC 14 (since it missed 13). Thanks, --Ben
Re: [PATCH v5 0/5] P1689R5 support
On Thu, Feb 02, 2023 at 21:24:12 +0100, Harald Anlauf wrote: > Am 25.01.23 um 22:06 schrieb Ben Boeckel via Gcc-patches: > > Hi, > > > > This patch series adds initial support for ISO C++'s [P1689R5][], a > > format for describing C++ module requirements and provisions based on > > the source code. This is required because compiling C++ with modules is > > not embarrassingly parallel and need to be ordered to ensure that > > `import some_module;` can be satisfied in time by making sure that any > > TU with `export import some_module;` is compiled first. > > > > [P1689R5]: https://isocpp.org/files/papers/P1689R5.html > > while that paper mentions Fortran, the patch in its present version > does not seem to implement anything related to Fortran and does not > touch the gfortran frontend. Or am I missing anything? Otherwise, > could you give an example how it would be used with Fortran? Correct. Still trying to put the walls back together after modules KoolAid Man'd their way into the build graph structure :) . Being able to drop our Fortran parser (well, we'd have to drop support for Fortran compilers that exist today…so maybe in 2075 or something) and rely on compilers to tell us the information would be amazing though :) . FWIW, the initial revision of the patchset did touch the gfortran frontend, but the new parameter is now defaulted and therefore the callsite doesn't need an update anymore. I still thought it worthwhile to keep the Fortran side aware of what is going on in the space. The link to Fortran comes up because the build graph problem is isomorphic (Fortran supports exporting multiple modules from a single TU, but it's not relevant at the graph level; it's the zero -> any case that is hard), CMake "solved" it already, and C++ is going to have a *lot* more "I want to consume $other_project's modules using my favorite compiler/flags" than seems to happen in Fortran. If you're interested, this is the paper showing how we do it: https://mathstuf.fedorapeople.org/fortran-modules/fortran-modules.html > Thus I'd say that it is OK from the gfortran side. Eventually we'll like to get gfortran supporting this type of scanning, but…as above. Thanks, --Ben
Re: [PATCH v5 0/5] P1689R5 support
On Fri, Feb 03, 2023 at 09:10:21 +, Jonathan Wakely wrote: > On Fri, 3 Feb 2023 at 08:58, Jonathan Wakely wrote: > > On Fri, 3 Feb 2023, 04:09 Andrew Pinski via Gcc, wrote: > >> On Wed, Jan 25, 2023 at 1:07 PM Ben Boeckel via Fortran > >> wrote: > >> > This patch series adds initial support for ISO C++'s [P1689R5][], a > >> > format for describing C++ module requirements and provisions based on > >> > the source code. This is required because compiling C++ with modules is > >> > not embarrassingly parallel and need to be ordered to ensure that > >> > `import some_module;` can be satisfied in time by making sure that any > >> > TU with `export import some_module;` is compiled first. > >> > >> I like how folks are complaining that GCC outputs POSIX makefile > >> syntax from GCC's dependency files which are supposed to be in POSIX > >> Makefile syntax. > >> It seems like rather the build tools are people like to use are not > >> understanding POSIX makefile syntax any more rather. > >> Also I am not a fan of json, it is too verbose for no use. Maybe it is > >> time to go back to standardizing a new POSIX makefile syntax rather > >> than changing C++ here. I'm not complaining that dependency files are in POSIX (or even POSIX-to-be) syntax. The information requires a bit more structure than some variable assignments and I don't expect anything trying to read them to start trying to understand `VAR_$(DEREF)=` and the behaviors of `:=` versus `=` assignment to get this reliably. > > That would take a decade or more. It's too late for POSIX 202x and > > the pace that POSIX agrees on makefile features is incredibly slow. > > Also, name+=value is *not* POSIX make syntax today, that's an > extension. That's why the tools don't always support it. > So I don't think it's true that GCC's dependency files are in POSIX syntax. > > POSIX 202x does add support for it, but it will take some time for it > to be supported everywhere. Additionally, while the *syntax* might be supported, encoding all of P1689 in it would require additional work (e.g., key/value variable assignments or something). Batch scanning would also be…interesting. Also note that the imported modules' location cannot be known before scanning in general, so all you get are "logical names" that you need a collator to link up with other scan results anyways. Tools such as `make` and `ninja` cannot know, in general, how to do this linking between arbitrary targets (e.g., there may be a debug and release build of the same module in the graph and knowing which to use requires higher-level info about the entire build graph; modules may also be considered "private" and not accessible everywhere and therefore should also not be hooked up across different target boundaries). While the `CXX_MODULES +=` approach can work for simple cases (a pseudo-implicit build), it is quite insufficient for the general case. --Ben
Re: [PATCH v5 3/5] p1689r5: initial support
On Tue, Feb 14, 2023 at 16:50:27 -0500, Jason Merrill wrote: > I notice that the actual flags are all -fdep-*, though some of them are > -fdeps-* here, and the internal variables all seem to be fdeps_*. I > lean toward harmonizing on "deps", I think. Done. > I don't love the three separate options, but I suppose it's fine. I'd > prefer "target" instead of "output". Done. > It should be possible to omit both -file and -target and get reasonable > defaults, like the ones for -MD/-MQ in gcc.cc:cpp_unique_options. `file` can be omitted (the `output_stream` will be used then). I *think* I see that adding: %{fdeps_file:-fdeps-file=%{!o:%b.ddi}%{o*:%.ddi%*}} would at least do for `-fdeps-file` defaults? I don't know if there's a reasonable default for `-fdeps-target=` though given that this command line has no information about the object file that will be used (`-o` is used for preprocessor output since we're leaning on `-E` here). --Ben
Re: [PATCH v5 1/5] libcpp: reject codepoints above 0x10FFFF
On Mon, Feb 13, 2023 at 10:53:17 -0500, Jason Merrill wrote: > On 1/25/23 13:06, Ben Boeckel wrote: > > Unicode does not support such values because they are unrepresentable in > > UTF-16. > > > > libcpp/ > > > > * charset.cc: Reject encodings of codepoints above 0x10. > > UTF-16 does not support such codepoints and therefore all > > Unicode rejects such values. > > It seems that this causes a bunch of testsuite failures from tests that > expect this limit to be checked elsewhere with a different diagnostic, > so I think the easiest thing is to fold this into _cpp_valid_utf8_str > instead, i.e.: Since then, `cpp_valid_utf8_p` has appeared and takes care of the over-long encodings. The new patchset just checks for codepoints beyond 0x10 and rejects them in this function (and the test suite matches `master` results for me then). --Ben
Re: [PATCH v5 4/5] c++modules: report imported CMI files as dependencies
On Mon, Feb 13, 2023 at 13:33:50 -0500, Jason Merrill wrote: > Both this and the mapper dependency patch seem to cause most of the > modules testcases to crash; please remember to run the regression tests > (https://gcc.gnu.org/contribute.html#testing) Fixed for v6. `cpp_get_deps` can return `NULL` which `deps_add_dep` assumes to not be true; fixed by checking before calling. --Ben
Re: LSP based on GCC
On Wed, May 17, 2023 at 15:48:09 -0400, Paul Smith wrote: > More frustratingly, Clang has made some poor decisions around > "compatibility": they tried to leverage the GNU ecosystem by emulating > GCC features and arguments but sometimes break things. The most Alas, the cost of trying to make a compiler that can injest in-the-wild code. It's the reason "every" compiler claims various GCC things: too many projects ended up with `#error "Unknown compiler"` in their detections and fixing them when you're just trying to get off the ground is annoying. As far as I know, GCC is locked into never providing a single uniquely identifiable trait because other compilers would end up having to emulate it too once it gets used by projects. CMake basically just ends up with "it's GCC" if `__GNUC__` is defined and none of the other, more specific, preprocessor markers are present. We're kind of getting this again with the variety of different frontends available on top of Clang these days (Apple's Xcode build, upstream itself, Intel's frontend, IBM's LLVM-based frontend, the XL-alike Clang build, Fujitsu has one, ARM's, and who knows how many others are out there). Sometimes they've…forgotten to make something distinctive so that it can be detected reliably. > egregious example I'm aware of is that they look for GCC-named > precompiled headers (.gch), even though the Clang PCH format is > completely different. So if Clang (and the LSP servers built on it) > find a .gch header file they will try to read it, fail, and give an > error. I filed a bug about this in 2019 but it's been ignored. > > This means you need to modify your LSP server arguments to omit any PCH > compiler command line arguments; for environments based on auto- > generated definitions like compile_commands.json this is frustrating. FWIW, this is only going to get worse with C++ modules. --Ben
Re: LSP based on GCC
On Thu, May 18, 2023 at 09:25:04 -0400, Paul Smith wrote: > On Wed, 2023-05-17 at 18:38 -0400, Ben Boeckel wrote: > > FWIW, this is only going to get worse with C++ modules. > > There's no reason it should. Of course the right answer is to tell > people to fix their build systems and if they want to use a different > compiler AND use PCH, they use the appropriate suffix for that > compiler. > > But even if you don't want to do that the fix in this case is trivial. > I even sent a patch (although since I don't know the clang code there's > no doubt that it was not done "the right way" and needed to be > massaged), they just never cared about it. > > The GCC PCH files use a special 4-byte prefix in every file; all you > have to do in clang is, if you find a .gch file open the file and read > the first 4 bytes and if it's a real GCC PCH file you ignore it and if > it's actually a Clang PCH with a malformed name you complain bitterly > and dump core er, I mean, you read it silently as if it had the > right name. PCH files can "be ignored" in some sense because they can be recalculated from `#include` files pretty easily. Module files, however, cannot. > One would hope that, if the GCC module files have a similar compiler- > specific format (I'm not too familiar with modules) they also use a > similar magic number at the beginning of the file. GCC module files are use ELF containers, so there's plenty of metadata to know it's not-for-Clang. But Clang will need to make its own version of these module files to know what, if anything, is provided by it by sources that import it to make any kind of useful suggestions. > But anyway this is losing the thread of Eli's hopeful request. Agreed. A GCC-based LSP will help immensely with GCC-using projects (whether it be Emacs or Vim on the other end of the LSP pipe ;) ). --Ben
[PATCH v6 0/4] P1689R5 support
Hi, This patch series adds initial support for ISO C++'s [P1689R5][], a format for describing C++ module requirements and provisions based on the source code. This is required because compiling C++ with modules is not embarrassingly parallel and need to be ordered to ensure that `import some_module;` can be satisfied in time by making sure that any TU with `export import some_module;` is compiled first. [P1689R5]: https://isocpp.org/files/papers/P1689R5.html I've also added patches to include imported module CMI files and the module mapper file as dependencies of the compilation. I briefly looked into adding dependencies on response files as well, but that appeared to need some code contortions to have a `class mkdeps` available before parsing the command line or to keep the information around until one was made. I'd like feedback on the approach taken here with respect to the user-visible flags. I'll also note that header units are not supported at this time because the current `-E` behavior with respect to `import ;` is to search for an appropriate `.gcm` file which is not something such a "scan" can support. A new mode will likely need to be created (e.g., replacing `-E` with `-fc++-module-scanning` or something) where headers are looked up "normally" and processed only as much as scanning requires. FWIW, Clang as taken an alternate approach with its `clang-scan-deps` tool rather than using the compiler directly. Thanks, --Ben --- v5 -> v6: - rebase onto `master` (585c660f041 (reload1: Change return type of predicate function from int to bool, 2023-06-06)) - fix crash related to reporting imported CMI files as dependencies - rework utf-8 validity to patch the new `cpp_valid_utf8_p` function instead of the core utf-8 decoding routine to reject invalid codepoints (preserves higher-level error detection of invalid utf-8) - harmonize of `fdeps` spelling in flags, variables, comments, etc. - rename `-fdeps-output=` to `-fdeps-target=` v4 -> v5: - add dependency tracking for imported modules to `-MF` - add dependency tracking for static module mapper files given to `-fmodule-mapper=` v3 -> v4: - add missing spaces between function names and arguments v2 -> v3: - changelog entries moved to commit messages - documentation updated/added in the UTF-8 routine editing v1 -> v2: - removal of the `deps_write(extra)` parameter to option-checking where ndeeded - default parameter of `cpp_finish(fdeps_stream = NULL)` - unification of libcpp UTF-8 validity functions from v1 - test cases for flag parsing states (depflags-*) and p1689 output (p1689-*) Ben Boeckel (4): libcpp: reject codepoints above 0x10 p1689r5: initial support c++modules: report imported CMI files as dependencies c++modules: report module mapper files as a dependency gcc/c-family/c-opts.cc| 40 +++- gcc/c-family/c.opt| 12 + gcc/cp/mapper-client.cc | 4 + gcc/cp/mapper-client.h| 1 + gcc/cp/module.cc | 24 +- gcc/doc/invoke.texi | 15 ++ gcc/testsuite/g++.dg/modules/depflags-f-MD.C | 2 + gcc/testsuite/g++.dg/modules/depflags-f.C | 1 + gcc/testsuite/g++.dg/modules/depflags-fi.C| 3 + gcc/testsuite/g++.dg/modules/depflags-fj-MD.C | 3 + gcc/testsuite/g++.dg/modules/depflags-fj.C| 4 + .../g++.dg/modules/depflags-fjo-MD.C | 4 + gcc/testsuite/g++.dg/modules/depflags-fjo.C | 5 + gcc/testsuite/g++.dg/modules/depflags-fo-MD.C | 3 + gcc/testsuite/g++.dg/modules/depflags-fo.C| 4 + gcc/testsuite/g++.dg/modules/depflags-j-MD.C | 2 + gcc/testsuite/g++.dg/modules/depflags-j.C | 3 + gcc/testsuite/g++.dg/modules/depflags-jo-MD.C | 3 + gcc/testsuite/g++.dg/modules/depflags-jo.C| 4 + gcc/testsuite/g++.dg/modules/depflags-o-MD.C | 2 + gcc/testsuite/g++.dg/modules/depflags-o.C | 3 + gcc/testsuite/g++.dg/modules/modules.exp | 1 + gcc/testsuite/g++.dg/modules/p1689-1.C| 18 ++ gcc/testsuite/g++.dg/modules/p1689-1.exp.json | 27 +++ gcc/testsuite/g++.dg/modules/p1689-2.C| 16 ++ gcc/testsuite/g++.dg/modules/p1689-2.exp.json | 16 ++ gcc/testsuite/g++.dg/modules/p1689-3.C| 14 ++ gcc/testsuite/g++.dg/modules/p1689-3.exp.json | 16 ++ gcc/testsuite/g++.dg/modules/p1689-4.C| 14 ++ gcc/testsuite/g++.dg/modules/p1689-4.exp.json | 14 ++ gcc/testsuite/g++.dg/modules/p1689-5.C| 14 ++ gcc/testsuite/g++.dg/modules/p1689-5.exp.json | 14 ++ gcc/testsuite/g++.dg/modules/test-p1689.py| 222 ++ gcc/testsuite/lib/modules.exp | 71 ++ libcpp/charset.cc | 7 + libcpp/include/cpplib.h | 12 +- libcpp/include/mkdeps.h | 17 +- libcpp/init.cc| 13
[PATCH v6 3/4] c++modules: report imported CMI files as dependencies
They affect the build, so report them via `-MF` mechanisms. gcc/cp/ * module.cc (do_import): Report imported CMI files as dependencies. Signed-off-by: Ben Boeckel --- gcc/cp/module.cc | 3 +++ 1 file changed, 3 insertions(+) diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc index c80f139eb82..e88ce0a1818 100644 --- a/gcc/cp/module.cc +++ b/gcc/cp/module.cc @@ -18966,6 +18966,9 @@ module_state::do_import (cpp_reader *reader, bool outermost) dump () && dump ("CMI is %s", file); if (note_module_cmi_yes || inform_cmi_p) inform (loc, "reading CMI %qs", file); + /* Add the CMI file to the dependency tracking. */ + if (cpp_get_deps (reader)) + deps_add_dep (cpp_get_deps (reader), file); fd = open (file, O_RDONLY | O_CLOEXEC | O_BINARY); e = errno; } -- 2.40.1
[PATCH v6 1/4] libcpp: reject codepoints above 0x10FFFF
Unicode does not support such values because they are unrepresentable in UTF-16. libcpp/ * charset.cc: Reject encodings of codepoints above 0x10. UTF-16 does not support such codepoints and therefore all Unicode rejects such values. Signed-off-by: Ben Boeckel --- libcpp/charset.cc | 7 +++ 1 file changed, 7 insertions(+) diff --git a/libcpp/charset.cc b/libcpp/charset.cc index d7f323b2cd5..3b34d804cf1 100644 --- a/libcpp/charset.cc +++ b/libcpp/charset.cc @@ -1886,6 +1886,13 @@ cpp_valid_utf8_p (const char *buffer, size_t num_bytes) int err = one_utf8_to_cppchar (&iter, &bytesleft, &cp); if (err) return false; + + /* Additionally, Unicode declares that all codepoints above 0010 are +invalid because they cannot be represented in UTF-16. + +Reject such values.*/ + if (cp >= 0x10) + return false; } /* No problems encountered. */ return true; -- 2.40.1
[PATCH v6 2/4] p1689r5: initial support
This patch implements support for [P1689R5][] to communicate to a build system the C++20 module dependencies to build systems so that they may build `.gcm` files in the proper order. Support is communicated through the following three new flags: - `-fdeps-format=` specifies the format for the output. Currently named `p1689r5`. - `-fdeps-file=` specifies the path to the file to write the format to. - `-fdeps-target=` specifies the `.o` that will be written for the TU that is scanned. This is required so that the build system can correlate the dependency output with the actual compilation that will occur. CMake supports this format as of 17 Jun 2022 (to be part of 3.25.0) using an experimental feature selection (to allow for future usage evolution without committing to how it works today). While it remains experimental, docs may be found in CMake's documentation for experimental features. Future work may include using this format for Fortran module dependencies as well, however this is still pending work. [P1689R5]: https://isocpp.org/files/papers/P1689R5.html [cmake-experimental]: https://gitlab.kitware.com/cmake/cmake/-/blob/master/Help/dev/experimental.rst TODO: - header-unit information fields Header units (including the standard library headers) are 100% unsupported right now because the `-E` mechanism wants to import their BMIs. A new mode (i.e., something more workable than existing `-E` behavior) that mocks up header units as if they were imported purely from their path and content would be required. - non-utf8 paths The current standard says that paths that are not unambiguously represented using UTF-8 are not supported (because these cases are rare and the extra complication is not worth it at this time). Future versions of the format might have ways of encoding non-UTF-8 paths. For now, this patch just doesn't support non-UTF-8 paths (ignoring the "unambiguously represetable in UTF-8" case). - figure out why junk gets placed at the end of the file Sometimes it seems like the file gets a lot of `NUL` bytes appended to it. It happens rarely and seems to be the result of some `ftruncate`-style call which results in extra padding in the contents. Noting it here as an observation at least. libcpp/ * include/cpplib.h: Add cpp_deps_format enum. (cpp_options): Add format field (cpp_finish): Add dependency stream parameter. * include/mkdeps.h (deps_add_module_target): Add new preprocessor parameter used for C++ module tracking. * init.cc (cpp_finish): Add new preprocessor parameter used for C++ module tracking. * mkdeps.cc (mkdeps): Implement P1689R5 output. gcc/ * doc/invoke.texi: Document -fdeps-format=, -fdeps-file=, and -fdeps-target= flags. gcc/c-family/ * c-opts.cc (c_common_handle_option): Add fdeps_file variable and -fdeps-format=, -fdeps-file=, and -fdeps-target= parsing. * c.opt: Add -fdeps-format=, -fdeps-file=, and -fdeps-target= flags. gcc/cp/ * module.cc (preprocessed_module): Pass whether the module is exported to dependency tracking. gcc/testsuite/ * g++.dg/modules/depflags-f-MD.C: New test. * g++.dg/modules/depflags-f.C: New test. * g++.dg/modules/depflags-fi.C: New test. * g++.dg/modules/depflags-fj-MD.C: New test. * g++.dg/modules/depflags-fj.C: New test. * g++.dg/modules/depflags-fjo-MD.C: New test. * g++.dg/modules/depflags-fjo.C: New test. * g++.dg/modules/depflags-fo-MD.C: New test. * g++.dg/modules/depflags-fo.C: New test. * g++.dg/modules/depflags-j-MD.C: New test. * g++.dg/modules/depflags-j.C: New test. * g++.dg/modules/depflags-jo-MD.C: New test. * g++.dg/modules/depflags-jo.C: New test. * g++.dg/modules/depflags-o-MD.C: New test. * g++.dg/modules/depflags-o.C: New test. * g++.dg/modules/p1689-1.C: New test. * g++.dg/modules/p1689-1.exp.json: New test expectation. * g++.dg/modules/p1689-2.C: New test. * g++.dg/modules/p1689-2.exp.json: New test expectation. * g++.dg/modules/p1689-3.C: New test. * g++.dg/modules/p1689-3.exp.json: New test expectation. * g++.dg/modules/p1689-4.C: New test. * g++.dg/modules/p1689-4.exp.json: New test expectation. * g++.dg/modules/p1689-5.C: New test. * g++.dg/modules/p1689-5.exp.json: New test expectation. * g++.dg/modules/modules.exp: Load new P1689 library routines. * g++.dg/modules/test-p1689.py: New tool for validating P1689 output. * lib/modules.exp: Support for validating P1689 outputs. Signed-off-by: Ben Boeckel --- gcc/c-family/c-opts.cc| 40 +++- gcc/c-family/c.opt| 12 + gcc/cp/module.cc | 3 +- gcc/doc/invoke.texi | 15 ++
[PATCH v6 4/4] c++modules: report module mapper files as a dependency
It affects the build, and if used as a static file, can reliably be tracked using the `-MF` mechanism. gcc/cp/: * mapper-client.cc, mapper-client.h (open_module_client): Accept dependency tracking and track module mapper files as dependencies. * module.cc (make_mapper, get_mapper): Pass the dependency tracking class down. Signed-off-by: Ben Boeckel --- gcc/cp/mapper-client.cc | 4 gcc/cp/mapper-client.h | 1 + gcc/cp/module.cc| 18 +- 3 files changed, 14 insertions(+), 9 deletions(-) diff --git a/gcc/cp/mapper-client.cc b/gcc/cp/mapper-client.cc index 39e80df2d25..0ce5679d659 100644 --- a/gcc/cp/mapper-client.cc +++ b/gcc/cp/mapper-client.cc @@ -34,6 +34,7 @@ along with GCC; see the file COPYING3. If not see #include "diagnostic-core.h" #include "mapper-client.h" #include "intl.h" +#include "mkdeps.h" #include "../../c++tools/resolver.h" @@ -132,6 +133,7 @@ spawn_mapper_program (char const **errmsg, std::string &name, module_client * module_client::open_module_client (location_t loc, const char *o, + class mkdeps *deps, void (*set_repo) (const char *), char const *full_program_name) { @@ -285,6 +287,8 @@ module_client::open_module_client (location_t loc, const char *o, errmsg = "opening"; else { + /* Add the mapper file to the dependency tracking. */ + deps_add_dep (deps, name.c_str ()); if (int l = r->read_tuple_file (fd, ident, false)) { if (l > 0) diff --git a/gcc/cp/mapper-client.h b/gcc/cp/mapper-client.h index b32723ce296..a3b0b8adc51 100644 --- a/gcc/cp/mapper-client.h +++ b/gcc/cp/mapper-client.h @@ -55,6 +55,7 @@ public: public: static module_client *open_module_client (location_t loc, const char *option, + class mkdeps *, void (*set_repo) (const char *), char const *); static void close_module_client (location_t loc, module_client *); diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc index e88ce0a1818..9dbb53d2aaf 100644 --- a/gcc/cp/module.cc +++ b/gcc/cp/module.cc @@ -3969,12 +3969,12 @@ static GTY(()) vec *partial_specializations; /* Our module mapper (created lazily). */ module_client *mapper; -static module_client *make_mapper (location_t loc); -inline module_client *get_mapper (location_t loc) +static module_client *make_mapper (location_t loc, class mkdeps *deps); +inline module_client *get_mapper (location_t loc, class mkdeps *deps) { auto *res = mapper; if (!res) -res = make_mapper (loc); +res = make_mapper (loc, deps); return res; } @@ -14031,7 +14031,7 @@ get_module (const char *ptr) /* Create a new mapper connecting to OPTION. */ module_client * -make_mapper (location_t loc) +make_mapper (location_t loc, class mkdeps *deps) { timevar_start (TV_MODULE_MAPPER); const char *option = module_mapper_name; @@ -14039,7 +14039,7 @@ make_mapper (location_t loc) option = getenv ("CXX_MODULE_MAPPER"); mapper = module_client::open_module_client -(loc, option, &set_cmi_repo, +(loc, option, deps, &set_cmi_repo, (save_decoded_options[0].opt_index == OPT_SPECIAL_program_name) && save_decoded_options[0].arg != progname ? save_decoded_options[0].arg : nullptr); @@ -19504,7 +19504,7 @@ maybe_translate_include (cpp_reader *reader, line_maps *lmaps, location_t loc, dump.push (NULL); dump () && dump ("Checking include translation '%s'", path); - auto *mapper = get_mapper (cpp_main_loc (reader)); + auto *mapper = get_mapper (cpp_main_loc (reader), cpp_get_deps (reader)); size_t len = strlen (path); path = canonicalize_header_name (NULL, loc, true, path, len); @@ -19620,7 +19620,7 @@ module_begin_main_file (cpp_reader *reader, line_maps *lmaps, static void name_pending_imports (cpp_reader *reader) { - auto *mapper = get_mapper (cpp_main_loc (reader)); + auto *mapper = get_mapper (cpp_main_loc (reader), cpp_get_deps (reader)); if (!vec_safe_length (pending_imports)) /* Not doing anything. */ @@ -20090,7 +20090,7 @@ init_modules (cpp_reader *reader) if (!flag_module_lazy) /* Get the mapper now, if we're not being lazy. */ -get_mapper (cpp_main_loc (reader)); +get_mapper (cpp_main_loc (reader), cpp_get_deps (reader)); if (!flag_preprocess_only) { @@ -20300,7 +20300,7 @@ late_finish_module (cpp_reader *reader, module_processing_cookie *cookie, if (!errorcount) { - auto *mapper = get_mapper (cpp_main_loc (reader)); + auto *mapper = get_mapper (cpp_main_loc (reader), cpp_get_deps (reader)); mapper->ModuleCompiled (state->get_flatname ()); } else if (cookie->cmi_name) -- 2.40.1
Re: Targetting p0847 for GCC14 (explicit object parameter)
On Thu, Jun 08, 2023 at 04:06:24 +, waffl3x via Gcc wrote: > I would like to boldly suggest implementing P0847 should be targeted at > GCC14. In my anecdotal experiences, this feature is very important to > people, and very important to myself, I believe it should be a priority. > > I am not suggesting this without offering to contribute, however > because of my inexperience with compiler hacking I am concerned I would > hinder efforts. With that said, if no one is interested in starting > work on it, but there is consensus that the feature is important > enough, then I will do my best to take up that job. Note that one way to help with a "feature that is very important" to oneself without compiler experience is to help write test cases (even better if it includes the necessary `dejagnu` markup). Corner cases are typically the trickiest to handle and filling it out ahead of time is likely very helpful to whoever does end up working on it. Indeed, because you're (likely) different from the feature implementer, you can help fill in blind spots that may be missed otherwise. It would also be useful to be somewhat systematic about it as well and take notes on what, specifically, is being tested in each case as well as where known gaps in the test cases exist. --Ben
Re: [PATCH v6 0/4] P1689R5 support
On Thu, Jun 08, 2023 at 21:59:13 +0400, Maxim Kuvyrkov wrote: > This patch series causes ICEs on arm-linux-gnueabihf. Would you > please investigate? Please let me know if you need any in reproducing > these. Finally back at it. I tried on aarch64, but wasn't able to reproduce the errors (alas, it is probably a 32bit thing…let me try with `-m32`). Is there hardware I can access to try this out on the same target triple? Alternatively, a backtrace may be able to help pinpoint it enough if you have the cycles. Thanks, --Ben
Re: [PATCH v6 0/4] P1689R5 support
On Fri, Jun 16, 2023 at 15:48:59 -0400, Ben Boeckel wrote: > On Thu, Jun 08, 2023 at 21:59:13 +0400, Maxim Kuvyrkov wrote: > > This patch series causes ICEs on arm-linux-gnueabihf. Would you > > please investigate? Please let me know if you need any in reproducing > > these. > > Finally back at it. I tried on aarch64, but wasn't able to reproduce the > errors (alas, it is probably a 32bit thing…let me try with `-m32`). Is > there hardware I can access to try this out on the same target triple? Trying inside of an i386 container also came up with nothing…I'll try qemu. --Ben
Re: [PATCH v6 0/4] P1689R5 support
On Fri, Jun 16, 2023 at 23:55:53 -0400, Jason Merrill wrote: > I see the same thing with patch 4 on x86_64-pc-linux-gnu, e.g. > > FAIL: g++.dg/modules/ben-1_a.C -std=c++17 (test for excess errors) > Excess errors: > /home/jason/gt/gcc/testsuite/g++.dg/modules/ben-1_a.C:9:1: internal > compiler error: Segmentation fault > 0x19e2f3c crash_signal > /home/jason/gt/gcc/toplev.cc:314 > 0x340f3f8 mkdeps::vec::size() const > /home/jason/gt/libcpp/mkdeps.cc:57 > 0x340dc1f apply_vpath > /home/jason/gt/libcpp/mkdeps.cc:194 > 0x340e08e deps_add_dep(mkdeps*, char const*) > /home/jason/gt/libcpp/mkdeps.cc:318 > 0xea7b51 module_client::open_module_client(unsigned int, char const*, > mkdeps*, void (*)(char const*), char const*) > /home/jason/gt/gcc/cp/mapper-client.cc:291 > 0xef2ba8 make_mapper > /home/jason/gt/gcc/cp/module.cc:14042 > 0xf0896c get_mapper(unsigned int, mkdeps*) > /home/jason/gt/gcc/cp/module.cc:3977 > 0xf032ac name_pending_imports > /home/jason/gt/gcc/cp/module.cc:19623 > 0xf03a7d preprocessed_module(cpp_reader*) > /home/jason/gt/gcc/cp/module.cc:19817 > 0xe85104 module_token_cdtor(cpp_reader*, unsigned long) > /home/jason/gt/gcc/cp/lex.cc:548 > 0xf467b2 cp_lexer_new_main > /home/jason/gt/gcc/cp/parser.cc:756 > 0xfc1e3a c_parse_file() > /home/jason/gt/gcc/cp/parser.cc:49725 > 0x11c5bf5 c_common_parse_file() > /home/jason/gt/gcc/c-family/c-opts.cc:1268 Thanks. I missed a `nullptr` check before calling `deps_add_dep`. I think I got misled by `make check` returning a zero exit code even if there are failures. Thanks, --Ben
Re: [PATCH v5 3/5] p1689r5: initial support
On Mon, Jun 19, 2023 at 17:33:58 -0400, Jason Merrill wrote: > On 5/12/23 10:24, Ben Boeckel wrote: > > `file` can be omitted (the `output_stream` will be used then). I *think* > > I see that adding: > > > > %{fdeps_file:-fdeps-file=%{!o:%b.ddi}%{o*:%.ddi%*}} > > %{!fdeps-file: but yes. > > > would at least do for `-fdeps-file` defaults? I don't know if there's a > > reasonable default for `-fdeps-target=` though given that this command > > line has no information about the object file that will be used (`-o` is > > used for preprocessor output since we're leaning on `-E` here). > > I would think it could default to %b.o? I suppose that could work, yes. > I had quite a few more comments on the v5 patch that you didn't respond > to here or address in the v6 patch; did your mail client hide them from you? Oof. Sorry, I saw large chunks of quoting and apparently assumed the rest was fine (I usually do aggressive trimming when doing that style of review). I see them now. Will go through and include in v7. --Ben
Re: [PATCH v5 3/5] p1689r5: initial support
On Tue, Feb 14, 2023 at 16:50:27 -0500, Jason Merrill wrote: > On 1/25/23 13:06, Ben Boeckel wrote: > > - header-unit information fields > > > > Header units (including the standard library headers) are 100% > > unsupported right now because the `-E` mechanism wants to import their > > BMIs. A new mode (i.e., something more workable than existing `-E` > > behavior) that mocks up header units as if they were imported purely > > from their path and content would be required. > > I notice that the cpp dependency generation tries (in open_file_failed) > to continue after encountering a missing file, is that not sufficient > for header units? Or adjustable to be sufficient? No. Header units can introduce macros which can be used to modify the set of modules that are imported. Included headers are "discovered" dependencies and don't modify the build graph (just add more files that trigger a rebuild) and can be collected during compilation. Module dependencies are needed to get the build correct in the first place in order to: - order module compilations in the build graph so that imported modules are ready before anything using them; and - computing the set of flags needed for telling the compiler where imported modules' CMI files should be located. > > - non-utf8 paths > > > > The current standard says that paths that are not unambiguously > > represented using UTF-8 are not supported (because these cases are rare > > and the extra complication is not worth it at this time). Future > > versions of the format might have ways of encoding non-UTF-8 paths. For > > now, this patch just doesn't support non-UTF-8 paths (ignoring the > > "unambiguously represetable in UTF-8" case). > > typo "representable" Fixed. > > diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc > > index c68a2a27469..1c14ce3fe8e 100644 > > --- a/gcc/c-family/c-opts.cc > > +++ b/gcc/c-family/c-opts.cc > > @@ -77,6 +77,9 @@ static bool verbose; > > /* Dependency output file. */ > > static const char *deps_file; > > > > +/* Enhanced dependency output file. */ > > Maybe "structured", as in the docs? It isn't really a direct > enhancement of the makefile dependencies. Agreed. I'll also add a link to p1689r5 as a comment for what "structured" means where it is parsed out. > > + if (cpp_opts->deps.format != DEPS_FMT_NONE) > > +{ > > + if (!fdeps_file) > > + fdeps_stream = out_stream; > > + else if (fdeps_file[0] == '-' && fdeps_file[1] == '\0') > > + fdeps_stream = stdout; > > You probably want to check that deps_stream and fdeps_stream don't end > up as the same stream. Hmm. But `stdout` is probably fine to use for both though. Basically: if (fdeps_stream == out_stream && fdeps_stream != stdout) make_diagnostic_noise (); > > @@ -1374,6 +1410,8 @@ handle_deferred_opts (void) > > > > if (opt->code == OPT_MT || opt->code == OPT_MQ) > > deps_add_target (deps, opt->arg, opt->code == OPT_MQ); > > + else if (opt->code == OPT_fdep_output_) > > + deps_add_output (deps, opt->arg, true); > > How about fdeps_add_target? Renamed. > > diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt > > index ef371ca8c26..630781fdf8a 100644 > > --- a/gcc/c-family/c.opt > > +++ b/gcc/c-family/c.opt > > @@ -256,6 +256,18 @@ MT > > C ObjC C++ ObjC++ Joined Separate MissingArgError(missing makefile target > > after %qs) > > -MT Add a target that does not require quoting. > > > > +fdep-format= > > +C ObjC C++ ObjC++ NoDriverArg Joined MissingArgError(missing format after > > %qs) > > +Format for output dependency information. Supported (\"p1689r5\"). > > I think we want "structured" here, as well. Fixed. > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > > index 06d77983e30..b61c3ebd3ec 100644 > > --- a/gcc/doc/invoke.texi > > +++ b/gcc/doc/invoke.texi > > @@ -2791,6 +2791,21 @@ is @option{-fpermitted-flt-eval-methods=c11}. The > > default when in a GNU > > dialect (@option{-std=gnu11} or similar) is > > @option{-fpermitted-flt-eval-methods=ts-18661-3}. > > > > +@item -fdep-file=@var{file} > > +@opindex fdep-file > > +Where to write structured dependency information. > > + > > +@item -fdep-format=@var{format} > > +@opindex fdep-format > > +The format to use for structured dependency information. @samp{p1689r5} is > > the > > +only supported format righ
Re: [PATCH v6 1/4] libcpp: reject codepoints above 0x10FFFF
On Tue, Jun 20, 2023 at 21:16:40 +0200, Damien Guibouret wrote: > I think the comparison should be ">" instead of ">=" as 0x10 seems a > valid value (Unicode says value above 0x10 is invalid). > Other tests around same value in this file are using ">". Ah, good catch. I'll make a separate patch submission for that. --Ben
Re: [PATCH v5 4/5] c++modules: report imported CMI files as dependencies
On Thu, Jun 22, 2023 at 17:21:42 -0400, Jason Merrill wrote: > On 1/25/23 16:06, Ben Boeckel wrote: > > They affect the build, so report them via `-MF` mechanisms. > > Why isn't this covered by the existing code in preprocessed_module? It appears as though it is neutered in patch 3 where `write_make_modules_deps` is used in `make_write` (or will use that name in v7 once I finish up testing). This logic cannot be used for p1689 output because it assumes the location and names of CMI files (`.c++m`) that will be necessary (it is related to the `CXX_IMPORTS +=` GNU make/libcody extensions that will, e.g., cause `ninja` to choke if it is read from `-MF` output as it uses "fancier" Makefile syntax than tools that are not actually `make` are going to be willing to support). This codepath is the *actual* filename being read at compile time and is relevant at all times; it may duplicate what `preprocessed_module` sets up. I'm also realizing that this is why I need to pass `-fdeps-format=p1689` when compiling…there may need to be another, more idiomatic, way to disable this additional syntax usage in `-MF` output. --Ben
Re: [PATCH v5 4/5] c++modules: report imported CMI files as dependencies
On Fri, Jun 23, 2023 at 08:12:41 -0400, Nathan Sidwell wrote: > On 6/22/23 22:45, Ben Boeckel wrote: > > On Thu, Jun 22, 2023 at 17:21:42 -0400, Jason Merrill wrote: > >> On 1/25/23 16:06, Ben Boeckel wrote: > >>> They affect the build, so report them via `-MF` mechanisms. > >> > >> Why isn't this covered by the existing code in preprocessed_module? > > > > It appears as though it is neutered in patch 3 where > > `write_make_modules_deps` is used in `make_write` (or will use that name > > Why do you want to record the transitive modules? I would expect just noting > the > ones with imports directly in the TU would suffice (i.e check the 'outermost' > arg) FWIW, only GCC has "fat" modules. MSVC and Clang both require the transitive closure to be passed. The idea there is to minimize the size of individual module files. If GCC only reads the "fat" modules, then only those should be recorded. If it reads other modules, they should be recorded as well. --Ben
Re: [PATCH v5 5/5] c++modules: report module mapper files as a dependency
On Fri, Jun 23, 2023 at 10:44:11 -0400, Jason Merrill wrote: > On 1/25/23 16:06, Ben Boeckel wrote: > > It affects the build, and if used as a static file, can reliably be > > tracked using the `-MF` mechanism. > > Hmm, this seems a bit like making all .o depend on the Makefile; it Technically this is true: the command line for the TU lives in said Makefile; if I updated it, a new TU would be really nice. This is a long-standing limitation of `make` though. FWIW, `ninja` fixes it by tracking the command line used and CMake's Makefiles generator handles it by storing TU flags in an included file and depending on that file from the TU output. > shouldn't be necessary to rebuild all TUs that use modules when we add > another module to the mapper file. If I change it from: ``` mod.a mod.a.cmi ``` to: ``` mod.a mod.a.replace.cmi ``` I'd expect a recompile. As with anything, this depends on the granularity of the mapper files. A global mapper file is very similar to a global response file and given that we don't have line-change granularity dependency tracking… > What is your expected use case for > this dependency? CMake, at least, uses a per-TU mapper file, so any build system using a similar strategy handling the above case would only affect TUs that actually list `mod.a`. --Ben
Re: [PATCH v5 3/5] p1689r5: initial support
On Fri, Jun 23, 2023 at 14:31:17 -0400, Jason Merrill wrote: > On 6/20/23 15:46, Ben Boeckel wrote: > > On Tue, Feb 14, 2023 at 16:50:27 -0500, Jason Merrill wrote: > >> On 1/25/23 13:06, Ben Boeckel wrote: > > >>> Header units (including the standard library headers) are 100% > >>> unsupported right now because the `-E` mechanism wants to import their > >>> BMIs. A new mode (i.e., something more workable than existing `-E` > >>> behavior) that mocks up header units as if they were imported purely > >>> from their path and content would be required. > >> >> I notice that the cpp dependency generation tries (in open_file_failed) > >> to continue after encountering a missing file, is that not sufficient > >> for header units? Or adjustable to be sufficient? > > > > No. Header units can introduce macros which can be used to modify the > > set of modules that are imported. Included headers are "discovered" > > dependencies and don't modify the build graph (just add more files that > > trigger a rebuild) and can be collected during compilation. Module > > dependencies are needed to get the build correct in the first place in > > order to: > > > > - order module compilations in the build graph so that imported modules > > are ready before anything using them; and > > - computing the set of flags needed for telling the compiler where > > imported modules' CMI files should be located. > > So if the header unit CMI isn't available during dependency generation, > would it be better to just #include the header? It's not so simple: the preprocessor state needs to isolate out `LOCAL_ONLY` from this case: ``` #define LOCAL_ONLY 1 import ; // The preprocessing of this should *not* see // `LOCAL_ONLY`. ``` > > Hmm. But `stdout` is probably fine to use for both though. Basically: > > > > if (fdeps_stream == out_stream && fdeps_stream != stdout) > >make_diagnostic_noise (); > > (fdeps_stream == deps_stream, but sure, that's reasonable. Done. > >> So, I take it this is the common use case you have in mind, generating > >> Make dependencies for the p1689 file? When are you thinking the Make > >> dependencies for the .o are generated? At build time? > > > > Yes. If an included file changes, the scanning should be performed > > again. The compilation will have its own `-MF` as well (which should > > point to the same files plus the CMI files it ends up reading). > > > >> I'm a bit surprised you're using .json rather than an extension that > >> indicates what the information is. > > > > I can change that; the filename doesn't *really* matter (e.g., CMake > > uses `.ddi` for "dynamic dependency information"). > > That works. Done. > >>> `-M` is about discovered dependencies: those that you find out while > >>> doing work. `-fdep-*` is about ordering dependencies: extracting > >>> information from file content in order to even order future work around. > >> > >> I'm not sure I see the distinction; Makefiles also express ordering > >> dependencies. In both cases, you want to find out from the files what > >> order you will want to process them in when building the project. > > > > Makefiles can express ordering dependencies, but not the `-M` snippets; > > these are for files that, if changed, should trigger a rebuild. This is > > > fundamentally different than module dependencies which instead indicate > > which *compiles* (or CMI generation if using a 2-phase setup) need to > > complete before compilation (or CMI generation) of the scanned TU can be > > performed. Generally generated headers will be ordered manually in the > > build system description. However, maintaining that same level for > > in-source dependency information on a per-source level is a *far* higher > > burden. > > The main difference I see is that the CMI might not exist yet. As you > say, we don't want to require people to write all the dependencies by > hand, but that just means we need to be able to generate the > dependencies automatically. In the Make-only model I'm thinking of, one > would collect dependencies on an initial failing build, and then start > over from the beginning again with the dependencies we discovered. It's > the same two-phase scan and build, but one that uses the same compile > commands for both phases. It's a potentially unbounded set of phases: - 2 phases per tool that is built that gen
[PATCH v7 1/4] spec: add a spec function to join arguments
When passing `-o` flags to other options, the typical `-o foo` spelling leaves a leading whitespace when replacing elsewhere. This ends up creating flags spelled as `-some-option-with-arg= foo.ext` which doesn't parse properly. When attempting to make a spec function to just remove the leading whitespace, the argument splitting ends up masking the whitespace. However, the intended extension *also* ends up being its own argument. To perform the desired behavior, the arguments need to be concatenated together. gcc/: * gcc.cc (join_spec_func): Add a spec function to join all arguments. Signed-off-by: Ben Boeckel --- gcc/gcc.cc | 15 +++ 1 file changed, 15 insertions(+) diff --git a/gcc/gcc.cc b/gcc/gcc.cc index fdfac0b4fe4..44433b80d61 100644 --- a/gcc/gcc.cc +++ b/gcc/gcc.cc @@ -447,6 +447,7 @@ static const char *greater_than_spec_func (int, const char **); static const char *debug_level_greater_than_spec_func (int, const char **); static const char *dwarf_version_greater_than_spec_func (int, const char **); static const char *find_fortran_preinclude_file (int, const char **); +static const char *join_spec_func (int, const char **); static char *convert_white_space (char *); static char *quote_spec (char *); static char *quote_spec_arg (char *); @@ -1772,6 +1773,7 @@ static const struct spec_function static_spec_functions[] = { "debug-level-gt", debug_level_greater_than_spec_func }, { "dwarf-version-gt",dwarf_version_greater_than_spec_func }, { "fortran-preinclude-file", find_fortran_preinclude_file}, + { "join",join_spec_func}, #ifdef EXTRA_SPEC_FUNCTIONS EXTRA_SPEC_FUNCTIONS #endif @@ -10975,6 +10977,19 @@ find_fortran_preinclude_file (int argc, const char **argv) return result; } +/* The function takes any number of arguments and joins them together. */ + +static const char * +join_spec_func (int argc, const char **argv) +{ + char *result = NULL; + + for (int i = 0; i < argc; ++i) +result = reconcat(result, result ? result : "", argv[i], NULL); + + return result; +} + /* If any character in ORIG fits QUOTE_P (_, P), reallocate the string so as to precede every one of them with a backslash. Return the original string or the reallocated one. */ -- 2.40.1
[PATCH v7 0/4] P1689R5 support
Hi, This patch series adds initial support for ISO C++'s [P1689R5][], a format for describing C++ module requirements and provisions based on the source code. This is required because compiling C++ with modules is not embarrassingly parallel and need to be ordered to ensure that `import some_module;` can be satisfied in time by making sure that any TU with `export import some_module;` is compiled first. [P1689R5]: https://isocpp.org/files/papers/P1689R5.html I've also added patches to include imported module CMI files and the module mapper file as dependencies of the compilation. I briefly looked into adding dependencies on response files as well, but that appeared to need some code contortions to have a `class mkdeps` available before parsing the command line or to keep the information around until one was made. I'd like feedback on the approach taken here with respect to the user-visible flags. I'll also note that header units are not supported at this time because the current `-E` behavior with respect to `import ;` is to search for an appropriate `.gcm` file which is not something such a "scan" can support. A new mode will likely need to be created (e.g., replacing `-E` with `-fc++-module-scanning` or something) where headers are looked up "normally" and processed only as much as scanning requires. FWIW, Clang as taken an alternate approach with its `clang-scan-deps` tool rather than using the compiler directly. Thanks, --Ben --- v6 -> v7: - rebase onto `master` (80ae426a195 (d: Fix core.volatile.volatileLoad discarded if result is unused, 2023-07-02)) - add test cases for patches 3 and 4 (new dependency reporting in `-MF`) - add a Python script to test aspects of generated dependency files - a new `join` spec function to support `-fdeps-*` defaults based on the `-o` flag (needed to strip the leading space that appears otherwise) - note that JSON writing support should be factored out for use by `libcpp` and `gcc` (libiberty?) - use `.ddi` for the extension of `-fdeps-*` output files by default - support defaults for `-fdeps-file=` and `-fdeps-target=` when only `-fdeps-format=` is provided (with tests) - error if `-MF` and `-fdeps-file=` are both the same (non-`stdout`) file as their formats are incompatible - expand the documentation on how the `-fdeps-*` flags should be used v5 -> v6: - rebase onto `master` (585c660f041 (reload1: Change return type of predicate function from int to bool, 2023-06-06)) - fix crash related to reporting imported CMI files as dependencies - rework utf-8 validity to patch the new `cpp_valid_utf8_p` function instead of the core utf-8 decoding routine to reject invalid codepoints (preserves higher-level error detection of invalid utf-8) - harmonize of `fdeps` spelling in flags, variables, comments, etc. - rename `-fdeps-output=` to `-fdeps-target=` v4 -> v5: - add dependency tracking for imported modules to `-MF` - add dependency tracking for static module mapper files given to `-fmodule-mapper=` v3 -> v4: - add missing spaces between function names and arguments v2 -> v3: - changelog entries moved to commit messages - documentation updated/added in the UTF-8 routine editing v1 -> v2: - removal of the `deps_write(extra)` parameter to option-checking where ndeeded - default parameter of `cpp_finish(fdeps_stream = NULL)` - unification of libcpp UTF-8 validity functions from v1 - test cases for flag parsing states (depflags-*) and p1689 output (p1689-*) Ben Boeckel (4): spec: add a spec function to join arguments p1689r5: initial support c++modules: report imported CMI files as dependencies c++modules: report module mapper files as a dependency gcc/c-family/c-opts.cc| 44 +++- gcc/c-family/c.opt| 12 + gcc/cp/mapper-client.cc | 5 + gcc/cp/mapper-client.h| 1 + gcc/cp/module.cc | 24 +- gcc/doc/invoke.texi | 27 +++ gcc/gcc.cc| 19 +- gcc/json.h| 3 + gcc/testsuite/g++.dg/modules/depflags-f-MD.C | 2 + gcc/testsuite/g++.dg/modules/depflags-f.C | 3 + gcc/testsuite/g++.dg/modules/depflags-fi.C| 4 + gcc/testsuite/g++.dg/modules/depflags-fj-MD.C | 3 + .../g++.dg/modules/depflags-fj-MF-share.C | 6 + gcc/testsuite/g++.dg/modules/depflags-fj.C| 4 + .../g++.dg/modules/depflags-fjo-MD.C | 4 + gcc/testsuite/g++.dg/modules/depflags-fjo.C | 5 + gcc/testsuite/g++.dg/modules/depflags-fo-MD.C | 3 + gcc/testsuite/g++.dg/modules/depflags-fo.C| 4 + gcc/testsuite/g++.dg/modules/depflags-j-MD.C | 2 + gcc/testsuite/g++.dg/modules/depflags-j.C | 3 + gcc/testsuite/g++.dg/modules/depflags-jo-MD.C | 3 + gcc/testsuite/g++.dg/modules/depflags-jo.C| 4 + gcc/testsuite/g++.dg/modu
[PATCH v7 3/4] c++modules: report imported CMI files as dependencies
They affect the build, so report them via `-MF` mechanisms. gcc/cp/ * module.cc (do_import): Report imported CMI files as dependencies. gcc/testsuite/ * g++.dg/modules/depreport-1_a.C: New test. * g++.dg/modules/depreport-1_b.C: New test. * g++.dg/modules/test-depfile.py: New tool for validating depfile information. * lib/modules.exp: Support for validating depfile contents. Signed-off-by: Ben Boeckel --- gcc/cp/module.cc | 3 + gcc/testsuite/g++.dg/modules/depreport-1_a.C | 10 + gcc/testsuite/g++.dg/modules/depreport-1_b.C | 12 ++ gcc/testsuite/g++.dg/modules/test-depfile.py | 187 +++ gcc/testsuite/lib/modules.exp| 29 +++ 5 files changed, 241 insertions(+) create mode 100644 gcc/testsuite/g++.dg/modules/depreport-1_a.C create mode 100644 gcc/testsuite/g++.dg/modules/depreport-1_b.C create mode 100644 gcc/testsuite/g++.dg/modules/test-depfile.py diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc index 9df60d695b1..f3acc4e02fe 100644 --- a/gcc/cp/module.cc +++ b/gcc/cp/module.cc @@ -18968,6 +18968,9 @@ module_state::do_import (cpp_reader *reader, bool outermost) dump () && dump ("CMI is %s", file); if (note_module_cmi_yes || inform_cmi_p) inform (loc, "reading CMI %qs", file); + /* Add the CMI file to the dependency tracking. */ + if (cpp_get_deps (reader)) + deps_add_dep (cpp_get_deps (reader), file); fd = open (file, O_RDONLY | O_CLOEXEC | O_BINARY); e = errno; } diff --git a/gcc/testsuite/g++.dg/modules/depreport-1_a.C b/gcc/testsuite/g++.dg/modules/depreport-1_a.C new file mode 100644 index 000..241701728a2 --- /dev/null +++ b/gcc/testsuite/g++.dg/modules/depreport-1_a.C @@ -0,0 +1,10 @@ +// { dg-additional-options -fmodules-ts } + +export module Foo; +// { dg-module-cmi Foo } + +export class Base +{ +public: + int m; +}; diff --git a/gcc/testsuite/g++.dg/modules/depreport-1_b.C b/gcc/testsuite/g++.dg/modules/depreport-1_b.C new file mode 100644 index 000..b6e317c6703 --- /dev/null +++ b/gcc/testsuite/g++.dg/modules/depreport-1_b.C @@ -0,0 +1,12 @@ +// { dg-additional-options -fmodules-ts } +// { dg-additional-options -MD } +// { dg-additional-options "-MF depreport-1.d" } + +import Foo; + +void foo () +{ + Base b; +} + +// { dg-final { run-check-module-dep-expect-input "depreport-1.d" "gcm.cache/Foo.gcm" } } diff --git a/gcc/testsuite/g++.dg/modules/test-depfile.py b/gcc/testsuite/g++.dg/modules/test-depfile.py new file mode 100644 index 000..ea4edb61434 --- /dev/null +++ b/gcc/testsuite/g++.dg/modules/test-depfile.py @@ -0,0 +1,187 @@ +import json + + +# Parameters. +ALL_ERRORS = False + + +def _report_error(msg): +'''Report an error.''' +full_msg = 'ERROR: ' + msg +if ALL_ERRORS: +print(full_msg) +else: +raise RuntimeError(full_msg) + + +class Token(object): +pass + + +class Output(Token): +def __init__(self, path): +self.path = path + + +class Input(Token): +def __init__(self, path): +self.path = path + + +class Colon(Token): +pass + + +class Append(Token): +pass + + +class Variable(Token): +def __init__(self, name): +self.name = name + + +class Word(Token): +def __init__(self, name): +self.name = name + + +def validate_depfile(depfile, expect_input=None): +'''Validate a depfile contains some information + +Returns `False` if the information is not found. +''' +with open(depfile, 'r') as fin: +depfile_content = fin.read() + +real_lines = [] +join_line = False +for line in depfile_content.split('\n'): +# Join the line if needed. +if join_line: +line = real_lines.pop() + line + +# Detect line continuations. +join_line = line.endswith('\\') +# Strip line continuation characters. +if join_line: +line = line[:-1] + +# Add to the real line set. +real_lines.append(line) + +# Perform tokenization. +tokenized_lines = [] +for line in real_lines: +tokenized = [] +join_word = False +for word in line.split(' '): +if join_word: +word = tokenized.pop() + ' ' + word + +# Detect word joins. +join_word = word.endswith('\\') +# Strip escape character. +if join_word: +word = word[:-1] + +# Detect `:` at the end of a word. +if word.endswith(':'): +tokenized.append(word[:-1]) +word = word[-1] + +# Add word to the tokenized set. +tokenized.append(word) + +tokenized_line
[PATCH v7 2/4] p1689r5: initial support
les.exp: Support for validating P1689 outputs. Signed-off-by: Ben Boeckel --- gcc/c-family/c-opts.cc| 44 +++- gcc/c-family/c.opt| 12 + gcc/cp/module.cc | 3 +- gcc/doc/invoke.texi | 27 +++ gcc/gcc.cc| 4 +- gcc/json.h| 3 + gcc/testsuite/g++.dg/modules/depflags-f-MD.C | 2 + gcc/testsuite/g++.dg/modules/depflags-f.C | 3 + gcc/testsuite/g++.dg/modules/depflags-fi.C| 4 + gcc/testsuite/g++.dg/modules/depflags-fj-MD.C | 3 + .../g++.dg/modules/depflags-fj-MF-share.C | 6 + gcc/testsuite/g++.dg/modules/depflags-fj.C| 4 + .../g++.dg/modules/depflags-fjo-MD.C | 4 + gcc/testsuite/g++.dg/modules/depflags-fjo.C | 5 + gcc/testsuite/g++.dg/modules/depflags-fo-MD.C | 3 + gcc/testsuite/g++.dg/modules/depflags-fo.C| 4 + gcc/testsuite/g++.dg/modules/depflags-j-MD.C | 2 + gcc/testsuite/g++.dg/modules/depflags-j.C | 3 + gcc/testsuite/g++.dg/modules/depflags-jo-MD.C | 3 + gcc/testsuite/g++.dg/modules/depflags-jo.C| 4 + gcc/testsuite/g++.dg/modules/depflags-o-MD.C | 2 + gcc/testsuite/g++.dg/modules/depflags-o.C | 3 + gcc/testsuite/g++.dg/modules/modules.exp | 1 + gcc/testsuite/g++.dg/modules/p1689-1.C| 17 ++ gcc/testsuite/g++.dg/modules/p1689-1.exp.ddi | 27 +++ gcc/testsuite/g++.dg/modules/p1689-2.C| 15 ++ gcc/testsuite/g++.dg/modules/p1689-2.exp.ddi | 16 ++ gcc/testsuite/g++.dg/modules/p1689-3.C| 13 + gcc/testsuite/g++.dg/modules/p1689-3.exp.ddi | 16 ++ gcc/testsuite/g++.dg/modules/p1689-4.C| 13 + gcc/testsuite/g++.dg/modules/p1689-4.exp.ddi | 14 ++ gcc/testsuite/g++.dg/modules/p1689-5.C| 13 + gcc/testsuite/g++.dg/modules/p1689-5.exp.ddi | 14 ++ .../g++.dg/modules/p1689-file-default.C | 16 ++ .../g++.dg/modules/p1689-file-default.exp.ddi | 27 +++ .../g++.dg/modules/p1689-target-default.C | 16 ++ .../modules/p1689-target-default.exp.ddi | 27 +++ gcc/testsuite/g++.dg/modules/test-p1689.py| 222 ++ gcc/testsuite/lib/modules.exp | 71 ++ libcpp/include/cpplib.h | 12 +- libcpp/include/mkdeps.h | 9 +- libcpp/init.cc| 13 +- libcpp/mkdeps.cc | 153 +++- 43 files changed, 859 insertions(+), 14 deletions(-) create mode 100644 gcc/testsuite/g++.dg/modules/depflags-f-MD.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-f.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fi.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fj-MD.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fj-MF-share.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fj.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fjo-MD.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fjo.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fo-MD.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fo.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-j-MD.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-j.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-jo-MD.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-jo.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-o-MD.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-o.C create mode 100644 gcc/testsuite/g++.dg/modules/p1689-1.C create mode 100644 gcc/testsuite/g++.dg/modules/p1689-1.exp.ddi create mode 100644 gcc/testsuite/g++.dg/modules/p1689-2.C create mode 100644 gcc/testsuite/g++.dg/modules/p1689-2.exp.ddi create mode 100644 gcc/testsuite/g++.dg/modules/p1689-3.C create mode 100644 gcc/testsuite/g++.dg/modules/p1689-3.exp.ddi create mode 100644 gcc/testsuite/g++.dg/modules/p1689-4.C create mode 100644 gcc/testsuite/g++.dg/modules/p1689-4.exp.ddi create mode 100644 gcc/testsuite/g++.dg/modules/p1689-5.C create mode 100644 gcc/testsuite/g++.dg/modules/p1689-5.exp.ddi create mode 100644 gcc/testsuite/g++.dg/modules/p1689-file-default.C create mode 100644 gcc/testsuite/g++.dg/modules/p1689-file-default.exp.ddi create mode 100644 gcc/testsuite/g++.dg/modules/p1689-target-default.C create mode 100644 gcc/testsuite/g++.dg/modules/p1689-target-default.exp.ddi create mode 100644 gcc/testsuite/g++.dg/modules/test-p1689.py create mode 100644 gcc/testsuite/lib/modules.exp diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc index af19140e382..9d794b2f4de 100644 --- a/gcc/c-family/c-opts.cc +++ b/gcc/c-family/c-opts.cc @@ -77,6 +77,9 @@ static bool verbose; /* Dependency output file. */ static const char *deps_file; +/* Structured dependency output file. */ +static const char *fdeps_file; + /* The prefix given by -iprefix, if any. *
[PATCH v7 4/4] c++modules: report module mapper files as a dependency
It affects the build, and if used as a static file, can reliably be tracked using the `-MF` mechanism. gcc/cp/: * mapper-client.cc, mapper-client.h (open_module_client): Accept dependency tracking and track module mapper files as dependencies. * module.cc (make_mapper, get_mapper): Pass the dependency tracking class down. gcc/testsuite/: * g++.dg/modules/depreport-2.modmap: New test. * g++.dg/modules/depreport-2_a.C: New test. * g++.dg/modules/depreport-2_b.C: New test. * g++.dg/modules/test-depfile.py: Support `:|` syntax output when generating modules. Signed-off-by: Ben Boeckel --- gcc/cp/mapper-client.cc | 5 + gcc/cp/mapper-client.h| 1 + gcc/cp/module.cc | 18 - .../g++.dg/modules/depreport-2.modmap | 2 ++ gcc/testsuite/g++.dg/modules/depreport-2_a.C | 15 ++ gcc/testsuite/g++.dg/modules/depreport-2_b.C | 14 + gcc/testsuite/g++.dg/modules/test-depfile.py | 20 +++ 7 files changed, 66 insertions(+), 9 deletions(-) create mode 100644 gcc/testsuite/g++.dg/modules/depreport-2.modmap create mode 100644 gcc/testsuite/g++.dg/modules/depreport-2_a.C create mode 100644 gcc/testsuite/g++.dg/modules/depreport-2_b.C diff --git a/gcc/cp/mapper-client.cc b/gcc/cp/mapper-client.cc index 39e80df2d25..92727195246 100644 --- a/gcc/cp/mapper-client.cc +++ b/gcc/cp/mapper-client.cc @@ -34,6 +34,7 @@ along with GCC; see the file COPYING3. If not see #include "diagnostic-core.h" #include "mapper-client.h" #include "intl.h" +#include "mkdeps.h" #include "../../c++tools/resolver.h" @@ -132,6 +133,7 @@ spawn_mapper_program (char const **errmsg, std::string &name, module_client * module_client::open_module_client (location_t loc, const char *o, + class mkdeps *deps, void (*set_repo) (const char *), char const *full_program_name) { @@ -285,6 +287,9 @@ module_client::open_module_client (location_t loc, const char *o, errmsg = "opening"; else { + /* Add the mapper file to the dependency tracking. */ + if (deps) + deps_add_dep (deps, name.c_str ()); if (int l = r->read_tuple_file (fd, ident, false)) { if (l > 0) diff --git a/gcc/cp/mapper-client.h b/gcc/cp/mapper-client.h index b32723ce296..a3b0b8adc51 100644 --- a/gcc/cp/mapper-client.h +++ b/gcc/cp/mapper-client.h @@ -55,6 +55,7 @@ public: public: static module_client *open_module_client (location_t loc, const char *option, + class mkdeps *, void (*set_repo) (const char *), char const *); static void close_module_client (location_t loc, module_client *); diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc index f3acc4e02fe..77c9edcbc04 100644 --- a/gcc/cp/module.cc +++ b/gcc/cp/module.cc @@ -3969,12 +3969,12 @@ static GTY(()) vec *partial_specializations; /* Our module mapper (created lazily). */ module_client *mapper; -static module_client *make_mapper (location_t loc); -inline module_client *get_mapper (location_t loc) +static module_client *make_mapper (location_t loc, class mkdeps *deps); +inline module_client *get_mapper (location_t loc, class mkdeps *deps) { auto *res = mapper; if (!res) -res = make_mapper (loc); +res = make_mapper (loc, deps); return res; } @@ -14033,7 +14033,7 @@ get_module (const char *ptr) /* Create a new mapper connecting to OPTION. */ module_client * -make_mapper (location_t loc) +make_mapper (location_t loc, class mkdeps *deps) { timevar_start (TV_MODULE_MAPPER); const char *option = module_mapper_name; @@ -14041,7 +14041,7 @@ make_mapper (location_t loc) option = getenv ("CXX_MODULE_MAPPER"); mapper = module_client::open_module_client -(loc, option, &set_cmi_repo, +(loc, option, deps, &set_cmi_repo, (save_decoded_options[0].opt_index == OPT_SPECIAL_program_name) && save_decoded_options[0].arg != progname ? save_decoded_options[0].arg : nullptr); @@ -19506,7 +19506,7 @@ maybe_translate_include (cpp_reader *reader, line_maps *lmaps, location_t loc, dump.push (NULL); dump () && dump ("Checking include translation '%s'", path); - auto *mapper = get_mapper (cpp_main_loc (reader)); + auto *mapper = get_mapper (cpp_main_loc (reader), cpp_get_deps (reader)); size_t len = strlen (path); path = canonicalize_header_name (NULL, loc, true, path, len); @@ -19622,7 +19622,7 @@ module_begin_main_file (cpp_reader *reader, line_maps *lmaps, static void nam
Re: [PATCH v5 4/5] c++modules: report imported CMI files as dependencies
On Tue, Jul 18, 2023 at 16:52:44 -0400, Jason Merrill wrote: > On 6/25/23 12:36, Ben Boeckel wrote: > > On Fri, Jun 23, 2023 at 08:12:41 -0400, Nathan Sidwell wrote: > >> On 6/22/23 22:45, Ben Boeckel wrote: > >>> On Thu, Jun 22, 2023 at 17:21:42 -0400, Jason Merrill wrote: > >>>> On 1/25/23 16:06, Ben Boeckel wrote: > >>>>> They affect the build, so report them via `-MF` mechanisms. > >>>> > >>>> Why isn't this covered by the existing code in preprocessed_module? > >>> > >>> It appears as though it is neutered in patch 3 where > >>> `write_make_modules_deps` is used in `make_write` (or will use that name > >> > >> Why do you want to record the transitive modules? I would expect just > >> noting the > >> ones with imports directly in the TU would suffice (i.e check the > >> 'outermost' arg) > > > > FWIW, only GCC has "fat" modules. MSVC and Clang both require the > > transitive closure to be passed. The idea there is to minimize the size > > of individual module files. > > > > If GCC only reads the "fat" modules, then only those should be recorded. > > If it reads other modules, they should be recorded as well. For clarification, given: * a.cppm ``` export module a; ``` * b.cppm ``` export module b; import a; ``` * use.cppm ``` import b; ``` in a "fat" module setup, `use.cppm` only needs to be told about `b.cmi` because it contains everything that an importer needs to know about the `a` module (reachable types, re-exported bits, whatever). With the "thin" modules, `a.cmi` must be specified when compiling `use.cppm` to satisfy anything that may be required transitively (e.g., a return type of some `b` symbol is from `a`). MSVC and Clang (17+) use this model. I don't know MSVC's rationale, but Clang's is to make CMIs relocatable by not embedding paths to dependent modules in CMIs. This should help caching and network transfer sizes in distributed builds. LLVM/Clang discussion: https://discourse.llvm.org/t/c-20-modules-should-the-bmis-contain-paths-to-their-dependent-bmis/70422 https://github.com/llvm/llvm-project/issues/62707 Maybe I'm missing how this *actually* works in GCC as I've really only interacted with it through the command line, but I've not needed to mention `a.cmi` when compiling `use.cppm`. Is `a.cmi` referenced and read through some embedded information in `b.cmi` or does `b.cmi` include enough information to not need to read it at all? If the former, distributed builds are going to have a problem knowing what files to send just from the command line (I'll call this "implicit thin"). If the latter, that is the "fat" CMI that I'm thinking of. > But wouldn't the transitive modules be dependencies of the direct > imports, so (re)building the direct imports would first require building > the transitive modules anyway? Expressing the transitive closure of > dependencies for each importer seems redundant when it can be easily > derived from the direct dependencies of each module. I'm not concerned whether it is transitive or not, really. If a file is read, it should be reported here regardless of the reason. Note that caching mechanisms may skip actually *doing* the reading, but the dependencies should still be reported from the cached results as-if the real work had been performed. --Ben
Re: [PATCH v5 4/5] c++modules: report imported CMI files as dependencies
On Wed, Jul 19, 2023 at 17:11:08 -0400, Nathan Sidwell wrote: > GCC is neither of these descriptions. a CMI does not contain the transitive > closure of its imports. It contains an import table. That table lists the > transitive closure of its imports (it needs that closure to do remapping), > and > that table contains the CMI pathnames of the direct imports. Those pathnames > are absolute, if the mapper provded an absolute pathm or relative to the CMI > repo. > > The rationale here is that if you're building a CMI, Foo, which imports a > bunch > of modules, those imported CMIs will have the same (relative) location in > this > compilation and in compilations importing Foo (why would you move them?) Note > this is NOT inhibiting relocatable builds, because of the CMI repo. But it is inhibiting distributed builds because the distributing tool would need to know: - what CMIs are actually imported (here, "read the module mapper file" (in CMake's case, this is only the modules that are needed; a single massive mapper file for an entire project would have extra entries) or "act as a proxy for the socket/program specified" for other approaches); - read the CMIs as it sends to the remote side to gather any other CMIs that may be needed (recursively); Contrast this with the MSVC and Clang (17+) mechanism where the command line contains everything that is needed and a single bolus can be sent. And relocatable is probably fine. How does it interact with reproducible builds? Or are GCC CMIs not really something anyone should consider for installation (even as a "here, maybe this can help consumers" mechanism)? > On 7/18/23 20:01, Ben Boeckel wrote: > > Maybe I'm missing how this *actually* works in GCC as I've really only > > interacted with it through the command line, but I've not needed to > > mention `a.cmi` when compiling `use.cppm`. Is `a.cmi` referenced and > > read through some embedded information in `b.cmi` or does `b.cmi` > > include enough information to not need to read it at all? If the former, > > distributed builds are going to have a problem knowing what files to > > send just from the command line (I'll call this "implicit thin"). If the > > latter, that is the "fat" CMI that I'm thinking of. > > please don't use perjorative terms like 'fat' and 'thin'. Sorry, I was internally analogizing to "thinLTO". --Ben
Re: [PATCH v5 4/5] c++modules: report imported CMI files as dependencies
On Thu, Jul 20, 2023 at 17:00:32 -0400, Nathan Sidwell wrote: > On 7/19/23 20:47, Ben Boeckel wrote: > > But it is inhibiting distributed builds because the distributing tool > > would need to know: > > > > - what CMIs are actually imported (here, "read the module mapper file" > >(in CMake's case, this is only the modules that are needed; a single > >massive mapper file for an entire project would have extra entries) or > >"act as a proxy for the socket/program specified" for other > >approaches); > > This information is in the machine (& human) README section of the CMI. Ok. That leaves it up to distributing build tools to figure out at least. > > - read the CMIs as it sends to the remote side to gather any other CMIs > >that may be needed (recursively); > > > > Contrast this with the MSVC and Clang (17+) mechanism where the command > > line contains everything that is needed and a single bolus can be sent. > > um, the build system needs to create that command line? Where does the build > system get that information? IIUC it'll need to read some file(s) to do that. It's chained through the P1689 information in the collator as needed. No extra files need to be read (at least with CMake's approach); certainly not CMI files. > > And relocatable is probably fine. How does it interact with reproducible > > builds? Or are GCC CMIs not really something anyone should consider for > > installation (even as a "here, maybe this can help consumers" > > mechanism)? > > Module CMIs should be considered a cacheable artifact. They are neither > object > files nor source files. Sure, cachable sounds fine. What about the installation? --Ben
Re: [PATCH v5 4/5] c++modules: report imported CMI files as dependencies
On Fri, Jul 21, 2023 at 16:23:07 -0400, Nathan Sidwell wrote: > It occurs to me that the model I am envisioning is similar to CMake's object > libraries. Object libraries are a convenient name for a bunch of object > files. > IIUC they're linked by naming the individual object files (or I think the > could > be implemented as a static lib linked with --whole-archive path/to/libfoo.a > -no-whole-archive. But for this conversation consider them a bunch of > separate > object files with a convenient group name. Yes, `--whole-archive` would work great if it had any kind of portability across CMake's platform set. > Consider also that object libraries could themselves contain object libraries > (I > don't know of they can, but it seems like a useful concept). Then one could > create an object library from a collection of object files and object > libraries > (recursively). CMake would handle the transitive gtaph. I think this detail is relevant, but you can use `$` as an `INTERFACE` sources and it would act like that, but it is an explicit thing. Instead, `OBJECT` libraries *only* provide their objects to targets that *directly* link them. If not, given this: A (OBJECT library) B (library of some kind; links PUBLIC to A) C (links to B) If `A` has things like linker flags (or, more likely, libraries) as part of its usage requirements, C will get them on is link line. However, if OBJECT files are transitive in the same way, the linker (on most platforms at least) chokes because it now has duplicates of all of A's symbols: those from the B library and those from A's objects on the link line. > Now, allow an object library to itself have some kind of tangible, on-disk > representation. *BUT* not like a static library -- it doesn't include the > object files. > > > Now that immediately maps onto modules. > > CMI: Object library > Direct imports: Direct object libraries of an object library > > This is why I don't understand the need explicitly indicate the indirect > imports > of a CMI. CMake knows them, because it knows the graph. Sure, *CMake* knows them, but the *build tool* needs to be told (typically `make` or `ninja`) because it is what is actually executing the build graph. The way this is communicated is via `-MF` files and that's what I'm providing in this patch. Note that `ninja` does not allow rules to specify such dependencies for other rules than the one it is reading the file for. --Ben
Re: [PATCH v5 4/5] c++modules: report imported CMI files as dependencies
On Thu, Jul 27, 2023 at 18:13:48 -0700, Jason Merrill wrote: > On 7/23/23 20:26, Ben Boeckel wrote: > > Sure, *CMake* knows them, but the *build tool* needs to be told > > (typically `make` or `ninja`) because it is what is actually executing > > the build graph. The way this is communicated is via `-MF` files and > > that's what I'm providing in this patch. Note that `ninja` does not > > allow rules to specify such dependencies for other rules than the one it > > is reading the file for. > > But since the direct imports need to be rebuilt themselves if the > transitive imports change, the build graph should be the same whether or > not the transitive imports are repeated? Either way, if a transitive > import changes you need to rebuild the direct import and then the importer. I suppose I have seen enough bad build systems that don't do everything correctly that I'm interested in creating "pits of success" rather than "well, you didn't get thing X 100% correct, so you're screwed here too". The case that I think is most likely here is that someone has a "superbuild" with 3 projects A, B, and C where C uses B and B uses A. At the top-level the superbuild exposes just "make projectA projectB projectC"-granularity (rather than a combined build graph; they may use different build systems) and then users go into some projectC directly and forget to update projectB after updating projectA (known to all use the same compiler/flags so that CMI sharing is possible). The build it still broken, but ideally they get notified in some useful way when rebuilding the TU rather than…whatever ends up catching the problem incidentally. > I guess it shouldn't hurt to have the transitive imports in the -MF > file, as long as they aren't also in the p1689 file, so I'm not > particularly opposed to this change, but I don't see how it makes a > practical difference. Correct. The P1689 shouldn't even know about transitive imports (well, maybe from header units?) as it just records "I saw an `import` statement" and should never look up CMI files (indeed, we would need another scanning step to know what CMI files to create for the P1689 scan if they were necessary…). --Ben
isl 0.26 C++17 issues when bootstrapping with GCC
Hi, I tried adding isl 0.26 to a 13.2 GCC build using Iain's macOS aarch64 patches: https://github.com/iains/gcc-13-branch It seems that the bootstrap's `CXX='g++ -std=c++11'` confuses isl's build where C++17 is expected to work by disabling C++17 behind its back. Should GCC not add this flag for its dependencies but only its own build perhaps? isl bug report thread (with isl configure logs and the like): https://groups.google.com/g/isl-development/c/ShnQcW_35ZQ Thanks, --Ben
Re: isl 0.26 C++17 issues when bootstrapping with GCC
On Tue, Aug 29, 2023 at 18:57:37 +0200, Richard Biener wrote: > I suppose for bootstrapping we could disable ISL during stage1 since > it enables an optional feature only. Other than that GCC only > requires a C++11 compiler for building, so ISL breaks that constraint > with requiring C++17. Note that it doesn't *require* it per sé; the tests that try it are compiled if C++17 support was detected at all. The headers seem to just have optional header-only `std::any`-using APIs if C++17 is around. `isl` supporting a flag to disable the tests would also work, but that doesn't fix 0.26. It also doesn't mean it won't start requiring C++17 at some point in the future. In light of that, I feel that skipping it for bootstrap is probably the right solution here. Alas, my skill with autotools is closer to the caveman-with-club level rather than that of a surgeon. --Ben
[PATCH v8 0/4] P1689R5 support
Hi, This patch series adds initial support for ISO C++'s [P1689R5][], a format for describing C++ module requirements and provisions based on the source code. This is required because compiling C++ with modules is not embarrassingly parallel and need to be ordered to ensure that `import some_module;` can be satisfied in time by making sure that any TU with `export import some_module;` is compiled first. [P1689R5]: https://isocpp.org/files/papers/P1689R5.html I've also added patches to include imported module CMI files and the module mapper file as dependencies of the compilation. I briefly looked into adding dependencies on response files as well, but that appeared to need some code contortions to have a `class mkdeps` available before parsing the command line or to keep the information around until one was made. I'd like feedback on the approach taken here with respect to the user-visible flags. I'll also note that header units are not supported at this time because the current `-E` behavior with respect to `import ;` is to search for an appropriate `.gcm` file which is not something such a "scan" can support. A new mode will likely need to be created (e.g., replacing `-E` with `-fc++-module-scanning` or something) where headers are looked up "normally" and processed only as much as scanning requires. FWIW, Clang as taken an alternate approach with its `clang-scan-deps` tool rather than using the compiler directly. Thanks, --Ben --- v7 -> v8: - rename `DEPS_FMT_` enum variants to `FDEPS_FMT_` to match the associated flag - memory leak fix in the `join` specfunc implementation (also better comments), both from Jason - formatting fix in `mkdeps.cc` for `write_make_modules_deps` assignment - comments on new functions for P1689R5 implementation v6 -> v7: - rebase onto `master` (80ae426a195 (d: Fix core.volatile.volatileLoad discarded if result is unused, 2023-07-02)) - add test cases for patches 3 and 4 (new dependency reporting in `-MF`) - add a Python script to test aspects of generated dependency files - a new `join` spec function to support `-fdeps-*` defaults based on the `-o` flag (needed to strip the leading space that appears otherwise) - note that JSON writing support should be factored out for use by `libcpp` and `gcc` (libiberty?) - use `.ddi` for the extension of `-fdeps-*` output files by default - support defaults for `-fdeps-file=` and `-fdeps-target=` when only `-fdeps-format=` is provided (with tests) - error if `-MF` and `-fdeps-file=` are both the same (non-`stdout`) file as their formats are incompatible - expand the documentation on how the `-fdeps-*` flags should be used v5 -> v6: - rebase onto `master` (585c660f041 (reload1: Change return type of predicate function from int to bool, 2023-06-06)) - fix crash related to reporting imported CMI files as dependencies - rework utf-8 validity to patch the new `cpp_valid_utf8_p` function instead of the core utf-8 decoding routine to reject invalid codepoints (preserves higher-level error detection of invalid utf-8) - harmonize of `fdeps` spelling in flags, variables, comments, etc. - rename `-fdeps-output=` to `-fdeps-target=` v4 -> v5: - add dependency tracking for imported modules to `-MF` - add dependency tracking for static module mapper files given to `-fmodule-mapper=` v3 -> v4: - add missing spaces between function names and arguments v2 -> v3: - changelog entries moved to commit messages - documentation updated/added in the UTF-8 routine editing v1 -> v2: - removal of the `deps_write(extra)` parameter to option-checking where ndeeded - default parameter of `cpp_finish(fdeps_stream = NULL)` - unification of libcpp UTF-8 validity functions from v1 - test cases for flag parsing states (depflags-*) and p1689 output (p1689-*) Ben Boeckel (4): spec: add a spec function to join arguments p1689r5: initial support c++modules: report imported CMI files as dependencies c++modules: report module mapper files as a dependency gcc/c-family/c-opts.cc| 44 +++- gcc/c-family/c.opt| 12 + gcc/cp/mapper-client.cc | 5 + gcc/cp/mapper-client.h| 1 + gcc/cp/module.cc | 24 +- gcc/doc/invoke.texi | 27 +++ gcc/gcc.cc| 27 ++- gcc/json.h| 3 + gcc/testsuite/g++.dg/modules/depflags-f-MD.C | 2 + gcc/testsuite/g++.dg/modules/depflags-f.C | 3 + gcc/testsuite/g++.dg/modules/depflags-fi.C| 4 + gcc/testsuite/g++.dg/modules/depflags-fj-MD.C | 3 + .../g++.dg/modules/depflags-fj-MF-share.C | 6 + gcc/testsuite/g++.dg/modules/depflags-fj.C| 4 + .../g++.dg/modules/depflags-fjo-MD.C | 4 + gcc/testsuite/g++.dg/modules/depflags-fjo.C | 5 + gcc/testsuite/g++.dg/modules/depflag
[PATCH v8 1/4] spec: add a spec function to join arguments
When passing `-o` flags to other options, the typical `-o foo` spelling leaves a leading whitespace when replacing elsewhere. This ends up creating flags spelled as `-some-option-with-arg= foo.ext` which doesn't parse properly. When attempting to make a spec function to just remove the leading whitespace, the argument splitting ends up masking the whitespace. However, the intended extension *also* ends up being its own argument. To perform the desired behavior, the arguments need to be concatenated together. gcc/: * gcc.cc (join_spec_func): Add a spec function to join all arguments. Signed-off-by: Ben Boeckel --- gcc/gcc.cc | 23 +++ 1 file changed, 23 insertions(+) diff --git a/gcc/gcc.cc b/gcc/gcc.cc index fdfac0b4fe4..4c4e81dee50 100644 --- a/gcc/gcc.cc +++ b/gcc/gcc.cc @@ -447,6 +447,7 @@ static const char *greater_than_spec_func (int, const char **); static const char *debug_level_greater_than_spec_func (int, const char **); static const char *dwarf_version_greater_than_spec_func (int, const char **); static const char *find_fortran_preinclude_file (int, const char **); +static const char *join_spec_func (int, const char **); static char *convert_white_space (char *); static char *quote_spec (char *); static char *quote_spec_arg (char *); @@ -1772,6 +1773,7 @@ static const struct spec_function static_spec_functions[] = { "debug-level-gt", debug_level_greater_than_spec_func }, { "dwarf-version-gt",dwarf_version_greater_than_spec_func }, { "fortran-preinclude-file", find_fortran_preinclude_file}, + { "join",join_spec_func}, #ifdef EXTRA_SPEC_FUNCTIONS EXTRA_SPEC_FUNCTIONS #endif @@ -10975,6 +10977,27 @@ find_fortran_preinclude_file (int argc, const char **argv) return result; } +/* The function takes any number of arguments and joins them together. + + This seems to be necessary to build "-fjoined=foo.b" from "-fseparate foo.a" + with a %{fseparate*:-fjoined=%.b$*} rule without adding undesired spaces: + when doing $* replacement we first replace $* with the rest of the switch + (in this case ""), and then add any arguments as arguments after the result, + resulting in "-fjoined= foo.b". Using this function with e.g. + %{fseparate*:-fjoined=%:join(%.b$*)} gets multiple words as separate argv + elements instead of separated by spaces, and we paste them together. */ + +static const char * +join_spec_func (int argc, const char **argv) +{ + if (argc == 1) +return argv[0]; + for (int i = 0; i < argc; ++i) +obstack_grow (&obstack, argv[i], strlen (argv[i])); + obstack_1grow (&obstack, '\0'); + return XOBFINISH (&obstack, const char *); +} + /* If any character in ORIG fits QUOTE_P (_, P), reallocate the string so as to precede every one of them with a backslash. Return the original string or the reallocated one. */ -- 2.41.0
[PATCH v8 3/4] c++modules: report imported CMI files as dependencies
They affect the build, so report them via `-MF` mechanisms. gcc/cp/ * module.cc (do_import): Report imported CMI files as dependencies. gcc/testsuite/ * g++.dg/modules/depreport-1_a.C: New test. * g++.dg/modules/depreport-1_b.C: New test. * g++.dg/modules/test-depfile.py: New tool for validating depfile information. * lib/modules.exp: Support for validating depfile contents. Signed-off-by: Ben Boeckel --- gcc/cp/module.cc | 3 + gcc/testsuite/g++.dg/modules/depreport-1_a.C | 10 + gcc/testsuite/g++.dg/modules/depreport-1_b.C | 12 ++ gcc/testsuite/g++.dg/modules/test-depfile.py | 187 +++ gcc/testsuite/lib/modules.exp| 29 +++ 5 files changed, 241 insertions(+) create mode 100644 gcc/testsuite/g++.dg/modules/depreport-1_a.C create mode 100644 gcc/testsuite/g++.dg/modules/depreport-1_b.C create mode 100644 gcc/testsuite/g++.dg/modules/test-depfile.py diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc index 9df60d695b1..f3acc4e02fe 100644 --- a/gcc/cp/module.cc +++ b/gcc/cp/module.cc @@ -18968,6 +18968,9 @@ module_state::do_import (cpp_reader *reader, bool outermost) dump () && dump ("CMI is %s", file); if (note_module_cmi_yes || inform_cmi_p) inform (loc, "reading CMI %qs", file); + /* Add the CMI file to the dependency tracking. */ + if (cpp_get_deps (reader)) + deps_add_dep (cpp_get_deps (reader), file); fd = open (file, O_RDONLY | O_CLOEXEC | O_BINARY); e = errno; } diff --git a/gcc/testsuite/g++.dg/modules/depreport-1_a.C b/gcc/testsuite/g++.dg/modules/depreport-1_a.C new file mode 100644 index 000..241701728a2 --- /dev/null +++ b/gcc/testsuite/g++.dg/modules/depreport-1_a.C @@ -0,0 +1,10 @@ +// { dg-additional-options -fmodules-ts } + +export module Foo; +// { dg-module-cmi Foo } + +export class Base +{ +public: + int m; +}; diff --git a/gcc/testsuite/g++.dg/modules/depreport-1_b.C b/gcc/testsuite/g++.dg/modules/depreport-1_b.C new file mode 100644 index 000..b6e317c6703 --- /dev/null +++ b/gcc/testsuite/g++.dg/modules/depreport-1_b.C @@ -0,0 +1,12 @@ +// { dg-additional-options -fmodules-ts } +// { dg-additional-options -MD } +// { dg-additional-options "-MF depreport-1.d" } + +import Foo; + +void foo () +{ + Base b; +} + +// { dg-final { run-check-module-dep-expect-input "depreport-1.d" "gcm.cache/Foo.gcm" } } diff --git a/gcc/testsuite/g++.dg/modules/test-depfile.py b/gcc/testsuite/g++.dg/modules/test-depfile.py new file mode 100644 index 000..ea4edb61434 --- /dev/null +++ b/gcc/testsuite/g++.dg/modules/test-depfile.py @@ -0,0 +1,187 @@ +import json + + +# Parameters. +ALL_ERRORS = False + + +def _report_error(msg): +'''Report an error.''' +full_msg = 'ERROR: ' + msg +if ALL_ERRORS: +print(full_msg) +else: +raise RuntimeError(full_msg) + + +class Token(object): +pass + + +class Output(Token): +def __init__(self, path): +self.path = path + + +class Input(Token): +def __init__(self, path): +self.path = path + + +class Colon(Token): +pass + + +class Append(Token): +pass + + +class Variable(Token): +def __init__(self, name): +self.name = name + + +class Word(Token): +def __init__(self, name): +self.name = name + + +def validate_depfile(depfile, expect_input=None): +'''Validate a depfile contains some information + +Returns `False` if the information is not found. +''' +with open(depfile, 'r') as fin: +depfile_content = fin.read() + +real_lines = [] +join_line = False +for line in depfile_content.split('\n'): +# Join the line if needed. +if join_line: +line = real_lines.pop() + line + +# Detect line continuations. +join_line = line.endswith('\\') +# Strip line continuation characters. +if join_line: +line = line[:-1] + +# Add to the real line set. +real_lines.append(line) + +# Perform tokenization. +tokenized_lines = [] +for line in real_lines: +tokenized = [] +join_word = False +for word in line.split(' '): +if join_word: +word = tokenized.pop() + ' ' + word + +# Detect word joins. +join_word = word.endswith('\\') +# Strip escape character. +if join_word: +word = word[:-1] + +# Detect `:` at the end of a word. +if word.endswith(':'): +tokenized.append(word[:-1]) +word = word[-1] + +# Add word to the tokenized set. +tokenized.append(word) + +tokenized_line
[PATCH v8 4/4] c++modules: report module mapper files as a dependency
It affects the build, and if used as a static file, can reliably be tracked using the `-MF` mechanism. gcc/cp/: * mapper-client.cc, mapper-client.h (open_module_client): Accept dependency tracking and track module mapper files as dependencies. * module.cc (make_mapper, get_mapper): Pass the dependency tracking class down. gcc/testsuite/: * g++.dg/modules/depreport-2.modmap: New test. * g++.dg/modules/depreport-2_a.C: New test. * g++.dg/modules/depreport-2_b.C: New test. * g++.dg/modules/test-depfile.py: Support `:|` syntax output when generating modules. Signed-off-by: Ben Boeckel --- gcc/cp/mapper-client.cc | 5 + gcc/cp/mapper-client.h| 1 + gcc/cp/module.cc | 18 - .../g++.dg/modules/depreport-2.modmap | 2 ++ gcc/testsuite/g++.dg/modules/depreport-2_a.C | 15 ++ gcc/testsuite/g++.dg/modules/depreport-2_b.C | 14 + gcc/testsuite/g++.dg/modules/test-depfile.py | 20 +++ 7 files changed, 66 insertions(+), 9 deletions(-) create mode 100644 gcc/testsuite/g++.dg/modules/depreport-2.modmap create mode 100644 gcc/testsuite/g++.dg/modules/depreport-2_a.C create mode 100644 gcc/testsuite/g++.dg/modules/depreport-2_b.C diff --git a/gcc/cp/mapper-client.cc b/gcc/cp/mapper-client.cc index 39e80df2d25..92727195246 100644 --- a/gcc/cp/mapper-client.cc +++ b/gcc/cp/mapper-client.cc @@ -34,6 +34,7 @@ along with GCC; see the file COPYING3. If not see #include "diagnostic-core.h" #include "mapper-client.h" #include "intl.h" +#include "mkdeps.h" #include "../../c++tools/resolver.h" @@ -132,6 +133,7 @@ spawn_mapper_program (char const **errmsg, std::string &name, module_client * module_client::open_module_client (location_t loc, const char *o, + class mkdeps *deps, void (*set_repo) (const char *), char const *full_program_name) { @@ -285,6 +287,9 @@ module_client::open_module_client (location_t loc, const char *o, errmsg = "opening"; else { + /* Add the mapper file to the dependency tracking. */ + if (deps) + deps_add_dep (deps, name.c_str ()); if (int l = r->read_tuple_file (fd, ident, false)) { if (l > 0) diff --git a/gcc/cp/mapper-client.h b/gcc/cp/mapper-client.h index b32723ce296..a3b0b8adc51 100644 --- a/gcc/cp/mapper-client.h +++ b/gcc/cp/mapper-client.h @@ -55,6 +55,7 @@ public: public: static module_client *open_module_client (location_t loc, const char *option, + class mkdeps *, void (*set_repo) (const char *), char const *); static void close_module_client (location_t loc, module_client *); diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc index f3acc4e02fe..77c9edcbc04 100644 --- a/gcc/cp/module.cc +++ b/gcc/cp/module.cc @@ -3969,12 +3969,12 @@ static GTY(()) vec *partial_specializations; /* Our module mapper (created lazily). */ module_client *mapper; -static module_client *make_mapper (location_t loc); -inline module_client *get_mapper (location_t loc) +static module_client *make_mapper (location_t loc, class mkdeps *deps); +inline module_client *get_mapper (location_t loc, class mkdeps *deps) { auto *res = mapper; if (!res) -res = make_mapper (loc); +res = make_mapper (loc, deps); return res; } @@ -14033,7 +14033,7 @@ get_module (const char *ptr) /* Create a new mapper connecting to OPTION. */ module_client * -make_mapper (location_t loc) +make_mapper (location_t loc, class mkdeps *deps) { timevar_start (TV_MODULE_MAPPER); const char *option = module_mapper_name; @@ -14041,7 +14041,7 @@ make_mapper (location_t loc) option = getenv ("CXX_MODULE_MAPPER"); mapper = module_client::open_module_client -(loc, option, &set_cmi_repo, +(loc, option, deps, &set_cmi_repo, (save_decoded_options[0].opt_index == OPT_SPECIAL_program_name) && save_decoded_options[0].arg != progname ? save_decoded_options[0].arg : nullptr); @@ -19506,7 +19506,7 @@ maybe_translate_include (cpp_reader *reader, line_maps *lmaps, location_t loc, dump.push (NULL); dump () && dump ("Checking include translation '%s'", path); - auto *mapper = get_mapper (cpp_main_loc (reader)); + auto *mapper = get_mapper (cpp_main_loc (reader), cpp_get_deps (reader)); size_t len = strlen (path); path = canonicalize_header_name (NULL, loc, true, path, len); @@ -19622,7 +19622,7 @@ module_begin_main_file (cpp_reader *reader, line_maps *lmaps, static void nam
[PATCH v8 2/4] p1689r5: initial support
les.exp: Support for validating P1689 outputs. Signed-off-by: Ben Boeckel --- gcc/c-family/c-opts.cc| 44 +++- gcc/c-family/c.opt| 12 + gcc/cp/module.cc | 3 +- gcc/doc/invoke.texi | 27 +++ gcc/gcc.cc| 4 +- gcc/json.h| 3 + gcc/testsuite/g++.dg/modules/depflags-f-MD.C | 2 + gcc/testsuite/g++.dg/modules/depflags-f.C | 3 + gcc/testsuite/g++.dg/modules/depflags-fi.C| 4 + gcc/testsuite/g++.dg/modules/depflags-fj-MD.C | 3 + .../g++.dg/modules/depflags-fj-MF-share.C | 6 + gcc/testsuite/g++.dg/modules/depflags-fj.C| 4 + .../g++.dg/modules/depflags-fjo-MD.C | 4 + gcc/testsuite/g++.dg/modules/depflags-fjo.C | 5 + gcc/testsuite/g++.dg/modules/depflags-fo-MD.C | 3 + gcc/testsuite/g++.dg/modules/depflags-fo.C| 4 + gcc/testsuite/g++.dg/modules/depflags-j-MD.C | 2 + gcc/testsuite/g++.dg/modules/depflags-j.C | 3 + gcc/testsuite/g++.dg/modules/depflags-jo-MD.C | 3 + gcc/testsuite/g++.dg/modules/depflags-jo.C| 4 + gcc/testsuite/g++.dg/modules/depflags-o-MD.C | 2 + gcc/testsuite/g++.dg/modules/depflags-o.C | 3 + gcc/testsuite/g++.dg/modules/modules.exp | 1 + gcc/testsuite/g++.dg/modules/p1689-1.C| 17 ++ gcc/testsuite/g++.dg/modules/p1689-1.exp.ddi | 27 +++ gcc/testsuite/g++.dg/modules/p1689-2.C| 15 ++ gcc/testsuite/g++.dg/modules/p1689-2.exp.ddi | 16 ++ gcc/testsuite/g++.dg/modules/p1689-3.C| 13 + gcc/testsuite/g++.dg/modules/p1689-3.exp.ddi | 16 ++ gcc/testsuite/g++.dg/modules/p1689-4.C| 13 + gcc/testsuite/g++.dg/modules/p1689-4.exp.ddi | 14 ++ gcc/testsuite/g++.dg/modules/p1689-5.C| 13 + gcc/testsuite/g++.dg/modules/p1689-5.exp.ddi | 14 ++ .../g++.dg/modules/p1689-file-default.C | 16 ++ .../g++.dg/modules/p1689-file-default.exp.ddi | 27 +++ .../g++.dg/modules/p1689-target-default.C | 16 ++ .../modules/p1689-target-default.exp.ddi | 27 +++ gcc/testsuite/g++.dg/modules/test-p1689.py| 222 ++ gcc/testsuite/lib/modules.exp | 71 ++ libcpp/include/cpplib.h | 12 +- libcpp/include/mkdeps.h | 9 +- libcpp/init.cc| 13 +- libcpp/mkdeps.cc | 163 - 43 files changed, 869 insertions(+), 14 deletions(-) create mode 100644 gcc/testsuite/g++.dg/modules/depflags-f-MD.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-f.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fi.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fj-MD.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fj-MF-share.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fj.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fjo-MD.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fjo.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fo-MD.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fo.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-j-MD.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-j.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-jo-MD.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-jo.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-o-MD.C create mode 100644 gcc/testsuite/g++.dg/modules/depflags-o.C create mode 100644 gcc/testsuite/g++.dg/modules/p1689-1.C create mode 100644 gcc/testsuite/g++.dg/modules/p1689-1.exp.ddi create mode 100644 gcc/testsuite/g++.dg/modules/p1689-2.C create mode 100644 gcc/testsuite/g++.dg/modules/p1689-2.exp.ddi create mode 100644 gcc/testsuite/g++.dg/modules/p1689-3.C create mode 100644 gcc/testsuite/g++.dg/modules/p1689-3.exp.ddi create mode 100644 gcc/testsuite/g++.dg/modules/p1689-4.C create mode 100644 gcc/testsuite/g++.dg/modules/p1689-4.exp.ddi create mode 100644 gcc/testsuite/g++.dg/modules/p1689-5.C create mode 100644 gcc/testsuite/g++.dg/modules/p1689-5.exp.ddi create mode 100644 gcc/testsuite/g++.dg/modules/p1689-file-default.C create mode 100644 gcc/testsuite/g++.dg/modules/p1689-file-default.exp.ddi create mode 100644 gcc/testsuite/g++.dg/modules/p1689-target-default.C create mode 100644 gcc/testsuite/g++.dg/modules/p1689-target-default.exp.ddi create mode 100644 gcc/testsuite/g++.dg/modules/test-p1689.py create mode 100644 gcc/testsuite/lib/modules.exp diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc index 4961af63de8..1c1f8c84f88 100644 --- a/gcc/c-family/c-opts.cc +++ b/gcc/c-family/c-opts.cc @@ -77,6 +77,9 @@ static bool verbose; /* Dependency output file. */ static const char *deps_file; +/* Structured dependency output file. */ +static const char *fdeps_file; + /* The prefix given by -iprefix, if any. *
Re: [PATCH v8 0/4] P1689R5 support
On Tue, Sep 19, 2023 at 17:33:34 -0400, Jason Merrill wrote: > Pushed, thanks! Thanks! Is there a process I can use to backport this to GCC 13? --Ben
Re: Scaling -fmacro-prefix-map= to thousands entries
On Wed, Oct 04, 2023 at 22:19:32 +0100, Sergei Trofimovich via Gcc wrote: > The prototype that creates equivalent of the following commands does > work for smaller packages: > > > -fmacro-prefix-map=/nix/store/y8wfrgk7br5rfz4221lfb9v8w3n0cnyd-glibc-2.37-8-dev=/nix/store/-glibc-2.37-8-dev > > -fmacro-prefix-map=/nix/store/8n240jfdmsb3lnc2qa2vb9dwk638j1lp-gmp-with-cxx-6.3.0-dev=/nix/store/-gmp-with-cxx-6.3.0-dev > > -fmacro-prefix-map=/nix/store/phjcmy025rd1ankw5y1b21xsdii83cyk-nlohmann_json-3.11.2=/nix/store/-nlohmann_json-3.11.2 > ... > > The above works for small amount of options (like, 100). But around 1000 > options we start hitting linux limits on the single environment variable > or real-world packages like `qemu` with a ton of input depends. Are you trying to pass this through via `CFLAGS` and friends? > The command-line limitations are in various places: > - `gcc` limitation of lifting all command line options into a single > environment variable: https://gcc.gnu.org/PR111527 > - `linux` limitation of constraining single environ variable to a value > way below than full available environment space: > https://lkml.org/lkml/2023/9/24/381 > > `linux` fix would buy us 50x more budged (A Lot) but it will not help > much other operating systems like `Darwin` where absolute environment > limit is a lot lower than `linux`. > > I already implemented [1.] in https://github.com/NixOS/nixpkgs/pull/255192 > (also attached `mangle-NIX_STORE-in-__FILE__.patch` 3.5K patch against > `master` as a proof of concept). > > What would be the best way to scale up `-fmacro-prefix-map=` up to NixOS > needs for `gcc`? I would like to implement something sensible I could > upstream. How about `CFLAGS=@macro_prefix_map.args` and writing that file in the same codepath where you generate the flags today. It'll work with just about every compiler and tools like `ccache` will understand that it is an input that affects the build and properly take the file's contents into account. --Ben
Re: Scaling -fmacro-prefix-map= to thousands entries
On Thu, Oct 05, 2023 at 12:59:21 +0100, Sergei Trofimovich via Gcc wrote: > Sounds good. Do you have any preference over specific syntax? My > suggestions would be: > > 1. `-fmacro-prefix-map=file-name`: if `file-name` there is not in `key=val` >format then treat it as file > 2. `-fmacro-prefix-map=@file-name`: use @ as a signal to use file > 3. `fmacro-prefix-map-file=file-name`: use a new option Whatever is picked, please let `ccache` and `sccache` know so that they can include this argument's contents in their hashes. `distcc` and other distributed build tools need to know to send this file to wherever compilation is happening as well (though they may actually not care as they may send around preprocessed source?). Thanks, --Ben
Re: Odd Python errors in the G++ testsuite
On Mon, Oct 09, 2023 at 20:12:01 +0100, Maciej W. Rozycki wrote: > I'm seeing these tracebacks for several cases across the G++ testsuite: > > Executing on host: python3 -c "import sys; assert sys.version_info >= (3, 6)" >(timeout = 300) > spawn -ignore SIGHUP python3 -c import sys; assert sys.version_info >= (3, 6) What version of Python3 do you have? The test suite might not actually properly handle not having 3.7 (i.e., skip the tests that require it). > rules/0/primary-output is ok: p1689-1.o I wrote these tests. > rules/0/provides/0/logical-name is ok: foo > rules/0/provides/0/is-interface is ok: True > Traceback (most recent call last): > File ".../gcc/testsuite/g++.dg/modules/test-p1689.py", line 218, in > is_ok = validate_p1689(actual, expect) > File ".../gcc/testsuite/g++.dg/modules/test-p1689.py", line 182, in > validate_p1689 > return compare_json([], actual_json, expect_json) > File ".../gcc/testsuite/g++.dg/modules/test-p1689.py", line 145, in > compare_json > is_ok = _compare_object(path, actual, expect) > File ".../gcc/testsuite/g++.dg/modules/test-p1689.py", line 66, in > _compare_object > sub_error = compare_json(path + [key], actual[key], expect[key]) > File ".../gcc/testsuite/g++.dg/modules/test-p1689.py", line 151, in > compare_json > is_ok = _compare_array(path, actual, expect) > File ".../gcc/testsuite/g++.dg/modules/test-p1689.py", line 87, in > _compare_array > sub_error = compare_json(path + [str(idx)], a, e) > File ".../gcc/testsuite/g++.dg/modules/test-p1689.py", line 145, in > compare_json > is_ok = _compare_object(path, actual, expect) > File ".../gcc/testsuite/g++.dg/modules/test-p1689.py", line 66, in > _compare_object > sub_error = compare_json(path + [key], actual[key], expect[key]) > File ".../gcc/testsuite/g++.dg/modules/test-p1689.py", line 149, in > compare_json > actual = set(actual) > TypeError: unhashable type: 'dict' I'm not sure how this ends up with a dictionary in it… Can you `print(actual)` before this? > and also these intermittent failures for other cases: > > Executing on host: python3 -c "import sys; assert sys.version_info >= (3, 6)" >(timeout = 300) > spawn -ignore SIGHUP python3 -c import sys; assert sys.version_info >= (3, 6) > rules/0/primary-output is ok: p1689-2.o > rules/0/provides/0/logical-name is ok: foo:part1 > rules/0/provides/0/is-interface is ok: True > ERROR: length mismatch at rules/0/requires: actual: "1" expect: "0" > version is ok: 0 > revision is ok: 0 > FAIL: ERROR: length mismatch at rules/0/requires: actual: "1" expect: "0" > > This does seem to me like something not working as intended. As a Python > non-expert I have troubles concluding what is going on here and whether > these tracebacks are indeed supposed to be there, or whether it is a sign > of a problem. And these failures I don't even know where they come from. > > Does anyone know? Is there a way to run the offending commands by hand? > The relevant invocation lines do not appear in the test log file for one > to copy and paste, which I think is not the right way of doing things in > our environment. > > These issues seem independent from the test host environment as I can see > them on both a `powerpc64le-linux-gnu' and an `x86_64-linux-gnu' machine > in `riscv64-linux-gnu' target testing. Do they all have pre-3.7 Python3 versions? --Ben
Re: Odd Python errors in the G++ testsuite
On Mon, Oct 09, 2023 at 19:46:37 -0400, Paul Koning wrote: > > > > On Oct 9, 2023, at 7:42 PM, Ben Boeckel via Gcc wrote: > > > > On Mon, Oct 09, 2023 at 20:12:01 +0100, Maciej W. Rozycki wrote: > >> I'm seeing these tracebacks for several cases across the G++ testsuite: > >> > >> Executing on host: python3 -c "import sys; assert sys.version_info >= (3, > >> 6)"(timeout = 300) > >> spawn -ignore SIGHUP python3 -c import sys; assert sys.version_info >= (3, > >> 6) > > > > What version of Python3 do you have? The test suite might not actually > > properly handle not having 3.7 (i.e., skip the tests that require it). > > But the rule that you can't put a dict in a set is as old as set support (2.x > for some small x). Yes. I just wonder how a dictionary got in there in the first place. I'm not sure if some *other* 3.7-related change makes that work. --Ben
Re: Odd Python errors in the G++ testsuite
On Mon, Oct 09, 2023 at 20:12:01 +0100, Maciej W. Rozycki wrote: > Hi, > > I'm seeing these tracebacks for several cases across the G++ testsuite: > > Executing on host: python3 -c "import sys; assert sys.version_info >= (3, 6)" >(timeout = 300) > spawn -ignore SIGHUP python3 -c import sys; assert sys.version_info >= (3, 6) > rules/0/primary-output is ok: p1689-1.o > rules/0/provides/0/logical-name is ok: foo > rules/0/provides/0/is-interface is ok: True > Traceback (most recent call last): > File ".../gcc/testsuite/g++.dg/modules/test-p1689.py", line 218, in > is_ok = validate_p1689(actual, expect) > File ".../gcc/testsuite/g++.dg/modules/test-p1689.py", line 182, in > validate_p1689 > return compare_json([], actual_json, expect_json) > File ".../gcc/testsuite/g++.dg/modules/test-p1689.py", line 145, in > compare_json > is_ok = _compare_object(path, actual, expect) > File ".../gcc/testsuite/g++.dg/modules/test-p1689.py", line 66, in > _compare_object > sub_error = compare_json(path + [key], actual[key], expect[key]) > File ".../gcc/testsuite/g++.dg/modules/test-p1689.py", line 151, in > compare_json > is_ok = _compare_array(path, actual, expect) > File ".../gcc/testsuite/g++.dg/modules/test-p1689.py", line 87, in > _compare_array > sub_error = compare_json(path + [str(idx)], a, e) > File ".../gcc/testsuite/g++.dg/modules/test-p1689.py", line 145, in > compare_json > is_ok = _compare_object(path, actual, expect) > File ".../gcc/testsuite/g++.dg/modules/test-p1689.py", line 66, in > _compare_object > sub_error = compare_json(path + [key], actual[key], expect[key]) > File ".../gcc/testsuite/g++.dg/modules/test-p1689.py", line 149, in > compare_json > actual = set(actual) > TypeError: unhashable type: 'dict' So looking at the source…the 3.6 check is not from me. Not sure what's up there; it's probably not immediately related to the backtrace. But this backtrace means that we have a list of objects that do not expect a given ordering that is of JSON objects. I'm not sure why this never showed up before as all of the existing uses of it are indeed of objects. Can you try removing `"__P1689_unordered__"` from the `p1689-1.exp.ddi` file's `requires` array? The `p1689-file-default.exp.ddi` and `p1689-target-default.exp.ddi` files need the same treatment. --Ben
Re: Updated Sourceware infrastructure plans
On Wed, May 01, 2024 at 23:26:18 +0200, Mark Wielaard wrote: > On Wed, May 01, 2024 at 04:04:37PM -0400, Jason Merrill wrote: > > Do you (or others) have any thoughts about GitLab FOSS? > > The gitlab "community edition" still feels not very much "community". > We could run our own instance, but it will still be "open core" with > features missing to try to draw you towards the proprietary hosted > saas version. Also it seems to have way too much overhead. The focus > is clearly corporate developers where managers want assurances the > mandatory "pipelines" are executed and "workflows" followed exactly. I'll offer my experience here. We (at Kitware) have been using GitLab FOSS for around 8 years. We can't use the other editions because of the per-account pricing and having open registration (since pretty much everything there is FOSS code). GitLab is receptive to patches sent their way and have considered moving things to the FOSS edition to help large FOSS organizations (freedesktop.org, GNOME, KDE, probably others too). There's also been discussion of implementing features such as commit message review in order to court Linux developers given forge-like discussion happening there. FWIW, Fedora is also looking at forges as well: https://discussion.fedoraproject.org/t/2024-git-forge-evaluation/111795 That said, there are definitely gaps to fill. We have our tooling here: https://gitlab.kitware.com/utils/rust-ghostflow (core actions) https://gitlab.kitware.com/utils/ghostflow-director (service deployment) We use it to implement things including: - Basic content checks (scripts are executable, no binaries, file size limits, formatting, etc.) either on a commit-by-commit basis or by looking at the MR (patch series, PR, whatever the forge calls it) as a whole. Docs for currently-implemented checks are here: https://gitlab.kitware.com/utils/rust-ghostflow/-/blob/master/ghostflow-cli/doc/checks.md - Reformatting upon request; if the formatter(s) in use supports writing the content as intended, there is code to rewrite each individual patch to conform. This avoids wasting time on either side for things that can be done automatically (of course, you're also at the mercy of what the formatter wants…I find it worth it on balance). - More advanced merging including gathering trailers for the merge commit message from comments and other metadata including `Reviewed-by` and `Tested-by` (also from CI). Also supported is merging into multiple branches at once (e.g., backports to older branches with a single MR). - Merge train support (we call it the "stage"); this feature is otherwise locked behind for-pay editions of GitLab. Right now, GitLab and Github are supported, but other forges can be supported as well. In addition to the service (which is triggered by webhook delivery), there's a command line tool for local usage (though it only implements checking and reformatting at the moment mainly due to a lack of available time to work on it). There are other things that are probably of interest to supply chain or other things such as: - every push is stored in a ghostflow-director-side unique ref (`refs/mr/ID/heads/N` where `N` is an incrementing integer) to avoid forge-side garbage collection (especially problematic on Github; I've not noticed GitLab collecting so eagerly) - all webhooks are delivered via filesystem and can be archived (`webhook-listen` is the program that listens and delivers them: https://gitlab.kitware.com/utils/webhook-listen); events which trigger failures are stored with some context about what happened; those that are ignored are stored with a reason for the ignore (see this crate for the "event loop" of `ghostflow-director` itself: https://gitlab.kitware.com/utils/rust-json-job-dispatch) - the forge is the source of truth; if a ref is force-pushed, `ghostflow` will accept the state on the forge as gospel instead; the only non-logging/historical tracking state off-forge includes: - the config file - formatter installation (formatting is designed to only use trusted binaries; nothing from the repo itself other than which to use) On the first two points, we had some data loss on our instance once and using the webhook history and stored refs, I was able to restore code pushed to projects and "replay" comments that happened since the last backup (I copied the content and @mentioned the original author). > At the moment though the only thing people seem to agree on is that > any system will be based on git. So the plan for now is to first setup > a larger git(olite) system so that every contributor (also those who > don't currently have commit access) can easily "post" their git > repo. This can then hopefully integrate with the systems we already > have setup (triggering builder CI, flag/match with patchwork/emails, > etc.) or any future "pull request" l
Re: Updated Sourceware infrastructure plans
On Sun, May 05, 2024 at 08:22:12 +0300, Benson Muite wrote: > On 04/05/2024 22.56, Ben Boeckel via Overseers wrote: > > As a fellow FOSS maintainer I definitely appreciate the benefit of being > > email-based (`mutt` is far better at wrangling notifications from > > umpteen places than…well basically any website is at even their own), > > but as a *contributor* it is utterly opaque. It's not always clear if my > > patch has been seen, if it is waiting on maintainer time, or for me to > > do something. After one review, what is the courtesy time before pushing > > a new patchset to avoid a review "crossing in the night" as I push more > > patches? Did I get everyone that commented on the patch the first time > > in the Cc list properly? Is a discussion considered resolved (FWIW, > > Github is annoying with its conversation resolution behavior IMO; > > GitLab's explicit closing is much better). Has it been merged? To the > > right place? And that's for patches I author; figuring out the status of > > patches I'm interested in but not the author of is even harder. A forge > > surfaces a lot of this information pretty well and, to me, GitLab at > > least offers usable enough email messages (e.g., discussions on threads > > will thread in email too) that the public tracking of such things is far > > more useful on the whole. > > This is an area that also needs standardization of important > functionality. Some method of archiving the content is also helpful - > email does this well but typically does not offer dashboard. Sourcehut > makes reading threads using the web interface very easy. The other thing that email makes difficult to do: jump in on an existing discussion without having been subscribed previously. I mean, I know how to tell `mutt` to set an `In-Reply-To` header and munge a proper reply by hand once I find a `Message-Id` (though a fully proper `References` header is usually way too much work to be worth it), but this is not something I expect others to be able to easily perform. > Web interfaces are difficult to automate, but friendlier for occasional > use and encouraging new contributions. Tools separate from the version > control system such as Gerrit, Phabricator, Rhode Code and Review Board > also enable discussion management and overview. Note that forges tend to have very rich APIs. It's certainly not as easy as clicking around manually for one-off tasks or setting up a shell pipeline to process some emails, but building automation isn't impossible. --Ben
Re: Updated Sourceware infrastructure plans
On Tue, May 07, 2024 at 16:17:24 +, Joseph Myers via Gcc wrote: > I'd say we have two kinds of patch submission (= two kinds of pull request > in a pull request workflow) to consider in the toolchain, and it's > important that a PR-based system supports both of them well (and supports > a submission changing from one kind to the other, and preferably > dependencies between multiple PRs where appropriate). The way I'd handle this in `ghostflow` is with a description trailer like `Squash-merge: true` (already implemented trailers include `Backport`, `Fast-forward`, `Backport-ff`, and `Topic-rename` as description trailers, so this is a natural extension there). Alternatively a label can be used, but that is not directly editable by MR authors that are not also members of the project. There's also a checkbox either at MR creation and/or merge time to select between them, but I don't find that easily discoverable (I believe only those with merge rights can see the button state in general). > * Simple submissions that are intended to end up as a single commit on the > mainline (squash merge). The overall set of changes to be applied to the > mainline is subject to review, and the commit message also is subject to > review (review of commit messages isn't always something that PR-based > systems seem to handle that well). But for the most part there isn't a > need to rebase these - fixes as a result of review can go as subsequent > commits on the source branch (making it easy to review either the > individual fixes, or the whole updated set of changes), and merging from > upstream into that branch is also OK. (If there *is* a rebase, the > PR-based system should still preserve the history of and comments on > previous versions, avoid GCing them and avoid getting confused.) > > * Complicated submissions of patch series, that are intended to end up as > a sequence of commits on the mainline (non-squash merge preserving the > sequence of commits). In this case, fixes (or updating from upstream) > *do* involve rebases to show what the full new sequence of commits should > be (and all individual commits and their commit messages should be subject > to review, not just the overall set of changes to be applied). Again, > rebases need handling by the system in a history-preserving way. There's been a long-standing issue to use `range-diff` in GitLab. I really don't know why it isn't higher priority, but I suppose having groups like Sourceware and/or kernel.org interested could move it up a priority list for them. https://gitlab.com/gitlab-org/gitlab/-/issues/24096 FWIW, there's also a "comment on commit messages" issue: https://gitlab.com/gitlab-org/gitlab/-/issues/19691 That said, I've had little issues with rebases losing commits or discussion on GitLab whereas I've definitely seen things get lost on Github. I'm unfamiliar with other forges to know there (other than that Gerrit-likes that track patches are generally workable with rebases). > GitHub (as an example - obviously not appropriate itself for the > toolchain) does much better on simple submissions (either with squash > merges, or with merges showing the full history if you don't care about a > clean bisectable history), apart from review of commit messages, than it > does on complicated submissions or dependencies between PRs (I think > systems sometimes used for PR dependencies on GitHub may actually be > third-party add-ons). The way I've tended to handle this is to have one "main MR" that is the "whole story" with component MRs split out for separate review. Once the separate MRs are reviewed and merged (with cross references), the main MR is rebased to incorporate the merged code and simplify its diff. This helps to review smaller bits while also having the full story available for viewing. > Pull request systems have obvious advantages over mailing lists for > tracking open submissions - but it's still very easy for an active project > to end up with thousands of open PRs, among which it's very hard to find > anything. In CMake, the mechanism used to keep the queue manageable is to have a `triage:expired` label for closed-for-inactivity (or other reasons) so that closed-but-only-neglected MRs can be distinguished from closed-because-not-going-to-be-merged MRs. The "active patch queue" tends to stay under 20, but sometimes swells to 30 in busy times (as of this writing, it is at 10 open MRs). --Ben
Re: [PATCH v1] Remove 'restrict' from 'nptr' in strtol(3)-like functions
On Fri, Jul 05, 2024 at 22:15:44 +0200, Emanuele Torre via Gcc wrote: > That is 6.7.3.1p3: > > > > In what follows, a pointer expression E is said to be based on object P > if (at some sequence point in the execution of B prior to the > evaluation of E) modifying P to point to a copy of the array object > into which it formerly pointed would change the value of E.153) Note > that "based" is defined only for expressions with pointer types. > > Footnote 153) In other words, E depends on the value of P itself rather > than on the value of an object referenced indirectly through P. For > example, if identifier p has type (int **restrict), then the pointer > expressions p and p+1 are based on the restricted pointer object > designated by p, but the pointer expressions *p and p[1] are not. > > > > Which would be the same paragraph of the same section on N3047, but > footnote number 168. Obviously, we need better types on our standardese pointers. Jonathan used a pointer of type `footnote _N3220*` while Alejandro was expecting a pointer of type `footnote _N3047*`. Of course, given that they end up referring to the same thing, this may have interesting implications when they are used with `restrict`. --Ben
Re: Spurious -Wsequence-point with auto
On Fri, Aug 09, 2024 at 11:48:44 +0200, Alejandro Colomar via Gcc wrote: > On Fri, Aug 09, 2024 at 10:31:55AM GMT, Martin Uecker wrote: > > BTW: Double evaluation is not only a problem for semantics, but also because > > it can cause long compile times, cf. the 45 MB macro expansion example > > in Linux... > > Huh, what's that one? I didn't know it. :) https://lwn.net/Articles/983965/ --Ben
Re: On the subject of module consumer diagnostics.
On Tue, Sep 03, 2024 at 16:53:43 +0100, Iain Sandoe wrote: > I think that might be a misunderstanding on the part of the author; > AFAIU both GCC and MSVC _do_ require access to the sources at BMI > consume-time to give decent diagnostics. I think that there might be > confusion because the compilation would suceed on those toolchains > without the sources - but with poorer diagnostic quality? Does this have (additional) implications for caching tools and modules? They cache diagnostic output, but if these other paths showing up or disappearing affects the output, the cache key should incorporate that as well. Should there be a way for such tools to get this information somehow? Ideally the paths would only matter if reported diagnostics *would* look at the files, not just "there's a BMI that mentions a source path X" kind of inspection. --Ben
Re: #pragma once behavior
On Fri, Sep 06, 2024 at 00:03:23 -0500, Jeremy Rifkin wrote: > Hello, > > I'm looking at #pragma once behavior among the major C/C++ compilers as > part of a proposal paper for standardizing #pragma once. (This is > apparently a very controversial topic) > > To put my question up-front: Would GCC ever be open to altering its > #pragma once behavior to bring it more in-line with behavior from other > compilers and possibly more in-line with what users expect? > > To elaborate more: > > Design decisions for #pragma once essentially boil down to a file-based > definitions vs a content-based definition of "same file". > > A file-based definition is easier to reason about and more in-line with > what users expect, however, distinct copies of headers can't be handled > and multiple mount points are problematic. > > A content-based definition works for distinct copies, multiple mount > points, and is completely sufficient 99% of the time, however, it could > potentially break in hard-to-debug ways in a few notable cases (more > information later). > > Currently the three major C/C++ compilers treat #pragma once very differently: > - GCC uses file mtime + file contents > - Clang uses inodes > - MSVC uses file path > > None of the major compilers have documented their #pragma once semantics. > > In practice all three of these approaches work pretty well most of the > time (which is why people feel comfortable using #pragma once). However, > they can each break in their own ways. > > As mentioned earlier, clang and MSVC's file-based definitions of "same > file" break for multiple mount points and multiple copies of the same > header. MSVC's approach breaks for symbolic links and hard links. > > GCC's hybrid approach can break in surprising ways. I have three > examples to share: > > Example 1: > > Consider a scenario such as: > > usr/ > include/ > library_a/ > library_main.hpp > foo.hpp > library_b/ > library_main.hpp > foo.hpp > src/ > main.cpp > > main.cpp: > #include "library_a/library_main.hpp" > #include "library_b/library_main.hpp" > > And both library_main.hpp's have: > #pragma once > #include "foo.hpp" Could a "uses the relative search path" fact be used to mix into the file's identity? This way the `once` key would see "this content looked for things in directory `library_a`" and would see that `library_b/library_main.hpp`, despite the same content (and mtime) is actually a different context and actually perform the inclusions? Of course, this fails if `#include "../common/foo.hpp"` is used in each location as that *would* then want to elide the second inclusion. I don't know how this problem is avoided without actually reading the contents again. But the "I read this file" can remember what relative paths were searched (since the contents are the same at least). > Example 2: > > namespace v1 { > #include "library_v1.hpp" > } > namespace v2 { > #include "library_v2.hpp" > } > > Where both library headers include their own copy of a shared header > using #pragma once. Again, the context of the inclusion matters, so "is wrapped in a scope" can modify the "onceness" (`extern "C"` is probably the more common instance). > Example 3: > > usr/ > include/ > library/ > library.hpp > vendored-dependency.hpp > src/ > main.cpp > vendored-dependency.hpp > > main.cpp: > #include "vendored-dependency.hpp" > #include > > library.hpp: > #pragma once > #include "vendored-dependency.hpp" This is basically the same as Example 1 as far as context goes. Note that context cannot include `#define` state because `#once` is defined to be the first thing in the file and a file that is intended to be included multiple times (e.g., Boost.PP shenanigans) in different states cannot, in good faith, use `#once`. Hrm…though if we are doing `otherdir/samecontent`, the different preprocessor state *might* change that "what relative files did we look for?" state… Nothing is easy :( . --Ben
Re: On the subject of module consumer diagnostics.
On Fri, Sep 06, 2024 at 09:30:26 -0400, David Malcolm wrote: > On Fri, 2024-09-06 at 08:44 -0400, Ben Boeckel via Gcc wrote: > > Does this have (additional) implications for caching tools and > > modules? > > They cache diagnostic output, but if these other paths showing up or > > disappearing affects the output, the cache key should incorporate > > that > > as well. > > What kinds of caching tools are you thinking of? `ccache`, `sccache`, etc. These tools try to detect if the compilation would be the same and place the object in its output location and report the cached output on stdout/stderr as performed in the original compile so that it acts "just like the compiler"'s execution. > I'm curious about caching of diagnostics, and how the diagnostics are > represented in the cache. I know `sccache` just stores it as a text blob; `ccache` is probably the same, but I haven't been in its code myself to know. --Ben
Re: How to debug/improve excessive compiler memory usage and compile times
On Tue, Oct 01, 2024 at 18:06:35 +0200, Richard Biener via Gcc wrote: > Analyze where the compile time is spent and where memory is spent. > Identify unfitting data structures and algorithms causing the issue. > Replace with better ones. That’s what I do for these kind of issues > in the middle end. To this end, are there any plans or progress to implementing `-ftime-trace` profiling output? (It's a clang thing, but there is tooling that would be able to use GCC's reports if it made them.) https://aras-p.info/blog/2019/01/12/Investigating-compile-times-and-Clang-ftime-report/ This is also available to do build-wide perf visualization (when using `ninja`) with this tool that reads `-ftime-trace` files when they're available): https://github.com/nico/ninjatracing (FWIW, I have a dream to have CI builds be able to save out this information so that longer-term perf trends can be investigated. If this comes up with project changes to do less silly template magic or improvements to compilers to be faster, all the better.) Thanks, --Ben
Re: Help for git send-email setting
On Mon, Jan 13, 2025 at 01:27:17 +, Hao Liu via Gcc wrote: > I'm new to GCC community, and try to contribute some patches. > I am having trouble setting git send-email with Outlook on Linux. Does > anyone have any successful experiences to share? I don't believe Outlook is well-suited for sending patches (Linux or not). There may be workarounds: https://answers.microsoft.com/en-us/outlook_com/forum/all/configure-and-use-git-send-email-to-work-with/48b7e77c-6727-4541-a78a-f4f8309970a0 https://stackoverflow.com/questions/9814276/git-patch-file-attached-to-outlook-email-gets-modified-by-it There was mention that there is now `b4` configuration available in this message: https://gcc.gnu.org/pipermail/gcc/2024-September/244776.html though I do not see any updates to the wiki mentioning it. It may make things a lot easier to juggle as well. --Ben
Re: Classes Implicitly Declared
Ugh, sorry. the lack of Subject/References broke threading and I missed this continuation. On Wed, Feb 12, 2025 at 22:46:23 +, Frederick Virchanza Gotham via Gcc wrote: > I think it might be a possibility given how compiler vendors (and also > vendors of tools like CMake) aren't really getting very far with > modules. They've had a few years already. "Not very far" is probably the opinion of someone that last seriously checked in on the status in 2021 or so. Things have definitely changed since then. Yes, it has taken time. Longer than I'd like (as author of the CMake support, author of P1689 for dependency scanning, and SG15 Tooling member). But writing a spec to have toolchains to make what build systems need to get what users need to test toolchains is not a fast iteration cycle. Rinse and repeat for `import std`. It will likely be another run through this cycle for header units too. But we are finally at a point where named modules can be experimented with. The timeline has been (roughly): - 2019 Feb: pushing to get dependency scanning possible (I have patches to CMake and GCC to proof-of-concept at the Kona meeting) - 2019 Mar: CMake uses P1689R2 for its Fortran scanning/dependency computations. The implementation is updated for further revision updates as time goes on. - 2020-2022: hammering out P1689 details. As of P1689R5 it is in a place where everyone is happy. Alas, my allocations did not align with faster progress here. - 2022 Jun: experimental support for named modules lands in CMake 3.25. Community and toolchain implementers experiment with my GCC patches and report issues. - 2023 Sep: my P1689R5 patches land in GCC - 2023 Oct: CMake 3.28 releases with named imports as a non-experimental feature - 2022 Nov: VS 17.4 releases with P1689 support - 2023 Feb: clang-scan-deps learns to write P1689R5 - 2024 Apr: Support for clang/libc++ and MSVC/STL `import std;` lands in CMake `master` - 2024 Jul: Experimental support for `import std;` in CMake 3.30. - 2024 Nov: Support for GCC's `import std;` lands We're now at a point where *projects* can experiment with modules meaningfully (via CMake at least). I hope this drives finding issues both with CMake (of which I'm aware of a number) and toolchains (of which I'm sure there is some awareness of their existence, but prioritizing them may be hard without user feedback). I've also been working to help drive adoption in other build systems: Bazel: experimental community-driven support (though Google needed some convincing to allow the community to experiment with it) Meson: interest, but no progress AFAIK Tup: *crickets* xmake: done Progress in others that I have not contacted directly about it: autotools: didn't have a maintainer when I did the first round, but zackw has since picked it up…I should reach out MSBuild: done (though its developers were present for P1689 discussions) So as long as you're using CMake or xmake, go forth and test. Please file issues too. Thanks, --Ben