[PATCH RESEND 1/1] p1689r5: initial support

2022-10-04 Thread Ben Boeckel
This patch implements support for [P1689R5][] to communicate to a build
system the C++20 module dependencies to build systems so that they may
build `.gcm` files in the proper order.

Support is communicated through the following three new flags:

- `-fdeps-format=` specifies the format for the output. Currently named
  `p1689r5`.

- `-fdeps-file=` specifies the path to the file to write the format to.

- `-fdep-output=` specifies the `.o` that will be written for the TU
  that is scanned. This is required so that the build system can
  correlate the dependency output with the actual compilation that will
  occur.

CMake supports this format as of 17 Jun 2022 (to be part of 3.25.0)
using an experimental feature selection (to allow for future usage
evolution without committing to how it works today). While it remains
experimental, docs may be found in CMake's documentation for
experimental features.

Future work may include using this format for Fortran module
dependencies as well, however this is still pending work.

[P1689R5]: https://isocpp.org/files/papers/P1689R5.html
[cmake-experimental]: 
https://gitlab.kitware.com/cmake/cmake/-/blob/master/Help/dev/experimental.rst

TODO:

- header-unit information fields

Header units (including the standard library headers) are 100%
unsupported right now because the `-E` mechanism wants to import their
BMIs. A new mode (i.e., something more workable than existing `-E`
behavior) that mocks up header units as if they were imported purely
from their path and content would be required.

- non-utf8 paths

The current standard says that paths that are not unambiguously
represented using UTF-8 are not supported (because these cases are rare
and the extra complication is not worth it at this time). Future
versions of the format might have ways of encoding non-UTF-8 paths. For
now, this patch just doesn't support non-UTF-8 paths (ignoring the
"unambiguously represetable in UTF-8" case).

- figure out why junk gets placed at the end of the file

Sometimes it seems like the file gets a lot of `NUL` bytes appended to
it. It happens rarely and seems to be the result of some
`ftruncate`-style call which results in extra padding in the contents.
Noting it here as an observation at least.

Signed-off-by: Ben Boeckel 
---
 gcc/ChangeLog   |   9 ++
 gcc/c-family/ChangeLog  |   6 +
 gcc/c-family/c-opts.cc  |  40 ++-
 gcc/c-family/c.opt  |  12 ++
 gcc/cp/ChangeLog|   5 +
 gcc/cp/module.cc|   3 +-
 gcc/doc/invoke.texi |  15 +++
 gcc/fortran/ChangeLog   |   5 +
 gcc/fortran/cpp.cc  |   4 +-
 gcc/genmatch.cc |   2 +-
 gcc/input.cc|   4 +-
 libcpp/ChangeLog|  11 ++
 libcpp/include/cpplib.h |  12 +-
 libcpp/include/mkdeps.h |  17 ++-
 libcpp/init.cc  |  14 ++-
 libcpp/mkdeps.cc| 235 ++--
 16 files changed, 368 insertions(+), 26 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 6dded16c0e3..2d61de6adde 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,12 @@
+2022-09-20  Ben Boeckel  
+
+   * doc/invoke.texi: Document -fdeps-format=, -fdep-file=, and
+   -fdep-output= flags.
+   * genmatch.cc (main): Add new preprocessor parameter used for C++
+   module tracking.
+   * input.cc (test_lexer): Add new preprocessor parameter used for C++
+   module tracking.
+
 2022-09-19  Torbjörn SVENSSON  
 
* targhooks.cc (default_zero_call_used_regs): Improve sorry
diff --git a/gcc/c-family/ChangeLog b/gcc/c-family/ChangeLog
index ba3d76dd6cb..569dcd96e8c 100644
--- a/gcc/c-family/ChangeLog
+++ b/gcc/c-family/ChangeLog
@@ -1,3 +1,9 @@
+2022-09-20  Ben Boeckel  
+
+   * c-opts.cc (c_common_handle_option): Add fdeps_file variable and
+   -fdeps-format=, -fdep-file=, and -fdep-output= parsing.
+   * c.opt: Add -fdeps-format=, -fdep-file=, and -fdep-output= flags.
+
 2022-09-15  Richard Biener  
 
* c-common.h (build_void_list_node): Remove.
diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc
index babaa2fc157..617d0e93696 100644
--- a/gcc/c-family/c-opts.cc
+++ b/gcc/c-family/c-opts.cc
@@ -77,6 +77,9 @@ static bool verbose;
 /* Dependency output file.  */
 static const char *deps_file;
 
+/* Enhanced dependency output file.  */
+static const char *fdeps_file;
+
 /* The prefix given by -iprefix, if any.  */
 static const char *iprefix;
 
@@ -360,6 +363,23 @@ c_common_handle_option (size_t scode, const char *arg, 
HOST_WIDE_INT value,
   deps_file = arg;
   break;
 
+case OPT_fdep_format_:
+  if (!strcmp (arg, "p1689r5"))
+   cpp_opts->deps.format = DEPS_FMT_P1689R5;
+  else
+   error ("%<-fdep-format=%> unknown format %s", arg);
+  break;
+
+case OPT_fdep_file_:
+  deps_seen = true;
+  fdeps_file = arg;
+  break;
+
+case OPT_fdep_output_:
+  deps_seen = true;
+  defer_opt (code, arg);
+  b

[PATCH RESEND 0/1] RFC: P1689R5 support

2022-10-04 Thread Ben Boeckel
This patch adds initial support for ISO C++'s [P1689R5][], a format for
describing C++ module requirements and provisions based on the source
code. This is required because compiling C++ with modules is not
embarrassingly parallel and need to be ordered to ensure that `import
some_module;` can be satisfied in time by making sure that the TU with
`export import some_module;` is compiled first.

[P1689R5]: https://isocpp.org/files/papers/P1689R5.html

I'd like feedback on the approach taken here with respect to the
user-visible flags. I'll also note that header units are not supported
at this time because the current `-E` behavior with respect to `import
;` is to search for an appropriate `.gcm` file which is not
something such a "scan" can support. A new mode will likely need to be
created (e.g., replacing `-E` with `-fc++-module-scanning` or something)
where headers are looked up "normally" and processed only as much as
scanning requires.

Testing is currently happening in CMake's CI using a prior revision of
this patch (the differences are basically the changelog, some style, and
`trtbd` instead of `p1689r5` as the format name).

For testing within GCC, I'll work on the following:

- scanning non-module source
- scanning module-importing source (`import X;`)
- scanning module-exporting source (`export module X;`)
- scanning module implementation unit (`module X;`)
- flag combinations?

Are there existing tools for handling JSON output for testing purposes?
Basically, something that I can add to the test suite that doesn't care
about whitespace, but checks the structure (with sensible replacements
for absolute paths where relevant)?

For the record, Clang has patches with similar flags and behavior by
Chuanqi Xu here:

https://reviews.llvm.org/D134269

with the same flags (though using my old `trtbd` spelling for the
format name).

Thanks,

--Ben

Ben Boeckel (1):
  p1689r5: initial support

 gcc/ChangeLog   |   9 ++
 gcc/c-family/ChangeLog  |   6 +
 gcc/c-family/c-opts.cc  |  40 ++-
 gcc/c-family/c.opt  |  12 ++
 gcc/cp/ChangeLog|   5 +
 gcc/cp/module.cc|   3 +-
 gcc/doc/invoke.texi |  15 +++
 gcc/fortran/ChangeLog   |   5 +
 gcc/fortran/cpp.cc  |   4 +-
 gcc/genmatch.cc |   2 +-
 gcc/input.cc|   4 +-
 libcpp/ChangeLog|  11 ++
 libcpp/include/cpplib.h |  12 +-
 libcpp/include/mkdeps.h |  17 ++-
 libcpp/init.cc  |  14 ++-
 libcpp/mkdeps.cc| 235 ++--
 16 files changed, 368 insertions(+), 26 deletions(-)


base-commit: d812e8cb2a920fd75768e16ca8ded59ad93c172f
-- 
2.37.3



[PATCH 1/1] libcpp: allow UCS_LIMIT codepoints in UTF-8 strings

2023-06-21 Thread Ben Boeckel
libcpp/

* charset.cc: Allow `UCS_LIMIT` in UTF-8 strings.

Reported-by: Damien Guibouret 
Fixes: c1dbaa6656a (libcpp: reject codepoints above 0x10, 2023-06-06)
Signed-off-by: Ben Boeckel 
---
 libcpp/charset.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libcpp/charset.cc b/libcpp/charset.cc
index d4f573e365f..54ebab2b8a4 100644
--- a/libcpp/charset.cc
+++ b/libcpp/charset.cc
@@ -1891,7 +1891,7 @@ cpp_valid_utf8_p (const char *buffer, size_t num_bytes)
 invalid because they cannot be represented in UTF-16.
 
 Reject such values.*/
-  if (cp >= UCS_LIMIT)
+  if (cp > UCS_LIMIT)
return false;
 }
   /* No problems encountered.  */
-- 
2.40.1



[modules] Preprocessing requires compiled header unit modules

2022-02-24 Thread Ben Boeckel via Gcc
Hi,

I'm looking to update my patch which implements P1689[1] for GCC. I
originally wrote this back in February 2019 at Kona for a proof of
concept[2], but never really got the time to delve back in until now.
I'd like to get header units implemented prior to submitting it as it
will be an important part in the long run (as I don't think having a
partial implementation in the wild would be beneficial).

Now that modules have been merged, my patch works just fine for named
modules. However, for header unit modules, it runs into a problem that
imported header units are required to be compiled and available in the
mapper while scanning for dependencies.

Example code:

```c++ # use-header.cpp
module;

import "header-unit.hpp";

int main(int argc, char* argv[]) {
return good;
}
```

```c++ # header-unit.hpp
constexpr int good = 0;
```

When trying to preprocess this:

```console
% g++ -std=c++20 -fmodules-ts -E use-header.cpp -o use-header.cpp.i
In module imported at use-header.cpp:3:1:
./header-unit.hpp: error: failed to read compiled module: No such file or 
directory
./header-unit.hpp: note: compiled module file is 
‘gcm.cache/,/header-unit.hpp.gcm’
./header-unit.hpp: note: imports must be built before being imported
./header-unit.hpp: fatal error: returning to the gate for a mechanical issue
compilation terminated
```

There used to be no need to do this back prior to the modules landing in
`master`, but I can see this being an oversight in the meantime. I am
not familiar enough with the code to know how to untangle its current
state (attempting to bisect the change throws me back in to the depths
of the history without modules so "does it work" is unanswerable there).

I tried this on commits:

4efde6781bba8d64b9dcff07e7efe71d35aa6f6a c++: Modules Is Landing
f4ed267fa5b82d6dafbc8afc82baf45bfcae549c master as of the 23rd

Thanks,

--Ben

[1] https://wg21.link/p1689r4
[2] https://github.com/mathstuf/cxx-modules-sandbox/ (see the `docker`
branch for the instructions on putting things together)


Re: [modules] Preprocessing requires compiled header unit modules

2022-04-21 Thread Ben Boeckel via Gcc
On Thu, Apr 21, 2022 at 06:05:52 +0200, Boris Kolpackov wrote:
> I don't think it is. A header unit (unlike a named module) may export
> macros which could affect further dependencies. Consider:
> 
> import "header-unit.hpp"; // May or may not export macro FOO.
> 
> #ifdef FOO
> import "header-unit2.hpp"
> #endif

I agree that the header needs to be *found*, but scanning cannot require
a pre-existing BMI for that header. A new mode likely needs to be laid
down to get the information necessary (instead of just piggy-backing on
`-E` behavior to get what I want).

--Ben


Re: [modules] Preprocessing requires compiled header unit modules

2022-04-21 Thread Ben Boeckel via Gcc
On Thu, Apr 21, 2022 at 18:59:56 +0100, Iain Sandoe wrote:
> Hi Ben,
> 
> > On 21 Apr 2022, at 13:05, Ben Boeckel via Gcc  wrote:
> > 
> > On Thu, Apr 21, 2022 at 06:05:52 +0200, Boris Kolpackov wrote:
> >> I don't think it is. A header unit (unlike a named module) may export
> >> macros which could affect further dependencies. Consider:
> >> 
> >> import "header-unit.hpp"; // May or may not export macro FOO.
> 
> 1.  If you know how this was built, then you could do an -E -fdirectives-only 
> build (both
>GCC and clang support this now) to obtain the macros.

My understanding is that how it gets used determines how it should be
made for Clang (because the consumer's `-D`, `-W`, etc. flags matter). I
do not yet know how I am to support this in CMake.

> 2.  I suppose we could invent a tool (or FE mode) to dump the macros exported 
> by a HU ***

Fun considerations:

  - are `-D` flags exported? `-U`?
  - how about this if `value` is the same as or different from the
at-start expansion:

```c++
#undef SOME_MACRO
#define SOME_MACRO value
```

  - how about `#undef FOO`?

> >> #ifdef FOO
> >> import "header-unit2.hpp"
> >> #endif
> > 
> > I agree that the header needs to be *found*, but scanning cannot require
> > a pre-existing BMI for that header. A new mode likely needs to be laid
> > down to get the information necessary (instead of just piggy-backing on
> > `-E` behavior to get what I want).
> 
> perhaps that means (2)?

Can't it just read the header as if it wasn't imported? AFAIU, that's
what GCC did in Jan 2019. I understand that CPP state is probably not
easy, but something to consider.

> *** it’s kinda frustrating that this is hard infomation to get as a 
> developer, so
> perhaps we can anticipate users wanting such output.

I think cacheing and distributed build tools are the most likely
consumers of such information.

--Ben


Re: [modules] Preprocessing requires compiled header unit modules

2022-04-22 Thread Ben Boeckel via Gcc
On Fri, Apr 22, 2022 at 16:08:00 +0200, Boris Kolpackov wrote:
> Ben Boeckel  writes:
> > I agree that the header needs to be *found*, but scanning cannot require
> > a pre-existing BMI for that header.
> 
> Well, if scanning cannot require a pre-existing BMI but a pre-existing
> BMI is required to get accurate dependency information, then something
> has to give.

If we need to know and have dependencies prepared before we can figure
out the dependencies for a TU, modules are unsolvable (without an active
build executor). If C++ implementations are really going to require
that, then not being able to perform `cl.exe *.cpp` anymore should be
the least of people's worries on the recent megathread because the
following tools are all unsuitable for C++ with header units without
major overhauls (alphabetical):

  - autoconf/automake
  - cmake
  - gn
  - gyp
  - make (not GNU make, though even that requires some active
involvement via the socket communications)
  - meson
  - ninja

I'm sure there are many more in use out there.

> In this example, if you treat import of header-unit.hpp as
> include, you will get incorrect dependency information.
> 
> So to make this work correctly we will need to re-create the
> macro isolation semantics of import for include.

I'm aware. This is what I alluded to with "CPP state is probably not
easy".

> Even if we manage to do this, there are some implications I
> am not sure we will like: the isolated macros will contain
> inclusion guards, which means we will keep re-scanning the
> same files potentially many many time. Here is an example,
> assume each header-unitN.hpp includes or imports :

Note that scanning each module TU only happens once. Header units might
just get *read* in the course of scanning other units.

And headers are read multiple times already over the lifetime of the
build, so we're not making things worse here.

I wonder if we had had a modularized stdlib which landed with modules if
header units would have been a thing in the first place…

--Ben


Re: [modules] Preprocessing requires compiled header unit modules

2022-04-25 Thread Ben Boeckel via Gcc
On Mon, Apr 25, 2022 at 11:42:14 +0200, Boris Kolpackov wrote:
> 1. Firstly, this only applies to header units, not named modules.

Correct.

> 2. I am not sure what you mean by "active build executor" (but it
>does sound ominous, I will grant you that ;-)).

One that does more than schedule builds, but is "actively" participating
in the build itself. Ninja's `dyndep` is about as close one can be to
the edge IMO. `make` is potentially active because of its behavior
around `include` of output files, but this is a niche feature.

> 3. I agree some build systems may require "major overhauls" to
>support header units via the module mapper. I would like this
>not to be the case, but so far nobody has implemented an
>alternative (that I am aware of) that is correct and scalable
>and I personally have doubts such a thing is achievable.

One the compilers can scan headers, CMake should be able to do it. MSVC
is the closest right now (though the name it wants for any given header
usage site is currently an issue).

> Ben Boeckel  writes:
> > Note that scanning each module TU only happens once. Header units might
> > just get *read* in the course of scanning other units.
> > 
> > And headers are read multiple times already over the lifetime of the
> > build, so we're not making things worse here.
> 
> I am not sure I follow. Say we have 10 TUs each include or import
> 10 headers each of which includes . If we use include,
> then when scanning each of these 10 TUs we have to scan 
> once (since all the subsequent includes are suppressed by include
> guards). So total of 10x1=10 scans of  for the entire
> build.
> 
> Now if instead of include we use import (which, during the scan, is
> treated as include with macro isolation), we are looking at 10 scans
> of  for each TU (because the include guards are ignored).
> So total of 10x10=100 scans of  for the build.
> 
> What am I missing?

Ew, indeed. I suppose a sufficiently smart compiler could just encode
headers as a "diff" to the preprocessor and symbol state and apply it
upon revisiting, but that seems like a high bar (re-reading if it
detects a difference in preprocessor state that "matters").

--Ben


Re: [RFC] Support for nonzero attribute

2022-06-06 Thread Ben Boeckel via Gcc
On Sun, Jun 05, 2022 at 20:09:04 +, Miika via Gcc wrote:
> Based on Jakub's and Yair's comments I created a new attribute "inrange".
> Inrage takes three arguments, pos min and max.
> Pos being the argument position in the function, and min and max defines the
> range of valid integer. Both min and max are inclusive and work with enums.
> Warnings are enabled with the new flag: "-Winrange".

Is this something that could be applied to variables or types (I've not
much experience with GCC attribute internal mechanisms, so maybe not)?

For example:

```
// This variable must be in range.
int percentage __attribute__((inrange(0, 100)));
// Any variable declared through this typedef must be in range.
typedef int __attribute__((inrange(0, 100))) percentage_t;
```

--Ben


Re: Proposal: allow to extend C++ template argument deduction via plugins

2022-07-15 Thread Ben Boeckel via Gcc
On Thu, Jul 14, 2022 at 18:46:47 +0200, Dan Klishch via Gcc wrote:
> As far as I understand the currently available plugin extension points, it is
> not possible to modify template argument deduction algorithm (except the
> theoretical possibility to completely override parsing step). However, such
> opportunity might be beneficial for projects like libpqxx, for example, when
> database schema and query text are available at compile-time, return types of
> the query might be inferred by the plugin.
> 
> I propose to add something like PLUGIN_FUNCTION_CALL plugin_event which will
> allow to modify function calls conditionally. Will a patch adding such
> functionality be welcomed?

Note that I'm not a GCC developer, so my opinion isn't worth much on the
acceptability-to-GCC front.

Wouldn't this make it a C++ dialect? How would non-GCC compilers be able
to be compatible with such an API? That is, if the schema and query
being available changes the API (nevermind the schema itself changing),
I don't see how this works in general.

--Ben


Re: Importable STL Header Units?

2022-07-25 Thread Ben Boeckel via Gcc
On Mon, Jul 25, 2022 at 19:48:28 +1000, Zopolis0 via Gcc wrote:
> Currently, when importing the standard library, one has to
> separately compile each unit they want to use, which is a hindrance to the
> build process and a gap in the implementation.
> 
> Is there any particular reason why gcc doesn't provide importable header
> units for the standard library? Is there a timeline for this? Given that
> the standard library can already be compiled into header units, is there
> any particular reason why those aren't shipped with gcc?

Some questions that come to mind if GCC were to provide this:

- What flags should be used for these pre-compiled units?
- How do you expect build systems to be able to find them? Will there be
  some kind of naming convention? It'd certainly be nice if other
  implementations would cooperate…

(FD: CMake developer working on module support myself) It is my belief
that any module-supporting build system has to consider the possibility
of compiling any given external module because there's no guarantee that
the flag set is compatible (say, `-ffast-math` or other "fun" flags with
ABI implications) and you'll need your own compilation of the module
anyways. Having pre-compiled units is nice, but merely an optimization
for any robust build system; you certainly can't just use any available
pre-compiled module blindly (unless you're relying on the linker to
detect mismatches or something).

The C++ committee's SG15 (Tooling) is looking into the problem. These
papers are likely of interest to you (still under discussion):

https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p2577r2.pdf
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p2581r0.pdf

More papers will certainly appear:

https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/

--Ben


Re: Wanted: original ConceptGCC downloads / branch, concepts-lite branch

2022-08-17 Thread Ben Boeckel via Gcc
On Wed, Aug 17, 2022 at 12:42:42 +0100, Aaron Gray via Gcc wrote:
> I am looking for the original ConceptGCC source code, the
> https://www.generic-programming.org/software/ConceptGCC/download.html has
> all broken links and the SVN is gone.
> 
> Is this available on GCC git or SVN ?

There is this repo that may be of interest:

https://github.com/asutton/gcc

No idea of the state of it though or how useful it is for C++20
Concepts.

--Ben


[PATCH 1/1] p1689r5: initial support

2022-09-27 Thread Ben Boeckel via Gcc
This patch implements support for [P1689R5][] to communicate to a build
system the C++20 module dependencies to build systems so that they may
build `.gcm` files in the proper order.

Support is communicated through the following three new flags:

- `-fdeps-format=` specifies the format for the output. Currently named
  `p1689r5`.

- `-fdeps-file=` specifies the path to the file to write the format to.

- `-fdep-output=` specifies the `.o` that will be written for the TU
  that is scanned. This is required so that the build system can
  correlate the dependency output with the actual compilation that will
  occur.

CMake supports this format as of 17 Jun 2022 (to be part of 3.25.0)
using an experimental feature selection (to allow for future usage
evolution without committing to how it works today). While it remains
experimental, docs may be found in CMake's documentation for
experimental features.

Future work may include using this format for Fortran module
dependencies as well, however this is still pending work.

[P1689R5]: https://isocpp.org/files/papers/P1689R5.html
[cmake-experimental]: 
https://gitlab.kitware.com/cmake/cmake/-/blob/master/Help/dev/experimental.rst

TODO:

- header-unit information fields

Header units (including the standard library headers) are 100%
unsupported right now because the `-E` mechanism wants to import their
BMIs. A new mode (i.e., something more workable than existing `-E`
behavior) that mocks up header units as if they were imported purely
from their path and content would be required.

- non-utf8 paths

The current standard says that paths that are not unambiguously
represented using UTF-8 are not supported (because these cases are rare
and the extra complication is not worth it at this time). Future
versions of the format might have ways of encoding non-UTF-8 paths. For
now, this patch just doesn't support non-UTF-8 paths (ignoring the
"unambiguously represetable in UTF-8" case).

- figure out why junk gets placed at the end of the file

Sometimes it seems like the file gets a lot of `NUL` bytes appended to
it. It happens rarely and seems to be the result of some
`ftruncate`-style call which results in extra padding in the contents.
Noting it here as an observation at least.

Signed-off-by: Ben Boeckel 
---
 gcc/ChangeLog   |   9 ++
 gcc/c-family/ChangeLog  |   6 +
 gcc/c-family/c-opts.cc  |  40 ++-
 gcc/c-family/c.opt  |  12 ++
 gcc/cp/ChangeLog|   5 +
 gcc/cp/module.cc|   3 +-
 gcc/doc/invoke.texi |  15 +++
 gcc/fortran/ChangeLog   |   5 +
 gcc/fortran/cpp.cc  |   4 +-
 gcc/genmatch.cc |   2 +-
 gcc/input.cc|   4 +-
 libcpp/ChangeLog|  11 ++
 libcpp/include/cpplib.h |  12 +-
 libcpp/include/mkdeps.h |  17 ++-
 libcpp/init.cc  |  14 ++-
 libcpp/mkdeps.cc| 235 ++--
 16 files changed, 368 insertions(+), 26 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 6dded16c0e3..2d61de6adde 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,12 @@
+2022-09-20  Ben Boeckel  
+
+   * doc/invoke.texi: Document -fdeps-format=, -fdep-file=, and
+   -fdep-output= flags.
+   * genmatch.cc (main): Add new preprocessor parameter used for C++
+   module tracking.
+   * input.cc (test_lexer): Add new preprocessor parameter used for C++
+   module tracking.
+
 2022-09-19  Torbjörn SVENSSON  
 
* targhooks.cc (default_zero_call_used_regs): Improve sorry
diff --git a/gcc/c-family/ChangeLog b/gcc/c-family/ChangeLog
index ba3d76dd6cb..569dcd96e8c 100644
--- a/gcc/c-family/ChangeLog
+++ b/gcc/c-family/ChangeLog
@@ -1,3 +1,9 @@
+2022-09-20  Ben Boeckel  
+
+   * c-opts.cc (c_common_handle_option): Add fdeps_file variable and
+   -fdeps-format=, -fdep-file=, and -fdep-output= parsing.
+   * c.opt: Add -fdeps-format=, -fdep-file=, and -fdep-output= flags.
+
 2022-09-15  Richard Biener  
 
* c-common.h (build_void_list_node): Remove.
diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc
index babaa2fc157..617d0e93696 100644
--- a/gcc/c-family/c-opts.cc
+++ b/gcc/c-family/c-opts.cc
@@ -77,6 +77,9 @@ static bool verbose;
 /* Dependency output file.  */
 static const char *deps_file;
 
+/* Enhanced dependency output file.  */
+static const char *fdeps_file;
+
 /* The prefix given by -iprefix, if any.  */
 static const char *iprefix;
 
@@ -360,6 +363,23 @@ c_common_handle_option (size_t scode, const char *arg, 
HOST_WIDE_INT value,
   deps_file = arg;
   break;
 
+case OPT_fdep_format_:
+  if (!strcmp (arg, "p1689r5"))
+   cpp_opts->deps.format = DEPS_FMT_P1689R5;
+  else
+   error ("%<-fdep-format=%> unknown format %s", arg);
+  break;
+
+case OPT_fdep_file_:
+  deps_seen = true;
+  fdeps_file = arg;
+  break;
+
+case OPT_fdep_output_:
+  deps_seen = true;
+  defer_opt (code, arg);
+  b

[PATCH 0/1] RFC: P1689R5 support

2022-09-27 Thread Ben Boeckel via Gcc
This patch adds initial support for ISO C++'s [P1689R5][], a format for
describing C++ module requirements and provisions based on the source
code. This is required because compiling C++ with modules is not
embarrassingly parallel and need to be ordered to ensure that `import
some_module;` can be satisfied in time by making sure that the TU with
`export import some_module;` is compiled first.

[P1689R5]: https://isocpp.org/files/papers/P1689R5.html

I'd like feedback on the approach taken here with respect to the
user-visible flags. I'll also note that header units are not supported
at this time because the current `-E` behavior with respect to `import
;` is to search for an appropriate `.gcm` file which is not
something such a "scan" can support. A new mode will likely need to be
created (e.g., replacing `-E` with `-fc++-module-scanning` or something)
where headers are looked up "normally" and processed only as much as
scanning requires.

Testing is currently happening in CMake's CI using a prior revision of
this patch (the differences are basically the changelog, some style, and
`trtbd` instead of `p1689r5` as the format name).

For testing within GCC, I'll work on the following:

- scanning non-module source
- scanning module-importing source (`import X;`)
- scanning module-exporting source (`export module X;`)
- scanning module implementation unit (`module X;`)
- flag combinations?

Are there existing tools for handling JSON output for testing purposes?
Basically, something that I can add to the test suite that doesn't care
about whitespace, but checks the structure (with sensible replacements
for absolute paths where relevant)?

For the record, Clang has patches with similar flags and behavior by
Chuanqi Xu here:

https://reviews.llvm.org/D134269

with the same flags (though using my old `trtbd` spelling for the
format name).

Thanks,

--Ben

Ben Boeckel (1):
  p1689r5: initial support

 gcc/ChangeLog   |   9 ++
 gcc/c-family/ChangeLog  |   6 +
 gcc/c-family/c-opts.cc  |  40 ++-
 gcc/c-family/c.opt  |  12 ++
 gcc/cp/ChangeLog|   5 +
 gcc/cp/module.cc|   3 +-
 gcc/doc/invoke.texi |  15 +++
 gcc/fortran/ChangeLog   |   5 +
 gcc/fortran/cpp.cc  |   4 +-
 gcc/genmatch.cc |   2 +-
 gcc/input.cc|   4 +-
 libcpp/ChangeLog|  11 ++
 libcpp/include/cpplib.h |  12 +-
 libcpp/include/mkdeps.h |  17 ++-
 libcpp/init.cc  |  14 ++-
 libcpp/mkdeps.cc| 235 ++--
 16 files changed, 368 insertions(+), 26 deletions(-)


base-commit: d812e8cb2a920fd75768e16ca8ded59ad93c172f
-- 
2.37.3



Re: [PATCH RESEND 1/1] p1689r5: initial support

2022-10-11 Thread Ben Boeckel via Gcc
On Tue, Oct 04, 2022 at 21:12:03 +0200, Harald Anlauf wrote:
> Am 04.10.22 um 17:12 schrieb Ben Boeckel:
> > This patch implements support for [P1689R5][] to communicate to a build
> > system the C++20 module dependencies to build systems so that they may
> > build `.gcm` files in the proper order.
> 
> Is there a reason that you are touching so many frontends?

Just those that needed the update for `cpp_finish`. It does align with
those that will (eventually) need this support anyways (AFAIK).

> > diff --git a/gcc/fortran/cpp.cc b/gcc/fortran/cpp.cc
> > index 364bd0d2a85..0b9df9c02cd 100644
> > --- a/gcc/fortran/cpp.cc
> > +++ b/gcc/fortran/cpp.cc
> > @@ -712,7 +712,7 @@ gfc_cpp_done (void)
> >   FILE *f = fopen (gfc_cpp_option.deps_filename, "w");
> >   if (f)
> > {
> > - cpp_finish (cpp_in, f);
> > + cpp_finish (cpp_in, f, NULL);
> >   fclose (f);
> > }
> >   else
> > @@ -721,7 +721,7 @@ gfc_cpp_done (void)
> >  xstrerror (errno));
> > }
> > else
> > -   cpp_finish (cpp_in, stdout);
> > +   cpp_finish (cpp_in, stdout, NULL);
> >   }
> >
> > cpp_undef_all (cpp_in);
> 
> Couldn't you simply default the third argument of cpp_finish() to NULL?

I could do that. Wasn't sure how much that would be acceptable given
that it is a "silent" change, but it would reduce the number of files
touched here.

Thanks,

--Ben


Re: [PATCH RESEND 1/1] p1689r5: initial support

2022-10-11 Thread Ben Boeckel via Gcc
On Mon, Oct 10, 2022 at 17:04:09 -0400, Jason Merrill wrote:
> On 10/4/22 11:12, Ben Boeckel wrote:
> > This patch implements support for [P1689R5][] to communicate to a build
> > system the C++20 module dependencies to build systems so that they may
> > build `.gcm` files in the proper order.
> 
> Thanks!
> 
> > Support is communicated through the following three new flags:
> > 
> > - `-fdeps-format=` specifies the format for the output. Currently named
> >`p1689r5`.
> > 
> > - `-fdeps-file=` specifies the path to the file to write the format to.
> 
> Do you expect users to want to emit Makefile (-MM) and P1689 
> dependencies from the same compilation?

Yes, the build system wants to know what files affect scanning as well
(e.g., `#include ` is still possible and if it changes, this
operation should be performed again. The `-M` flags do this quite nicely
already :) .

> > - `-fdep-output=` specifies the `.o` that will be written for the TU
> >that is scanned. This is required so that the build system can
> >correlate the dependency output with the actual compilation that will
> >occur.
> 
> The dependency machinery already needs to be able to figure out the name 
> of the output file, can't we reuse that instead of specifying it on the 
> command line?

This is how it determines the output of the compilation. Because it is
piggy-backing on the `-E` flag, the `-o` flag specifies the output of
the preprocessed source (purely a side-effect right now).

> > diff --git a/libcpp/include/cpplib.h b/libcpp/include/cpplib.h
> > index 2db1e9cbdfb..90787230a9e 100644
> > --- a/libcpp/include/cpplib.h
> > +++ b/libcpp/include/cpplib.h
> > @@ -298,6 +298,9 @@ typedef CPPCHAR_SIGNED_T cppchar_signed_t;
> >   /* Style of header dependencies to generate.  */
> >   enum cpp_deps_style { DEPS_NONE = 0, DEPS_USER, DEPS_SYSTEM };
> >   
> > +/* Format of header dependencies to generate.  */
> > +enum cpp_deps_format { DEPS_FMT_NONE = 0, DEPS_FMT_P1689R5 };
> 
> Why not add this to cpp_deps_style?

It is orthogonal. The `-M` flags and `-fdeps-*` flags are similar, but
`-fdeps-*` dependencies are fundamentally different than `-M`
dependencies. The comment does need updated though.

`-M` is about discovered dependencies: those that you find out while
doing work. `-fdep-*` is about ordering dependencies: extracting
information from file content in order to even order future work around.
It stems from the loss of embarassing parallelism when building C++20
code that uses `import`. It's isomorphic to the Fortran module
compilation ordering problem.

> > @@ -387,7 +410,7 @@ make_write_vec (const mkdeps::vec &vec, 
> > FILE *fp,
> >  .PHONY targets for all the dependencies too.  */
> >   
> >   static void
> > -make_write (const cpp_reader *pfile, FILE *fp, unsigned int colmax)
> > +make_write (const cpp_reader *pfile, FILE *fp, unsigned int colmax, int 
> > extra)
> 
> Instead of adding the "extra" parameter...

Hmm. Not sure why I had named this so poorly. Maybe this comes from my
initial stab at this functionality in 2019 (the format has been hammered
out in ISO C++'s SG15 since then).

> >   {
> > const mkdeps *d = pfile->deps;
> >   
> > @@ -398,7 +421,7 @@ make_write (const cpp_reader *pfile, FILE *fp, unsigned 
> > int colmax)
> > if (d->deps.size ())
> >   {
> > column = make_write_vec (d->targets, fp, 0, colmax, d->quote_lwm);
> > -  if (CPP_OPTION (pfile, deps.modules) && d->cmi_name)
> > +  if (extra && CPP_OPTION (pfile, deps.modules) && d->cmi_name)
> > column = make_write_name (d->cmi_name, fp, column, colmax);
> > fputs (":", fp);
> > column++;
> > @@ -412,7 +435,7 @@ make_write (const cpp_reader *pfile, FILE *fp, unsigned 
> > int colmax)
> > if (!CPP_OPTION (pfile, deps.modules))
> >   return;
> 
> ...how about checking CPP_OPTION for p1689r5 mode here?

I'll take a look at this.

> >   
> > -  if (d->modules.size ())
> > +  if (extra && d->modules.size ())
> >   {
> > column = make_write_vec (d->targets, fp, 0, colmax, d->quote_lwm);
> > if (d->cmi_name)
> > @@ -423,7 +446,7 @@ make_write (const cpp_reader *pfile, FILE *fp, unsigned 
> > int colmax)
> > fputs ("\n", fp);
> >   }
> >   
> > -  if (d->module_name)
> > +  if (extra && d->module_name)
> >   {
> > if (d->cmi_name)
> > {
> > @@ -455,7 +478,7 @@ make_write (c

Re: [PATCH RESEND 1/1] p1689r5: initial support

2022-10-18 Thread Ben Boeckel via Gcc
On Tue, Oct 11, 2022 at 07:42:43 -0400, Ben Boeckel wrote:
> On Mon, Oct 10, 2022 at 17:04:09 -0400, Jason Merrill wrote:
> > Can we share utf8 parsing code with decode_utf8_char in pretty-print.cc?
> 
> I can look at factoring that out. I'll have to decode its logic to see
> how much overlap there is.

There is some mismatch. First, that is in `gcc` and this is in `libcpp`.
Second, `pretty-print.cc`'s implementation:

- fails on an empty string;
- accepts extended-length (5+-byte) encodings which are invalid Unicode;
  and
- decodes codepoint-by-codepoint instead of just validating the entire
  string.

--Ben


Re: [PATCH RESEND 0/1] RFC: P1689R5 support

2022-10-18 Thread Ben Boeckel via Gcc
On Thu, Oct 13, 2022 at 13:08:46 -0400, David Malcolm wrote:
> On Mon, 2022-10-10 at 16:21 -0400, Jason Merrill wrote:
> > David Malcolm would probably know best about JSON wrangling.
> 
> Unfortunately our JSON output doesn't make any guarantees about the
> ordering of keys within an object, so the precise textual output
> changes from run to run.  I've coped with that in my test cases by
> limiting myself to simple regexes of fragments of the JSON output.
> 
> Martin Liska [CCed] went much further in
> 4e275dccfc2467b3fe39012a3dd2a80bac257dd0 by adding a run-gcov-pytest
> DejaGnu directive, allowing for test cases for gcov to be written in
> Python, which can thus test much more interesting assertions about the
> generated JSON.

Ok, if Python is acceptable, I'll use its stdlib to do "fancy" things.
Part of this is because I want to assert that unnecessary fields don't
exist and that sounds…unlikely to be possible in any maintainable way
(assuming it is possible) with regexen. `jq` could help immensely, but
that is probably a bridge too far :) .

Thanks,

--Ben


Re: [PATCH RESEND 1/1] p1689r5: initial support

2022-10-20 Thread Ben Boeckel via Gcc
On Thu, Oct 20, 2022 at 11:39:25 -0400, Jason Merrill wrote:
> Oops, I was thinking this was in gcc as well.  In libcpp there's 
> _cpp_valid_utf8 (which calls one_utf8_to_cppchar).

This routine has a lot more logic (including UCN decoding) and the
`one_utf8_to_cppchar` also supports out-of-bounds codepoints above
`0x10`.

--Ben


[PATCH v2 0/1] RFC: P1689R5 support

2022-10-27 Thread Ben Boeckel via Gcc
Hi,

This patch adds initial support for ISO C++'s [P1689R5][], a format for
describing C++ module requirements and provisions based on the source
code. This is required because compiling C++ with modules is not
embarrassingly parallel and need to be ordered to ensure that `import
some_module;` can be satisfied in time by making sure that the TU with
`export import some_module;` is compiled first.

[P1689R5]: https://isocpp.org/files/papers/P1689R5.html

I'd like feedback on the approach taken here with respect to the
user-visible flags. I'll also note that header units are not supported
at this time because the current `-E` behavior with respect to `import
;` is to search for an appropriate `.gcm` file which is not
something such a "scan" can support. A new mode will likely need to be
created (e.g., replacing `-E` with `-fc++-module-scanning` or something)
where headers are looked up "normally" and processed only as much as
scanning requires.

For the record, Clang has patches with similar flags and behavior by
Chuanqi Xu here:

https://reviews.llvm.org/D134269

with the same flags.

Thanks,

--Ben

---
v1 -> v2:

- removal of the `deps_write(extra)` parameter to option-checking where
  ndeeded
- default parameter of `cpp_finish(fdeps_stream = NULL)`
- unification of libcpp UTF-8 validity functions from v1
- test cases for flag parsing states (depflags-*) and p1689 output
  (p1689-*)

Ben Boeckel (3):
  libcpp: reject codepoints above 0x10
  libcpp: add a function to determine UTF-8 validity of a C string
  p1689r5: initial support

 gcc/ChangeLog |   5 +
 gcc/c-family/ChangeLog|   6 +
 gcc/c-family/c-opts.cc|  40 +++-
 gcc/c-family/c.opt|  12 +
 gcc/cp/ChangeLog  |   5 +
 gcc/cp/module.cc  |   3 +-
 gcc/doc/invoke.texi   |  15 ++
 gcc/testsuite/ChangeLog   |   7 +
 gcc/testsuite/g++.dg/modules/depflags-f-MD.C  |   2 +
 gcc/testsuite/g++.dg/modules/depflags-f.C |   1 +
 gcc/testsuite/g++.dg/modules/depflags-fi.C|   3 +
 gcc/testsuite/g++.dg/modules/depflags-fj-MD.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-fj.C|   4 +
 .../g++.dg/modules/depflags-fjo-MD.C  |   4 +
 gcc/testsuite/g++.dg/modules/depflags-fjo.C   |   5 +
 gcc/testsuite/g++.dg/modules/depflags-fo-MD.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-fo.C|   4 +
 gcc/testsuite/g++.dg/modules/depflags-j-MD.C  |   2 +
 gcc/testsuite/g++.dg/modules/depflags-j.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-jo-MD.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-jo.C|   4 +
 gcc/testsuite/g++.dg/modules/depflags-o-MD.C  |   2 +
 gcc/testsuite/g++.dg/modules/depflags-o.C |   3 +
 gcc/testsuite/g++.dg/modules/modules.exp  |  11 +
 gcc/testsuite/g++.dg/modules/p1689-1.C|  18 ++
 gcc/testsuite/g++.dg/modules/p1689-1.exp.json |  27 +++
 gcc/testsuite/g++.dg/modules/p1689-2.C|  16 ++
 gcc/testsuite/g++.dg/modules/p1689-2.exp.json |  16 ++
 gcc/testsuite/g++.dg/modules/p1689-3.C|  14 ++
 gcc/testsuite/g++.dg/modules/p1689-3.exp.json |  16 ++
 gcc/testsuite/g++.dg/modules/p1689-4.C|  14 ++
 gcc/testsuite/g++.dg/modules/p1689-4.exp.json |  14 ++
 gcc/testsuite/g++.dg/modules/p1689-5.C|  14 ++
 gcc/testsuite/g++.dg/modules/p1689-5.exp.json |  14 ++
 gcc/testsuite/g++.dg/modules/test-p1689.py| 222 ++
 gcc/testsuite/lib/modules.exp |  71 ++
 libcpp/ChangeLog  |  23 ++
 libcpp/charset.cc |  22 +-
 libcpp/include/cpplib.h   |  12 +-
 libcpp/include/mkdeps.h   |  17 +-
 libcpp/init.cc|  13 +-
 libcpp/internal.h |   2 +
 libcpp/mkdeps.cc  | 149 +++-
 43 files changed, 823 insertions(+), 21 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-f-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-f.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fi.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fj-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fj.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fjo-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fjo.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fo-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fo.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-j-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-j.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-jo-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-jo.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-o-MD.C
 create mode 100644

[PATCH v2 2/3] libcpp: add a function to determine UTF-8 validity of a C string

2022-10-27 Thread Ben Boeckel via Gcc
This simplifies the interface for other UTF-8 validity detections when a
simple "yes" or "no" answer is sufficient.

Signed-off-by: Ben Boeckel 
---
 libcpp/ChangeLog  |  6 ++
 libcpp/charset.cc | 18 ++
 libcpp/internal.h |  2 ++
 3 files changed, 26 insertions(+)

diff --git a/libcpp/ChangeLog b/libcpp/ChangeLog
index 4d707277531..4e2c7900ae2 100644
--- a/libcpp/ChangeLog
+++ b/libcpp/ChangeLog
@@ -1,3 +1,9 @@
+2022-10-27  Ben Boeckel  
+
+   * include/charset.cc: Add `_cpp_valid_utf8_str` which determines
+   whether a C string is valid UTF-8 or not.
+   * include/internal.h: Add prototype for `_cpp_valid_utf8_str`.
+
 2022-10-27  Ben Boeckel  
 
* include/charset.cc: Reject encodings of codepoints above 0x10.
diff --git a/libcpp/charset.cc b/libcpp/charset.cc
index e9da6674b5f..0524ab6beba 100644
--- a/libcpp/charset.cc
+++ b/libcpp/charset.cc
@@ -1864,6 +1864,24 @@ _cpp_valid_utf8 (cpp_reader *pfile,
   return true;
 }
 
+extern bool
+_cpp_valid_utf8_str (const char *name)
+{
+  const uchar* in = (const uchar*)name;
+  size_t len = strlen(name);
+  cppchar_t cp;
+
+  while (*in)
+{
+  if (one_utf8_to_cppchar(&in, &len, &cp))
+   {
+ return false;
+   }
+}
+
+  return true;
+}
+
 /* Subroutine of convert_hex and convert_oct.  N is the representation
in the execution character set of a numeric escape; write it into the
string buffer TBUF and update the end-of-string pointer therein.  WIDE
diff --git a/libcpp/internal.h b/libcpp/internal.h
index badfd1b40da..4f2dd4a2f5c 100644
--- a/libcpp/internal.h
+++ b/libcpp/internal.h
@@ -834,6 +834,8 @@ extern bool _cpp_valid_utf8 (cpp_reader *pfile,
 struct normalize_state *nst,
 cppchar_t *cp);
 
+extern bool _cpp_valid_utf8_str (const char *str);
+
 extern void _cpp_destroy_iconv (cpp_reader *);
 extern unsigned char *_cpp_convert_input (cpp_reader *, const char *,
  unsigned char *, size_t, size_t,
-- 
2.37.3



[PATCH v2 3/3] p1689r5: initial support

2022-10-27 Thread Ben Boeckel via Gcc
This patch implements support for [P1689R5][] to communicate to a build
system the C++20 module dependencies to build systems so that they may
build `.gcm` files in the proper order.

Support is communicated through the following three new flags:

- `-fdeps-format=` specifies the format for the output. Currently named
  `p1689r5`.

- `-fdeps-file=` specifies the path to the file to write the format to.

- `-fdep-output=` specifies the `.o` that will be written for the TU
  that is scanned. This is required so that the build system can
  correlate the dependency output with the actual compilation that will
  occur.

CMake supports this format as of 17 Jun 2022 (to be part of 3.25.0)
using an experimental feature selection (to allow for future usage
evolution without committing to how it works today). While it remains
experimental, docs may be found in CMake's documentation for
experimental features.

Future work may include using this format for Fortran module
dependencies as well, however this is still pending work.

[P1689R5]: https://isocpp.org/files/papers/P1689R5.html
[cmake-experimental]: 
https://gitlab.kitware.com/cmake/cmake/-/blob/master/Help/dev/experimental.rst

TODO:

- header-unit information fields

Header units (including the standard library headers) are 100%
unsupported right now because the `-E` mechanism wants to import their
BMIs. A new mode (i.e., something more workable than existing `-E`
behavior) that mocks up header units as if they were imported purely
from their path and content would be required.

- non-utf8 paths

The current standard says that paths that are not unambiguously
represented using UTF-8 are not supported (because these cases are rare
and the extra complication is not worth it at this time). Future
versions of the format might have ways of encoding non-UTF-8 paths. For
now, this patch just doesn't support non-UTF-8 paths (ignoring the
"unambiguously represetable in UTF-8" case).

- figure out why junk gets placed at the end of the file

Sometimes it seems like the file gets a lot of `NUL` bytes appended to
it. It happens rarely and seems to be the result of some
`ftruncate`-style call which results in extra padding in the contents.
Noting it here as an observation at least.

Signed-off-by: Ben Boeckel 

---
 gcc/ChangeLog |   5 +
 gcc/c-family/ChangeLog|   6 +
 gcc/c-family/c-opts.cc|  40 +++-
 gcc/c-family/c.opt|  12 +
 gcc/cp/ChangeLog  |   5 +
 gcc/cp/module.cc  |   3 +-
 gcc/doc/invoke.texi   |  15 ++
 gcc/testsuite/ChangeLog   |   7 +
 gcc/testsuite/g++.dg/modules/depflags-f-MD.C  |   2 +
 gcc/testsuite/g++.dg/modules/depflags-f.C |   1 +
 gcc/testsuite/g++.dg/modules/depflags-fi.C|   3 +
 gcc/testsuite/g++.dg/modules/depflags-fj-MD.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-fj.C|   4 +
 .../g++.dg/modules/depflags-fjo-MD.C  |   4 +
 gcc/testsuite/g++.dg/modules/depflags-fjo.C   |   5 +
 gcc/testsuite/g++.dg/modules/depflags-fo-MD.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-fo.C|   4 +
 gcc/testsuite/g++.dg/modules/depflags-j-MD.C  |   2 +
 gcc/testsuite/g++.dg/modules/depflags-j.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-jo-MD.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-jo.C|   4 +
 gcc/testsuite/g++.dg/modules/depflags-o-MD.C  |   2 +
 gcc/testsuite/g++.dg/modules/depflags-o.C |   3 +
 gcc/testsuite/g++.dg/modules/modules.exp  |  11 +
 gcc/testsuite/g++.dg/modules/p1689-1.C|  18 ++
 gcc/testsuite/g++.dg/modules/p1689-1.exp.json |  27 +++
 gcc/testsuite/g++.dg/modules/p1689-2.C|  16 ++
 gcc/testsuite/g++.dg/modules/p1689-2.exp.json |  16 ++
 gcc/testsuite/g++.dg/modules/p1689-3.C|  14 ++
 gcc/testsuite/g++.dg/modules/p1689-3.exp.json |  16 ++
 gcc/testsuite/g++.dg/modules/p1689-4.C|  14 ++
 gcc/testsuite/g++.dg/modules/p1689-4.exp.json |  14 ++
 gcc/testsuite/g++.dg/modules/p1689-5.C|  14 ++
 gcc/testsuite/g++.dg/modules/p1689-5.exp.json |  14 ++
 gcc/testsuite/g++.dg/modules/test-p1689.py| 222 ++
 gcc/testsuite/lib/modules.exp |  71 ++
 libcpp/ChangeLog  |  11 +
 libcpp/include/cpplib.h   |  12 +-
 libcpp/include/mkdeps.h   |  17 +-
 libcpp/init.cc|  13 +-
 libcpp/mkdeps.cc  | 149 +++-
 41 files changed, 789 insertions(+), 19 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-f-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-f.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fi.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fj-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fj.C
 cr

[PATCH v2 1/3] libcpp: reject codepoints above 0x10FFFF

2022-10-27 Thread Ben Boeckel via Gcc
Unicode does not support such values because they are unrepresentable in
UTF-16.

Signed-off-by: Ben Boeckel 
---
 libcpp/ChangeLog  | 6 ++
 libcpp/charset.cc | 4 ++--
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/libcpp/ChangeLog b/libcpp/ChangeLog
index 18d5bcceaf0..4d707277531 100644
--- a/libcpp/ChangeLog
+++ b/libcpp/ChangeLog
@@ -1,3 +1,9 @@
+2022-10-27  Ben Boeckel  
+
+   * include/charset.cc: Reject encodings of codepoints above 0x10.
+   UTF-16 does not support such codepoints and therefore all Unicode
+   rejects such values.
+
 2022-10-19  Lewis Hyatt  
 
* include/cpplib.h (struct cpp_string): Use new "string_length" GTY.
diff --git a/libcpp/charset.cc b/libcpp/charset.cc
index 12a398e7527..e9da6674b5f 100644
--- a/libcpp/charset.cc
+++ b/libcpp/charset.cc
@@ -216,7 +216,7 @@ one_utf8_to_cppchar (const uchar **inbufp, size_t 
*inbytesleftp,
   if (c <= 0x3FF && nbytes > 5) return EILSEQ;
 
   /* Make sure the character is valid.  */
-  if (c > 0x7FFF || (c >= 0xD800 && c <= 0xDFFF)) return EILSEQ;
+  if (c > 0x10 || (c >= 0xD800 && c <= 0xDFFF)) return EILSEQ;
 
   *cp = c;
   *inbufp = inbuf;
@@ -320,7 +320,7 @@ one_utf32_to_utf8 (iconv_t bigend, const uchar **inbufp, 
size_t *inbytesleftp,
   s += inbuf[bigend ? 2 : 1] << 8;
   s += inbuf[bigend ? 3 : 0];
 
-  if (s >= 0x7FFF || (s >= 0xD800 && s <= 0xDFFF))
+  if (s > 0x10 || (s >= 0xD800 && s <= 0xDFFF))
 return EILSEQ;
 
   rval = one_cppchar_to_utf8 (s, outbufp, outbytesleftp);
-- 
2.37.3



Re: [PATCH v2 2/3] libcpp: add a function to determine UTF-8 validity of a C string

2022-10-28 Thread Ben Boeckel via Gcc
On Fri, Oct 28, 2022 at 08:59:16 -0400, David Malcolm wrote:
> On Thu, 2022-10-27 at 19:16 -0400, Ben Boeckel wrote:
> > This simplifies the interface for other UTF-8 validity detections
> > when a
> > simple "yes" or "no" answer is sufficient.
> > 
> > Signed-off-by: Ben Boeckel 
> > ---
> >  libcpp/ChangeLog  |  6 ++
> >  libcpp/charset.cc | 18 ++
> >  libcpp/internal.h |  2 ++
> >  3 files changed, 26 insertions(+)
> > 
> > diff --git a/libcpp/ChangeLog b/libcpp/ChangeLog
> > index 4d707277531..4e2c7900ae2 100644
> > --- a/libcpp/ChangeLog
> > +++ b/libcpp/ChangeLog
> > @@ -1,3 +1,9 @@
> > +2022-10-27  Ben Boeckel  
> > +
> > +   * include/charset.cc: Add `_cpp_valid_utf8_str` which
> > determines
> > +       whether a C string is valid UTF-8 or not.
> > +   * include/internal.h: Add prototype for
> > `_cpp_valid_utf8_str`.
> > +
> >  2022-10-27  Ben Boeckel  
> >  
> > * include/charset.cc: Reject encodings of codepoints above
> > 0x10.
> 
> The patch looks good to me, with the same potential caveat that you
> might need to move the ChangeLog entry from the patch "body" to the
> leading blurb, to satisfy:
>   ./contrib/gcc-changelog/git_check_commit.py

Ah, I had missed that. Now fixed locally for patches 1 and 2; will be in
v3 pending some time for further reviews.

THanks,

--Ben


Re: [PATCH v2 3/3] p1689r5: initial support

2022-10-28 Thread Ben Boeckel via Gcc
On Thu, Oct 27, 2022 at 19:16:44 -0400, Ben Boeckel wrote:
> diff --git a/gcc/testsuite/g++.dg/modules/modules.exp 
> b/gcc/testsuite/g++.dg/modules/modules.exp
> index afb323d0efd..7fe8825144f 100644
> --- a/gcc/testsuite/g++.dg/modules/modules.exp
> +++ b/gcc/testsuite/g++.dg/modules/modules.exp
> @@ -28,6 +28,7 @@
>  # { dg-module-do [link|run] [xfail] [options] } # link [and run]
>  
>  load_lib g++-dg.exp
> +load_lib modules.exp
>  
>  # If a testcase doesn't have special options, use these.
>  global DEFAULT_CXXFLAGS
> @@ -237,6 +238,13 @@ proc cleanup_module_files { files } {
>  }
>  }
>  
> +# delete the specified set of dep files
> +proc cleanup_dep_files { files } {
> +foreach file $files {
> + file_on_host delete $file
> +}
> +}
> +
>  global testdir
>  set testdir $srcdir/$subdir
>  proc srcdir {} {
> @@ -310,6 +318,7 @@ foreach src [lsort [find $srcdir/$subdir {*_a.[CHX}]] {
>   set std_list [module-init $src]
>   foreach std $std_list {
>   set mod_files {}
> + set dep_files {}
>   global module_do
>   set module_do {"compile" "P"}
>   set asm_list {}
> @@ -346,6 +355,8 @@ foreach src [lsort [find $srcdir/$subdir {*_a.[CHX}]] {
>   set mod_files [find $DEFAULT_REPO *.gcm]
>   }
>   cleanup_module_files $mod_files
> +
> + cleanup_dep_files $dep_files
>   }
>  }
>  }

These `cleanup_dep_files` hunks are leftovers from my attempts at
getting the P1689 and flags tests working; they'll be gone in v3.

--Ben


Re: [PATCH v2 3/3] p1689r5: initial support

2022-11-01 Thread Ben Boeckel via Gcc
On Tue, Nov 01, 2022 at 08:57:37 -0600, Tom Tromey wrote:
> >>>>> "Ben" == Ben Boeckel via Gcc-patches  writes:
> 
> Ben> - `-fdeps-file=` specifies the path to the file to write the format to.
> 
> I don't know how this output is intended to be used, but one mistake
> made with the other dependency-tracking options was that the output file
> isn't created atomically.  As a consequence, Makefiles normally have to
> work around this to be robust.  If that's a possible issue here then it
> would be best to handle it in this patch.

I don't think there'll be any race here because it's the "output" of the
rule as far as the build graph is concerned. It's also JSON, so anything
reading it "early" will get a partial object and easily detect
"something went wrong". And for clarity, the `-o` flag used in CMake
with this is just a side effect of the `-E` mechanism used and is
completely ignored in the CMake usage of this.

--Ben


[PATCH v3 0/3] RFC: P1689R5 support

2022-11-08 Thread Ben Boeckel via Gcc
Hi,

This patch adds initial support for ISO C++'s [P1689R5][], a format for
describing C++ module requirements and provisions based on the source
code. This is required because compiling C++ with modules is not
embarrassingly parallel and need to be ordered to ensure that `import
some_module;` can be satisfied in time by making sure that the TU with
`export import some_module;` is compiled first.

[P1689R5]: https://isocpp.org/files/papers/P1689R5.html

I'd like feedback on the approach taken here with respect to the
user-visible flags. I'll also note that header units are not supported
at this time because the current `-E` behavior with respect to `import
;` is to search for an appropriate `.gcm` file which is not
something such a "scan" can support. A new mode will likely need to be
created (e.g., replacing `-E` with `-fc++-module-scanning` or something)
where headers are looked up "normally" and processed only as much as
scanning requires.

For the record, Clang has patches with similar flags and behavior by
Chuanqi Xu here:

https://reviews.llvm.org/D134269

with the same flags.

Thanks,

--Ben

---
v2 -> v3:

- changelog entries moved to commit messages
- documentation updated/added in the UTF-8 routine editing

v1 -> v2:

- removal of the `deps_write(extra)` parameter to option-checking where
  ndeeded
- default parameter of `cpp_finish(fdeps_stream = NULL)`
- unification of libcpp UTF-8 validity functions from v1
- test cases for flag parsing states (depflags-*) and p1689 output
  (p1689-*)

Ben Boeckel (3):
  libcpp: reject codepoints above 0x10
  libcpp: add a function to determine UTF-8 validity of a C string
  p1689r5: initial support

 gcc/c-family/c-opts.cc|  40 +++-
 gcc/c-family/c.opt|  12 +
 gcc/cp/module.cc  |   3 +-
 gcc/doc/invoke.texi   |  15 ++
 gcc/testsuite/g++.dg/modules/depflags-f-MD.C  |   2 +
 gcc/testsuite/g++.dg/modules/depflags-f.C |   1 +
 gcc/testsuite/g++.dg/modules/depflags-fi.C|   3 +
 gcc/testsuite/g++.dg/modules/depflags-fj-MD.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-fj.C|   4 +
 .../g++.dg/modules/depflags-fjo-MD.C  |   4 +
 gcc/testsuite/g++.dg/modules/depflags-fjo.C   |   5 +
 gcc/testsuite/g++.dg/modules/depflags-fo-MD.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-fo.C|   4 +
 gcc/testsuite/g++.dg/modules/depflags-j-MD.C  |   2 +
 gcc/testsuite/g++.dg/modules/depflags-j.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-jo-MD.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-jo.C|   4 +
 gcc/testsuite/g++.dg/modules/depflags-o-MD.C  |   2 +
 gcc/testsuite/g++.dg/modules/depflags-o.C |   3 +
 gcc/testsuite/g++.dg/modules/modules.exp  |   1 +
 gcc/testsuite/g++.dg/modules/p1689-1.C|  18 ++
 gcc/testsuite/g++.dg/modules/p1689-1.exp.json |  27 +++
 gcc/testsuite/g++.dg/modules/p1689-2.C|  16 ++
 gcc/testsuite/g++.dg/modules/p1689-2.exp.json |  16 ++
 gcc/testsuite/g++.dg/modules/p1689-3.C|  14 ++
 gcc/testsuite/g++.dg/modules/p1689-3.exp.json |  16 ++
 gcc/testsuite/g++.dg/modules/p1689-4.C|  14 ++
 gcc/testsuite/g++.dg/modules/p1689-4.exp.json |  14 ++
 gcc/testsuite/g++.dg/modules/p1689-5.C|  14 ++
 gcc/testsuite/g++.dg/modules/p1689-5.exp.json |  14 ++
 gcc/testsuite/g++.dg/modules/test-p1689.py| 222 ++
 gcc/testsuite/lib/modules.exp |  71 ++
 libcpp/charset.cc |  28 ++-
 libcpp/include/cpplib.h   |  12 +-
 libcpp/include/mkdeps.h   |  17 +-
 libcpp/init.cc|  13 +-
 libcpp/internal.h |   2 +
 libcpp/mkdeps.cc  | 149 +++-
 38 files changed, 773 insertions(+), 21 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-f-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-f.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fi.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fj-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fj.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fjo-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fjo.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fo-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fo.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-j-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-j.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-jo-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-jo.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-o-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-o.C
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-1.C
 create mode 100644 gcc/testsuite/g++.dg/modules

[PATCH v3 1/3] libcpp: reject codepoints above 0x10FFFF

2022-11-08 Thread Ben Boeckel via Gcc
Unicode does not support such values because they are unrepresentable in
UTF-16.

libcpp/

* charset.cc: Reject encodings of codepoints above 0x10.
UTF-16 does not support such codepoints and therefore all
Unicode rejects such values.

Signed-off-by: Ben Boeckel 
---
 libcpp/charset.cc | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/libcpp/charset.cc b/libcpp/charset.cc
index 12a398e7527..324b5b19136 100644
--- a/libcpp/charset.cc
+++ b/libcpp/charset.cc
@@ -158,6 +158,10 @@ struct _cpp_strbuf
encoded as any of DF 80, E0 9F 80, F0 80 9F 80, F8 80 80 9F 80, or
FC 80 80 80 9F 80.  Only the first is valid.
 
+   Additionally, Unicode declares that all codepoints above 0010 are
+   invalid because they cannot be represented in UTF-16. As such, all 5- and
+   6-byte encodings are invalid.
+
An implementation note: the transformation from UTF-16 to UTF-8, or
vice versa, is easiest done by using UTF-32 as an intermediary.  */
 
@@ -216,7 +220,7 @@ one_utf8_to_cppchar (const uchar **inbufp, size_t 
*inbytesleftp,
   if (c <= 0x3FF && nbytes > 5) return EILSEQ;
 
   /* Make sure the character is valid.  */
-  if (c > 0x7FFF || (c >= 0xD800 && c <= 0xDFFF)) return EILSEQ;
+  if (c > 0x10 || (c >= 0xD800 && c <= 0xDFFF)) return EILSEQ;
 
   *cp = c;
   *inbufp = inbuf;
@@ -320,7 +324,7 @@ one_utf32_to_utf8 (iconv_t bigend, const uchar **inbufp, 
size_t *inbytesleftp,
   s += inbuf[bigend ? 2 : 1] << 8;
   s += inbuf[bigend ? 3 : 0];
 
-  if (s >= 0x7FFF || (s >= 0xD800 && s <= 0xDFFF))
+  if (s > 0x10 || (s >= 0xD800 && s <= 0xDFFF))
 return EILSEQ;
 
   rval = one_cppchar_to_utf8 (s, outbufp, outbytesleftp);
-- 
2.38.1



[PATCH v3 2/3] libcpp: add a function to determine UTF-8 validity of a C string

2022-11-08 Thread Ben Boeckel via Gcc
This simplifies the interface for other UTF-8 validity detections when a
simple "yes" or "no" answer is sufficient.

libcpp/

* charset.cc: Add `_cpp_valid_utf8_str` which determines whether
a C string is valid UTF-8 or not.
* internal.h: Add prototype for `_cpp_valid_utf8_str`.

Signed-off-by: Ben Boeckel 
---
 libcpp/charset.cc | 20 
 libcpp/internal.h |  2 ++
 2 files changed, 22 insertions(+)

diff --git a/libcpp/charset.cc b/libcpp/charset.cc
index 324b5b19136..e130bc01f48 100644
--- a/libcpp/charset.cc
+++ b/libcpp/charset.cc
@@ -1868,6 +1868,26 @@ _cpp_valid_utf8 (cpp_reader *pfile,
   return true;
 }
 
+/*  Detect whether a C-string is a valid UTF-8-encoded set of bytes. Returns
+`false` if any contained byte sequence encodes an invalid Unicode codepoint
+or is not a valid UTF-8 sequence. Returns `true` otherwise. */
+
+extern bool
+_cpp_valid_utf8_str (const char *name)
+{
+  const uchar* in = (const uchar*)name;
+  size_t len = strlen(name);
+  cppchar_t cp;
+
+  while (*in)
+{
+  if (one_utf8_to_cppchar(&in, &len, &cp))
+   return false;
+}
+
+  return true;
+}
+
 /* Subroutine of convert_hex and convert_oct.  N is the representation
in the execution character set of a numeric escape; write it into the
string buffer TBUF and update the end-of-string pointer therein.  WIDE
diff --git a/libcpp/internal.h b/libcpp/internal.h
index badfd1b40da..4f2dd4a2f5c 100644
--- a/libcpp/internal.h
+++ b/libcpp/internal.h
@@ -834,6 +834,8 @@ extern bool _cpp_valid_utf8 (cpp_reader *pfile,
 struct normalize_state *nst,
 cppchar_t *cp);
 
+extern bool _cpp_valid_utf8_str (const char *str);
+
 extern void _cpp_destroy_iconv (cpp_reader *);
 extern unsigned char *_cpp_convert_input (cpp_reader *, const char *,
  unsigned char *, size_t, size_t,
-- 
2.38.1



[PATCH v3 3/3] p1689r5: initial support

2022-11-08 Thread Ben Boeckel via Gcc
This patch implements support for [P1689R5][] to communicate to a build
system the C++20 module dependencies to build systems so that they may
build `.gcm` files in the proper order.

Support is communicated through the following three new flags:

- `-fdeps-format=` specifies the format for the output. Currently named
  `p1689r5`.

- `-fdeps-file=` specifies the path to the file to write the format to.

- `-fdep-output=` specifies the `.o` that will be written for the TU
  that is scanned. This is required so that the build system can
  correlate the dependency output with the actual compilation that will
  occur.

CMake supports this format as of 17 Jun 2022 (to be part of 3.25.0)
using an experimental feature selection (to allow for future usage
evolution without committing to how it works today). While it remains
experimental, docs may be found in CMake's documentation for
experimental features.

Future work may include using this format for Fortran module
dependencies as well, however this is still pending work.

[P1689R5]: https://isocpp.org/files/papers/P1689R5.html
[cmake-experimental]: 
https://gitlab.kitware.com/cmake/cmake/-/blob/master/Help/dev/experimental.rst

TODO:

- header-unit information fields

Header units (including the standard library headers) are 100%
unsupported right now because the `-E` mechanism wants to import their
BMIs. A new mode (i.e., something more workable than existing `-E`
behavior) that mocks up header units as if they were imported purely
from their path and content would be required.

- non-utf8 paths

The current standard says that paths that are not unambiguously
represented using UTF-8 are not supported (because these cases are rare
and the extra complication is not worth it at this time). Future
versions of the format might have ways of encoding non-UTF-8 paths. For
now, this patch just doesn't support non-UTF-8 paths (ignoring the
"unambiguously represetable in UTF-8" case).

- figure out why junk gets placed at the end of the file

Sometimes it seems like the file gets a lot of `NUL` bytes appended to
it. It happens rarely and seems to be the result of some
`ftruncate`-style call which results in extra padding in the contents.
Noting it here as an observation at least.

libcpp/

* include/cpplib.h: Add cpp_deps_format enum.
(cpp_options): Add format field
(cpp_finish): Add dependency stream parameter.
* include/mkdeps.h (deps_add_module_target): Add new preprocessor
parameter used for C++ module tracking.
* init.cc (cpp_finish): Add new preprocessor parameter used for C++
module tracking.
* mkdeps.cc (mkdeps): Implement P1689R5 output.

gcc/

* doc/invoke.texi: Document -fdeps-format=, -fdep-file=, and
-fdep-output= flags.

gcc/c-family/

* c-opts.cc (c_common_handle_option): Add fdeps_file variable and
-fdeps-format=, -fdep-file=, and -fdep-output= parsing.
* c.opt: Add -fdeps-format=, -fdep-file=, and -fdep-output= flags.

gcc/cp/

* module.cc (preprocessed_module): Pass whether the module is
exported to dependency tracking.

gcc/testsuite/

* g++.dg/modules/depflags-f-MD.C: New test.
* g++.dg/modules/depflags-f.C: New test.
* g++.dg/modules/depflags-fi.C: New test.
* g++.dg/modules/depflags-fj-MD.C: New test.
* g++.dg/modules/depflags-fj.C: New test.
* g++.dg/modules/depflags-fjo-MD.C: New test.
* g++.dg/modules/depflags-fjo.C: New test.
* g++.dg/modules/depflags-fo-MD.C: New test.
* g++.dg/modules/depflags-fo.C: New test.
* g++.dg/modules/depflags-j-MD.C: New test.
* g++.dg/modules/depflags-j.C: New test.
* g++.dg/modules/depflags-jo-MD.C: New test.
* g++.dg/modules/depflags-jo.C: New test.
* g++.dg/modules/depflags-o-MD.C: New test.
* g++.dg/modules/depflags-o.C: New test.
* g++.dg/modules/p1689-1.C: New test.
* g++.dg/modules/p1689-1.exp.json: New test expectation.
* g++.dg/modules/p1689-2.C: New test.
* g++.dg/modules/p1689-2.exp.json: New test expectation.
* g++.dg/modules/p1689-3.C: New test.
* g++.dg/modules/p1689-3.exp.json: New test expectation.
* g++.dg/modules/p1689-4.C: New test.
* g++.dg/modules/p1689-4.exp.json: New test expectation.
* g++.dg/modules/p1689-5.C: New test.
* g++.dg/modules/p1689-5.exp.json: New test expectation.
* g++.dg/modules/modules.exp: Load new P1689 library routines.
* g++.dg/modules/test-p1689.py: New tool for validating P1689 output.
* lib/modules.exp: Support for validating P1689 outputs.

Signed-off-by: Ben Boeckel 
---
 gcc/c-family/c-opts.cc|  40 +++-
 gcc/c-family/c.opt|  12 +
 gcc/cp/module.cc  |   3 +-
 gcc/doc/invoke.texi   |  15 ++
 gcc/testsuite/g++

Re: How can Autoconf help with the transition to stricter compilation defaults?

2022-11-15 Thread Ben Boeckel via Gcc
On Tue, Nov 15, 2022 at 15:09:19 -0800, Paul Eggert wrote:
> This may be a hack, but it's a *good* hack. It's likely to fix 
> real-world bugs that would be caused if Clang becomes overly pedantic by 
> default here. And the probability of introducing real-world bugs is 
> essentially zero.

FWIW, CMake uses the same signature for detecting whether a function
exists or not:


https://gitlab.kitware.com/cmake/cmake/-/blob/master/Modules/CheckFunctionExists.c

It's also been like this (without the `void`) for 20 years; the `void`
argument was added 6 years ago:


https://gitlab.kitware.com/cmake/cmake/-/commits/master/Modules/CheckFunctionExists.c

--Ben


[PATCH v4 1/3] libcpp: reject codepoints above 0x10FFFF

2022-12-10 Thread Ben Boeckel via Gcc
Unicode does not support such values because they are unrepresentable in
UTF-16.

libcpp/

* charset.cc: Reject encodings of codepoints above 0x10.
UTF-16 does not support such codepoints and therefore all
Unicode rejects such values.

Signed-off-by: Ben Boeckel 
---
 libcpp/charset.cc | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/libcpp/charset.cc b/libcpp/charset.cc
index 12a398e7527..324b5b19136 100644
--- a/libcpp/charset.cc
+++ b/libcpp/charset.cc
@@ -158,6 +158,10 @@ struct _cpp_strbuf
encoded as any of DF 80, E0 9F 80, F0 80 9F 80, F8 80 80 9F 80, or
FC 80 80 80 9F 80.  Only the first is valid.
 
+   Additionally, Unicode declares that all codepoints above 0010 are
+   invalid because they cannot be represented in UTF-16. As such, all 5- and
+   6-byte encodings are invalid.
+
An implementation note: the transformation from UTF-16 to UTF-8, or
vice versa, is easiest done by using UTF-32 as an intermediary.  */
 
@@ -216,7 +220,7 @@ one_utf8_to_cppchar (const uchar **inbufp, size_t 
*inbytesleftp,
   if (c <= 0x3FF && nbytes > 5) return EILSEQ;
 
   /* Make sure the character is valid.  */
-  if (c > 0x7FFF || (c >= 0xD800 && c <= 0xDFFF)) return EILSEQ;
+  if (c > 0x10 || (c >= 0xD800 && c <= 0xDFFF)) return EILSEQ;
 
   *cp = c;
   *inbufp = inbuf;
@@ -320,7 +324,7 @@ one_utf32_to_utf8 (iconv_t bigend, const uchar **inbufp, 
size_t *inbytesleftp,
   s += inbuf[bigend ? 2 : 1] << 8;
   s += inbuf[bigend ? 3 : 0];
 
-  if (s >= 0x7FFF || (s >= 0xD800 && s <= 0xDFFF))
+  if (s > 0x10 || (s >= 0xD800 && s <= 0xDFFF))
 return EILSEQ;
 
   rval = one_cppchar_to_utf8 (s, outbufp, outbytesleftp);
-- 
2.38.1



[PATCH v4 0/3] RFC: P1689R5 support

2022-12-10 Thread Ben Boeckel via Gcc
Hi,

This patch adds initial support for ISO C++'s [P1689R5][], a format for
describing C++ module requirements and provisions based on the source
code. This is required because compiling C++ with modules is not
embarrassingly parallel and need to be ordered to ensure that `import
some_module;` can be satisfied in time by making sure that the TU with
`export import some_module;` is compiled first.

[P1689R5]: https://isocpp.org/files/papers/P1689R5.html

I'd like feedback on the approach taken here with respect to the
user-visible flags. I'll also note that header units are not supported
at this time because the current `-E` behavior with respect to `import
;` is to search for an appropriate `.gcm` file which is not
something such a "scan" can support. A new mode will likely need to be
created (e.g., replacing `-E` with `-fc++-module-scanning` or something)
where headers are looked up "normally" and processed only as much as
scanning requires.

For the record, Clang has patches with similar flags and behavior by
Chuanqi Xu here:

https://reviews.llvm.org/D134269

with the same flags.

Thanks,

--Ben

---
v3 -> v4:

- add missing spaces between function names and arguments

v2 -> v3:

- changelog entries moved to commit messages
- documentation updated/added in the UTF-8 routine editing

v1 -> v2:

- removal of the `deps_write(extra)` parameter to option-checking where
  ndeeded
- default parameter of `cpp_finish(fdeps_stream = NULL)`
- unification of libcpp UTF-8 validity functions from v1
- test cases for flag parsing states (depflags-*) and p1689 output
  (p1689-*)

Ben Boeckel (3):
  libcpp: reject codepoints above 0x10
  libcpp: add a function to determine UTF-8 validity of a C string
  p1689r5: initial support

 gcc/c-family/c-opts.cc|  40 +++-
 gcc/c-family/c.opt|  12 +
 gcc/cp/module.cc  |   3 +-
 gcc/doc/invoke.texi   |  15 ++
 gcc/testsuite/g++.dg/modules/depflags-f-MD.C  |   2 +
 gcc/testsuite/g++.dg/modules/depflags-f.C |   1 +
 gcc/testsuite/g++.dg/modules/depflags-fi.C|   3 +
 gcc/testsuite/g++.dg/modules/depflags-fj-MD.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-fj.C|   4 +
 .../g++.dg/modules/depflags-fjo-MD.C  |   4 +
 gcc/testsuite/g++.dg/modules/depflags-fjo.C   |   5 +
 gcc/testsuite/g++.dg/modules/depflags-fo-MD.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-fo.C|   4 +
 gcc/testsuite/g++.dg/modules/depflags-j-MD.C  |   2 +
 gcc/testsuite/g++.dg/modules/depflags-j.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-jo-MD.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-jo.C|   4 +
 gcc/testsuite/g++.dg/modules/depflags-o-MD.C  |   2 +
 gcc/testsuite/g++.dg/modules/depflags-o.C |   3 +
 gcc/testsuite/g++.dg/modules/modules.exp  |   1 +
 gcc/testsuite/g++.dg/modules/p1689-1.C|  18 ++
 gcc/testsuite/g++.dg/modules/p1689-1.exp.json |  27 +++
 gcc/testsuite/g++.dg/modules/p1689-2.C|  16 ++
 gcc/testsuite/g++.dg/modules/p1689-2.exp.json |  16 ++
 gcc/testsuite/g++.dg/modules/p1689-3.C|  14 ++
 gcc/testsuite/g++.dg/modules/p1689-3.exp.json |  16 ++
 gcc/testsuite/g++.dg/modules/p1689-4.C|  14 ++
 gcc/testsuite/g++.dg/modules/p1689-4.exp.json |  14 ++
 gcc/testsuite/g++.dg/modules/p1689-5.C|  14 ++
 gcc/testsuite/g++.dg/modules/p1689-5.exp.json |  14 ++
 gcc/testsuite/g++.dg/modules/test-p1689.py| 222 ++
 gcc/testsuite/lib/modules.exp |  71 ++
 libcpp/charset.cc |  28 ++-
 libcpp/include/cpplib.h   |  12 +-
 libcpp/include/mkdeps.h   |  17 +-
 libcpp/init.cc|  13 +-
 libcpp/internal.h |   2 +
 libcpp/mkdeps.cc  | 149 +++-
 38 files changed, 773 insertions(+), 21 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-f-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-f.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fi.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fj-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fj.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fjo-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fjo.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fo-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fo.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-j-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-j.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-jo-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-jo.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-o-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-o.C
 create mode 100644 gcc/testsuite/g++.

[PATCH v4 2/3] libcpp: add a function to determine UTF-8 validity of a C string

2022-12-10 Thread Ben Boeckel via Gcc
This simplifies the interface for other UTF-8 validity detections when a
simple "yes" or "no" answer is sufficient.

libcpp/

* charset.cc: Add `_cpp_valid_utf8_str` which determines whether
a C string is valid UTF-8 or not.
* internal.h: Add prototype for `_cpp_valid_utf8_str`.

Signed-off-by: Ben Boeckel 
---
 libcpp/charset.cc | 20 
 libcpp/internal.h |  2 ++
 2 files changed, 22 insertions(+)

diff --git a/libcpp/charset.cc b/libcpp/charset.cc
index 324b5b19136..422cb52595c 100644
--- a/libcpp/charset.cc
+++ b/libcpp/charset.cc
@@ -1868,6 +1868,26 @@ _cpp_valid_utf8 (cpp_reader *pfile,
   return true;
 }
 
+/*  Detect whether a C-string is a valid UTF-8-encoded set of bytes. Returns
+`false` if any contained byte sequence encodes an invalid Unicode codepoint
+or is not a valid UTF-8 sequence. Returns `true` otherwise. */
+
+extern bool
+_cpp_valid_utf8_str (const char *name)
+{
+  const uchar* in = (const uchar*)name;
+  size_t len = strlen (name);
+  cppchar_t cp;
+
+  while (*in)
+{
+  if (one_utf8_to_cppchar (&in, &len, &cp))
+   return false;
+}
+
+  return true;
+}
+
 /* Subroutine of convert_hex and convert_oct.  N is the representation
in the execution character set of a numeric escape; write it into the
string buffer TBUF and update the end-of-string pointer therein.  WIDE
diff --git a/libcpp/internal.h b/libcpp/internal.h
index badfd1b40da..4f2dd4a2f5c 100644
--- a/libcpp/internal.h
+++ b/libcpp/internal.h
@@ -834,6 +834,8 @@ extern bool _cpp_valid_utf8 (cpp_reader *pfile,
 struct normalize_state *nst,
 cppchar_t *cp);
 
+extern bool _cpp_valid_utf8_str (const char *str);
+
 extern void _cpp_destroy_iconv (cpp_reader *);
 extern unsigned char *_cpp_convert_input (cpp_reader *, const char *,
  unsigned char *, size_t, size_t,
-- 
2.38.1



[PATCH v4 3/3] p1689r5: initial support

2022-12-10 Thread Ben Boeckel via Gcc
This patch implements support for [P1689R5][] to communicate to a build
system the C++20 module dependencies to build systems so that they may
build `.gcm` files in the proper order.

Support is communicated through the following three new flags:

- `-fdeps-format=` specifies the format for the output. Currently named
  `p1689r5`.

- `-fdeps-file=` specifies the path to the file to write the format to.

- `-fdep-output=` specifies the `.o` that will be written for the TU
  that is scanned. This is required so that the build system can
  correlate the dependency output with the actual compilation that will
  occur.

CMake supports this format as of 17 Jun 2022 (to be part of 3.25.0)
using an experimental feature selection (to allow for future usage
evolution without committing to how it works today). While it remains
experimental, docs may be found in CMake's documentation for
experimental features.

Future work may include using this format for Fortran module
dependencies as well, however this is still pending work.

[P1689R5]: https://isocpp.org/files/papers/P1689R5.html
[cmake-experimental]: 
https://gitlab.kitware.com/cmake/cmake/-/blob/master/Help/dev/experimental.rst

TODO:

- header-unit information fields

Header units (including the standard library headers) are 100%
unsupported right now because the `-E` mechanism wants to import their
BMIs. A new mode (i.e., something more workable than existing `-E`
behavior) that mocks up header units as if they were imported purely
from their path and content would be required.

- non-utf8 paths

The current standard says that paths that are not unambiguously
represented using UTF-8 are not supported (because these cases are rare
and the extra complication is not worth it at this time). Future
versions of the format might have ways of encoding non-UTF-8 paths. For
now, this patch just doesn't support non-UTF-8 paths (ignoring the
"unambiguously represetable in UTF-8" case).

- figure out why junk gets placed at the end of the file

Sometimes it seems like the file gets a lot of `NUL` bytes appended to
it. It happens rarely and seems to be the result of some
`ftruncate`-style call which results in extra padding in the contents.
Noting it here as an observation at least.

libcpp/

* include/cpplib.h: Add cpp_deps_format enum.
(cpp_options): Add format field
(cpp_finish): Add dependency stream parameter.
* include/mkdeps.h (deps_add_module_target): Add new preprocessor
parameter used for C++ module tracking.
* init.cc (cpp_finish): Add new preprocessor parameter used for C++
module tracking.
* mkdeps.cc (mkdeps): Implement P1689R5 output.

gcc/

* doc/invoke.texi: Document -fdeps-format=, -fdep-file=, and
-fdep-output= flags.

gcc/c-family/

* c-opts.cc (c_common_handle_option): Add fdeps_file variable and
-fdeps-format=, -fdep-file=, and -fdep-output= parsing.
* c.opt: Add -fdeps-format=, -fdep-file=, and -fdep-output= flags.

gcc/cp/

* module.cc (preprocessed_module): Pass whether the module is
exported to dependency tracking.

gcc/testsuite/

* g++.dg/modules/depflags-f-MD.C: New test.
* g++.dg/modules/depflags-f.C: New test.
* g++.dg/modules/depflags-fi.C: New test.
* g++.dg/modules/depflags-fj-MD.C: New test.
* g++.dg/modules/depflags-fj.C: New test.
* g++.dg/modules/depflags-fjo-MD.C: New test.
* g++.dg/modules/depflags-fjo.C: New test.
* g++.dg/modules/depflags-fo-MD.C: New test.
* g++.dg/modules/depflags-fo.C: New test.
* g++.dg/modules/depflags-j-MD.C: New test.
* g++.dg/modules/depflags-j.C: New test.
* g++.dg/modules/depflags-jo-MD.C: New test.
* g++.dg/modules/depflags-jo.C: New test.
* g++.dg/modules/depflags-o-MD.C: New test.
* g++.dg/modules/depflags-o.C: New test.
* g++.dg/modules/p1689-1.C: New test.
* g++.dg/modules/p1689-1.exp.json: New test expectation.
* g++.dg/modules/p1689-2.C: New test.
* g++.dg/modules/p1689-2.exp.json: New test expectation.
* g++.dg/modules/p1689-3.C: New test.
* g++.dg/modules/p1689-3.exp.json: New test expectation.
* g++.dg/modules/p1689-4.C: New test.
* g++.dg/modules/p1689-4.exp.json: New test expectation.
* g++.dg/modules/p1689-5.C: New test.
* g++.dg/modules/p1689-5.exp.json: New test expectation.
* g++.dg/modules/modules.exp: Load new P1689 library routines.
* g++.dg/modules/test-p1689.py: New tool for validating P1689 output.
* lib/modules.exp: Support for validating P1689 outputs.

Signed-off-by: Ben Boeckel 
---
 gcc/c-family/c-opts.cc|  40 +++-
 gcc/c-family/c.opt|  12 +
 gcc/cp/module.cc  |   3 +-
 gcc/doc/invoke.texi   |  15 ++
 gcc/testsuite/g++

Re: -minstd: Require a minimum std version, without being specific

2022-12-21 Thread Ben Boeckel via Gcc
On Wed, Dec 21, 2022 at 19:33:48 +0100, Alejandro Colomar via Gcc wrote:
> I've long had this wish: an option similar to -std, but which would not 
> specify 
> the standard.  Rather, mark a requirement that the standard be at least a 
> version.
> 
> This would be especially useful for libraries, which might for example 
> require 
> C99 or C11 to work.  They would be able to specify -minstd=c11 in their pc(5) 
> file (for use with pkgconf(1)).  That way, a program using such library, 
> would 
> be free to use -std to specify the C version that the project should be 
> compiled 
> with; maybe gnu17, maybe even gnu2x.  But if the program tries to compile 
> under, 
> say gnu89, the compiler would report an error.

(FD: CMake developer, ISO C++ SG15 committee member)

I'd like to see us move away from "flag soup" and instead towards more
structured information here. This request corresponds to CMake's
`C_STANDARD` target property[1] which CMake then puts together to mean
something.

Note that there are real hazards with just putting the flags like this
into `.pc` files directly. If you have a C library and say "-minstd=c99"
and I'm C++, what is supposed to happen with this flag for a C++
compilre? Does it translate to C++11 (which is the first C++ standard to
"fully contain" C99)? What if there is no answer (e.g., `-minstd=c23`)?

FWIW, my idea is to broaden the concept CMake has of "usage
requirements" to not be so CMake-centered and to encompass things like
"you need an rpath to X to use my libraries" or even "here's an entry
for `PYTHONPATH` to use my Python code". ISO C++'s SG15 has started
discussion of such things with an eye towards standardization here:

https://github.com/isocpp/pkg-fmt

--Ben

[1]https://cmake.org/cmake/help/latest/prop_tgt/C_STANDARD.html


[PATCH v5 0/5] P1689R5 support

2023-01-25 Thread Ben Boeckel via Gcc
Hi,

This patch series adds initial support for ISO C++'s [P1689R5][], a
format for describing C++ module requirements and provisions based on
the source code. This is required because compiling C++ with modules is
not embarrassingly parallel and need to be ordered to ensure that
`import some_module;` can be satisfied in time by making sure that any
TU with `export import some_module;` is compiled first.

[P1689R5]: https://isocpp.org/files/papers/P1689R5.html

I've also added patches to include imported module CMI files and the
module mapper file as dependencies of the compilation. I briefly looked
into adding dependencies on response files as well, but that appeared to
need some code contortions to have a `class mkdeps` available before
parsing the command line or to keep the information around until one was
made.

I'd like feedback on the approach taken here with respect to the
user-visible flags. I'll also note that header units are not supported
at this time because the current `-E` behavior with respect to `import
;` is to search for an appropriate `.gcm` file which is not
something such a "scan" can support. A new mode will likely need to be
created (e.g., replacing `-E` with `-fc++-module-scanning` or something)
where headers are looked up "normally" and processed only as much as
scanning requires.

FWIW, Clang as taken an alternate approach with its `clang-scan-deps`
tool rather than using the compiler directly.

Thanks,

--Ben

---
v4 -> v5:

- add dependency tracking for imported modules to `-MF`
- add dependency tracking for static module mapper files given to
  `-fmodule-mapper=`

v3 -> v4:

- add missing spaces between function names and arguments

v2 -> v3:

- changelog entries moved to commit messages
- documentation updated/added in the UTF-8 routine editing

v1 -> v2:

- removal of the `deps_write(extra)` parameter to option-checking where
  ndeeded
- default parameter of `cpp_finish(fdeps_stream = NULL)`
- unification of libcpp UTF-8 validity functions from v1
- test cases for flag parsing states (depflags-*) and p1689 output
  (p1689-*)

Ben Boeckel (5):
  libcpp: reject codepoints above 0x10
  libcpp: add a function to determine UTF-8 validity of a C string
  p1689r5: initial support
  c++modules: report imported CMI files as dependencies
  c++modules: report module mapper files as a dependency

 gcc/c-family/c-opts.cc|  40 +++-
 gcc/c-family/c.opt|  12 +
 gcc/cp/mapper-client.cc   |   4 +
 gcc/cp/mapper-client.h|   1 +
 gcc/cp/module.cc  |  23 +-
 gcc/doc/invoke.texi   |  15 ++
 gcc/testsuite/g++.dg/modules/depflags-f-MD.C  |   2 +
 gcc/testsuite/g++.dg/modules/depflags-f.C |   1 +
 gcc/testsuite/g++.dg/modules/depflags-fi.C|   3 +
 gcc/testsuite/g++.dg/modules/depflags-fj-MD.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-fj.C|   4 +
 .../g++.dg/modules/depflags-fjo-MD.C  |   4 +
 gcc/testsuite/g++.dg/modules/depflags-fjo.C   |   5 +
 gcc/testsuite/g++.dg/modules/depflags-fo-MD.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-fo.C|   4 +
 gcc/testsuite/g++.dg/modules/depflags-j-MD.C  |   2 +
 gcc/testsuite/g++.dg/modules/depflags-j.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-jo-MD.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-jo.C|   4 +
 gcc/testsuite/g++.dg/modules/depflags-o-MD.C  |   2 +
 gcc/testsuite/g++.dg/modules/depflags-o.C |   3 +
 gcc/testsuite/g++.dg/modules/modules.exp  |   1 +
 gcc/testsuite/g++.dg/modules/p1689-1.C|  18 ++
 gcc/testsuite/g++.dg/modules/p1689-1.exp.json |  27 +++
 gcc/testsuite/g++.dg/modules/p1689-2.C|  16 ++
 gcc/testsuite/g++.dg/modules/p1689-2.exp.json |  16 ++
 gcc/testsuite/g++.dg/modules/p1689-3.C|  14 ++
 gcc/testsuite/g++.dg/modules/p1689-3.exp.json |  16 ++
 gcc/testsuite/g++.dg/modules/p1689-4.C|  14 ++
 gcc/testsuite/g++.dg/modules/p1689-4.exp.json |  14 ++
 gcc/testsuite/g++.dg/modules/p1689-5.C|  14 ++
 gcc/testsuite/g++.dg/modules/p1689-5.exp.json |  14 ++
 gcc/testsuite/g++.dg/modules/test-p1689.py| 222 ++
 gcc/testsuite/lib/modules.exp |  71 ++
 libcpp/charset.cc |  28 ++-
 libcpp/include/cpplib.h   |  12 +-
 libcpp/include/mkdeps.h   |  17 +-
 libcpp/init.cc|  13 +-
 libcpp/internal.h |   2 +
 libcpp/mkdeps.cc  | 149 +++-
 40 files changed, 789 insertions(+), 30 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-f-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-f.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fi.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fj-MD.C
 create mode 10064

[PATCH v5 2/5] libcpp: add a function to determine UTF-8 validity of a C string

2023-01-25 Thread Ben Boeckel via Gcc
This simplifies the interface for other UTF-8 validity detections when a
simple "yes" or "no" answer is sufficient.

libcpp/

* charset.cc: Add `_cpp_valid_utf8_str` which determines whether
a C string is valid UTF-8 or not.
* internal.h: Add prototype for `_cpp_valid_utf8_str`.

Signed-off-by: Ben Boeckel 
---
 libcpp/charset.cc | 20 
 libcpp/internal.h |  2 ++
 2 files changed, 22 insertions(+)

diff --git a/libcpp/charset.cc b/libcpp/charset.cc
index f7ae12ea5a2..616be9d02ee 100644
--- a/libcpp/charset.cc
+++ b/libcpp/charset.cc
@@ -1868,6 +1868,26 @@ _cpp_valid_utf8 (cpp_reader *pfile,
   return true;
 }
 
+/*  Detect whether a C-string is a valid UTF-8-encoded set of bytes. Returns
+`false` if any contained byte sequence encodes an invalid Unicode codepoint
+or is not a valid UTF-8 sequence. Returns `true` otherwise. */
+
+extern bool
+_cpp_valid_utf8_str (const char *name)
+{
+  const uchar* in = (const uchar*)name;
+  size_t len = strlen (name);
+  cppchar_t cp;
+
+  while (*in)
+{
+  if (one_utf8_to_cppchar (&in, &len, &cp))
+   return false;
+}
+
+  return true;
+}
+
 /* Subroutine of convert_hex and convert_oct.  N is the representation
in the execution character set of a numeric escape; write it into the
string buffer TBUF and update the end-of-string pointer therein.  WIDE
diff --git a/libcpp/internal.h b/libcpp/internal.h
index 9724676a8cd..48520901b2d 100644
--- a/libcpp/internal.h
+++ b/libcpp/internal.h
@@ -834,6 +834,8 @@ extern bool _cpp_valid_utf8 (cpp_reader *pfile,
 struct normalize_state *nst,
 cppchar_t *cp);
 
+extern bool _cpp_valid_utf8_str (const char *str);
+
 extern void _cpp_destroy_iconv (cpp_reader *);
 extern unsigned char *_cpp_convert_input (cpp_reader *, const char *,
  unsigned char *, size_t, size_t,
-- 
2.39.0



[PATCH v5 1/5] libcpp: reject codepoints above 0x10FFFF

2023-01-25 Thread Ben Boeckel via Gcc
Unicode does not support such values because they are unrepresentable in
UTF-16.

libcpp/

* charset.cc: Reject encodings of codepoints above 0x10.
UTF-16 does not support such codepoints and therefore all
Unicode rejects such values.

Signed-off-by: Ben Boeckel 
---
 libcpp/charset.cc | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/libcpp/charset.cc b/libcpp/charset.cc
index 3c47d4f868b..f7ae12ea5a2 100644
--- a/libcpp/charset.cc
+++ b/libcpp/charset.cc
@@ -158,6 +158,10 @@ struct _cpp_strbuf
encoded as any of DF 80, E0 9F 80, F0 80 9F 80, F8 80 80 9F 80, or
FC 80 80 80 9F 80.  Only the first is valid.
 
+   Additionally, Unicode declares that all codepoints above 0010 are
+   invalid because they cannot be represented in UTF-16. As such, all 5- and
+   6-byte encodings are invalid.
+
An implementation note: the transformation from UTF-16 to UTF-8, or
vice versa, is easiest done by using UTF-32 as an intermediary.  */
 
@@ -216,7 +220,7 @@ one_utf8_to_cppchar (const uchar **inbufp, size_t 
*inbytesleftp,
   if (c <= 0x3FF && nbytes > 5) return EILSEQ;
 
   /* Make sure the character is valid.  */
-  if (c > 0x7FFF || (c >= 0xD800 && c <= 0xDFFF)) return EILSEQ;
+  if (c > 0x10 || (c >= 0xD800 && c <= 0xDFFF)) return EILSEQ;
 
   *cp = c;
   *inbufp = inbuf;
@@ -320,7 +324,7 @@ one_utf32_to_utf8 (iconv_t bigend, const uchar **inbufp, 
size_t *inbytesleftp,
   s += inbuf[bigend ? 2 : 1] << 8;
   s += inbuf[bigend ? 3 : 0];
 
-  if (s >= 0x7FFF || (s >= 0xD800 && s <= 0xDFFF))
+  if (s > 0x10 || (s >= 0xD800 && s <= 0xDFFF))
 return EILSEQ;
 
   rval = one_cppchar_to_utf8 (s, outbufp, outbytesleftp);
-- 
2.39.0



[PATCH v5 4/5] c++modules: report imported CMI files as dependencies

2023-01-25 Thread Ben Boeckel via Gcc
They affect the build, so report them via `-MF` mechanisms.

gcc/cp/

* module.cc (do_import): Report imported CMI files as
dependencies.

Signed-off-by: Ben Boeckel 
---
 gcc/cp/module.cc | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index ebd30f63d81..dbd1b721616 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -18966,6 +18966,8 @@ module_state::do_import (cpp_reader *reader, bool 
outermost)
   dump () && dump ("CMI is %s", file);
   if (note_module_cmi_yes || inform_cmi_p)
inform (loc, "reading CMI %qs", file);
+  /* Add the CMI file to the dependency tracking. */
+  deps_add_dep (cpp_get_deps (reader), file);
   fd = open (file, O_RDONLY | O_CLOEXEC | O_BINARY);
   e = errno;
 }
-- 
2.39.0



[PATCH v5 3/5] p1689r5: initial support

2023-01-25 Thread Ben Boeckel via Gcc
This patch implements support for [P1689R5][] to communicate to a build
system the C++20 module dependencies to build systems so that they may
build `.gcm` files in the proper order.

Support is communicated through the following three new flags:

- `-fdeps-format=` specifies the format for the output. Currently named
  `p1689r5`.

- `-fdeps-file=` specifies the path to the file to write the format to.

- `-fdep-output=` specifies the `.o` that will be written for the TU
  that is scanned. This is required so that the build system can
  correlate the dependency output with the actual compilation that will
  occur.

CMake supports this format as of 17 Jun 2022 (to be part of 3.25.0)
using an experimental feature selection (to allow for future usage
evolution without committing to how it works today). While it remains
experimental, docs may be found in CMake's documentation for
experimental features.

Future work may include using this format for Fortran module
dependencies as well, however this is still pending work.

[P1689R5]: https://isocpp.org/files/papers/P1689R5.html
[cmake-experimental]: 
https://gitlab.kitware.com/cmake/cmake/-/blob/master/Help/dev/experimental.rst

TODO:

- header-unit information fields

Header units (including the standard library headers) are 100%
unsupported right now because the `-E` mechanism wants to import their
BMIs. A new mode (i.e., something more workable than existing `-E`
behavior) that mocks up header units as if they were imported purely
from their path and content would be required.

- non-utf8 paths

The current standard says that paths that are not unambiguously
represented using UTF-8 are not supported (because these cases are rare
and the extra complication is not worth it at this time). Future
versions of the format might have ways of encoding non-UTF-8 paths. For
now, this patch just doesn't support non-UTF-8 paths (ignoring the
"unambiguously represetable in UTF-8" case).

- figure out why junk gets placed at the end of the file

Sometimes it seems like the file gets a lot of `NUL` bytes appended to
it. It happens rarely and seems to be the result of some
`ftruncate`-style call which results in extra padding in the contents.
Noting it here as an observation at least.

libcpp/

* include/cpplib.h: Add cpp_deps_format enum.
(cpp_options): Add format field
(cpp_finish): Add dependency stream parameter.
* include/mkdeps.h (deps_add_module_target): Add new preprocessor
parameter used for C++ module tracking.
* init.cc (cpp_finish): Add new preprocessor parameter used for C++
module tracking.
* mkdeps.cc (mkdeps): Implement P1689R5 output.

gcc/

* doc/invoke.texi: Document -fdeps-format=, -fdep-file=, and
-fdep-output= flags.

gcc/c-family/

* c-opts.cc (c_common_handle_option): Add fdeps_file variable and
-fdeps-format=, -fdep-file=, and -fdep-output= parsing.
* c.opt: Add -fdeps-format=, -fdep-file=, and -fdep-output= flags.

gcc/cp/

* module.cc (preprocessed_module): Pass whether the module is
exported to dependency tracking.

gcc/testsuite/

* g++.dg/modules/depflags-f-MD.C: New test.
* g++.dg/modules/depflags-f.C: New test.
* g++.dg/modules/depflags-fi.C: New test.
* g++.dg/modules/depflags-fj-MD.C: New test.
* g++.dg/modules/depflags-fj.C: New test.
* g++.dg/modules/depflags-fjo-MD.C: New test.
* g++.dg/modules/depflags-fjo.C: New test.
* g++.dg/modules/depflags-fo-MD.C: New test.
* g++.dg/modules/depflags-fo.C: New test.
* g++.dg/modules/depflags-j-MD.C: New test.
* g++.dg/modules/depflags-j.C: New test.
* g++.dg/modules/depflags-jo-MD.C: New test.
* g++.dg/modules/depflags-jo.C: New test.
* g++.dg/modules/depflags-o-MD.C: New test.
* g++.dg/modules/depflags-o.C: New test.
* g++.dg/modules/p1689-1.C: New test.
* g++.dg/modules/p1689-1.exp.json: New test expectation.
* g++.dg/modules/p1689-2.C: New test.
* g++.dg/modules/p1689-2.exp.json: New test expectation.
* g++.dg/modules/p1689-3.C: New test.
* g++.dg/modules/p1689-3.exp.json: New test expectation.
* g++.dg/modules/p1689-4.C: New test.
* g++.dg/modules/p1689-4.exp.json: New test expectation.
* g++.dg/modules/p1689-5.C: New test.
* g++.dg/modules/p1689-5.exp.json: New test expectation.
* g++.dg/modules/modules.exp: Load new P1689 library routines.
* g++.dg/modules/test-p1689.py: New tool for validating P1689 output.
* lib/modules.exp: Support for validating P1689 outputs.

Signed-off-by: Ben Boeckel 
---
 gcc/c-family/c-opts.cc|  40 +++-
 gcc/c-family/c.opt|  12 +
 gcc/cp/module.cc  |   3 +-
 gcc/doc/invoke.texi   |  15 ++
 gcc/testsuite/g++

[PATCH v5 5/5] c++modules: report module mapper files as a dependency

2023-01-25 Thread Ben Boeckel via Gcc
It affects the build, and if used as a static file, can reliably be
tracked using the `-MF` mechanism.

gcc/cp/:

* mapper-client.cc, mapper-client.h (open_module_client): Accept
dependency tracking and track module mapper files as
dependencies.
* module.cc (make_mapper, get_mapper): Pass the dependency
tracking class down.

Signed-off-by: Ben Boeckel 
---
 gcc/cp/mapper-client.cc |  4 
 gcc/cp/mapper-client.h  |  1 +
 gcc/cp/module.cc| 18 +-
 3 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/gcc/cp/mapper-client.cc b/gcc/cp/mapper-client.cc
index 39e80df2d25..0ce5679d659 100644
--- a/gcc/cp/mapper-client.cc
+++ b/gcc/cp/mapper-client.cc
@@ -34,6 +34,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "diagnostic-core.h"
 #include "mapper-client.h"
 #include "intl.h"
+#include "mkdeps.h"
 
 #include "../../c++tools/resolver.h"
 
@@ -132,6 +133,7 @@ spawn_mapper_program (char const **errmsg, std::string 
&name,
 
 module_client *
 module_client::open_module_client (location_t loc, const char *o,
+  class mkdeps *deps,
   void (*set_repo) (const char *),
   char const *full_program_name)
 {
@@ -285,6 +287,8 @@ module_client::open_module_client (location_t loc, const 
char *o,
  errmsg = "opening";
else
  {
+   /* Add the mapper file to the dependency tracking. */
+   deps_add_dep (deps, name.c_str ());
if (int l = r->read_tuple_file (fd, ident, false))
  {
if (l > 0)
diff --git a/gcc/cp/mapper-client.h b/gcc/cp/mapper-client.h
index b32723ce296..a3b0b8adc51 100644
--- a/gcc/cp/mapper-client.h
+++ b/gcc/cp/mapper-client.h
@@ -55,6 +55,7 @@ public:
 
 public:
   static module_client *open_module_client (location_t loc, const char *option,
+   class mkdeps *,
void (*set_repo) (const char *),
char const *);
   static void close_module_client (location_t loc, module_client *);
diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index dbd1b721616..37066bf072b 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -3969,12 +3969,12 @@ static GTY(()) vec 
*partial_specializations;
 /* Our module mapper (created lazily).  */
 module_client *mapper;
 
-static module_client *make_mapper (location_t loc);
-inline module_client *get_mapper (location_t loc)
+static module_client *make_mapper (location_t loc, class mkdeps *deps);
+inline module_client *get_mapper (location_t loc, class mkdeps *deps)
 {
   auto *res = mapper;
   if (!res)
-res = make_mapper (loc);
+res = make_mapper (loc, deps);
   return res;
 }
 
@@ -14031,7 +14031,7 @@ get_module (const char *ptr)
 /* Create a new mapper connecting to OPTION.  */
 
 module_client *
-make_mapper (location_t loc)
+make_mapper (location_t loc, class mkdeps *deps)
 {
   timevar_start (TV_MODULE_MAPPER);
   const char *option = module_mapper_name;
@@ -14039,7 +14039,7 @@ make_mapper (location_t loc)
 option = getenv ("CXX_MODULE_MAPPER");
 
   mapper = module_client::open_module_client
-(loc, option, &set_cmi_repo,
+(loc, option, deps, &set_cmi_repo,
  (save_decoded_options[0].opt_index == OPT_SPECIAL_program_name)
  && save_decoded_options[0].arg != progname
  ? save_decoded_options[0].arg : nullptr);
@@ -19503,7 +19503,7 @@ maybe_translate_include (cpp_reader *reader, line_maps 
*lmaps, location_t loc,
   dump.push (NULL);
 
   dump () && dump ("Checking include translation '%s'", path);
-  auto *mapper = get_mapper (cpp_main_loc (reader));
+  auto *mapper = get_mapper (cpp_main_loc (reader), cpp_get_deps (reader));
 
   size_t len = strlen (path);
   path = canonicalize_header_name (NULL, loc, true, path, len);
@@ -19619,7 +19619,7 @@ module_begin_main_file (cpp_reader *reader, line_maps 
*lmaps,
 static void
 name_pending_imports (cpp_reader *reader)
 {
-  auto *mapper = get_mapper (cpp_main_loc (reader));
+  auto *mapper = get_mapper (cpp_main_loc (reader), cpp_get_deps (reader));
 
   if (!vec_safe_length (pending_imports))
 /* Not doing anything.  */
@@ -20089,7 +20089,7 @@ init_modules (cpp_reader *reader)
 
   if (!flag_module_lazy)
 /* Get the mapper now, if we're not being lazy.  */
-get_mapper (cpp_main_loc (reader));
+get_mapper (cpp_main_loc (reader), cpp_get_deps (reader));
 
   if (!flag_preprocess_only)
 {
@@ -20299,7 +20299,7 @@ late_finish_module (cpp_reader *reader,  
module_processing_cookie *cookie,
 
   if (!errorcount)
 {
-  auto *mapper = get_mapper (cpp_main_loc (reader));
+  auto *mapper = get_mapper (cpp_main_loc (reader), cpp_get_deps (reader));
   mapper->ModuleCompiled (state->get_flatname ());
 }
   else if (cookie->cmi_name)
-- 
2.39.0



Re: [PATCH v5 0/5] P1689R5 support

2023-02-02 Thread Ben Boeckel via Gcc
On Wed, Jan 25, 2023 at 16:06:31 -0500, Ben Boeckel wrote:
> This patch series adds initial support for ISO C++'s [P1689R5][], a
> format for describing C++ module requirements and provisions based on
> the source code. This is required because compiling C++ with modules is
> not embarrassingly parallel and need to be ordered to ensure that
> `import some_module;` can be satisfied in time by making sure that any
> TU with `export import some_module;` is compiled first.
> 
> [P1689R5]: https://isocpp.org/files/papers/P1689R5.html
> 
> I've also added patches to include imported module CMI files and the
> module mapper file as dependencies of the compilation. I briefly looked
> into adding dependencies on response files as well, but that appeared to
> need some code contortions to have a `class mkdeps` available before
> parsing the command line or to keep the information around until one was
> made.
> 
> I'd like feedback on the approach taken here with respect to the
> user-visible flags. I'll also note that header units are not supported
> at this time because the current `-E` behavior with respect to `import
> ;` is to search for an appropriate `.gcm` file which is not
> something such a "scan" can support. A new mode will likely need to be
> created (e.g., replacing `-E` with `-fc++-module-scanning` or something)
> where headers are looked up "normally" and processed only as much as
> scanning requires.
> 
> FWIW, Clang as taken an alternate approach with its `clang-scan-deps`
> tool rather than using the compiler directly.

Ping? It'd be nice to have this supported in at least GCC 14 (since it
missed 13).

Thanks,

--Ben


Re: [PATCH v5 0/5] P1689R5 support

2023-02-02 Thread Ben Boeckel via Gcc
On Thu, Feb 02, 2023 at 21:24:12 +0100, Harald Anlauf wrote:
> Am 25.01.23 um 22:06 schrieb Ben Boeckel via Gcc-patches:
> > Hi,
> >
> > This patch series adds initial support for ISO C++'s [P1689R5][], a
> > format for describing C++ module requirements and provisions based on
> > the source code. This is required because compiling C++ with modules is
> > not embarrassingly parallel and need to be ordered to ensure that
> > `import some_module;` can be satisfied in time by making sure that any
> > TU with `export import some_module;` is compiled first.
> >
> > [P1689R5]: https://isocpp.org/files/papers/P1689R5.html
> 
> while that paper mentions Fortran, the patch in its present version
> does not seem to implement anything related to Fortran and does not
> touch the gfortran frontend.  Or am I missing anything?  Otherwise,
> could you give an example how it would be used with Fortran?

Correct. Still trying to put the walls back together after modules
KoolAid Man'd their way into the build graph structure :) . Being able
to drop our Fortran parser (well, we'd have to drop support for Fortran
compilers that exist today…so maybe in 2075 or something) and rely on
compilers to tell us the information would be amazing though :) .

FWIW, the initial revision of the patchset did touch the gfortran
frontend, but the new parameter is now defaulted and therefore the
callsite doesn't need an update anymore. I still thought it worthwhile
to keep the Fortran side aware of what is going on in the space.

The link to Fortran comes up because the build graph problem is
isomorphic (Fortran supports exporting multiple modules from a single
TU, but it's not relevant at the graph level; it's the zero -> any case
that is hard), CMake "solved" it already, and C++ is going to have a
*lot* more "I want to consume $other_project's modules using my favorite
compiler/flags" than seems to happen in Fortran. If you're interested,
this is the paper showing how we do it:

https://mathstuf.fedorapeople.org/fortran-modules/fortran-modules.html

> Thus I'd say that it is OK from the gfortran side.

Eventually we'll like to get gfortran supporting this type of scanning,
but…as above.

Thanks,

--Ben


Re: [PATCH v5 0/5] P1689R5 support

2023-02-03 Thread Ben Boeckel via Gcc
On Fri, Feb 03, 2023 at 09:10:21 +, Jonathan Wakely wrote:
> On Fri, 3 Feb 2023 at 08:58, Jonathan Wakely wrote:
> > On Fri, 3 Feb 2023, 04:09 Andrew Pinski via Gcc,  wrote:
> >> On Wed, Jan 25, 2023 at 1:07 PM Ben Boeckel via Fortran
> >>  wrote:
> >> > This patch series adds initial support for ISO C++'s [P1689R5][], a
> >> > format for describing C++ module requirements and provisions based on
> >> > the source code. This is required because compiling C++ with modules is
> >> > not embarrassingly parallel and need to be ordered to ensure that
> >> > `import some_module;` can be satisfied in time by making sure that any
> >> > TU with `export import some_module;` is compiled first.
> >>
> >> I like how folks are complaining that GCC outputs POSIX makefile
> >> syntax from GCC's dependency files which are supposed to be in POSIX
> >> Makefile syntax.
> >> It seems like rather the build tools are people like to use are not
> >> understanding POSIX makefile syntax any more rather.
> >> Also I am not a fan of json, it is too verbose for no use. Maybe it is
> >> time to go back to standardizing a new POSIX makefile syntax rather
> >> than changing C++ here.

I'm not complaining that dependency files are in POSIX (or even
POSIX-to-be) syntax. The information requires a bit more structure than
some variable assignments and I don't expect anything trying to read
them to start trying to understand `VAR_$(DEREF)=` and the behaviors of
`:=` versus `=` assignment to get this reliably.

> > That would take a decade or more. It's too late for POSIX 202x and
> > the pace that POSIX agrees on makefile features is incredibly slow.
> 
> Also, name+=value is *not* POSIX make syntax today, that's an
> extension. That's why the tools don't always support it.
> So I don't think it's true that GCC's dependency files are in POSIX syntax.
> 
> POSIX 202x does add support for it, but it will take some time for it
> to be supported everywhere.

Additionally, while the *syntax* might be supported, encoding all of
P1689 in it would require additional work (e.g., key/value variable
assignments or something). Batch scanning would also be…interesting.
Also note that the imported modules' location cannot be known before
scanning in general, so all you get are "logical names" that you need a
collator to link up with other scan results anyways. Tools such as
`make` and `ninja` cannot know, in general, how to do this linking
between arbitrary targets (e.g., there may be a debug and release build
of the same module in the graph and knowing which to use requires
higher-level info about the entire build graph; modules may also be
considered "private" and not accessible everywhere and therefore should
also not be hooked up across different target boundaries).

While the `CXX_MODULES +=` approach can work for simple cases (a
pseudo-implicit build), it is quite insufficient for the general case.

--Ben


Re: [PATCH v5 3/5] p1689r5: initial support

2023-05-12 Thread Ben Boeckel via Gcc
On Tue, Feb 14, 2023 at 16:50:27 -0500, Jason Merrill wrote:
> I notice that the actual flags are all -fdep-*, though some of them are 
> -fdeps-* here, and the internal variables all seem to be fdeps_*.  I 
> lean toward harmonizing on "deps", I think.

Done.

> I don't love the three separate options, but I suppose it's fine.  I'd 
> prefer "target" instead of "output".

Done.

> It should be possible to omit both -file and -target and get reasonable 
> defaults, like the ones for -MD/-MQ in gcc.cc:cpp_unique_options.

`file` can be omitted (the `output_stream` will be used then). I *think*
I see that adding:

%{fdeps_file:-fdeps-file=%{!o:%b.ddi}%{o*:%.ddi%*}}

would at least do for `-fdeps-file` defaults? I don't know if there's a
reasonable default for `-fdeps-target=` though given that this command
line has no information about the object file that will be used (`-o` is
used for preprocessor output since we're leaning on `-E` here).

--Ben


Re: [PATCH v5 1/5] libcpp: reject codepoints above 0x10FFFF

2023-05-12 Thread Ben Boeckel via Gcc
On Mon, Feb 13, 2023 at 10:53:17 -0500, Jason Merrill wrote:
> On 1/25/23 13:06, Ben Boeckel wrote:
> > Unicode does not support such values because they are unrepresentable in
> > UTF-16.
> > 
> > libcpp/
> > 
> > * charset.cc: Reject encodings of codepoints above 0x10.
> > UTF-16 does not support such codepoints and therefore all
> > Unicode rejects such values.
> 
> It seems that this causes a bunch of testsuite failures from tests that 
> expect this limit to be checked elsewhere with a different diagnostic, 
> so I think the easiest thing is to fold this into _cpp_valid_utf8_str 
> instead, i.e.:

Since then, `cpp_valid_utf8_p` has appeared and takes care of the
over-long encodings. The new patchset just checks for codepoints beyond
0x10 and rejects them in this function (and the test suite matches
`master` results for me then).

--Ben


Re: [PATCH v5 4/5] c++modules: report imported CMI files as dependencies

2023-05-12 Thread Ben Boeckel via Gcc
On Mon, Feb 13, 2023 at 13:33:50 -0500, Jason Merrill wrote:
> Both this and the mapper dependency patch seem to cause most of the 
> modules testcases to crash; please remember to run the regression tests 
> (https://gcc.gnu.org/contribute.html#testing)

Fixed for v6. `cpp_get_deps` can return `NULL` which `deps_add_dep`
assumes to not be true; fixed by checking before calling.

--Ben


Re: LSP based on GCC

2023-05-17 Thread Ben Boeckel via Gcc
On Wed, May 17, 2023 at 15:48:09 -0400, Paul Smith wrote:
> More frustratingly, Clang has made some poor decisions around
> "compatibility": they tried to leverage the GNU ecosystem by emulating
> GCC features and arguments but sometimes break things.  The most

Alas, the cost of trying to make a compiler that can injest in-the-wild
code. It's the reason "every" compiler claims various GCC things: too
many projects ended up with `#error "Unknown compiler"` in their
detections and fixing them when you're just trying to get off the ground
is annoying. As far as I know, GCC is locked into never providing a
single uniquely identifiable trait because other compilers would end up
having to emulate it too once it gets used by projects. CMake basically
just ends up with "it's GCC" if `__GNUC__` is defined and none of the
other, more specific, preprocessor markers are present.

We're kind of getting this again with the variety of different frontends
available on top of Clang these days (Apple's Xcode build, upstream
itself, Intel's frontend, IBM's LLVM-based frontend, the XL-alike Clang
build, Fujitsu has one, ARM's, and who knows how many others are out
there). Sometimes they've…forgotten to make something distinctive so
that it can be detected reliably.

> egregious example I'm aware of is that they look for GCC-named
> precompiled headers (.gch), even though the Clang PCH format is
> completely different.  So if Clang (and the LSP servers built on it)
> find a .gch header file they will try to read it, fail, and give an
> error.  I filed a bug about this in 2019 but it's been ignored.
> 
> This means you need to modify your LSP server arguments to omit any PCH
> compiler command line arguments; for environments based on auto-
> generated definitions like compile_commands.json this is frustrating.

FWIW, this is only going to get worse with C++ modules.

--Ben


Re: LSP based on GCC

2023-05-18 Thread Ben Boeckel via Gcc
On Thu, May 18, 2023 at 09:25:04 -0400, Paul Smith wrote:
> On Wed, 2023-05-17 at 18:38 -0400, Ben Boeckel wrote:
> > FWIW, this is only going to get worse with C++ modules.
> 
> There's no reason it should.  Of course the right answer is to tell
> people to fix their build systems and if they want to use a different
> compiler AND use PCH, they use the appropriate suffix for that
> compiler.
> 
> But even if you don't want to do that the fix in this case is trivial.
> I even sent a patch (although since I don't know the clang code there's
> no doubt that it was not done "the right way" and needed to be
> massaged), they just never cared about it.
> 
> The GCC PCH files use a special 4-byte prefix in every file; all you
> have to do in clang is, if you find a .gch file open the file and read
> the first 4 bytes and if it's a real GCC PCH file you ignore it and if
> it's actually a Clang PCH with a malformed name you complain bitterly
> and dump core er, I mean, you read it silently as if it had the
> right name.

PCH files can "be ignored" in some sense because they can be
recalculated from `#include` files pretty easily. Module files, however,
cannot.

> One would hope that, if the GCC module files have a similar compiler-
> specific format (I'm not too familiar with modules) they also use a
> similar magic number at the beginning of the file.

GCC module files are use ELF containers, so there's plenty of metadata
to know it's not-for-Clang. But Clang will need to make its own version
of these module files to know what, if anything, is provided by it by
sources that import it to make any kind of useful suggestions.

> But anyway this is losing the thread of Eli's hopeful request.

Agreed. A GCC-based LSP will help immensely with GCC-using projects
(whether it be Emacs or Vim on the other end of the LSP pipe ;) ).

--Ben


[PATCH v6 0/4] P1689R5 support

2023-06-06 Thread Ben Boeckel via Gcc
Hi,

This patch series adds initial support for ISO C++'s [P1689R5][], a
format for describing C++ module requirements and provisions based on
the source code. This is required because compiling C++ with modules is
not embarrassingly parallel and need to be ordered to ensure that
`import some_module;` can be satisfied in time by making sure that any
TU with `export import some_module;` is compiled first.

[P1689R5]: https://isocpp.org/files/papers/P1689R5.html

I've also added patches to include imported module CMI files and the
module mapper file as dependencies of the compilation. I briefly looked
into adding dependencies on response files as well, but that appeared to
need some code contortions to have a `class mkdeps` available before
parsing the command line or to keep the information around until one was
made.

I'd like feedback on the approach taken here with respect to the
user-visible flags. I'll also note that header units are not supported
at this time because the current `-E` behavior with respect to `import
;` is to search for an appropriate `.gcm` file which is not
something such a "scan" can support. A new mode will likely need to be
created (e.g., replacing `-E` with `-fc++-module-scanning` or something)
where headers are looked up "normally" and processed only as much as
scanning requires.

FWIW, Clang as taken an alternate approach with its `clang-scan-deps`
tool rather than using the compiler directly.

Thanks,

--Ben

---
v5 -> v6:

- rebase onto `master` (585c660f041 (reload1: Change return type of
  predicate function from int to bool, 2023-06-06))
- fix crash related to reporting imported CMI files as dependencies
- rework utf-8 validity to patch the new `cpp_valid_utf8_p` function
  instead of the core utf-8 decoding routine to reject invalid
  codepoints (preserves higher-level error detection of invalid utf-8)
- harmonize of `fdeps` spelling in flags, variables, comments, etc.
- rename `-fdeps-output=` to `-fdeps-target=`

v4 -> v5:

- add dependency tracking for imported modules to `-MF`
- add dependency tracking for static module mapper files given to
  `-fmodule-mapper=`

v3 -> v4:

- add missing spaces between function names and arguments

v2 -> v3:

- changelog entries moved to commit messages
- documentation updated/added in the UTF-8 routine editing

v1 -> v2:

- removal of the `deps_write(extra)` parameter to option-checking where
  ndeeded
- default parameter of `cpp_finish(fdeps_stream = NULL)`
- unification of libcpp UTF-8 validity functions from v1
- test cases for flag parsing states (depflags-*) and p1689 output
  (p1689-*)

Ben Boeckel (4):
  libcpp: reject codepoints above 0x10
  p1689r5: initial support
  c++modules: report imported CMI files as dependencies
  c++modules: report module mapper files as a dependency

 gcc/c-family/c-opts.cc|  40 +++-
 gcc/c-family/c.opt|  12 +
 gcc/cp/mapper-client.cc   |   4 +
 gcc/cp/mapper-client.h|   1 +
 gcc/cp/module.cc  |  24 +-
 gcc/doc/invoke.texi   |  15 ++
 gcc/testsuite/g++.dg/modules/depflags-f-MD.C  |   2 +
 gcc/testsuite/g++.dg/modules/depflags-f.C |   1 +
 gcc/testsuite/g++.dg/modules/depflags-fi.C|   3 +
 gcc/testsuite/g++.dg/modules/depflags-fj-MD.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-fj.C|   4 +
 .../g++.dg/modules/depflags-fjo-MD.C  |   4 +
 gcc/testsuite/g++.dg/modules/depflags-fjo.C   |   5 +
 gcc/testsuite/g++.dg/modules/depflags-fo-MD.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-fo.C|   4 +
 gcc/testsuite/g++.dg/modules/depflags-j-MD.C  |   2 +
 gcc/testsuite/g++.dg/modules/depflags-j.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-jo-MD.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-jo.C|   4 +
 gcc/testsuite/g++.dg/modules/depflags-o-MD.C  |   2 +
 gcc/testsuite/g++.dg/modules/depflags-o.C |   3 +
 gcc/testsuite/g++.dg/modules/modules.exp  |   1 +
 gcc/testsuite/g++.dg/modules/p1689-1.C|  18 ++
 gcc/testsuite/g++.dg/modules/p1689-1.exp.json |  27 +++
 gcc/testsuite/g++.dg/modules/p1689-2.C|  16 ++
 gcc/testsuite/g++.dg/modules/p1689-2.exp.json |  16 ++
 gcc/testsuite/g++.dg/modules/p1689-3.C|  14 ++
 gcc/testsuite/g++.dg/modules/p1689-3.exp.json |  16 ++
 gcc/testsuite/g++.dg/modules/p1689-4.C|  14 ++
 gcc/testsuite/g++.dg/modules/p1689-4.exp.json |  14 ++
 gcc/testsuite/g++.dg/modules/p1689-5.C|  14 ++
 gcc/testsuite/g++.dg/modules/p1689-5.exp.json |  14 ++
 gcc/testsuite/g++.dg/modules/test-p1689.py| 222 ++
 gcc/testsuite/lib/modules.exp |  71 ++
 libcpp/charset.cc |   7 +
 libcpp/include/cpplib.h   |  12 +-
 libcpp/include/mkdeps.h   |  17 +-
 libcpp/init.cc|  13 

[PATCH v6 3/4] c++modules: report imported CMI files as dependencies

2023-06-06 Thread Ben Boeckel via Gcc
They affect the build, so report them via `-MF` mechanisms.

gcc/cp/

* module.cc (do_import): Report imported CMI files as
dependencies.

Signed-off-by: Ben Boeckel 
---
 gcc/cp/module.cc | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index c80f139eb82..e88ce0a1818 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -18966,6 +18966,9 @@ module_state::do_import (cpp_reader *reader, bool 
outermost)
   dump () && dump ("CMI is %s", file);
   if (note_module_cmi_yes || inform_cmi_p)
inform (loc, "reading CMI %qs", file);
+  /* Add the CMI file to the dependency tracking. */
+  if (cpp_get_deps (reader))
+   deps_add_dep (cpp_get_deps (reader), file);
   fd = open (file, O_RDONLY | O_CLOEXEC | O_BINARY);
   e = errno;
 }
-- 
2.40.1



[PATCH v6 1/4] libcpp: reject codepoints above 0x10FFFF

2023-06-06 Thread Ben Boeckel via Gcc
Unicode does not support such values because they are unrepresentable in
UTF-16.

libcpp/

* charset.cc: Reject encodings of codepoints above 0x10.
UTF-16 does not support such codepoints and therefore all
Unicode rejects such values.

Signed-off-by: Ben Boeckel 
---
 libcpp/charset.cc | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/libcpp/charset.cc b/libcpp/charset.cc
index d7f323b2cd5..3b34d804cf1 100644
--- a/libcpp/charset.cc
+++ b/libcpp/charset.cc
@@ -1886,6 +1886,13 @@ cpp_valid_utf8_p (const char *buffer, size_t num_bytes)
   int err = one_utf8_to_cppchar (&iter, &bytesleft, &cp);
   if (err)
return false;
+
+  /* Additionally, Unicode declares that all codepoints above 0010 are
+invalid because they cannot be represented in UTF-16.
+
+Reject such values.*/
+  if (cp >= 0x10)
+   return false;
 }
   /* No problems encountered.  */
   return true;
-- 
2.40.1



[PATCH v6 2/4] p1689r5: initial support

2023-06-06 Thread Ben Boeckel via Gcc
This patch implements support for [P1689R5][] to communicate to a build
system the C++20 module dependencies to build systems so that they may
build `.gcm` files in the proper order.

Support is communicated through the following three new flags:

- `-fdeps-format=` specifies the format for the output. Currently named
  `p1689r5`.

- `-fdeps-file=` specifies the path to the file to write the format to.

- `-fdeps-target=` specifies the `.o` that will be written for the TU
  that is scanned. This is required so that the build system can
  correlate the dependency output with the actual compilation that will
  occur.

CMake supports this format as of 17 Jun 2022 (to be part of 3.25.0)
using an experimental feature selection (to allow for future usage
evolution without committing to how it works today). While it remains
experimental, docs may be found in CMake's documentation for
experimental features.

Future work may include using this format for Fortran module
dependencies as well, however this is still pending work.

[P1689R5]: https://isocpp.org/files/papers/P1689R5.html
[cmake-experimental]: 
https://gitlab.kitware.com/cmake/cmake/-/blob/master/Help/dev/experimental.rst

TODO:

- header-unit information fields

Header units (including the standard library headers) are 100%
unsupported right now because the `-E` mechanism wants to import their
BMIs. A new mode (i.e., something more workable than existing `-E`
behavior) that mocks up header units as if they were imported purely
from their path and content would be required.

- non-utf8 paths

The current standard says that paths that are not unambiguously
represented using UTF-8 are not supported (because these cases are rare
and the extra complication is not worth it at this time). Future
versions of the format might have ways of encoding non-UTF-8 paths. For
now, this patch just doesn't support non-UTF-8 paths (ignoring the
"unambiguously represetable in UTF-8" case).

- figure out why junk gets placed at the end of the file

Sometimes it seems like the file gets a lot of `NUL` bytes appended to
it. It happens rarely and seems to be the result of some
`ftruncate`-style call which results in extra padding in the contents.
Noting it here as an observation at least.

libcpp/

* include/cpplib.h: Add cpp_deps_format enum.
(cpp_options): Add format field
(cpp_finish): Add dependency stream parameter.
* include/mkdeps.h (deps_add_module_target): Add new preprocessor
parameter used for C++ module tracking.
* init.cc (cpp_finish): Add new preprocessor parameter used for C++
module tracking.
* mkdeps.cc (mkdeps): Implement P1689R5 output.

gcc/

* doc/invoke.texi: Document -fdeps-format=, -fdeps-file=, and
-fdeps-target= flags.

gcc/c-family/

* c-opts.cc (c_common_handle_option): Add fdeps_file variable and
-fdeps-format=, -fdeps-file=, and -fdeps-target= parsing.
* c.opt: Add -fdeps-format=, -fdeps-file=, and -fdeps-target=
flags.

gcc/cp/

* module.cc (preprocessed_module): Pass whether the module is
exported to dependency tracking.

gcc/testsuite/

* g++.dg/modules/depflags-f-MD.C: New test.
* g++.dg/modules/depflags-f.C: New test.
* g++.dg/modules/depflags-fi.C: New test.
* g++.dg/modules/depflags-fj-MD.C: New test.
* g++.dg/modules/depflags-fj.C: New test.
* g++.dg/modules/depflags-fjo-MD.C: New test.
* g++.dg/modules/depflags-fjo.C: New test.
* g++.dg/modules/depflags-fo-MD.C: New test.
* g++.dg/modules/depflags-fo.C: New test.
* g++.dg/modules/depflags-j-MD.C: New test.
* g++.dg/modules/depflags-j.C: New test.
* g++.dg/modules/depflags-jo-MD.C: New test.
* g++.dg/modules/depflags-jo.C: New test.
* g++.dg/modules/depflags-o-MD.C: New test.
* g++.dg/modules/depflags-o.C: New test.
* g++.dg/modules/p1689-1.C: New test.
* g++.dg/modules/p1689-1.exp.json: New test expectation.
* g++.dg/modules/p1689-2.C: New test.
* g++.dg/modules/p1689-2.exp.json: New test expectation.
* g++.dg/modules/p1689-3.C: New test.
* g++.dg/modules/p1689-3.exp.json: New test expectation.
* g++.dg/modules/p1689-4.C: New test.
* g++.dg/modules/p1689-4.exp.json: New test expectation.
* g++.dg/modules/p1689-5.C: New test.
* g++.dg/modules/p1689-5.exp.json: New test expectation.
* g++.dg/modules/modules.exp: Load new P1689 library routines.
* g++.dg/modules/test-p1689.py: New tool for validating P1689 output.
* lib/modules.exp: Support for validating P1689 outputs.

Signed-off-by: Ben Boeckel 
---
 gcc/c-family/c-opts.cc|  40 +++-
 gcc/c-family/c.opt|  12 +
 gcc/cp/module.cc  |   3 +-
 gcc/doc/invoke.texi   |  15 ++
 

[PATCH v6 4/4] c++modules: report module mapper files as a dependency

2023-06-06 Thread Ben Boeckel via Gcc
It affects the build, and if used as a static file, can reliably be
tracked using the `-MF` mechanism.

gcc/cp/:

* mapper-client.cc, mapper-client.h (open_module_client): Accept
dependency tracking and track module mapper files as
dependencies.
* module.cc (make_mapper, get_mapper): Pass the dependency
tracking class down.

Signed-off-by: Ben Boeckel 
---
 gcc/cp/mapper-client.cc |  4 
 gcc/cp/mapper-client.h  |  1 +
 gcc/cp/module.cc| 18 +-
 3 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/gcc/cp/mapper-client.cc b/gcc/cp/mapper-client.cc
index 39e80df2d25..0ce5679d659 100644
--- a/gcc/cp/mapper-client.cc
+++ b/gcc/cp/mapper-client.cc
@@ -34,6 +34,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "diagnostic-core.h"
 #include "mapper-client.h"
 #include "intl.h"
+#include "mkdeps.h"
 
 #include "../../c++tools/resolver.h"
 
@@ -132,6 +133,7 @@ spawn_mapper_program (char const **errmsg, std::string 
&name,
 
 module_client *
 module_client::open_module_client (location_t loc, const char *o,
+  class mkdeps *deps,
   void (*set_repo) (const char *),
   char const *full_program_name)
 {
@@ -285,6 +287,8 @@ module_client::open_module_client (location_t loc, const 
char *o,
  errmsg = "opening";
else
  {
+   /* Add the mapper file to the dependency tracking. */
+   deps_add_dep (deps, name.c_str ());
if (int l = r->read_tuple_file (fd, ident, false))
  {
if (l > 0)
diff --git a/gcc/cp/mapper-client.h b/gcc/cp/mapper-client.h
index b32723ce296..a3b0b8adc51 100644
--- a/gcc/cp/mapper-client.h
+++ b/gcc/cp/mapper-client.h
@@ -55,6 +55,7 @@ public:
 
 public:
   static module_client *open_module_client (location_t loc, const char *option,
+   class mkdeps *,
void (*set_repo) (const char *),
char const *);
   static void close_module_client (location_t loc, module_client *);
diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index e88ce0a1818..9dbb53d2aaf 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -3969,12 +3969,12 @@ static GTY(()) vec 
*partial_specializations;
 /* Our module mapper (created lazily).  */
 module_client *mapper;
 
-static module_client *make_mapper (location_t loc);
-inline module_client *get_mapper (location_t loc)
+static module_client *make_mapper (location_t loc, class mkdeps *deps);
+inline module_client *get_mapper (location_t loc, class mkdeps *deps)
 {
   auto *res = mapper;
   if (!res)
-res = make_mapper (loc);
+res = make_mapper (loc, deps);
   return res;
 }
 
@@ -14031,7 +14031,7 @@ get_module (const char *ptr)
 /* Create a new mapper connecting to OPTION.  */
 
 module_client *
-make_mapper (location_t loc)
+make_mapper (location_t loc, class mkdeps *deps)
 {
   timevar_start (TV_MODULE_MAPPER);
   const char *option = module_mapper_name;
@@ -14039,7 +14039,7 @@ make_mapper (location_t loc)
 option = getenv ("CXX_MODULE_MAPPER");
 
   mapper = module_client::open_module_client
-(loc, option, &set_cmi_repo,
+(loc, option, deps, &set_cmi_repo,
  (save_decoded_options[0].opt_index == OPT_SPECIAL_program_name)
  && save_decoded_options[0].arg != progname
  ? save_decoded_options[0].arg : nullptr);
@@ -19504,7 +19504,7 @@ maybe_translate_include (cpp_reader *reader, line_maps 
*lmaps, location_t loc,
   dump.push (NULL);
 
   dump () && dump ("Checking include translation '%s'", path);
-  auto *mapper = get_mapper (cpp_main_loc (reader));
+  auto *mapper = get_mapper (cpp_main_loc (reader), cpp_get_deps (reader));
 
   size_t len = strlen (path);
   path = canonicalize_header_name (NULL, loc, true, path, len);
@@ -19620,7 +19620,7 @@ module_begin_main_file (cpp_reader *reader, line_maps 
*lmaps,
 static void
 name_pending_imports (cpp_reader *reader)
 {
-  auto *mapper = get_mapper (cpp_main_loc (reader));
+  auto *mapper = get_mapper (cpp_main_loc (reader), cpp_get_deps (reader));
 
   if (!vec_safe_length (pending_imports))
 /* Not doing anything.  */
@@ -20090,7 +20090,7 @@ init_modules (cpp_reader *reader)
 
   if (!flag_module_lazy)
 /* Get the mapper now, if we're not being lazy.  */
-get_mapper (cpp_main_loc (reader));
+get_mapper (cpp_main_loc (reader), cpp_get_deps (reader));
 
   if (!flag_preprocess_only)
 {
@@ -20300,7 +20300,7 @@ late_finish_module (cpp_reader *reader,  
module_processing_cookie *cookie,
 
   if (!errorcount)
 {
-  auto *mapper = get_mapper (cpp_main_loc (reader));
+  auto *mapper = get_mapper (cpp_main_loc (reader), cpp_get_deps (reader));
   mapper->ModuleCompiled (state->get_flatname ());
 }
   else if (cookie->cmi_name)
-- 
2.40.1



Re: Targetting p0847 for GCC14 (explicit object parameter)

2023-06-08 Thread Ben Boeckel via Gcc
On Thu, Jun 08, 2023 at 04:06:24 +, waffl3x via Gcc wrote:
> I would like to boldly suggest implementing P0847 should be targeted at
> GCC14. In my anecdotal experiences, this feature is very important to
> people, and very important to myself, I believe it should be a priority.
> 
> I am not suggesting this without offering to contribute, however
> because of my inexperience with compiler hacking I am concerned I would
> hinder efforts. With that said, if no one is interested in starting
> work on it, but there is consensus that the feature is important
> enough, then I will do my best to take up that job.

Note that one way to help with a "feature that is very important" to
oneself without compiler experience is to help write test cases (even
better if it includes the necessary `dejagnu` markup). Corner cases are
typically the trickiest to handle and filling it out ahead of time is
likely very helpful to whoever does end up working on it. Indeed,
because you're (likely) different from the feature implementer, you can
help fill in blind spots that may be missed otherwise.

It would also be useful to be somewhat systematic about it as well and
take notes on what, specifically, is being tested in each case as well
as where known gaps in the test cases exist.

--Ben


Re: [PATCH v6 0/4] P1689R5 support

2023-06-16 Thread Ben Boeckel via Gcc
On Thu, Jun 08, 2023 at 21:59:13 +0400, Maxim Kuvyrkov wrote:
> This patch series causes ICEs on arm-linux-gnueabihf.  Would you
> please investigate?  Please let me know if you need any in reproducing
> these.

Finally back at it. I tried on aarch64, but wasn't able to reproduce the
errors (alas, it is probably a 32bit thing…let me try with `-m32`). Is
there hardware I can access to try this out on the same target triple?

Alternatively, a backtrace may be able to help pinpoint it enough if you
have the cycles.

Thanks,

--Ben


Re: [PATCH v6 0/4] P1689R5 support

2023-06-16 Thread Ben Boeckel via Gcc
On Fri, Jun 16, 2023 at 15:48:59 -0400, Ben Boeckel wrote:
> On Thu, Jun 08, 2023 at 21:59:13 +0400, Maxim Kuvyrkov wrote:
> > This patch series causes ICEs on arm-linux-gnueabihf.  Would you
> > please investigate?  Please let me know if you need any in reproducing
> > these.
> 
> Finally back at it. I tried on aarch64, but wasn't able to reproduce the
> errors (alas, it is probably a 32bit thing…let me try with `-m32`). Is
> there hardware I can access to try this out on the same target triple?

Trying inside of an i386 container also came up with nothing…I'll try
qemu.

--Ben


Re: [PATCH v6 0/4] P1689R5 support

2023-06-17 Thread Ben Boeckel via Gcc
On Fri, Jun 16, 2023 at 23:55:53 -0400, Jason Merrill wrote:
> I see the same thing with patch 4 on x86_64-pc-linux-gnu, e.g.
> 
> FAIL: g++.dg/modules/ben-1_a.C -std=c++17 (test for excess errors)
> Excess errors:
> /home/jason/gt/gcc/testsuite/g++.dg/modules/ben-1_a.C:9:1: internal
> compiler error: Segmentation fault
> 0x19e2f3c crash_signal
> /home/jason/gt/gcc/toplev.cc:314
> 0x340f3f8 mkdeps::vec::size() const
> /home/jason/gt/libcpp/mkdeps.cc:57
> 0x340dc1f apply_vpath
> /home/jason/gt/libcpp/mkdeps.cc:194
> 0x340e08e deps_add_dep(mkdeps*, char const*)
> /home/jason/gt/libcpp/mkdeps.cc:318
> 0xea7b51 module_client::open_module_client(unsigned int, char const*,
> mkdeps*, void (*)(char const*), char const*)
> /home/jason/gt/gcc/cp/mapper-client.cc:291
> 0xef2ba8 make_mapper
> /home/jason/gt/gcc/cp/module.cc:14042
> 0xf0896c get_mapper(unsigned int, mkdeps*)
> /home/jason/gt/gcc/cp/module.cc:3977
> 0xf032ac name_pending_imports
> /home/jason/gt/gcc/cp/module.cc:19623
> 0xf03a7d preprocessed_module(cpp_reader*)
> /home/jason/gt/gcc/cp/module.cc:19817
> 0xe85104 module_token_cdtor(cpp_reader*, unsigned long)
> /home/jason/gt/gcc/cp/lex.cc:548
> 0xf467b2 cp_lexer_new_main
> /home/jason/gt/gcc/cp/parser.cc:756
> 0xfc1e3a c_parse_file()
> /home/jason/gt/gcc/cp/parser.cc:49725
> 0x11c5bf5 c_common_parse_file()
> /home/jason/gt/gcc/c-family/c-opts.cc:1268

Thanks. I missed a `nullptr` check before calling `deps_add_dep`. I
think I got misled by `make check` returning a zero exit code even if
there are failures.

Thanks,

--Ben


Re: [PATCH v5 3/5] p1689r5: initial support

2023-06-20 Thread Ben Boeckel via Gcc
On Mon, Jun 19, 2023 at 17:33:58 -0400, Jason Merrill wrote:
> On 5/12/23 10:24, Ben Boeckel wrote:
> > `file` can be omitted (the `output_stream` will be used then). I *think*
> > I see that adding:
> > 
> >  %{fdeps_file:-fdeps-file=%{!o:%b.ddi}%{o*:%.ddi%*}}
> 
> %{!fdeps-file: but yes.
> 
> > would at least do for `-fdeps-file` defaults? I don't know if there's a
> > reasonable default for `-fdeps-target=` though given that this command
> > line has no information about the object file that will be used (`-o` is
> > used for preprocessor output since we're leaning on `-E` here).
> 
> I would think it could default to %b.o?

I suppose that could work, yes.

> I had quite a few more comments on the v5 patch that you didn't respond 
> to here or address in the v6 patch; did your mail client hide them from you?

Oof. Sorry, I saw large chunks of quoting and apparently assumed the
rest was fine (I usually do aggressive trimming when doing that style of
review). I see them now. Will go through and include in v7.

--Ben


Re: [PATCH v5 3/5] p1689r5: initial support

2023-06-20 Thread Ben Boeckel via Gcc
On Tue, Feb 14, 2023 at 16:50:27 -0500, Jason Merrill wrote:
> On 1/25/23 13:06, Ben Boeckel wrote:
> > - header-unit information fields
> > 
> > Header units (including the standard library headers) are 100%
> > unsupported right now because the `-E` mechanism wants to import their
> > BMIs. A new mode (i.e., something more workable than existing `-E`
> > behavior) that mocks up header units as if they were imported purely
> > from their path and content would be required.
> 
> I notice that the cpp dependency generation tries (in open_file_failed) 
> to continue after encountering a missing file, is that not sufficient 
> for header units?  Or adjustable to be sufficient?

No. Header units can introduce macros which can be used to modify the
set of modules that are imported. Included headers are "discovered"
dependencies and don't modify the build graph (just add more files that
trigger a rebuild) and can be collected during compilation. Module
dependencies are needed to get the build correct in the first place in
order to:

- order module compilations in the build graph so that imported modules
  are ready before anything using them; and
- computing the set of flags needed for telling the compiler where
  imported modules' CMI files should be located.

> > - non-utf8 paths
> > 
> > The current standard says that paths that are not unambiguously
> > represented using UTF-8 are not supported (because these cases are rare
> > and the extra complication is not worth it at this time). Future
> > versions of the format might have ways of encoding non-UTF-8 paths. For
> > now, this patch just doesn't support non-UTF-8 paths (ignoring the
> > "unambiguously represetable in UTF-8" case).
> 
> typo "representable"

Fixed.

> > diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc
> > index c68a2a27469..1c14ce3fe8e 100644
> > --- a/gcc/c-family/c-opts.cc
> > +++ b/gcc/c-family/c-opts.cc
> > @@ -77,6 +77,9 @@ static bool verbose;
> >   /* Dependency output file.  */
> >   static const char *deps_file;
> >   
> > +/* Enhanced dependency output file.  */
> 
> Maybe "structured", as in the docs?  It isn't really a direct 
> enhancement of the makefile dependencies.

Agreed. I'll also add a link to p1689r5 as a comment for what
"structured" means where it is parsed out.

> > +  if (cpp_opts->deps.format != DEPS_FMT_NONE)
> > +{
> > +  if (!fdeps_file)
> > +   fdeps_stream = out_stream;
> > +  else if (fdeps_file[0] == '-' && fdeps_file[1] == '\0')
> > +   fdeps_stream = stdout;
> 
> You probably want to check that deps_stream and fdeps_stream don't end 
> up as the same stream.

Hmm. But `stdout` is probably fine to use for both though. Basically:

if (fdeps_stream == out_stream && fdeps_stream != stdout)
  make_diagnostic_noise ();

> > @@ -1374,6 +1410,8 @@ handle_deferred_opts (void)
> >   
> > if (opt->code == OPT_MT || opt->code == OPT_MQ)
> >   deps_add_target (deps, opt->arg, opt->code == OPT_MQ);
> > +   else if (opt->code == OPT_fdep_output_)
> > + deps_add_output (deps, opt->arg, true);
> 
> How about fdeps_add_target?

Renamed.

> > diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
> > index ef371ca8c26..630781fdf8a 100644
> > --- a/gcc/c-family/c.opt
> > +++ b/gcc/c-family/c.opt
> > @@ -256,6 +256,18 @@ MT
> >   C ObjC C++ ObjC++ Joined Separate MissingArgError(missing makefile target 
> > after %qs)
> >   -MT   Add a target that does not require quoting.
> >   
> > +fdep-format=
> > +C ObjC C++ ObjC++ NoDriverArg Joined MissingArgError(missing format after 
> > %qs)
> > +Format for output dependency information.  Supported (\"p1689r5\").
> 
> I think we want "structured" here, as well.

Fixed.

> > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > index 06d77983e30..b61c3ebd3ec 100644
> > --- a/gcc/doc/invoke.texi
> > +++ b/gcc/doc/invoke.texi
> > @@ -2791,6 +2791,21 @@ is @option{-fpermitted-flt-eval-methods=c11}.  The 
> > default when in a GNU
> >   dialect (@option{-std=gnu11} or similar) is
> >   @option{-fpermitted-flt-eval-methods=ts-18661-3}.
> >   
> > +@item -fdep-file=@var{file}
> > +@opindex fdep-file
> > +Where to write structured dependency information.
> > +
> > +@item -fdep-format=@var{format}
> > +@opindex fdep-format
> > +The format to use for structured dependency information. @samp{p1689r5} is 
> > the
> > +only supported format righ

Re: [PATCH v6 1/4] libcpp: reject codepoints above 0x10FFFF

2023-06-20 Thread Ben Boeckel via Gcc
On Tue, Jun 20, 2023 at 21:16:40 +0200, Damien Guibouret wrote:
> I think the comparison should be ">" instead of ">=" as 0x10 seems a 
> valid value (Unicode says value above 0x10 is invalid).
> Other tests around same value in this file are using ">".

Ah, good catch. I'll make a separate patch submission for that.

--Ben


Re: [PATCH v5 4/5] c++modules: report imported CMI files as dependencies

2023-06-22 Thread Ben Boeckel via Gcc
On Thu, Jun 22, 2023 at 17:21:42 -0400, Jason Merrill wrote:
> On 1/25/23 16:06, Ben Boeckel wrote:
> > They affect the build, so report them via `-MF` mechanisms.
> 
> Why isn't this covered by the existing code in preprocessed_module?

It appears as though it is neutered in patch 3 where
`write_make_modules_deps` is used in `make_write` (or will use that name
in v7 once I finish up testing). This logic cannot be used for p1689
output because it assumes the location and names of CMI files (`.c++m`)
that will be necessary (it is related to the `CXX_IMPORTS +=` GNU
make/libcody extensions that will, e.g., cause `ninja` to choke if it is
read from `-MF` output as it uses "fancier" Makefile syntax than tools
that are not actually `make` are going to be willing to support). This
codepath is the *actual* filename being read at compile time and is
relevant at all times; it may duplicate what `preprocessed_module` sets
up.

I'm also realizing that this is why I need to pass `-fdeps-format=p1689`
when compiling…there may need to be another, more idiomatic, way to
disable this additional syntax usage in `-MF` output.

--Ben


Re: [PATCH v5 4/5] c++modules: report imported CMI files as dependencies

2023-06-25 Thread Ben Boeckel via Gcc
On Fri, Jun 23, 2023 at 08:12:41 -0400, Nathan Sidwell wrote:
> On 6/22/23 22:45, Ben Boeckel wrote:
> > On Thu, Jun 22, 2023 at 17:21:42 -0400, Jason Merrill wrote:
> >> On 1/25/23 16:06, Ben Boeckel wrote:
> >>> They affect the build, so report them via `-MF` mechanisms.
> >>
> >> Why isn't this covered by the existing code in preprocessed_module?
> > 
> > It appears as though it is neutered in patch 3 where
> > `write_make_modules_deps` is used in `make_write` (or will use that name
> 
> Why do you want to record the transitive modules? I would expect just noting 
> the 
> ones with imports directly in the TU would suffice (i.e check the 'outermost' 
> arg)

FWIW, only GCC has "fat" modules. MSVC and Clang both require the
transitive closure to be passed. The idea there is to minimize the size
of individual module files.

If GCC only reads the "fat" modules, then only those should be recorded.
If it reads other modules, they should be recorded as well.

--Ben


Re: [PATCH v5 5/5] c++modules: report module mapper files as a dependency

2023-06-25 Thread Ben Boeckel via Gcc
On Fri, Jun 23, 2023 at 10:44:11 -0400, Jason Merrill wrote:
> On 1/25/23 16:06, Ben Boeckel wrote:
> > It affects the build, and if used as a static file, can reliably be
> > tracked using the `-MF` mechanism.
> 
> Hmm, this seems a bit like making all .o depend on the Makefile; it 

Technically this is true: the command line for the TU lives in said
Makefile; if I updated it, a new TU would be really nice. This is a
long-standing limitation of `make` though. FWIW, `ninja` fixes it by
tracking the command line used and CMake's Makefiles generator handles
it by storing TU flags in an included file and depending on that file
from the TU output.

> shouldn't be necessary to rebuild all TUs that use modules when we add 
> another module to the mapper file.

If I change it from:

```
mod.a   mod.a.cmi
```

to:

```
mod.a   mod.a.replace.cmi
```

I'd expect a recompile. As with anything, this depends on the
granularity of the mapper files. A global mapper file is very similar to
a global response file and given that we don't have line-change
granularity dependency tracking…

> What is your expected use case for 
> this dependency?

CMake, at least, uses a per-TU mapper file, so any build system using a
similar strategy handling the above case would only affect TUs that
actually list `mod.a`.

--Ben


Re: [PATCH v5 3/5] p1689r5: initial support

2023-06-25 Thread Ben Boeckel via Gcc
On Fri, Jun 23, 2023 at 14:31:17 -0400, Jason Merrill wrote:
> On 6/20/23 15:46, Ben Boeckel wrote:
> > On Tue, Feb 14, 2023 at 16:50:27 -0500, Jason Merrill wrote:
> >> On 1/25/23 13:06, Ben Boeckel wrote:
> 
> >>> Header units (including the standard library headers) are 100%
> >>> unsupported right now because the `-E` mechanism wants to import their
> >>> BMIs. A new mode (i.e., something more workable than existing `-E`
> >>> behavior) that mocks up header units as if they were imported purely
> >>> from their path and content would be required.
> >> >> I notice that the cpp dependency generation tries (in open_file_failed)
> >> to continue after encountering a missing file, is that not sufficient 
> >> for header units?  Or adjustable to be sufficient?
> > 
> > No. Header units can introduce macros which can be used to modify the
> > set of modules that are imported. Included headers are "discovered"
> > dependencies and don't modify the build graph (just add more files that
> > trigger a rebuild) and can be collected during compilation. Module
> > dependencies are needed to get the build correct in the first place in
> > order to:
> > 
> > - order module compilations in the build graph so that imported modules
> >   are ready before anything using them; and
> > - computing the set of flags needed for telling the compiler where
> >   imported modules' CMI files should be located.
> 
> So if the header unit CMI isn't available during dependency generation, 
> would it be better to just #include the header?

It's not so simple: the preprocessor state needs to isolate out
`LOCAL_ONLY` from this case:

```
#define LOCAL_ONLY 1
import ; // The preprocessing of this should *not* see
// `LOCAL_ONLY`.
```

> > Hmm. But `stdout` is probably fine to use for both though. Basically:
> > 
> >  if (fdeps_stream == out_stream && fdeps_stream != stdout)
> >make_diagnostic_noise ();
> 
> (fdeps_stream == deps_stream, but sure, that's reasonable.

Done.

> >> So, I take it this is the common use case you have in mind, generating
> >> Make dependencies for the p1689 file?  When are you thinking the Make
> >> dependencies for the .o are generated?  At build time?
> > 
> > Yes. If an included file changes, the scanning should be performed
> > again. The compilation will have its own `-MF` as well (which should
> > point to the same files plus the CMI files it ends up reading).
> > 
> >> I'm a bit surprised you're using .json rather than an extension that
> >> indicates what the information is.
> > 
> > I can change that; the filename doesn't *really* matter (e.g., CMake
> > uses `.ddi` for "dynamic dependency information").
> 
> That works.

Done.

> >>> `-M` is about discovered dependencies: those that you find out while
> >>> doing work. `-fdep-*` is about ordering dependencies: extracting
> >>> information from file content in order to even order future work around.
> >>
> >> I'm not sure I see the distinction; Makefiles also express ordering
> >> dependencies.  In both cases, you want to find out from the files what
> >> order you will want to process them in when building the project.
> > 
> > Makefiles can express ordering dependencies, but not the `-M` snippets;
> > these are for files that, if changed, should trigger a rebuild. This is > 
> > fundamentally different than module dependencies which instead indicate
> > which *compiles* (or CMI generation if using a 2-phase setup) need to
> > complete before compilation (or CMI generation) of the scanned TU can be
> > performed. Generally generated headers will be ordered manually in the
> > build system description. However, maintaining that same level for
> > in-source dependency information on a per-source level is a *far* higher
> > burden.
> 
> The main difference I see is that the CMI might not exist yet.  As you 
> say, we don't want to require people to write all the dependencies by 
> hand, but that just means we need to be able to generate the 
> dependencies automatically.  In the Make-only model I'm thinking of, one 
> would collect dependencies on an initial failing build, and then start 
> over from the beginning again with the dependencies we discovered.  It's 
> the same two-phase scan and build, but one that uses the same compile 
> commands for both phases.

It's a potentially unbounded set of phases:

- 2 phases per tool that is built that gen

[PATCH v7 1/4] spec: add a spec function to join arguments

2023-07-02 Thread Ben Boeckel via Gcc
When passing `-o` flags to other options, the typical `-o foo` spelling
leaves a leading whitespace when replacing elsewhere. This ends up
creating flags spelled as `-some-option-with-arg= foo.ext` which doesn't
parse properly. When attempting to make a spec function to just remove
the leading whitespace, the argument splitting ends up masking the
whitespace. However, the intended extension *also* ends up being its own
argument. To perform the desired behavior, the arguments need to be
concatenated together.

gcc/:

* gcc.cc (join_spec_func): Add a spec function to join all
arguments.

Signed-off-by: Ben Boeckel 
---
 gcc/gcc.cc | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/gcc/gcc.cc b/gcc/gcc.cc
index fdfac0b4fe4..44433b80d61 100644
--- a/gcc/gcc.cc
+++ b/gcc/gcc.cc
@@ -447,6 +447,7 @@ static const char *greater_than_spec_func (int, const char 
**);
 static const char *debug_level_greater_than_spec_func (int, const char **);
 static const char *dwarf_version_greater_than_spec_func (int, const char **);
 static const char *find_fortran_preinclude_file (int, const char **);
+static const char *join_spec_func (int, const char **);
 static char *convert_white_space (char *);
 static char *quote_spec (char *);
 static char *quote_spec_arg (char *);
@@ -1772,6 +1773,7 @@ static const struct spec_function static_spec_functions[] 
=
   { "debug-level-gt",  debug_level_greater_than_spec_func },
   { "dwarf-version-gt",dwarf_version_greater_than_spec_func },
   { "fortran-preinclude-file", find_fortran_preinclude_file},
+  { "join",join_spec_func},
 #ifdef EXTRA_SPEC_FUNCTIONS
   EXTRA_SPEC_FUNCTIONS
 #endif
@@ -10975,6 +10977,19 @@ find_fortran_preinclude_file (int argc, const char 
**argv)
   return result;
 }
 
+/* The function takes any number of arguments and joins them together.  */
+
+static const char *
+join_spec_func (int argc, const char **argv)
+{
+  char *result = NULL;
+
+  for (int i = 0; i < argc; ++i)
+result = reconcat(result, result ? result : "", argv[i], NULL);
+
+  return result;
+}
+
 /* If any character in ORIG fits QUOTE_P (_, P), reallocate the string
so as to precede every one of them with a backslash.  Return the
original string or the reallocated one.  */
-- 
2.40.1



[PATCH v7 0/4] P1689R5 support

2023-07-02 Thread Ben Boeckel via Gcc
Hi,

This patch series adds initial support for ISO C++'s [P1689R5][], a
format for describing C++ module requirements and provisions based on
the source code. This is required because compiling C++ with modules is
not embarrassingly parallel and need to be ordered to ensure that
`import some_module;` can be satisfied in time by making sure that any
TU with `export import some_module;` is compiled first.

[P1689R5]: https://isocpp.org/files/papers/P1689R5.html

I've also added patches to include imported module CMI files and the
module mapper file as dependencies of the compilation. I briefly looked
into adding dependencies on response files as well, but that appeared to
need some code contortions to have a `class mkdeps` available before
parsing the command line or to keep the information around until one was
made.

I'd like feedback on the approach taken here with respect to the
user-visible flags. I'll also note that header units are not supported
at this time because the current `-E` behavior with respect to `import
;` is to search for an appropriate `.gcm` file which is not
something such a "scan" can support. A new mode will likely need to be
created (e.g., replacing `-E` with `-fc++-module-scanning` or something)
where headers are looked up "normally" and processed only as much as
scanning requires.

FWIW, Clang as taken an alternate approach with its `clang-scan-deps`
tool rather than using the compiler directly.

Thanks,

--Ben

---
v6 -> v7:

- rebase onto `master` (80ae426a195 (d: Fix core.volatile.volatileLoad
  discarded if result is unused, 2023-07-02))
- add test cases for patches 3 and 4 (new dependency reporting in `-MF`)
- add a Python script to test aspects of generated dependency files
- a new `join` spec function to support `-fdeps-*` defaults based on the
  `-o` flag (needed to strip the leading space that appears otherwise)
- note that JSON writing support should be factored out for use by
  `libcpp` and `gcc` (libiberty?)
- use `.ddi` for the extension of `-fdeps-*` output files by default
- support defaults for `-fdeps-file=` and `-fdeps-target=` when only
  `-fdeps-format=` is provided (with tests)
- error if `-MF` and `-fdeps-file=` are both the same (non-`stdout`)
  file as their formats are incompatible
- expand the documentation on how the `-fdeps-*` flags should be used

v5 -> v6:

- rebase onto `master` (585c660f041 (reload1: Change return type of
  predicate function from int to bool, 2023-06-06))
- fix crash related to reporting imported CMI files as dependencies
- rework utf-8 validity to patch the new `cpp_valid_utf8_p` function
  instead of the core utf-8 decoding routine to reject invalid
  codepoints (preserves higher-level error detection of invalid utf-8)
- harmonize of `fdeps` spelling in flags, variables, comments, etc.
- rename `-fdeps-output=` to `-fdeps-target=`

v4 -> v5:

- add dependency tracking for imported modules to `-MF`
- add dependency tracking for static module mapper files given to
  `-fmodule-mapper=`

v3 -> v4:

- add missing spaces between function names and arguments

v2 -> v3:

- changelog entries moved to commit messages
- documentation updated/added in the UTF-8 routine editing

v1 -> v2:

- removal of the `deps_write(extra)` parameter to option-checking where
  ndeeded
- default parameter of `cpp_finish(fdeps_stream = NULL)`
- unification of libcpp UTF-8 validity functions from v1
- test cases for flag parsing states (depflags-*) and p1689 output
  (p1689-*)

Ben Boeckel (4):
  spec: add a spec function to join arguments
  p1689r5: initial support
  c++modules: report imported CMI files as dependencies
  c++modules: report module mapper files as a dependency

 gcc/c-family/c-opts.cc|  44 +++-
 gcc/c-family/c.opt|  12 +
 gcc/cp/mapper-client.cc   |   5 +
 gcc/cp/mapper-client.h|   1 +
 gcc/cp/module.cc  |  24 +-
 gcc/doc/invoke.texi   |  27 +++
 gcc/gcc.cc|  19 +-
 gcc/json.h|   3 +
 gcc/testsuite/g++.dg/modules/depflags-f-MD.C  |   2 +
 gcc/testsuite/g++.dg/modules/depflags-f.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-fi.C|   4 +
 gcc/testsuite/g++.dg/modules/depflags-fj-MD.C |   3 +
 .../g++.dg/modules/depflags-fj-MF-share.C |   6 +
 gcc/testsuite/g++.dg/modules/depflags-fj.C|   4 +
 .../g++.dg/modules/depflags-fjo-MD.C  |   4 +
 gcc/testsuite/g++.dg/modules/depflags-fjo.C   |   5 +
 gcc/testsuite/g++.dg/modules/depflags-fo-MD.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-fo.C|   4 +
 gcc/testsuite/g++.dg/modules/depflags-j-MD.C  |   2 +
 gcc/testsuite/g++.dg/modules/depflags-j.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-jo-MD.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-jo.C|   4 +
 gcc/testsuite/g++.dg/modu

[PATCH v7 3/4] c++modules: report imported CMI files as dependencies

2023-07-02 Thread Ben Boeckel via Gcc
They affect the build, so report them via `-MF` mechanisms.

gcc/cp/

* module.cc (do_import): Report imported CMI files as
dependencies.

gcc/testsuite/

* g++.dg/modules/depreport-1_a.C: New test.
* g++.dg/modules/depreport-1_b.C: New test.
* g++.dg/modules/test-depfile.py: New tool for validating depfile
information.
* lib/modules.exp: Support for validating depfile contents.

Signed-off-by: Ben Boeckel 
---
 gcc/cp/module.cc |   3 +
 gcc/testsuite/g++.dg/modules/depreport-1_a.C |  10 +
 gcc/testsuite/g++.dg/modules/depreport-1_b.C |  12 ++
 gcc/testsuite/g++.dg/modules/test-depfile.py | 187 +++
 gcc/testsuite/lib/modules.exp|  29 +++
 5 files changed, 241 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/modules/depreport-1_a.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depreport-1_b.C
 create mode 100644 gcc/testsuite/g++.dg/modules/test-depfile.py

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 9df60d695b1..f3acc4e02fe 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -18968,6 +18968,9 @@ module_state::do_import (cpp_reader *reader, bool 
outermost)
   dump () && dump ("CMI is %s", file);
   if (note_module_cmi_yes || inform_cmi_p)
inform (loc, "reading CMI %qs", file);
+  /* Add the CMI file to the dependency tracking. */
+  if (cpp_get_deps (reader))
+   deps_add_dep (cpp_get_deps (reader), file);
   fd = open (file, O_RDONLY | O_CLOEXEC | O_BINARY);
   e = errno;
 }
diff --git a/gcc/testsuite/g++.dg/modules/depreport-1_a.C 
b/gcc/testsuite/g++.dg/modules/depreport-1_a.C
new file mode 100644
index 000..241701728a2
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/depreport-1_a.C
@@ -0,0 +1,10 @@
+// { dg-additional-options -fmodules-ts }
+
+export module Foo;
+// { dg-module-cmi Foo }
+
+export class Base
+{
+public:
+  int m;
+};
diff --git a/gcc/testsuite/g++.dg/modules/depreport-1_b.C 
b/gcc/testsuite/g++.dg/modules/depreport-1_b.C
new file mode 100644
index 000..b6e317c6703
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/depreport-1_b.C
@@ -0,0 +1,12 @@
+// { dg-additional-options -fmodules-ts }
+// { dg-additional-options -MD }
+// { dg-additional-options "-MF depreport-1.d" }
+
+import Foo;
+
+void foo ()
+{
+  Base b;
+}
+
+// { dg-final { run-check-module-dep-expect-input "depreport-1.d" 
"gcm.cache/Foo.gcm" } }
diff --git a/gcc/testsuite/g++.dg/modules/test-depfile.py 
b/gcc/testsuite/g++.dg/modules/test-depfile.py
new file mode 100644
index 000..ea4edb61434
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/test-depfile.py
@@ -0,0 +1,187 @@
+import json
+
+
+# Parameters.
+ALL_ERRORS = False
+
+
+def _report_error(msg):
+'''Report an error.'''
+full_msg = 'ERROR: ' + msg
+if ALL_ERRORS:
+print(full_msg)
+else:
+raise RuntimeError(full_msg)
+
+
+class Token(object):
+pass
+
+
+class Output(Token):
+def __init__(self, path):
+self.path = path
+
+
+class Input(Token):
+def __init__(self, path):
+self.path = path
+
+
+class Colon(Token):
+pass
+
+
+class Append(Token):
+pass
+
+
+class Variable(Token):
+def __init__(self, name):
+self.name = name
+
+
+class Word(Token):
+def __init__(self, name):
+self.name = name
+
+
+def validate_depfile(depfile, expect_input=None):
+'''Validate a depfile contains some information
+
+Returns `False` if the information is not found.
+'''
+with open(depfile, 'r') as fin:
+depfile_content = fin.read()
+
+real_lines = []
+join_line = False
+for line in depfile_content.split('\n'):
+# Join the line if needed.
+if join_line:
+line = real_lines.pop() + line
+
+# Detect line continuations.
+join_line = line.endswith('\\')
+# Strip line continuation characters.
+if join_line:
+line = line[:-1]
+
+# Add to the real line set.
+real_lines.append(line)
+
+# Perform tokenization.
+tokenized_lines = []
+for line in real_lines:
+tokenized = []
+join_word = False
+for word in line.split(' '):
+if join_word:
+word = tokenized.pop() + ' ' + word
+
+# Detect word joins.
+join_word = word.endswith('\\')
+# Strip escape character.
+if join_word:
+word = word[:-1]
+
+# Detect `:` at the end of a word.
+if word.endswith(':'):
+tokenized.append(word[:-1])
+word = word[-1]
+
+# Add word to the tokenized set.
+tokenized.append(word)
+
+tokenized_line

[PATCH v7 2/4] p1689r5: initial support

2023-07-02 Thread Ben Boeckel via Gcc
les.exp: Support for validating P1689 outputs.

Signed-off-by: Ben Boeckel 
---
 gcc/c-family/c-opts.cc|  44 +++-
 gcc/c-family/c.opt|  12 +
 gcc/cp/module.cc  |   3 +-
 gcc/doc/invoke.texi   |  27 +++
 gcc/gcc.cc|   4 +-
 gcc/json.h|   3 +
 gcc/testsuite/g++.dg/modules/depflags-f-MD.C  |   2 +
 gcc/testsuite/g++.dg/modules/depflags-f.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-fi.C|   4 +
 gcc/testsuite/g++.dg/modules/depflags-fj-MD.C |   3 +
 .../g++.dg/modules/depflags-fj-MF-share.C |   6 +
 gcc/testsuite/g++.dg/modules/depflags-fj.C|   4 +
 .../g++.dg/modules/depflags-fjo-MD.C  |   4 +
 gcc/testsuite/g++.dg/modules/depflags-fjo.C   |   5 +
 gcc/testsuite/g++.dg/modules/depflags-fo-MD.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-fo.C|   4 +
 gcc/testsuite/g++.dg/modules/depflags-j-MD.C  |   2 +
 gcc/testsuite/g++.dg/modules/depflags-j.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-jo-MD.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-jo.C|   4 +
 gcc/testsuite/g++.dg/modules/depflags-o-MD.C  |   2 +
 gcc/testsuite/g++.dg/modules/depflags-o.C |   3 +
 gcc/testsuite/g++.dg/modules/modules.exp  |   1 +
 gcc/testsuite/g++.dg/modules/p1689-1.C|  17 ++
 gcc/testsuite/g++.dg/modules/p1689-1.exp.ddi  |  27 +++
 gcc/testsuite/g++.dg/modules/p1689-2.C|  15 ++
 gcc/testsuite/g++.dg/modules/p1689-2.exp.ddi  |  16 ++
 gcc/testsuite/g++.dg/modules/p1689-3.C|  13 +
 gcc/testsuite/g++.dg/modules/p1689-3.exp.ddi  |  16 ++
 gcc/testsuite/g++.dg/modules/p1689-4.C|  13 +
 gcc/testsuite/g++.dg/modules/p1689-4.exp.ddi  |  14 ++
 gcc/testsuite/g++.dg/modules/p1689-5.C|  13 +
 gcc/testsuite/g++.dg/modules/p1689-5.exp.ddi  |  14 ++
 .../g++.dg/modules/p1689-file-default.C   |  16 ++
 .../g++.dg/modules/p1689-file-default.exp.ddi |  27 +++
 .../g++.dg/modules/p1689-target-default.C |  16 ++
 .../modules/p1689-target-default.exp.ddi  |  27 +++
 gcc/testsuite/g++.dg/modules/test-p1689.py| 222 ++
 gcc/testsuite/lib/modules.exp |  71 ++
 libcpp/include/cpplib.h   |  12 +-
 libcpp/include/mkdeps.h   |   9 +-
 libcpp/init.cc|  13 +-
 libcpp/mkdeps.cc  | 153 +++-
 43 files changed, 859 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-f-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-f.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fi.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fj-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fj-MF-share.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fj.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fjo-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fjo.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fo-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fo.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-j-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-j.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-jo-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-jo.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-o-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-o.C
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-1.C
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-1.exp.ddi
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-2.C
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-2.exp.ddi
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-3.C
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-3.exp.ddi
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-4.C
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-4.exp.ddi
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-5.C
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-5.exp.ddi
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-file-default.C
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-file-default.exp.ddi
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-target-default.C
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-target-default.exp.ddi
 create mode 100644 gcc/testsuite/g++.dg/modules/test-p1689.py
 create mode 100644 gcc/testsuite/lib/modules.exp

diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc
index af19140e382..9d794b2f4de 100644
--- a/gcc/c-family/c-opts.cc
+++ b/gcc/c-family/c-opts.cc
@@ -77,6 +77,9 @@ static bool verbose;
 /* Dependency output file.  */
 static const char *deps_file;
 
+/* Structured dependency output file.  */
+static const char *fdeps_file;
+
 /* The prefix given by -iprefix, if any.  *

[PATCH v7 4/4] c++modules: report module mapper files as a dependency

2023-07-02 Thread Ben Boeckel via Gcc
It affects the build, and if used as a static file, can reliably be
tracked using the `-MF` mechanism.

gcc/cp/:

* mapper-client.cc, mapper-client.h (open_module_client): Accept
dependency tracking and track module mapper files as
dependencies.
* module.cc (make_mapper, get_mapper): Pass the dependency
tracking class down.

gcc/testsuite/:

* g++.dg/modules/depreport-2.modmap: New test.
* g++.dg/modules/depreport-2_a.C: New test.
* g++.dg/modules/depreport-2_b.C: New test.
* g++.dg/modules/test-depfile.py: Support `:|` syntax output
when generating modules.

Signed-off-by: Ben Boeckel 
---
 gcc/cp/mapper-client.cc   |  5 +
 gcc/cp/mapper-client.h|  1 +
 gcc/cp/module.cc  | 18 -
 .../g++.dg/modules/depreport-2.modmap |  2 ++
 gcc/testsuite/g++.dg/modules/depreport-2_a.C  | 15 ++
 gcc/testsuite/g++.dg/modules/depreport-2_b.C  | 14 +
 gcc/testsuite/g++.dg/modules/test-depfile.py  | 20 +++
 7 files changed, 66 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/depreport-2.modmap
 create mode 100644 gcc/testsuite/g++.dg/modules/depreport-2_a.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depreport-2_b.C

diff --git a/gcc/cp/mapper-client.cc b/gcc/cp/mapper-client.cc
index 39e80df2d25..92727195246 100644
--- a/gcc/cp/mapper-client.cc
+++ b/gcc/cp/mapper-client.cc
@@ -34,6 +34,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "diagnostic-core.h"
 #include "mapper-client.h"
 #include "intl.h"
+#include "mkdeps.h"
 
 #include "../../c++tools/resolver.h"
 
@@ -132,6 +133,7 @@ spawn_mapper_program (char const **errmsg, std::string 
&name,
 
 module_client *
 module_client::open_module_client (location_t loc, const char *o,
+  class mkdeps *deps,
   void (*set_repo) (const char *),
   char const *full_program_name)
 {
@@ -285,6 +287,9 @@ module_client::open_module_client (location_t loc, const 
char *o,
  errmsg = "opening";
else
  {
+   /* Add the mapper file to the dependency tracking. */
+   if (deps)
+ deps_add_dep (deps, name.c_str ());
if (int l = r->read_tuple_file (fd, ident, false))
  {
if (l > 0)
diff --git a/gcc/cp/mapper-client.h b/gcc/cp/mapper-client.h
index b32723ce296..a3b0b8adc51 100644
--- a/gcc/cp/mapper-client.h
+++ b/gcc/cp/mapper-client.h
@@ -55,6 +55,7 @@ public:
 
 public:
   static module_client *open_module_client (location_t loc, const char *option,
+   class mkdeps *,
void (*set_repo) (const char *),
char const *);
   static void close_module_client (location_t loc, module_client *);
diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index f3acc4e02fe..77c9edcbc04 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -3969,12 +3969,12 @@ static GTY(()) vec 
*partial_specializations;
 /* Our module mapper (created lazily).  */
 module_client *mapper;
 
-static module_client *make_mapper (location_t loc);
-inline module_client *get_mapper (location_t loc)
+static module_client *make_mapper (location_t loc, class mkdeps *deps);
+inline module_client *get_mapper (location_t loc, class mkdeps *deps)
 {
   auto *res = mapper;
   if (!res)
-res = make_mapper (loc);
+res = make_mapper (loc, deps);
   return res;
 }
 
@@ -14033,7 +14033,7 @@ get_module (const char *ptr)
 /* Create a new mapper connecting to OPTION.  */
 
 module_client *
-make_mapper (location_t loc)
+make_mapper (location_t loc, class mkdeps *deps)
 {
   timevar_start (TV_MODULE_MAPPER);
   const char *option = module_mapper_name;
@@ -14041,7 +14041,7 @@ make_mapper (location_t loc)
 option = getenv ("CXX_MODULE_MAPPER");
 
   mapper = module_client::open_module_client
-(loc, option, &set_cmi_repo,
+(loc, option, deps, &set_cmi_repo,
  (save_decoded_options[0].opt_index == OPT_SPECIAL_program_name)
  && save_decoded_options[0].arg != progname
  ? save_decoded_options[0].arg : nullptr);
@@ -19506,7 +19506,7 @@ maybe_translate_include (cpp_reader *reader, line_maps 
*lmaps, location_t loc,
   dump.push (NULL);
 
   dump () && dump ("Checking include translation '%s'", path);
-  auto *mapper = get_mapper (cpp_main_loc (reader));
+  auto *mapper = get_mapper (cpp_main_loc (reader), cpp_get_deps (reader));
 
   size_t len = strlen (path);
   path = canonicalize_header_name (NULL, loc, true, path, len);
@@ -19622,7 +19622,7 @@ module_begin_main_file (cpp_reader *reader, line_maps 
*lmaps,
 static void
 nam

Re: [PATCH v5 4/5] c++modules: report imported CMI files as dependencies

2023-07-18 Thread Ben Boeckel via Gcc
On Tue, Jul 18, 2023 at 16:52:44 -0400, Jason Merrill wrote:
> On 6/25/23 12:36, Ben Boeckel wrote:
> > On Fri, Jun 23, 2023 at 08:12:41 -0400, Nathan Sidwell wrote:
> >> On 6/22/23 22:45, Ben Boeckel wrote:
> >>> On Thu, Jun 22, 2023 at 17:21:42 -0400, Jason Merrill wrote:
> >>>> On 1/25/23 16:06, Ben Boeckel wrote:
> >>>>> They affect the build, so report them via `-MF` mechanisms.
> >>>>
> >>>> Why isn't this covered by the existing code in preprocessed_module?
> >>>
> >>> It appears as though it is neutered in patch 3 where
> >>> `write_make_modules_deps` is used in `make_write` (or will use that name
> >>
> >> Why do you want to record the transitive modules? I would expect just 
> >> noting the
> >> ones with imports directly in the TU would suffice (i.e check the 
> >> 'outermost' arg)
> > 
> > FWIW, only GCC has "fat" modules. MSVC and Clang both require the
> > transitive closure to be passed. The idea there is to minimize the size
> > of individual module files.
> > 
> > If GCC only reads the "fat" modules, then only those should be recorded.
> > If it reads other modules, they should be recorded as well.

For clarification, given:

* a.cppm
```
export module a;
```

* b.cppm
```
export module b;
import a;
```

* use.cppm
```
import b;
```

in a "fat" module setup, `use.cppm` only needs to be told about
`b.cmi` because it contains everything that an importer needs to know
about the `a` module (reachable types, re-exported bits, whatever). With
the "thin" modules, `a.cmi` must be specified when compiling `use.cppm`
to satisfy anything that may be required transitively (e.g., a return
type of some `b` symbol is from `a`). MSVC and Clang (17+) use this
model. I don't know MSVC's rationale, but Clang's is to make CMIs
relocatable by not embedding paths to dependent modules in CMIs. This
should help caching and network transfer sizes in distributed builds.
LLVM/Clang discussion:


https://discourse.llvm.org/t/c-20-modules-should-the-bmis-contain-paths-to-their-dependent-bmis/70422
https://github.com/llvm/llvm-project/issues/62707

Maybe I'm missing how this *actually* works in GCC as I've really only
interacted with it through the command line, but I've not needed to
mention `a.cmi` when compiling `use.cppm`. Is `a.cmi` referenced and
read through some embedded information in `b.cmi` or does `b.cmi`
include enough information to not need to read it at all? If the former,
distributed builds are going to have a problem knowing what files to
send just from the command line (I'll call this "implicit thin"). If the
latter, that is the "fat" CMI that I'm thinking of.

> But wouldn't the transitive modules be dependencies of the direct 
> imports, so (re)building the direct imports would first require building 
> the transitive modules anyway?  Expressing the transitive closure of 
> dependencies for each importer seems redundant when it can be easily 
> derived from the direct dependencies of each module.

I'm not concerned whether it is transitive or not, really. If a file is
read, it should be reported here regardless of the reason. Note that
caching mechanisms may skip actually *doing* the reading, but the
dependencies should still be reported from the cached results as-if the
real work had been performed.

--Ben


Re: [PATCH v5 4/5] c++modules: report imported CMI files as dependencies

2023-07-19 Thread Ben Boeckel via Gcc
On Wed, Jul 19, 2023 at 17:11:08 -0400, Nathan Sidwell wrote:
> GCC is neither of these descriptions.  a CMI does not contain the transitive 
> closure of its imports.  It contains an import table.  That table lists the 
> transitive closure of its imports (it needs that closure to do remapping), 
> and 
> that table contains the CMI pathnames of the direct imports.  Those pathnames 
> are absolute, if the mapper provded an absolute pathm or relative to the CMI 
> repo.
> 
> The rationale here is that if you're building a CMI, Foo, which imports a 
> bunch 
> of modules, those imported CMIs will have the same (relative) location in 
> this 
> compilation and in compilations importing Foo (why would you move them?) Note 
> this is NOT inhibiting relocatable builds, because of the CMI repo.

But it is inhibiting distributed builds because the distributing tool
would need to know:

- what CMIs are actually imported (here, "read the module mapper file"
  (in CMake's case, this is only the modules that are needed; a single
  massive mapper file for an entire project would have extra entries) or
  "act as a proxy for the socket/program specified" for other
  approaches);
- read the CMIs as it sends to the remote side to gather any other CMIs
  that may be needed (recursively);

Contrast this with the MSVC and Clang (17+) mechanism where the command
line contains everything that is needed and a single bolus can be sent.

And relocatable is probably fine. How does it interact with reproducible
builds? Or are GCC CMIs not really something anyone should consider for
installation (even as a "here, maybe this can help consumers"
mechanism)?

> On 7/18/23 20:01, Ben Boeckel wrote:
> > Maybe I'm missing how this *actually* works in GCC as I've really only
> > interacted with it through the command line, but I've not needed to
> > mention `a.cmi` when compiling `use.cppm`. Is `a.cmi` referenced and
> > read through some embedded information in `b.cmi` or does `b.cmi`
> > include enough information to not need to read it at all? If the former,
> > distributed builds are going to have a problem knowing what files to
> > send just from the command line (I'll call this "implicit thin"). If the
> > latter, that is the "fat" CMI that I'm thinking of.
> 
> please don't use perjorative terms like 'fat' and 'thin'.

Sorry, I was internally analogizing to "thinLTO".

--Ben


Re: [PATCH v5 4/5] c++modules: report imported CMI files as dependencies

2023-07-21 Thread Ben Boeckel via Gcc
On Thu, Jul 20, 2023 at 17:00:32 -0400, Nathan Sidwell wrote:
> On 7/19/23 20:47, Ben Boeckel wrote:
> > But it is inhibiting distributed builds because the distributing tool
> > would need to know:
> > 
> > - what CMIs are actually imported (here, "read the module mapper file"
> >(in CMake's case, this is only the modules that are needed; a single
> >massive mapper file for an entire project would have extra entries) or
> >"act as a proxy for the socket/program specified" for other
> >approaches);
> 
> This information is in the machine (& human) README section of the CMI.

Ok. That leaves it up to distributing build tools to figure out at
least.

> > - read the CMIs as it sends to the remote side to gather any other CMIs
> >that may be needed (recursively);
> > 
> > Contrast this with the MSVC and Clang (17+) mechanism where the command
> > line contains everything that is needed and a single bolus can be sent.
> 
> um, the build system needs to create that command line? Where does the build 
> system get that information?  IIUC it'll need to read some file(s) to do that.

It's chained through the P1689 information in the collator as needed. No
extra files need to be read (at least with CMake's approach); certainly
not CMI files.

> > And relocatable is probably fine. How does it interact with reproducible
> > builds? Or are GCC CMIs not really something anyone should consider for
> > installation (even as a "here, maybe this can help consumers"
> > mechanism)?
> 
> Module CMIs should be considered a cacheable artifact.  They are neither 
> object 
> files nor source files.

Sure, cachable sounds fine. What about the installation?

--Ben


Re: [PATCH v5 4/5] c++modules: report imported CMI files as dependencies

2023-07-23 Thread Ben Boeckel via Gcc
On Fri, Jul 21, 2023 at 16:23:07 -0400, Nathan Sidwell wrote:
> It occurs to me that the model I am envisioning is similar to CMake's object 
> libraries.  Object libraries are a convenient name for a bunch of object 
> files. 
> IIUC they're linked by naming the individual object files (or I think the 
> could 
> be implemented as a static lib linked with --whole-archive path/to/libfoo.a 
> -no-whole-archive.  But for this conversation consider them a bunch of 
> separate 
> object files with a convenient group name.

Yes, `--whole-archive` would work great if it had any kind of
portability across CMake's platform set.

> Consider also that object libraries could themselves contain object libraries 
> (I 
> don't know of they can, but it seems like a useful concept).  Then one could 
> create an object library from a collection of object files and object 
> libraries 
> (recursively).  CMake would handle the transitive gtaph.

I think this detail is relevant, but you can use
`$` as an `INTERFACE` sources and it would act
like that, but it is an explicit thing. Instead, `OBJECT` libraries
*only* provide their objects to targets that *directly* link them. If
not, given this:

A (OBJECT library)
B (library of some kind; links PUBLIC to A)
C (links to B)

If `A` has things like linker flags (or, more likely, libraries) as part
of its usage requirements, C will get them on is link line. However, if
OBJECT files are transitive in the same way, the linker (on most
platforms at least) chokes because it now has duplicates of all of A's
symbols: those from the B library and those from A's objects on the link
line.

> Now, allow an object library to itself have some kind of tangible, on-disk 
> representation.  *BUT* not like a static library -- it doesn't include the 
> object files.
> 
> 
> Now that immediately maps onto modules.
> 
> CMI: Object library
> Direct imports: Direct object libraries of an object library
> 
> This is why I don't understand the need explicitly indicate the indirect 
> imports 
> of a CMI.  CMake knows them, because it knows the graph.

Sure, *CMake* knows them, but the *build tool* needs to be told
(typically `make` or `ninja`) because it is what is actually executing
the build graph. The way this is communicated is via `-MF` files and
that's what I'm providing in this patch. Note that `ninja` does not
allow rules to specify such dependencies for other rules than the one it
is reading the file for.

--Ben


Re: [PATCH v5 4/5] c++modules: report imported CMI files as dependencies

2023-07-29 Thread Ben Boeckel via Gcc
On Thu, Jul 27, 2023 at 18:13:48 -0700, Jason Merrill wrote:
> On 7/23/23 20:26, Ben Boeckel wrote:
> > Sure, *CMake* knows them, but the *build tool* needs to be told
> > (typically `make` or `ninja`) because it is what is actually executing
> > the build graph. The way this is communicated is via `-MF` files and
> > that's what I'm providing in this patch. Note that `ninja` does not
> > allow rules to specify such dependencies for other rules than the one it
> > is reading the file for.
> 
> But since the direct imports need to be rebuilt themselves if the 
> transitive imports change, the build graph should be the same whether or 
> not the transitive imports are repeated?  Either way, if a transitive 
> import changes you need to rebuild the direct import and then the importer.

I suppose I have seen enough bad build systems that don't do everything
correctly that I'm interested in creating "pits of success" rather than
"well, you didn't get thing X 100% correct, so you're screwed here too".

The case that I think is most likely here is that someone has a
"superbuild" with 3 projects A, B, and C where C uses B and B uses A. At
the top-level the superbuild exposes just "make projectA
projectB projectC"-granularity (rather than a combined build graph; they
may use different build systems) and then users go into some projectC
directly and forget to update projectB after updating projectA (known to
all use the same compiler/flags so that CMI sharing is possible). The
build it still broken, but ideally they get notified in some useful way
when rebuilding the TU rather than…whatever ends up catching the problem
incidentally.

> I guess it shouldn't hurt to have the transitive imports in the -MF 
> file, as long as they aren't also in the p1689 file, so I'm not 
> particularly opposed to this change, but I don't see how it makes a 
> practical difference.

Correct. The P1689 shouldn't even know about transitive imports (well,
maybe from header units?) as it just records "I saw an `import`
statement" and should never look up CMI files (indeed, we would need
another scanning step to know what CMI files to create for the P1689
scan if they were necessary…).

--Ben


isl 0.26 C++17 issues when bootstrapping with GCC

2023-08-29 Thread Ben Boeckel via Gcc
Hi,

I tried adding isl 0.26 to a 13.2 GCC build using Iain's macOS aarch64
patches:

https://github.com/iains/gcc-13-branch

It seems that the bootstrap's `CXX='g++ -std=c++11'` confuses isl's
build where C++17 is expected to work by disabling C++17 behind its
back. Should GCC not add this flag for its dependencies but only its own
build perhaps?

isl bug report thread (with isl configure logs and the like):

https://groups.google.com/g/isl-development/c/ShnQcW_35ZQ

Thanks,

--Ben


Re: isl 0.26 C++17 issues when bootstrapping with GCC

2023-08-29 Thread Ben Boeckel via Gcc
On Tue, Aug 29, 2023 at 18:57:37 +0200, Richard Biener wrote:
> I suppose for bootstrapping we could disable ISL during stage1 since
> it enables an optional feature only.  Other than that GCC only
> requires a C++11 compiler for building, so ISL breaks that constraint
> with requiring C++17.

Note that it doesn't *require* it per sé; the tests that try it are
compiled if C++17 support was detected at all. The headers seem to just
have optional header-only `std::any`-using APIs if C++17 is around.
`isl` supporting a flag to disable the tests would also work, but that
doesn't fix 0.26. It also doesn't mean it won't start requiring C++17 at
some point in the future.

In light of that, I feel that skipping it for bootstrap is probably the
right solution here. Alas, my skill with autotools is closer to the
caveman-with-club level rather than that of a surgeon.

--Ben


[PATCH v8 0/4] P1689R5 support

2023-09-01 Thread Ben Boeckel via Gcc
Hi,

This patch series adds initial support for ISO C++'s [P1689R5][], a
format for describing C++ module requirements and provisions based on
the source code. This is required because compiling C++ with modules is
not embarrassingly parallel and need to be ordered to ensure that
`import some_module;` can be satisfied in time by making sure that any
TU with `export import some_module;` is compiled first.

[P1689R5]: https://isocpp.org/files/papers/P1689R5.html

I've also added patches to include imported module CMI files and the
module mapper file as dependencies of the compilation. I briefly looked
into adding dependencies on response files as well, but that appeared to
need some code contortions to have a `class mkdeps` available before
parsing the command line or to keep the information around until one was
made.

I'd like feedback on the approach taken here with respect to the
user-visible flags. I'll also note that header units are not supported
at this time because the current `-E` behavior with respect to `import
;` is to search for an appropriate `.gcm` file which is not
something such a "scan" can support. A new mode will likely need to be
created (e.g., replacing `-E` with `-fc++-module-scanning` or something)
where headers are looked up "normally" and processed only as much as
scanning requires.

FWIW, Clang as taken an alternate approach with its `clang-scan-deps`
tool rather than using the compiler directly.

Thanks,

--Ben

---
v7 -> v8:

- rename `DEPS_FMT_` enum variants to `FDEPS_FMT_` to match the
  associated flag
- memory leak fix in the `join` specfunc implementation (also better
  comments), both from Jason
- formatting fix in `mkdeps.cc` for `write_make_modules_deps` assignment
- comments on new functions for P1689R5 implementation

v6 -> v7:

- rebase onto `master` (80ae426a195 (d: Fix core.volatile.volatileLoad
  discarded if result is unused, 2023-07-02))
- add test cases for patches 3 and 4 (new dependency reporting in `-MF`)
- add a Python script to test aspects of generated dependency files
- a new `join` spec function to support `-fdeps-*` defaults based on the
  `-o` flag (needed to strip the leading space that appears otherwise)
- note that JSON writing support should be factored out for use by
  `libcpp` and `gcc` (libiberty?)
- use `.ddi` for the extension of `-fdeps-*` output files by default
- support defaults for `-fdeps-file=` and `-fdeps-target=` when only
  `-fdeps-format=` is provided (with tests)
- error if `-MF` and `-fdeps-file=` are both the same (non-`stdout`)
  file as their formats are incompatible
- expand the documentation on how the `-fdeps-*` flags should be used

v5 -> v6:

- rebase onto `master` (585c660f041 (reload1: Change return type of
  predicate function from int to bool, 2023-06-06))
- fix crash related to reporting imported CMI files as dependencies
- rework utf-8 validity to patch the new `cpp_valid_utf8_p` function
  instead of the core utf-8 decoding routine to reject invalid
  codepoints (preserves higher-level error detection of invalid utf-8)
- harmonize of `fdeps` spelling in flags, variables, comments, etc.
- rename `-fdeps-output=` to `-fdeps-target=`

v4 -> v5:

- add dependency tracking for imported modules to `-MF`
- add dependency tracking for static module mapper files given to
  `-fmodule-mapper=`

v3 -> v4:

- add missing spaces between function names and arguments

v2 -> v3:

- changelog entries moved to commit messages
- documentation updated/added in the UTF-8 routine editing

v1 -> v2:

- removal of the `deps_write(extra)` parameter to option-checking where
  ndeeded
- default parameter of `cpp_finish(fdeps_stream = NULL)`
- unification of libcpp UTF-8 validity functions from v1
- test cases for flag parsing states (depflags-*) and p1689 output
  (p1689-*)

Ben Boeckel (4):
  spec: add a spec function to join arguments
  p1689r5: initial support
  c++modules: report imported CMI files as dependencies
  c++modules: report module mapper files as a dependency

 gcc/c-family/c-opts.cc|  44 +++-
 gcc/c-family/c.opt|  12 +
 gcc/cp/mapper-client.cc   |   5 +
 gcc/cp/mapper-client.h|   1 +
 gcc/cp/module.cc  |  24 +-
 gcc/doc/invoke.texi   |  27 +++
 gcc/gcc.cc|  27 ++-
 gcc/json.h|   3 +
 gcc/testsuite/g++.dg/modules/depflags-f-MD.C  |   2 +
 gcc/testsuite/g++.dg/modules/depflags-f.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-fi.C|   4 +
 gcc/testsuite/g++.dg/modules/depflags-fj-MD.C |   3 +
 .../g++.dg/modules/depflags-fj-MF-share.C |   6 +
 gcc/testsuite/g++.dg/modules/depflags-fj.C|   4 +
 .../g++.dg/modules/depflags-fjo-MD.C  |   4 +
 gcc/testsuite/g++.dg/modules/depflags-fjo.C   |   5 +
 gcc/testsuite/g++.dg/modules/depflag

[PATCH v8 1/4] spec: add a spec function to join arguments

2023-09-01 Thread Ben Boeckel via Gcc
When passing `-o` flags to other options, the typical `-o foo` spelling
leaves a leading whitespace when replacing elsewhere. This ends up
creating flags spelled as `-some-option-with-arg= foo.ext` which doesn't
parse properly. When attempting to make a spec function to just remove
the leading whitespace, the argument splitting ends up masking the
whitespace. However, the intended extension *also* ends up being its own
argument. To perform the desired behavior, the arguments need to be
concatenated together.

gcc/:

* gcc.cc (join_spec_func): Add a spec function to join all
arguments.

Signed-off-by: Ben Boeckel 
---
 gcc/gcc.cc | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/gcc/gcc.cc b/gcc/gcc.cc
index fdfac0b4fe4..4c4e81dee50 100644
--- a/gcc/gcc.cc
+++ b/gcc/gcc.cc
@@ -447,6 +447,7 @@ static const char *greater_than_spec_func (int, const char 
**);
 static const char *debug_level_greater_than_spec_func (int, const char **);
 static const char *dwarf_version_greater_than_spec_func (int, const char **);
 static const char *find_fortran_preinclude_file (int, const char **);
+static const char *join_spec_func (int, const char **);
 static char *convert_white_space (char *);
 static char *quote_spec (char *);
 static char *quote_spec_arg (char *);
@@ -1772,6 +1773,7 @@ static const struct spec_function static_spec_functions[] 
=
   { "debug-level-gt",  debug_level_greater_than_spec_func },
   { "dwarf-version-gt",dwarf_version_greater_than_spec_func },
   { "fortran-preinclude-file", find_fortran_preinclude_file},
+  { "join",join_spec_func},
 #ifdef EXTRA_SPEC_FUNCTIONS
   EXTRA_SPEC_FUNCTIONS
 #endif
@@ -10975,6 +10977,27 @@ find_fortran_preinclude_file (int argc, const char 
**argv)
   return result;
 }
 
+/* The function takes any number of arguments and joins them together.
+
+   This seems to be necessary to build "-fjoined=foo.b" from "-fseparate foo.a"
+   with a %{fseparate*:-fjoined=%.b$*} rule without adding undesired spaces:
+   when doing $* replacement we first replace $* with the rest of the switch
+   (in this case ""), and then add any arguments as arguments after the result,
+   resulting in "-fjoined= foo.b".  Using this function with e.g.
+   %{fseparate*:-fjoined=%:join(%.b$*)} gets multiple words as separate argv
+   elements instead of separated by spaces, and we paste them together.  */
+
+static const char *
+join_spec_func (int argc, const char **argv)
+{
+  if (argc == 1)
+return argv[0];
+  for (int i = 0; i < argc; ++i)
+obstack_grow (&obstack, argv[i], strlen (argv[i]));
+  obstack_1grow (&obstack, '\0');
+  return XOBFINISH (&obstack, const char *);
+}
+
 /* If any character in ORIG fits QUOTE_P (_, P), reallocate the string
so as to precede every one of them with a backslash.  Return the
original string or the reallocated one.  */
-- 
2.41.0



[PATCH v8 3/4] c++modules: report imported CMI files as dependencies

2023-09-01 Thread Ben Boeckel via Gcc
They affect the build, so report them via `-MF` mechanisms.

gcc/cp/

* module.cc (do_import): Report imported CMI files as
dependencies.

gcc/testsuite/

* g++.dg/modules/depreport-1_a.C: New test.
* g++.dg/modules/depreport-1_b.C: New test.
* g++.dg/modules/test-depfile.py: New tool for validating depfile
information.
* lib/modules.exp: Support for validating depfile contents.

Signed-off-by: Ben Boeckel 
---
 gcc/cp/module.cc |   3 +
 gcc/testsuite/g++.dg/modules/depreport-1_a.C |  10 +
 gcc/testsuite/g++.dg/modules/depreport-1_b.C |  12 ++
 gcc/testsuite/g++.dg/modules/test-depfile.py | 187 +++
 gcc/testsuite/lib/modules.exp|  29 +++
 5 files changed, 241 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/modules/depreport-1_a.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depreport-1_b.C
 create mode 100644 gcc/testsuite/g++.dg/modules/test-depfile.py

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 9df60d695b1..f3acc4e02fe 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -18968,6 +18968,9 @@ module_state::do_import (cpp_reader *reader, bool 
outermost)
   dump () && dump ("CMI is %s", file);
   if (note_module_cmi_yes || inform_cmi_p)
inform (loc, "reading CMI %qs", file);
+  /* Add the CMI file to the dependency tracking. */
+  if (cpp_get_deps (reader))
+   deps_add_dep (cpp_get_deps (reader), file);
   fd = open (file, O_RDONLY | O_CLOEXEC | O_BINARY);
   e = errno;
 }
diff --git a/gcc/testsuite/g++.dg/modules/depreport-1_a.C 
b/gcc/testsuite/g++.dg/modules/depreport-1_a.C
new file mode 100644
index 000..241701728a2
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/depreport-1_a.C
@@ -0,0 +1,10 @@
+// { dg-additional-options -fmodules-ts }
+
+export module Foo;
+// { dg-module-cmi Foo }
+
+export class Base
+{
+public:
+  int m;
+};
diff --git a/gcc/testsuite/g++.dg/modules/depreport-1_b.C 
b/gcc/testsuite/g++.dg/modules/depreport-1_b.C
new file mode 100644
index 000..b6e317c6703
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/depreport-1_b.C
@@ -0,0 +1,12 @@
+// { dg-additional-options -fmodules-ts }
+// { dg-additional-options -MD }
+// { dg-additional-options "-MF depreport-1.d" }
+
+import Foo;
+
+void foo ()
+{
+  Base b;
+}
+
+// { dg-final { run-check-module-dep-expect-input "depreport-1.d" 
"gcm.cache/Foo.gcm" } }
diff --git a/gcc/testsuite/g++.dg/modules/test-depfile.py 
b/gcc/testsuite/g++.dg/modules/test-depfile.py
new file mode 100644
index 000..ea4edb61434
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/test-depfile.py
@@ -0,0 +1,187 @@
+import json
+
+
+# Parameters.
+ALL_ERRORS = False
+
+
+def _report_error(msg):
+'''Report an error.'''
+full_msg = 'ERROR: ' + msg
+if ALL_ERRORS:
+print(full_msg)
+else:
+raise RuntimeError(full_msg)
+
+
+class Token(object):
+pass
+
+
+class Output(Token):
+def __init__(self, path):
+self.path = path
+
+
+class Input(Token):
+def __init__(self, path):
+self.path = path
+
+
+class Colon(Token):
+pass
+
+
+class Append(Token):
+pass
+
+
+class Variable(Token):
+def __init__(self, name):
+self.name = name
+
+
+class Word(Token):
+def __init__(self, name):
+self.name = name
+
+
+def validate_depfile(depfile, expect_input=None):
+'''Validate a depfile contains some information
+
+Returns `False` if the information is not found.
+'''
+with open(depfile, 'r') as fin:
+depfile_content = fin.read()
+
+real_lines = []
+join_line = False
+for line in depfile_content.split('\n'):
+# Join the line if needed.
+if join_line:
+line = real_lines.pop() + line
+
+# Detect line continuations.
+join_line = line.endswith('\\')
+# Strip line continuation characters.
+if join_line:
+line = line[:-1]
+
+# Add to the real line set.
+real_lines.append(line)
+
+# Perform tokenization.
+tokenized_lines = []
+for line in real_lines:
+tokenized = []
+join_word = False
+for word in line.split(' '):
+if join_word:
+word = tokenized.pop() + ' ' + word
+
+# Detect word joins.
+join_word = word.endswith('\\')
+# Strip escape character.
+if join_word:
+word = word[:-1]
+
+# Detect `:` at the end of a word.
+if word.endswith(':'):
+tokenized.append(word[:-1])
+word = word[-1]
+
+# Add word to the tokenized set.
+tokenized.append(word)
+
+tokenized_line

[PATCH v8 4/4] c++modules: report module mapper files as a dependency

2023-09-01 Thread Ben Boeckel via Gcc
It affects the build, and if used as a static file, can reliably be
tracked using the `-MF` mechanism.

gcc/cp/:

* mapper-client.cc, mapper-client.h (open_module_client): Accept
dependency tracking and track module mapper files as
dependencies.
* module.cc (make_mapper, get_mapper): Pass the dependency
tracking class down.

gcc/testsuite/:

* g++.dg/modules/depreport-2.modmap: New test.
* g++.dg/modules/depreport-2_a.C: New test.
* g++.dg/modules/depreport-2_b.C: New test.
* g++.dg/modules/test-depfile.py: Support `:|` syntax output
when generating modules.

Signed-off-by: Ben Boeckel 
---
 gcc/cp/mapper-client.cc   |  5 +
 gcc/cp/mapper-client.h|  1 +
 gcc/cp/module.cc  | 18 -
 .../g++.dg/modules/depreport-2.modmap |  2 ++
 gcc/testsuite/g++.dg/modules/depreport-2_a.C  | 15 ++
 gcc/testsuite/g++.dg/modules/depreport-2_b.C  | 14 +
 gcc/testsuite/g++.dg/modules/test-depfile.py  | 20 +++
 7 files changed, 66 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/depreport-2.modmap
 create mode 100644 gcc/testsuite/g++.dg/modules/depreport-2_a.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depreport-2_b.C

diff --git a/gcc/cp/mapper-client.cc b/gcc/cp/mapper-client.cc
index 39e80df2d25..92727195246 100644
--- a/gcc/cp/mapper-client.cc
+++ b/gcc/cp/mapper-client.cc
@@ -34,6 +34,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "diagnostic-core.h"
 #include "mapper-client.h"
 #include "intl.h"
+#include "mkdeps.h"
 
 #include "../../c++tools/resolver.h"
 
@@ -132,6 +133,7 @@ spawn_mapper_program (char const **errmsg, std::string 
&name,
 
 module_client *
 module_client::open_module_client (location_t loc, const char *o,
+  class mkdeps *deps,
   void (*set_repo) (const char *),
   char const *full_program_name)
 {
@@ -285,6 +287,9 @@ module_client::open_module_client (location_t loc, const 
char *o,
  errmsg = "opening";
else
  {
+   /* Add the mapper file to the dependency tracking. */
+   if (deps)
+ deps_add_dep (deps, name.c_str ());
if (int l = r->read_tuple_file (fd, ident, false))
  {
if (l > 0)
diff --git a/gcc/cp/mapper-client.h b/gcc/cp/mapper-client.h
index b32723ce296..a3b0b8adc51 100644
--- a/gcc/cp/mapper-client.h
+++ b/gcc/cp/mapper-client.h
@@ -55,6 +55,7 @@ public:
 
 public:
   static module_client *open_module_client (location_t loc, const char *option,
+   class mkdeps *,
void (*set_repo) (const char *),
char const *);
   static void close_module_client (location_t loc, module_client *);
diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index f3acc4e02fe..77c9edcbc04 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -3969,12 +3969,12 @@ static GTY(()) vec 
*partial_specializations;
 /* Our module mapper (created lazily).  */
 module_client *mapper;
 
-static module_client *make_mapper (location_t loc);
-inline module_client *get_mapper (location_t loc)
+static module_client *make_mapper (location_t loc, class mkdeps *deps);
+inline module_client *get_mapper (location_t loc, class mkdeps *deps)
 {
   auto *res = mapper;
   if (!res)
-res = make_mapper (loc);
+res = make_mapper (loc, deps);
   return res;
 }
 
@@ -14033,7 +14033,7 @@ get_module (const char *ptr)
 /* Create a new mapper connecting to OPTION.  */
 
 module_client *
-make_mapper (location_t loc)
+make_mapper (location_t loc, class mkdeps *deps)
 {
   timevar_start (TV_MODULE_MAPPER);
   const char *option = module_mapper_name;
@@ -14041,7 +14041,7 @@ make_mapper (location_t loc)
 option = getenv ("CXX_MODULE_MAPPER");
 
   mapper = module_client::open_module_client
-(loc, option, &set_cmi_repo,
+(loc, option, deps, &set_cmi_repo,
  (save_decoded_options[0].opt_index == OPT_SPECIAL_program_name)
  && save_decoded_options[0].arg != progname
  ? save_decoded_options[0].arg : nullptr);
@@ -19506,7 +19506,7 @@ maybe_translate_include (cpp_reader *reader, line_maps 
*lmaps, location_t loc,
   dump.push (NULL);
 
   dump () && dump ("Checking include translation '%s'", path);
-  auto *mapper = get_mapper (cpp_main_loc (reader));
+  auto *mapper = get_mapper (cpp_main_loc (reader), cpp_get_deps (reader));
 
   size_t len = strlen (path);
   path = canonicalize_header_name (NULL, loc, true, path, len);
@@ -19622,7 +19622,7 @@ module_begin_main_file (cpp_reader *reader, line_maps 
*lmaps,
 static void
 nam

[PATCH v8 2/4] p1689r5: initial support

2023-09-01 Thread Ben Boeckel via Gcc
les.exp: Support for validating P1689 outputs.

Signed-off-by: Ben Boeckel 
---
 gcc/c-family/c-opts.cc|  44 +++-
 gcc/c-family/c.opt|  12 +
 gcc/cp/module.cc  |   3 +-
 gcc/doc/invoke.texi   |  27 +++
 gcc/gcc.cc|   4 +-
 gcc/json.h|   3 +
 gcc/testsuite/g++.dg/modules/depflags-f-MD.C  |   2 +
 gcc/testsuite/g++.dg/modules/depflags-f.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-fi.C|   4 +
 gcc/testsuite/g++.dg/modules/depflags-fj-MD.C |   3 +
 .../g++.dg/modules/depflags-fj-MF-share.C |   6 +
 gcc/testsuite/g++.dg/modules/depflags-fj.C|   4 +
 .../g++.dg/modules/depflags-fjo-MD.C  |   4 +
 gcc/testsuite/g++.dg/modules/depflags-fjo.C   |   5 +
 gcc/testsuite/g++.dg/modules/depflags-fo-MD.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-fo.C|   4 +
 gcc/testsuite/g++.dg/modules/depflags-j-MD.C  |   2 +
 gcc/testsuite/g++.dg/modules/depflags-j.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-jo-MD.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-jo.C|   4 +
 gcc/testsuite/g++.dg/modules/depflags-o-MD.C  |   2 +
 gcc/testsuite/g++.dg/modules/depflags-o.C |   3 +
 gcc/testsuite/g++.dg/modules/modules.exp  |   1 +
 gcc/testsuite/g++.dg/modules/p1689-1.C|  17 ++
 gcc/testsuite/g++.dg/modules/p1689-1.exp.ddi  |  27 +++
 gcc/testsuite/g++.dg/modules/p1689-2.C|  15 ++
 gcc/testsuite/g++.dg/modules/p1689-2.exp.ddi  |  16 ++
 gcc/testsuite/g++.dg/modules/p1689-3.C|  13 +
 gcc/testsuite/g++.dg/modules/p1689-3.exp.ddi  |  16 ++
 gcc/testsuite/g++.dg/modules/p1689-4.C|  13 +
 gcc/testsuite/g++.dg/modules/p1689-4.exp.ddi  |  14 ++
 gcc/testsuite/g++.dg/modules/p1689-5.C|  13 +
 gcc/testsuite/g++.dg/modules/p1689-5.exp.ddi  |  14 ++
 .../g++.dg/modules/p1689-file-default.C   |  16 ++
 .../g++.dg/modules/p1689-file-default.exp.ddi |  27 +++
 .../g++.dg/modules/p1689-target-default.C |  16 ++
 .../modules/p1689-target-default.exp.ddi  |  27 +++
 gcc/testsuite/g++.dg/modules/test-p1689.py| 222 ++
 gcc/testsuite/lib/modules.exp |  71 ++
 libcpp/include/cpplib.h   |  12 +-
 libcpp/include/mkdeps.h   |   9 +-
 libcpp/init.cc|  13 +-
 libcpp/mkdeps.cc  | 163 -
 43 files changed, 869 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-f-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-f.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fi.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fj-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fj-MF-share.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fj.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fjo-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fjo.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fo-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fo.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-j-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-j.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-jo-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-jo.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-o-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-o.C
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-1.C
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-1.exp.ddi
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-2.C
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-2.exp.ddi
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-3.C
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-3.exp.ddi
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-4.C
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-4.exp.ddi
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-5.C
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-5.exp.ddi
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-file-default.C
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-file-default.exp.ddi
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-target-default.C
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-target-default.exp.ddi
 create mode 100644 gcc/testsuite/g++.dg/modules/test-p1689.py
 create mode 100644 gcc/testsuite/lib/modules.exp

diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc
index 4961af63de8..1c1f8c84f88 100644
--- a/gcc/c-family/c-opts.cc
+++ b/gcc/c-family/c-opts.cc
@@ -77,6 +77,9 @@ static bool verbose;
 /* Dependency output file.  */
 static const char *deps_file;
 
+/* Structured dependency output file.  */
+static const char *fdeps_file;
+
 /* The prefix given by -iprefix, if any.  *

Re: [PATCH v8 0/4] P1689R5 support

2023-09-20 Thread Ben Boeckel via Gcc
On Tue, Sep 19, 2023 at 17:33:34 -0400, Jason Merrill wrote:
> Pushed, thanks!

Thanks!

Is there a process I can use to backport this to GCC 13?

--Ben


Re: Scaling -fmacro-prefix-map= to thousands entries

2023-10-05 Thread Ben Boeckel via Gcc
On Wed, Oct 04, 2023 at 22:19:32 +0100, Sergei Trofimovich via Gcc wrote:
> The prototype that creates equivalent of the following commands does
> work for smaller packages:
> 
> 
> -fmacro-prefix-map=/nix/store/y8wfrgk7br5rfz4221lfb9v8w3n0cnyd-glibc-2.37-8-dev=/nix/store/-glibc-2.37-8-dev
> 
> -fmacro-prefix-map=/nix/store/8n240jfdmsb3lnc2qa2vb9dwk638j1lp-gmp-with-cxx-6.3.0-dev=/nix/store/-gmp-with-cxx-6.3.0-dev
> 
> -fmacro-prefix-map=/nix/store/phjcmy025rd1ankw5y1b21xsdii83cyk-nlohmann_json-3.11.2=/nix/store/-nlohmann_json-3.11.2
> ...
> 
> The above works for small amount of options (like, 100). But around 1000
> options we start hitting linux limits on the single environment variable
> or real-world packages like `qemu` with a ton of input depends.

Are you trying to pass this through via `CFLAGS` and friends?

> The command-line limitations are in various places:
> - `gcc` limitation of lifting all command line options into a single
>   environment variable: https://gcc.gnu.org/PR111527
> - `linux` limitation of constraining single environ variable to a value
>   way below than full available environment space:
>   https://lkml.org/lkml/2023/9/24/381
> 
> `linux` fix would buy us 50x more budged (A Lot) but it will not help
> much other operating systems like `Darwin` where absolute environment
> limit is a lot lower than `linux`.
> 
> I already implemented [1.] in https://github.com/NixOS/nixpkgs/pull/255192
> (also attached `mangle-NIX_STORE-in-__FILE__.patch` 3.5K patch against
> `master` as a proof of concept).
> 
> What would be the best way to scale up `-fmacro-prefix-map=` up to NixOS
> needs for `gcc`? I would like to implement something sensible I could
> upstream.

How about `CFLAGS=@macro_prefix_map.args` and writing that file in the
same codepath where you generate the flags today. It'll work with just
about every compiler and tools like `ccache` will understand that it is
an input that affects the build and properly take the file's contents
into account.

--Ben


Re: Scaling -fmacro-prefix-map= to thousands entries

2023-10-05 Thread Ben Boeckel via Gcc
On Thu, Oct 05, 2023 at 12:59:21 +0100, Sergei Trofimovich via Gcc wrote:
> Sounds good. Do you have any preference over specific syntax? My
> suggestions would be:
> 
> 1. `-fmacro-prefix-map=file-name`: if `file-name` there is not in `key=val`
>format then treat it as file
> 2. `-fmacro-prefix-map=@file-name`: use @ as a signal to use file
> 3. `fmacro-prefix-map-file=file-name`: use a new option

Whatever is picked, please let `ccache` and `sccache` know so that they
can include this argument's contents in their hashes. `distcc` and other
distributed build tools need to know to send this file to wherever
compilation is happening as well (though they may actually not care as
they may send around preprocessed source?).

Thanks,

--Ben


Re: Odd Python errors in the G++ testsuite

2023-10-09 Thread Ben Boeckel via Gcc
On Mon, Oct 09, 2023 at 20:12:01 +0100, Maciej W. Rozycki wrote:
>  I'm seeing these tracebacks for several cases across the G++ testsuite:
> 
> Executing on host: python3 -c "import sys; assert sys.version_info >= (3, 6)" 
>(timeout = 300)
> spawn -ignore SIGHUP python3 -c import sys; assert sys.version_info >= (3, 6)

What version of Python3 do you have? The test suite might not actually
properly handle not having 3.7 (i.e., skip the tests that require it).

> rules/0/primary-output is ok: p1689-1.o

I wrote these tests.

> rules/0/provides/0/logical-name is ok: foo
> rules/0/provides/0/is-interface is ok: True
> Traceback (most recent call last):
>   File ".../gcc/testsuite/g++.dg/modules/test-p1689.py", line 218, in 
> is_ok = validate_p1689(actual, expect)
>   File ".../gcc/testsuite/g++.dg/modules/test-p1689.py", line 182, in 
> validate_p1689
> return compare_json([], actual_json, expect_json)
>   File ".../gcc/testsuite/g++.dg/modules/test-p1689.py", line 145, in 
> compare_json
> is_ok = _compare_object(path, actual, expect)
>   File ".../gcc/testsuite/g++.dg/modules/test-p1689.py", line 66, in 
> _compare_object
> sub_error = compare_json(path + [key], actual[key], expect[key])
>   File ".../gcc/testsuite/g++.dg/modules/test-p1689.py", line 151, in 
> compare_json
> is_ok = _compare_array(path, actual, expect)
>   File ".../gcc/testsuite/g++.dg/modules/test-p1689.py", line 87, in 
> _compare_array
> sub_error = compare_json(path + [str(idx)], a, e)
>   File ".../gcc/testsuite/g++.dg/modules/test-p1689.py", line 145, in 
> compare_json
> is_ok = _compare_object(path, actual, expect)
>   File ".../gcc/testsuite/g++.dg/modules/test-p1689.py", line 66, in 
> _compare_object
> sub_error = compare_json(path + [key], actual[key], expect[key])
>   File ".../gcc/testsuite/g++.dg/modules/test-p1689.py", line 149, in 
> compare_json
> actual = set(actual)
> TypeError: unhashable type: 'dict'

I'm not sure how this ends up with a dictionary in it… Can you
`print(actual)` before this?

> and also these intermittent failures for other cases:
> 
> Executing on host: python3 -c "import sys; assert sys.version_info >= (3, 6)" 
>(timeout = 300)
> spawn -ignore SIGHUP python3 -c import sys; assert sys.version_info >= (3, 6)
> rules/0/primary-output is ok: p1689-2.o
> rules/0/provides/0/logical-name is ok: foo:part1
> rules/0/provides/0/is-interface is ok: True
> ERROR: length mismatch at rules/0/requires: actual: "1" expect: "0"
> version is ok: 0
> revision is ok: 0
> FAIL: ERROR: length mismatch at rules/0/requires: actual: "1" expect: "0"
> 
> This does seem to me like something not working as intended.  As a Python 
> non-expert I have troubles concluding what is going on here and whether 
> these tracebacks are indeed supposed to be there, or whether it is a sign 
> of a problem.  And these failures I don't even know where they come from.  
> 
>  Does anyone know?  Is there a way to run the offending commands by hand?  
> The relevant invocation lines do not appear in the test log file for one 
> to copy and paste, which I think is not the right way of doing things in 
> our environment.
> 
>  These issues seem independent from the test host environment as I can see 
> them on both a `powerpc64le-linux-gnu' and an `x86_64-linux-gnu' machine 
> in `riscv64-linux-gnu' target testing.

Do they all have pre-3.7 Python3 versions?

--Ben


Re: Odd Python errors in the G++ testsuite

2023-10-09 Thread Ben Boeckel via Gcc
On Mon, Oct 09, 2023 at 19:46:37 -0400, Paul Koning wrote:
> 
> 
> > On Oct 9, 2023, at 7:42 PM, Ben Boeckel via Gcc  wrote:
> > 
> > On Mon, Oct 09, 2023 at 20:12:01 +0100, Maciej W. Rozycki wrote:
> >> I'm seeing these tracebacks for several cases across the G++ testsuite:
> >> 
> >> Executing on host: python3 -c "import sys; assert sys.version_info >= (3, 
> >> 6)"(timeout = 300)
> >> spawn -ignore SIGHUP python3 -c import sys; assert sys.version_info >= (3, 
> >> 6)
> > 
> > What version of Python3 do you have? The test suite might not actually
> > properly handle not having 3.7 (i.e., skip the tests that require it).
> 
> But the rule that you can't put a dict in a set is as old as set support (2.x 
> for some small x).

Yes. I just wonder how a dictionary got in there in the first place. I'm
not sure if some *other* 3.7-related change makes that work.

--Ben


Re: Odd Python errors in the G++ testsuite

2023-10-09 Thread Ben Boeckel via Gcc
On Mon, Oct 09, 2023 at 20:12:01 +0100, Maciej W. Rozycki wrote:
> Hi,
> 
>  I'm seeing these tracebacks for several cases across the G++ testsuite:
> 
> Executing on host: python3 -c "import sys; assert sys.version_info >= (3, 6)" 
>(timeout = 300)
> spawn -ignore SIGHUP python3 -c import sys; assert sys.version_info >= (3, 6)
> rules/0/primary-output is ok: p1689-1.o
> rules/0/provides/0/logical-name is ok: foo
> rules/0/provides/0/is-interface is ok: True
> Traceback (most recent call last):
>   File ".../gcc/testsuite/g++.dg/modules/test-p1689.py", line 218, in 
> is_ok = validate_p1689(actual, expect)
>   File ".../gcc/testsuite/g++.dg/modules/test-p1689.py", line 182, in 
> validate_p1689
> return compare_json([], actual_json, expect_json)
>   File ".../gcc/testsuite/g++.dg/modules/test-p1689.py", line 145, in 
> compare_json
> is_ok = _compare_object(path, actual, expect)
>   File ".../gcc/testsuite/g++.dg/modules/test-p1689.py", line 66, in 
> _compare_object
> sub_error = compare_json(path + [key], actual[key], expect[key])
>   File ".../gcc/testsuite/g++.dg/modules/test-p1689.py", line 151, in 
> compare_json
> is_ok = _compare_array(path, actual, expect)
>   File ".../gcc/testsuite/g++.dg/modules/test-p1689.py", line 87, in 
> _compare_array
> sub_error = compare_json(path + [str(idx)], a, e)
>   File ".../gcc/testsuite/g++.dg/modules/test-p1689.py", line 145, in 
> compare_json
> is_ok = _compare_object(path, actual, expect)
>   File ".../gcc/testsuite/g++.dg/modules/test-p1689.py", line 66, in 
> _compare_object
> sub_error = compare_json(path + [key], actual[key], expect[key])
>   File ".../gcc/testsuite/g++.dg/modules/test-p1689.py", line 149, in 
> compare_json
> actual = set(actual)
> TypeError: unhashable type: 'dict'

So looking at the source…the 3.6 check is not from me. Not sure what's
up there; it's probably not immediately related to the backtrace.

But this backtrace means that we have a list of objects that do not
expect a given ordering that is of JSON objects. I'm not sure why this
never showed up before as all of the existing uses of it are indeed of
objects. Can you try removing `"__P1689_unordered__"` from the
`p1689-1.exp.ddi` file's `requires` array? The
`p1689-file-default.exp.ddi` and `p1689-target-default.exp.ddi` files
need the same treatment.

--Ben


Re: Updated Sourceware infrastructure plans

2024-05-04 Thread Ben Boeckel via Gcc
On Wed, May 01, 2024 at 23:26:18 +0200, Mark Wielaard wrote:
> On Wed, May 01, 2024 at 04:04:37PM -0400, Jason Merrill wrote:
> > Do you (or others) have any thoughts about GitLab FOSS?
> 
> The gitlab "community edition" still feels not very much "community".
> We could run our own instance, but it will still be "open core" with
> features missing to try to draw you towards the proprietary hosted
> saas version. Also it seems to have way too much overhead. The focus
> is clearly corporate developers where managers want assurances the
> mandatory "pipelines" are executed and "workflows" followed exactly.

I'll offer my experience here. We (at Kitware) have been using GitLab
FOSS for around 8 years. We can't use the other editions because of the
per-account pricing and having open registration (since pretty much
everything there is FOSS code). GitLab is receptive to patches sent
their way and have considered moving things to the FOSS edition to help
large FOSS organizations (freedesktop.org, GNOME, KDE, probably others
too). There's also been discussion of implementing features such as
commit message review in order to court Linux developers given
forge-like discussion happening there. FWIW, Fedora is also looking at
forges as well:

https://discussion.fedoraproject.org/t/2024-git-forge-evaluation/111795

That said, there are definitely gaps to fill. We have our tooling here:

https://gitlab.kitware.com/utils/rust-ghostflow (core actions)
https://gitlab.kitware.com/utils/ghostflow-director (service deployment)

We use it to implement things including:

  - Basic content checks (scripts are executable, no binaries, file size
limits, formatting, etc.) either on a commit-by-commit basis or by
looking at the MR (patch series, PR, whatever the forge calls it) as
a whole. Docs for currently-implemented checks are here:

https://gitlab.kitware.com/utils/rust-ghostflow/-/blob/master/ghostflow-cli/doc/checks.md
  - Reformatting upon request; if the formatter(s) in use supports
writing the content as intended, there is code to rewrite each
individual patch to conform. This avoids wasting time on either side
for things that can be done automatically (of course, you're also at
the mercy of what the formatter wants…I find it worth it on balance).
  - More advanced merging including gathering trailers for the merge
commit message from comments and other metadata including
`Reviewed-by` and `Tested-by` (also from CI). Also supported is
merging into multiple branches at once (e.g., backports to older
branches with a single MR).
  - Merge train support (we call it the "stage"); this feature is
otherwise locked behind for-pay editions of GitLab.

Right now, GitLab and Github are supported, but other forges can be
supported as well. In addition to the service (which is triggered by
webhook delivery), there's a command line tool for local usage (though
it only implements checking and reformatting at the moment mainly due to
a lack of available time to work on it).

There are other things that are probably of interest to supply chain or
other things such as:

  - every push is stored in a ghostflow-director-side unique ref
(`refs/mr/ID/heads/N` where `N` is an incrementing integer) to avoid
forge-side garbage collection (especially problematic on Github;
I've not noticed GitLab collecting so eagerly)
  - all webhooks are delivered via filesystem and can be archived
(`webhook-listen` is the program that listens and delivers them:
https://gitlab.kitware.com/utils/webhook-listen); events which
trigger failures are stored with some context about what happened;
those that are ignored are stored with a reason for the ignore (see
this crate for the "event loop" of `ghostflow-director` itself:
https://gitlab.kitware.com/utils/rust-json-job-dispatch)
  - the forge is the source of truth; if a ref is force-pushed,
`ghostflow` will accept the state on the forge as gospel instead;
the only non-logging/historical tracking state off-forge includes:
- the config file
- formatter installation (formatting is designed to only use trusted
  binaries; nothing from the repo itself other than which to use)

On the first two points, we had some data loss on our instance once and
using the webhook history and stored refs, I was able to restore code
pushed to projects and "replay" comments that happened since the last
backup (I copied the content and @mentioned the original author).

> At the moment though the only thing people seem to agree on is that
> any system will be based on git. So the plan for now is to first setup
> a larger git(olite) system so that every contributor (also those who
> don't currently have commit access) can easily "post" their git
> repo. This can then hopefully integrate with the systems we already
> have setup (triggering builder CI, flag/match with patchwork/emails,
> etc.) or any future "pull request" l

Re: Updated Sourceware infrastructure plans

2024-05-06 Thread Ben Boeckel via Gcc
On Sun, May 05, 2024 at 08:22:12 +0300, Benson Muite wrote:
> On 04/05/2024 22.56, Ben Boeckel via Overseers wrote:
> > As a fellow FOSS maintainer I definitely appreciate the benefit of being
> > email-based (`mutt` is far better at wrangling notifications from
> > umpteen places than…well basically any website is at even their own),
> > but as a *contributor* it is utterly opaque. It's not always clear if my
> > patch has been seen, if it is waiting on maintainer time, or for me to
> > do something. After one review, what is the courtesy time before pushing
> > a new patchset to avoid a review "crossing in the night" as I push more
> > patches? Did I get everyone that commented on the patch the first time
> > in the Cc list properly? Is a discussion considered resolved (FWIW,
> > Github is annoying with its conversation resolution behavior IMO;
> > GitLab's explicit closing is much better). Has it been merged? To the
> > right place? And that's for patches I author; figuring out the status of
> > patches I'm interested in but not the author of is even harder. A forge
> > surfaces a lot of this information pretty well and, to me, GitLab at
> > least offers usable enough email messages (e.g., discussions on threads
> > will thread in email too) that the public tracking of such things is far
> > more useful on the whole.
> 
> This is an area that also needs standardization of important
> functionality.  Some method of archiving the content is also helpful -
> email does this well but typically does not offer  dashboard. Sourcehut
> makes reading threads using the web interface very easy.

The other thing that email makes difficult to do: jump in on an existing
discussion without having been subscribed previously. I mean, I know
how to tell `mutt` to set an `In-Reply-To` header and munge a proper
reply by hand once I find a `Message-Id` (though a fully proper
`References` header is usually way too much work to be worth it), but
this is not something I expect others to be able to easily perform.

> Web interfaces are difficult to automate, but friendlier for occasional
> use and encouraging new contributions.  Tools separate from the version
> control system such as Gerrit, Phabricator, Rhode Code and Review Board
> also enable discussion management and overview.

Note that forges tend to have very rich APIs. It's certainly not as easy
as clicking around manually for one-off tasks or setting up a shell
pipeline to process some emails, but building automation isn't
impossible.

--Ben


Re: Updated Sourceware infrastructure plans

2024-05-10 Thread Ben Boeckel via Gcc
On Tue, May 07, 2024 at 16:17:24 +, Joseph Myers via Gcc wrote:
> I'd say we have two kinds of patch submission (= two kinds of pull request 
> in a pull request workflow) to consider in the toolchain, and it's 
> important that a PR-based system supports both of them well (and supports 
> a submission changing from one kind to the other, and preferably 
> dependencies between multiple PRs where appropriate).

The way I'd handle this in `ghostflow` is with a description trailer
like `Squash-merge: true` (already implemented trailers include
`Backport`, `Fast-forward`, `Backport-ff`, and `Topic-rename` as
description trailers, so this is a natural extension there).
Alternatively a label can be used, but that is not directly editable by
MR authors that are not also members of the project. There's also a
checkbox either at MR creation and/or merge time to select between them,
but I don't find that easily discoverable (I believe only those with
merge rights can see the button state in general).

> * Simple submissions that are intended to end up as a single commit on the 
> mainline (squash merge).  The overall set of changes to be applied to the 
> mainline is subject to review, and the commit message also is subject to 
> review (review of commit messages isn't always something that PR-based 
> systems seem to handle that well).  But for the most part there isn't a 
> need to rebase these - fixes as a result of review can go as subsequent 
> commits on the source branch (making it easy to review either the 
> individual fixes, or the whole updated set of changes), and merging from 
> upstream into that branch is also OK.  (If there *is* a rebase, the 
> PR-based system should still preserve the history of and comments on 
> previous versions, avoid GCing them and avoid getting confused.)
> 
> * Complicated submissions of patch series, that are intended to end up as 
> a sequence of commits on the mainline (non-squash merge preserving the 
> sequence of commits).  In this case, fixes (or updating from upstream) 
> *do* involve rebases to show what the full new sequence of commits should 
> be (and all individual commits and their commit messages should be subject 
> to review, not just the overall set of changes to be applied).  Again, 
> rebases need handling by the system in a history-preserving way.

There's been a long-standing issue to use `range-diff` in GitLab. I
really don't know why it isn't higher priority, but I suppose having
groups like Sourceware and/or kernel.org interested could move it up a
priority list for them.

https://gitlab.com/gitlab-org/gitlab/-/issues/24096

FWIW, there's also a "comment on commit messages" issue:

https://gitlab.com/gitlab-org/gitlab/-/issues/19691

That said, I've had little issues with rebases losing commits or
discussion on GitLab whereas I've definitely seen things get lost on
Github. I'm unfamiliar with other forges to know there (other than that
Gerrit-likes that track patches are generally workable with rebases).

> GitHub (as an example - obviously not appropriate itself for the 
> toolchain) does much better on simple submissions (either with squash 
> merges, or with merges showing the full history if you don't care about a 
> clean bisectable history), apart from review of commit messages, than it 
> does on complicated submissions or dependencies between PRs (I think 
> systems sometimes used for PR dependencies on GitHub may actually be 
> third-party add-ons).

The way I've tended to handle this is to have one "main MR" that is the
"whole story" with component MRs split out for separate review. Once the
separate MRs are reviewed and merged (with cross references), the main
MR is rebased to incorporate the merged code and simplify its diff. This
helps to review smaller bits while also having the full story available
for viewing.

> Pull request systems have obvious advantages over mailing lists for 
> tracking open submissions - but it's still very easy for an active project 
> to end up with thousands of open PRs, among which it's very hard to find 
> anything.

In CMake, the mechanism used to keep the queue manageable is to have a
`triage:expired` label for closed-for-inactivity (or other reasons) so
that closed-but-only-neglected MRs can be distinguished from
closed-because-not-going-to-be-merged MRs. The "active patch queue"
tends to stay under 20, but sometimes swells to 30 in busy times (as of
this writing, it is at 10 open MRs).

--Ben


Re: [PATCH v1] Remove 'restrict' from 'nptr' in strtol(3)-like functions

2024-07-05 Thread Ben Boeckel via Gcc
On Fri, Jul 05, 2024 at 22:15:44 +0200, Emanuele Torre via Gcc wrote:
> That is 6.7.3.1p3:
> 
> 
> 
> In what follows, a pointer expression E is said to be based on object P
> if (at some sequence point in the execution of B prior to the
> evaluation of E) modifying P to point to a copy of the array object
> into which it formerly pointed would change the value of E.153) Note
> that "based" is defined only for expressions with pointer types.
> 
> Footnote 153) In other words, E depends on the value of P itself rather
> than on the value of an object referenced indirectly through P. For
> example, if identifier p has type (int **restrict), then the pointer
> expressions p and p+1 are based on the restricted pointer object
> designated by p, but the pointer expressions *p and p[1] are not.
> 
> 
> 
> Which would be the same paragraph of the same section on N3047, but
> footnote number 168.

Obviously, we need better types on our standardese pointers. Jonathan
used a pointer of type `footnote _N3220*` while Alejandro was expecting
a pointer of type `footnote _N3047*`. Of course, given that they end up
referring to the same thing, this may have interesting implications when
they are used with `restrict`.

--Ben


Re: Spurious -Wsequence-point with auto

2024-08-16 Thread Ben Boeckel via Gcc
On Fri, Aug 09, 2024 at 11:48:44 +0200, Alejandro Colomar via Gcc wrote:
> On Fri, Aug 09, 2024 at 10:31:55AM GMT, Martin Uecker wrote:
> > BTW: Double evaluation is not only a problem for semantics, but also because
> > it can cause long compile times, cf. the 45 MB macro expansion example
> > in Linux...
> 
> Huh, what's that one?  I didn't know it.  :)

https://lwn.net/Articles/983965/

--Ben


Re: On the subject of module consumer diagnostics.

2024-09-06 Thread Ben Boeckel via Gcc
On Tue, Sep 03, 2024 at 16:53:43 +0100, Iain Sandoe wrote:
> I think that might be a misunderstanding on the part of the author;
> AFAIU both GCC and MSVC _do_ require access to the sources at BMI
> consume-time to give decent diagnostics.   I think that there might be
> confusion because the compilation would suceed on those toolchains
> without the sources - but with poorer diagnostic quality?

Does this have (additional) implications for caching tools and modules?
They cache diagnostic output, but if these other paths showing up or
disappearing affects the output, the cache key should incorporate that
as well. Should there be a way for such tools to get this information
somehow? Ideally the paths would only matter if reported diagnostics
*would* look at the files, not just "there's a BMI that mentions a
source path X" kind of inspection.

--Ben


Re: #pragma once behavior

2024-09-06 Thread Ben Boeckel via Gcc
On Fri, Sep 06, 2024 at 00:03:23 -0500, Jeremy Rifkin wrote:
> Hello,
> 
> I'm looking at #pragma once behavior among the major C/C++ compilers as
> part of a proposal paper for standardizing #pragma once. (This is
> apparently a very controversial topic)
> 
> To put my question up-front: Would GCC ever be open to altering its
> #pragma once behavior to bring it more in-line with behavior from other
> compilers and possibly more in-line with what users expect?
> 
> To elaborate more:
> 
> Design decisions for #pragma once essentially boil down to a file-based
> definitions vs a content-based definition of "same file".
> 
> A file-based definition is easier to reason about and more in-line with
> what users expect, however, distinct copies of headers can't be handled
> and multiple mount points are problematic.
> 
> A content-based definition works for distinct copies, multiple mount
> points, and is completely sufficient 99% of the time, however, it could
> potentially break in hard-to-debug ways in a few notable cases (more
> information later).
> 
> Currently the three major C/C++ compilers treat #pragma once very differently:
> - GCC uses file mtime + file contents
> - Clang uses inodes
> - MSVC uses file path
> 
> None of the major compilers have documented their #pragma once semantics.
> 
> In practice all three of these approaches work pretty well most of the
> time (which is why people feel comfortable using #pragma once). However,
> they can each break in their own ways.
> 
> As mentioned earlier, clang and MSVC's file-based definitions of "same
> file" break for multiple mount points and multiple copies of the same
> header. MSVC's approach breaks for symbolic links and hard links.
> 
> GCC's hybrid approach can break in surprising ways. I have three
> examples to share:
> 
> Example 1:
> 
> Consider a scenario such as:
> 
> usr/
>   include/
> library_a/
>   library_main.hpp
>   foo.hpp
> library_b/
>   library_main.hpp
>   foo.hpp
> src/
>   main.cpp
> 
> main.cpp:
> #include "library_a/library_main.hpp"
> #include "library_b/library_main.hpp"
> 
> And both library_main.hpp's have:
> #pragma once
> #include "foo.hpp"

Could a "uses the relative search path" fact be used to mix into the
file's identity? This way the `once` key would see "this content looked
for things in directory `library_a`" and would see that
`library_b/library_main.hpp`, despite the same content (and mtime) is
actually a different context and actually perform the inclusions?

Of course, this fails if `#include "../common/foo.hpp"` is used in each
location as that *would* then want to elide the second inclusion. I
don't know how this problem is avoided without actually reading the
contents again. But the "I read this file" can remember what relative
paths were searched (since the contents are the same at least).

> Example 2:
> 
> namespace v1 {
> #include "library_v1.hpp"
> }
> namespace v2 {
> #include "library_v2.hpp"
> }
> 
> Where both library headers include their own copy of a shared header
> using #pragma once.

Again, the context of the inclusion matters, so "is wrapped in a scope"
can modify the "onceness" (`extern "C"` is probably the more common
instance).

> Example 3:
> 
> usr/
>   include/
> library/
>   library.hpp
>   vendored-dependency.hpp
> src/
>   main.cpp
>   vendored-dependency.hpp
> 
> main.cpp:
> #include "vendored-dependency.hpp"
> #include 
> 
> library.hpp:
> #pragma once
> #include "vendored-dependency.hpp"

This is basically the same as Example 1 as far as context goes.

Note that context cannot include `#define` state because `#once` is
defined to be the first thing in the file and a file that is intended to
be included multiple times (e.g., Boost.PP shenanigans) in different
states cannot, in good faith, use `#once`.

Hrm…though if we are doing `otherdir/samecontent`, the different
preprocessor state *might* change that "what relative files did we look
for?" state… Nothing is easy :( .

--Ben


Re: On the subject of module consumer diagnostics.

2024-09-06 Thread Ben Boeckel via Gcc
On Fri, Sep 06, 2024 at 09:30:26 -0400, David Malcolm wrote:
> On Fri, 2024-09-06 at 08:44 -0400, Ben Boeckel via Gcc wrote:
> > Does this have (additional) implications for caching tools and
> > modules?
> > They cache diagnostic output, but if these other paths showing up or
> > disappearing affects the output, the cache key should incorporate
> > that
> > as well.
> 
> What kinds of caching tools are you thinking of?

`ccache`, `sccache`, etc. These tools try to detect if the compilation
would be the same and place the object in its output location and report
the cached output on stdout/stderr as performed in the original compile
so that it acts "just like the compiler"'s execution.

> I'm curious about caching of diagnostics, and how the diagnostics are
> represented in the cache.

I know `sccache` just stores it as a text blob; `ccache` is probably the
same, but I haven't been in its code myself to know.

--Ben


Re: How to debug/improve excessive compiler memory usage and compile times

2024-10-01 Thread Ben Boeckel via Gcc
On Tue, Oct 01, 2024 at 18:06:35 +0200, Richard Biener via Gcc wrote:
> Analyze where the compile time is spent and where memory is spent.
> Identify unfitting data structures and algorithms causing the issue.
> Replace with better ones.  That’s what I do for these kind of issues
> in the middle end.

To this end, are there any plans or progress to implementing
`-ftime-trace` profiling output? (It's a clang thing, but there is
tooling that would be able to use GCC's reports if it made them.)


https://aras-p.info/blog/2019/01/12/Investigating-compile-times-and-Clang-ftime-report/

This is also available to do build-wide perf visualization (when using
`ninja`) with this tool that reads `-ftime-trace` files when they're
available):

https://github.com/nico/ninjatracing

(FWIW, I have a dream to have CI builds be able to save out this
information so that longer-term perf trends can be investigated. If this
comes up with project changes to do less silly template magic or
improvements to compilers to be faster, all the better.)

Thanks,

--Ben


Re: Help for git send-email setting

2025-01-13 Thread Ben Boeckel via Gcc
On Mon, Jan 13, 2025 at 01:27:17 +, Hao Liu via Gcc wrote:
> I'm new to GCC community, and try to contribute some patches.
> I am having trouble setting git send-email with Outlook on Linux. Does
> anyone have any successful experiences to share?

I don't believe Outlook is well-suited for sending patches (Linux or
not). There may be workarounds:


https://answers.microsoft.com/en-us/outlook_com/forum/all/configure-and-use-git-send-email-to-work-with/48b7e77c-6727-4541-a78a-f4f8309970a0

https://stackoverflow.com/questions/9814276/git-patch-file-attached-to-outlook-email-gets-modified-by-it

There was mention that there is now `b4` configuration available
in this message:

https://gcc.gnu.org/pipermail/gcc/2024-September/244776.html

though I do not see any updates to the wiki mentioning it. It may make
things a lot easier to juggle as well.

--Ben


Re: Classes Implicitly Declared

2025-02-16 Thread Ben Boeckel via Gcc
Ugh, sorry. the lack of Subject/References broke threading and I missed
this continuation.

On Wed, Feb 12, 2025 at 22:46:23 +, Frederick Virchanza Gotham via Gcc 
wrote:
> I think it might be a possibility given how compiler vendors (and also
> vendors of tools like CMake) aren't really getting very far with
> modules. They've had a few years already.

"Not very far" is probably the opinion of someone that last seriously
checked in on the status in 2021 or so. Things have definitely changed
since then.

Yes, it has taken time. Longer than I'd like (as author of the CMake
support, author of P1689 for dependency scanning, and SG15 Tooling
member). But writing a spec to have toolchains to make what build
systems need to get what users need to test toolchains is not a fast
iteration cycle. Rinse and repeat for `import std`. It will likely be
another run through this cycle for header units too.

But we are finally at a point where named modules can be experimented
with. The timeline has been (roughly):

- 2019 Feb: pushing to get dependency scanning possible (I have patches
  to CMake and GCC to proof-of-concept at the Kona meeting)
- 2019 Mar: CMake uses P1689R2 for its Fortran scanning/dependency
  computations. The implementation is updated for further revision
  updates as time goes on.
- 2020-2022: hammering out P1689 details. As of P1689R5 it is in a place
  where everyone is happy. Alas, my allocations did not align with
  faster progress here.
- 2022 Jun: experimental support for named modules lands in CMake 3.25.
  Community and toolchain implementers experiment with my GCC patches
  and report issues.
- 2023 Sep: my P1689R5 patches land in GCC
- 2023 Oct: CMake 3.28 releases with named imports as a non-experimental
  feature
- 2022 Nov: VS 17.4 releases with P1689 support
- 2023 Feb: clang-scan-deps learns to write P1689R5
- 2024 Apr: Support for clang/libc++ and MSVC/STL `import std;` lands in
  CMake `master`
- 2024 Jul: Experimental support for `import std;` in CMake 3.30.
- 2024 Nov: Support for GCC's `import std;` lands

We're now at a point where *projects* can experiment with modules
meaningfully (via CMake at least). I hope this drives finding issues
both with CMake (of which I'm aware of a number) and toolchains (of
which I'm sure there is some awareness of their existence, but
prioritizing them may be hard without user feedback).

I've also been working to help drive adoption in other build systems:

Bazel: experimental community-driven support (though Google needed some
   convincing to allow the community to experiment with it)
Meson: interest, but no progress AFAIK
Tup: *crickets*
xmake: done

Progress in others that I have not contacted directly about it:

autotools: didn't have a maintainer when I did the first round, but
   zackw has since picked it up…I should reach out
MSBuild: done (though its developers were present for P1689 discussions)

So as long as you're using CMake or xmake, go forth and test. Please
file issues too.

Thanks,

--Ben


  1   2   >