[PATCH 1/1] libcpp: allow UCS_LIMIT codepoints in UTF-8 strings

2023-06-21 Thread Ben Boeckel
libcpp/

* charset.cc: Allow `UCS_LIMIT` in UTF-8 strings.

Reported-by: Damien Guibouret 
Fixes: c1dbaa6656a (libcpp: reject codepoints above 0x10, 2023-06-06)
Signed-off-by: Ben Boeckel 
---
 libcpp/charset.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libcpp/charset.cc b/libcpp/charset.cc
index d4f573e365f..54ebab2b8a4 100644
--- a/libcpp/charset.cc
+++ b/libcpp/charset.cc
@@ -1891,7 +1891,7 @@ cpp_valid_utf8_p (const char *buffer, size_t num_bytes)
 invalid because they cannot be represented in UTF-16.
 
 Reject such values.*/
-  if (cp >= UCS_LIMIT)
+  if (cp > UCS_LIMIT)
return false;
 }
   /* No problems encountered.  */
-- 
2.40.1



[PATCH RESEND 1/1] p1689r5: initial support

2022-10-04 Thread Ben Boeckel
This patch implements support for [P1689R5][] to communicate to a build
system the C++20 module dependencies to build systems so that they may
build `.gcm` files in the proper order.

Support is communicated through the following three new flags:

- `-fdeps-format=` specifies the format for the output. Currently named
  `p1689r5`.

- `-fdeps-file=` specifies the path to the file to write the format to.

- `-fdep-output=` specifies the `.o` that will be written for the TU
  that is scanned. This is required so that the build system can
  correlate the dependency output with the actual compilation that will
  occur.

CMake supports this format as of 17 Jun 2022 (to be part of 3.25.0)
using an experimental feature selection (to allow for future usage
evolution without committing to how it works today). While it remains
experimental, docs may be found in CMake's documentation for
experimental features.

Future work may include using this format for Fortran module
dependencies as well, however this is still pending work.

[P1689R5]: https://isocpp.org/files/papers/P1689R5.html
[cmake-experimental]: 
https://gitlab.kitware.com/cmake/cmake/-/blob/master/Help/dev/experimental.rst

TODO:

- header-unit information fields

Header units (including the standard library headers) are 100%
unsupported right now because the `-E` mechanism wants to import their
BMIs. A new mode (i.e., something more workable than existing `-E`
behavior) that mocks up header units as if they were imported purely
from their path and content would be required.

- non-utf8 paths

The current standard says that paths that are not unambiguously
represented using UTF-8 are not supported (because these cases are rare
and the extra complication is not worth it at this time). Future
versions of the format might have ways of encoding non-UTF-8 paths. For
now, this patch just doesn't support non-UTF-8 paths (ignoring the
"unambiguously represetable in UTF-8" case).

- figure out why junk gets placed at the end of the file

Sometimes it seems like the file gets a lot of `NUL` bytes appended to
it. It happens rarely and seems to be the result of some
`ftruncate`-style call which results in extra padding in the contents.
Noting it here as an observation at least.

Signed-off-by: Ben Boeckel 
---
 gcc/ChangeLog   |   9 ++
 gcc/c-family/ChangeLog  |   6 +
 gcc/c-family/c-opts.cc  |  40 ++-
 gcc/c-family/c.opt  |  12 ++
 gcc/cp/ChangeLog|   5 +
 gcc/cp/module.cc|   3 +-
 gcc/doc/invoke.texi |  15 +++
 gcc/fortran/ChangeLog   |   5 +
 gcc/fortran/cpp.cc  |   4 +-
 gcc/genmatch.cc |   2 +-
 gcc/input.cc|   4 +-
 libcpp/ChangeLog|  11 ++
 libcpp/include/cpplib.h |  12 +-
 libcpp/include/mkdeps.h |  17 ++-
 libcpp/init.cc  |  14 ++-
 libcpp/mkdeps.cc| 235 ++--
 16 files changed, 368 insertions(+), 26 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 6dded16c0e3..2d61de6adde 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,12 @@
+2022-09-20  Ben Boeckel  
+
+   * doc/invoke.texi: Document -fdeps-format=, -fdep-file=, and
+   -fdep-output= flags.
+   * genmatch.cc (main): Add new preprocessor parameter used for C++
+   module tracking.
+   * input.cc (test_lexer): Add new preprocessor parameter used for C++
+   module tracking.
+
 2022-09-19  Torbjörn SVENSSON  
 
* targhooks.cc (default_zero_call_used_regs): Improve sorry
diff --git a/gcc/c-family/ChangeLog b/gcc/c-family/ChangeLog
index ba3d76dd6cb..569dcd96e8c 100644
--- a/gcc/c-family/ChangeLog
+++ b/gcc/c-family/ChangeLog
@@ -1,3 +1,9 @@
+2022-09-20  Ben Boeckel  
+
+   * c-opts.cc (c_common_handle_option): Add fdeps_file variable and
+   -fdeps-format=, -fdep-file=, and -fdep-output= parsing.
+   * c.opt: Add -fdeps-format=, -fdep-file=, and -fdep-output= flags.
+
 2022-09-15  Richard Biener  
 
* c-common.h (build_void_list_node): Remove.
diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc
index babaa2fc157..617d0e93696 100644
--- a/gcc/c-family/c-opts.cc
+++ b/gcc/c-family/c-opts.cc
@@ -77,6 +77,9 @@ static bool verbose;
 /* Dependency output file.  */
 static const char *deps_file;
 
+/* Enhanced dependency output file.  */
+static const char *fdeps_file;
+
 /* The prefix given by -iprefix, if any.  */
 static const char *iprefix;
 
@@ -360,6 +363,23 @@ c_common_handle_option (size_t scode, const char *arg, 
HOST_WIDE_INT value,
   deps_file = arg;
   break;
 
+case OPT_fdep_format_:
+  if (!strcmp (arg, "p1689r5"))
+   cpp_opts->deps.format = DEPS_FMT_P1689R5;
+  else
+   error ("%<-fdep-format=%> unknown format %s", arg);
+  break;
+
+case OPT_fdep_file_:
+  deps_seen = true;
+  fdeps_file = arg;
+  break;
+
+case OPT_fdep_output_:
+  deps_seen = true;
+  defer_opt (code, arg);
+  b

[PATCH RESEND 0/1] RFC: P1689R5 support

2022-10-04 Thread Ben Boeckel
This patch adds initial support for ISO C++'s [P1689R5][], a format for
describing C++ module requirements and provisions based on the source
code. This is required because compiling C++ with modules is not
embarrassingly parallel and need to be ordered to ensure that `import
some_module;` can be satisfied in time by making sure that the TU with
`export import some_module;` is compiled first.

[P1689R5]: https://isocpp.org/files/papers/P1689R5.html

I'd like feedback on the approach taken here with respect to the
user-visible flags. I'll also note that header units are not supported
at this time because the current `-E` behavior with respect to `import
;` is to search for an appropriate `.gcm` file which is not
something such a "scan" can support. A new mode will likely need to be
created (e.g., replacing `-E` with `-fc++-module-scanning` or something)
where headers are looked up "normally" and processed only as much as
scanning requires.

Testing is currently happening in CMake's CI using a prior revision of
this patch (the differences are basically the changelog, some style, and
`trtbd` instead of `p1689r5` as the format name).

For testing within GCC, I'll work on the following:

- scanning non-module source
- scanning module-importing source (`import X;`)
- scanning module-exporting source (`export module X;`)
- scanning module implementation unit (`module X;`)
- flag combinations?

Are there existing tools for handling JSON output for testing purposes?
Basically, something that I can add to the test suite that doesn't care
about whitespace, but checks the structure (with sensible replacements
for absolute paths where relevant)?

For the record, Clang has patches with similar flags and behavior by
Chuanqi Xu here:

https://reviews.llvm.org/D134269

with the same flags (though using my old `trtbd` spelling for the
format name).

Thanks,

--Ben

Ben Boeckel (1):
  p1689r5: initial support

 gcc/ChangeLog   |   9 ++
 gcc/c-family/ChangeLog  |   6 +
 gcc/c-family/c-opts.cc  |  40 ++-
 gcc/c-family/c.opt  |  12 ++
 gcc/cp/ChangeLog|   5 +
 gcc/cp/module.cc|   3 +-
 gcc/doc/invoke.texi |  15 +++
 gcc/fortran/ChangeLog   |   5 +
 gcc/fortran/cpp.cc  |   4 +-
 gcc/genmatch.cc |   2 +-
 gcc/input.cc|   4 +-
 libcpp/ChangeLog|  11 ++
 libcpp/include/cpplib.h |  12 +-
 libcpp/include/mkdeps.h |  17 ++-
 libcpp/init.cc  |  14 ++-
 libcpp/mkdeps.cc| 235 ++--
 16 files changed, 368 insertions(+), 26 deletions(-)


base-commit: d812e8cb2a920fd75768e16ca8ded59ad93c172f
-- 
2.37.3



[PATCH wwwdocs 1/1] [RESEND] gcc-14: document P1689R5 scanning output support

2024-03-08 Thread Ben Boeckel
---
 htdocs/gcc-14/changes.html | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
index 7278f753..b506eeb1 100644
--- a/htdocs/gcc-14/changes.html
+++ b/htdocs/gcc-14/changes.html
@@ -112,6 +112,17 @@ a work-in-progress.
   
 
   
+  C++ module scanning for named modules is now available:
+
+  https://wg21.link/P1689R5";>P1689R5, Format for
+  describing dependencies of source files.
+  
+  The -fdeps-format=, -fdeps-file=, and
+  -fdeps=target= flags may be used to generate P1689 output
+  (the p1689r5 format is the only available format today).
+  
+
+  
 
 
 Runtime Library (libstdc++)
-- 
2.42.0



[PATCH v2 1/2] email: fix patch email addresses

2024-03-08 Thread Ben Boeckel
ChangeLog:

* config-ml.in: Update patch email address.
* symlink-tree: Update patch email address.

Signed-off-by: Ben Boeckel 
---
v1 -> v2:
- add Signed-off-by

---
 config-ml.in | 2 +-
 symlink-tree | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/config-ml.in b/config-ml.in
index 68854a4f16c..6aff74410c0 100644
--- a/config-ml.in
+++ b/config-ml.in
@@ -25,7 +25,7 @@
 # the same distribution terms that you use for the rest of that program.
 #
 # Please report bugs to 
-# and send patches to .
+# and send patches to .
 
 # It is advisable to support a few --enable/--disable options to let the
 # user select which libraries s/he really wants.
diff --git a/symlink-tree b/symlink-tree
index a9d50831b88..5cb95ba66aa 100755
--- a/symlink-tree
+++ b/symlink-tree
@@ -24,7 +24,7 @@
 # the same distribution terms that you use for the rest of that program.
 #
 # Please report bugs to 
-# and send patches to .
+# and send patches to .
 
 # Syntax: symlink-tree srcdir "ignore1 ignore2 ..."
 #

base-commit: 6fe63013e3292a45288461b7efa9d52e0ac234dc
-- 
2.44.0



[PATCH v2 2/2] bugzilla: remove `gcc-bugs@` mailing list address

2024-03-08 Thread Ben Boeckel
Bugzilla is preferred today. Use a URL that gives context about
gathering information prior to actually filing a bug at Bugzilla.

ChangeLog:

* config-ml.in: Replace gcc-bugs@ with bug reporting link.
* symlink-tree: Replace gcc-bugs@ with bug reporting link.

fixincludes/ChangeLog:

* README: Replace gcc-bugs@ with bug reporting link.

gcc/testsuite/ChangeLog:

* lib/file-format.exp: Replace gcc-bugs@ with bug reporting link.

libcpp/ChangeLog:

* configure: Regenerate.
* configure.ac: Replace gcc-bugs@ with bug reporting link.

libdecnumber/ChangeLog:

* configure: Regenerate.
* configure.ac: Replace gcc-bugs@ with bug reporting link.

Signed-off-by: Ben Boeckel 
---
v1 -> v2:
- Use `https://gcc.gnu.org/bugs/ instead of a direct Bugzilla link
- Regenerate `configure` scripts

---
 config-ml.in  |  2 +-
 fixincludes/README|  4 ++--
 gcc/testsuite/lib/file-format.exp |  4 ++--
 libcpp/configure  | 22 +++---
 libcpp/configure.ac   |  2 +-
 libdecnumber/configure| 22 +++---
 libdecnumber/configure.ac |  2 +-
 symlink-tree  |  2 +-
 8 files changed, 30 insertions(+), 30 deletions(-)

diff --git a/config-ml.in b/config-ml.in
index 6aff74410c0..dea86f7b3fe 100644
--- a/config-ml.in
+++ b/config-ml.in
@@ -24,7 +24,7 @@
 # configuration script generated by Autoconf, you may include it under
 # the same distribution terms that you use for the rest of that program.
 #
-# Please report bugs to 
+# Please report bugs to <https://gcc.gnu.org/bugs/>
 # and send patches to .
 
 # It is advisable to support a few --enable/--disable options to let the
diff --git a/fixincludes/README b/fixincludes/README
index 98480165d10..28c9cfee194 100644
--- a/fixincludes/README
+++ b/fixincludes/README
@@ -6,8 +6,8 @@ If you are having some problem with a system header that is 
either
 broken by the manufacturer, or is broken by the fixinclude process,
 then you will need to alter or add information to the include fix
 definitions file, ``inclhack.def''.  Please also send relevant
-information to gcc-b...@gcc.gnu.org, gcc-patches@gcc.gnu.org and,
-please, to me:  bk...@gnu.org.
+information to https://gcc.gnu.org/bugs/, gcc-patches@gcc.gnu.org
+and, please, to me:  bk...@gnu.org.
 
 To make your fix, you will need to do several things:
 
diff --git a/gcc/testsuite/lib/file-format.exp 
b/gcc/testsuite/lib/file-format.exp
index 0670bda9c81..1a6904a9923 100644
--- a/gcc/testsuite/lib/file-format.exp
+++ b/gcc/testsuite/lib/file-format.exp
@@ -14,8 +14,8 @@
 # along with GCC; see the file COPYING3.  If not see
 # <http://www.gnu.org/licenses/>.
 
-# Please email any bugs, comments, and/or additions to this file to:
-# gcc-b...@gcc.gnu.org
+# Please report any bugs, comments, and/or additions to this file to:
+# https://gcc.gnu.org/bugs/
 
 # This file defines a proc for determining the file format in use by the
 # target.  This is useful for tests that are only supported by certain file
diff --git a/libcpp/configure b/libcpp/configure
index 8a38c0546e3..3904a66e190 100755
--- a/libcpp/configure
+++ b/libcpp/configure
@@ -2,7 +2,7 @@
 # Guess values for system-dependent variables and create Makefiles.
 # Generated by GNU Autoconf 2.69 for cpplib  .
 #
-# Report bugs to .
+# Report bugs to <https://gcc.gnu.org/bugs/>.
 #
 #
 # Copyright (C) 1992-1996, 1998-2012 Free Software Foundation, Inc.
@@ -267,10 +267,10 @@ fi
 $as_echo "$0: be upgraded to zsh 4.3.4 or later."
   else
 $as_echo "$0: Please tell bug-autoc...@gnu.org and
-$0: gcc-b...@gcc.gnu.org about your system, including any
-$0: error possibly output before this message. Then install
-$0: a modern shell, or manually run the script under such a
-$0: shell if you do have one."
+$0: https://gcc.gnu.org/bugs/ about your system, including
+$0: any error possibly output before this message. Then
+$0: install a modern shell, or manually run the script
+$0: under such a shell if you do have one."
   fi
   exit 1
 fi
@@ -582,7 +582,7 @@ PACKAGE_NAME='cpplib'
 PACKAGE_TARNAME='cpplib'
 PACKAGE_VERSION=' '
 PACKAGE_STRING='cpplib  '
-PACKAGE_BUGREPORT='gcc-b...@gcc.gnu.org'
+PACKAGE_BUGREPORT='https://gcc.gnu.org/bugs/'
 PACKAGE_URL=''
 
 ac_unique_file="ucnid.h"
@@ -1424,7 +1424,7 @@ Some influential environment variables:
 Use these variables to override the choices made by `configure' or to help
 it to find libraries and programs with nonstandard names/locations.
 
-Report bugs to .
+Report bugs to <https://gcc.gnu.org/bugs/>.
 _ACEOF
 ac_status=$?
 fi
@@ -1684,9 +1684,9 @@ $as_echo "$as_me: WARNING: $2: see the Autoconf 
documentation" >&2;}
 $as_echo "$as_me: WARNING: $2: section \"Present But Cannot Be

Re: [PATCH 1/1] gcc-14: document P1689R5 scanning output support

2024-01-03 Thread Ben Boeckel
On Mon, Nov 20, 2023 at 11:22:56 -0500, Ben Boeckel wrote:
> ---
>  htdocs/gcc-14/changes.html | 11 +++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
> index 7278f753..b506eeb1 100644
> --- a/htdocs/gcc-14/changes.html
> +++ b/htdocs/gcc-14/changes.html
> @@ -112,6 +112,17 @@ a work-in-progress.
>
>  
>
> +  C++ module scanning for named modules is now available:
> +
> +  https://wg21.link/P1689R5";>P1689R5, Format for
> +  describing dependencies of source files.
> +  
> +  The -fdeps-format=, -fdeps-file=, and
> +  -fdeps=target= flags may be used to generate P1689 output
> +  (the p1689r5 format is the only available format today).
> +  
> +
> +  
>  
>  
>  Runtime Library (libstdc++)
> -- 
> 2.42.0

Ping? Is this the right place to submit this patch?

Thanks,

--Ben


[PATCH 1/1] gcc-14: document P1689R5 scanning output support

2023-11-20 Thread Ben Boeckel
---
 htdocs/gcc-14/changes.html | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
index 7278f753..b506eeb1 100644
--- a/htdocs/gcc-14/changes.html
+++ b/htdocs/gcc-14/changes.html
@@ -112,6 +112,17 @@ a work-in-progress.
   
 
   
+  C++ module scanning for named modules is now available:
+
+  https://wg21.link/P1689R5";>P1689R5, Format for
+  describing dependencies of source files.
+  
+  The -fdeps-format=, -fdeps-file=, and
+  -fdeps=target= flags may be used to generate P1689 output
+  (the p1689r5 format is the only available format today).
+  
+
+  
 
 
 Runtime Library (libstdc++)
-- 
2.42.0



[PATCH 1/1] email: fix bug and patch email addresses

2023-11-20 Thread Ben Boeckel
Changelog:

* config-ml.in: Update bug and patch email address.
* symlink-tree: Update bug and patch email address.
---
 config-ml.in | 4 ++--
 symlink-tree | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/config-ml.in b/config-ml.in
index 68854a4f16c..107d31f58f8 100644
--- a/config-ml.in
+++ b/config-ml.in
@@ -24,8 +24,8 @@
 # configuration script generated by Autoconf, you may include it under
 # the same distribution terms that you use for the rest of that program.
 #
-# Please report bugs to 
-# and send patches to .
+# Please report bugs to 
+# and send patches to .
 
 # It is advisable to support a few --enable/--disable options to let the
 # user select which libraries s/he really wants.
diff --git a/symlink-tree b/symlink-tree
index a9d50831b88..47d17e33928 100755
--- a/symlink-tree
+++ b/symlink-tree
@@ -23,8 +23,8 @@
 # configuration script generated by Autoconf, you may include it under
 # the same distribution terms that you use for the rest of that program.
 #
-# Please report bugs to 
-# and send patches to .
+# Please report bugs to 
+# and send patches to .
 
 # Syntax: symlink-tree srcdir "ignore1 ignore2 ..."
 #
-- 
2.42.0



[PATCH 1/1] gcc-14: document P1689R5 scanning output support

2023-11-20 Thread Ben Boeckel
---
 htdocs/gcc-14/changes.html | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
index 7278f753..b506eeb1 100644
--- a/htdocs/gcc-14/changes.html
+++ b/htdocs/gcc-14/changes.html
@@ -112,6 +112,17 @@ a work-in-progress.
   
 
   
+  C++ module scanning for named modules is now available:
+
+  https://wg21.link/P1689R5";>P1689R5, Format for
+  describing dependencies of source files.
+  
+  The -fdeps-format=, -fdeps-file=, and
+  -fdeps=target= flags may be used to generate P1689 output
+  (the p1689r5 format is the only available format today).
+  
+
+  
 
 
 Runtime Library (libstdc++)
-- 
2.42.0



[PATCH 1/1] email: fix bug and patch email addresses

2023-11-20 Thread Ben Boeckel
Changelog:

* config-ml.in: Update bug and patch email address.
* symlink-tree: Update bug and patch email address.
---
 config-ml.in | 4 ++--
 symlink-tree | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/config-ml.in b/config-ml.in
index 68854a4f16c..107d31f58f8 100644
--- a/config-ml.in
+++ b/config-ml.in
@@ -24,8 +24,8 @@
 # configuration script generated by Autoconf, you may include it under
 # the same distribution terms that you use for the rest of that program.
 #
-# Please report bugs to 
-# and send patches to .
+# Please report bugs to 
+# and send patches to .
 
 # It is advisable to support a few --enable/--disable options to let the
 # user select which libraries s/he really wants.
diff --git a/symlink-tree b/symlink-tree
index a9d50831b88..47d17e33928 100755
--- a/symlink-tree
+++ b/symlink-tree
@@ -23,8 +23,8 @@
 # configuration script generated by Autoconf, you may include it under
 # the same distribution terms that you use for the rest of that program.
 #
-# Please report bugs to 
-# and send patches to .
+# Please report bugs to 
+# and send patches to .
 
 # Syntax: symlink-tree srcdir "ignore1 ignore2 ..."
 #
-- 
2.42.0



[PATCH 2/2] bugzilla: remove `gcc-bugs@` mailing list address

2023-11-20 Thread Ben Boeckel
Bugzilla is preferred today.

ChangeLog:

* config-ml.in: Replace gcc-bugs@ with Bugzilla link.
* symlink-tree: Replace gcc-bugs@ with Bugzilla link.

fixincludes/ChangeLog:

* README: Replace gcc-bugs@ with Bugzilla link.

gcc/testsuite/ChangeLog:

* lib/file-format.exp: Replace gcc-bugs@ with Bugzilla link.

libcpp/ChangeLog:

* configure: Replace gcc-bugs@ with Bugzilla link.
* configure.ac: Replace gcc-bugs@ with Bugzilla link.

libdecnumber/ChangeLog:

* configure: Replace gcc-bugs@ with Bugzilla link.
* configure.ac: Replace gcc-bugs@ with Bugzilla link.

Signed-off-by: Ben Boeckel 
---
 config-ml.in  |  2 +-
 fixincludes/README|  4 ++--
 gcc/testsuite/lib/file-format.exp |  4 ++--
 libcpp/configure  | 12 ++--
 libcpp/configure.ac   |  2 +-
 libdecnumber/configure| 12 ++--
 libdecnumber/configure.ac |  2 +-
 symlink-tree  |  2 +-
 8 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/config-ml.in b/config-ml.in
index 6aff74410c0..8724cf6370e 100644
--- a/config-ml.in
+++ b/config-ml.in
@@ -24,7 +24,7 @@
 # configuration script generated by Autoconf, you may include it under
 # the same distribution terms that you use for the rest of that program.
 #
-# Please report bugs to 
+# Please report bugs to <https://gcc.gnu.org/bugzilla>
 # and send patches to .
 
 # It is advisable to support a few --enable/--disable options to let the
diff --git a/fixincludes/README b/fixincludes/README
index 98480165d10..74eb8373224 100644
--- a/fixincludes/README
+++ b/fixincludes/README
@@ -6,8 +6,8 @@ If you are having some problem with a system header that is 
either
 broken by the manufacturer, or is broken by the fixinclude process,
 then you will need to alter or add information to the include fix
 definitions file, ``inclhack.def''.  Please also send relevant
-information to gcc-b...@gcc.gnu.org, gcc-patches@gcc.gnu.org and,
-please, to me:  bk...@gnu.org.
+information to https://gcc.gnu.org/bugzilla, gcc-patches@gcc.gnu.org
+and, please, to me:  bk...@gnu.org.
 
 To make your fix, you will need to do several things:
 
diff --git a/gcc/testsuite/lib/file-format.exp 
b/gcc/testsuite/lib/file-format.exp
index 9bf89e2814c..3bfdc4f8264 100644
--- a/gcc/testsuite/lib/file-format.exp
+++ b/gcc/testsuite/lib/file-format.exp
@@ -14,8 +14,8 @@
 # along with GCC; see the file COPYING3.  If not see
 # <http://www.gnu.org/licenses/>.
 
-# Please email any bugs, comments, and/or additions to this file to:
-# gcc-b...@gcc.gnu.org
+# Please report any bugs, comments, and/or additions to this file to:
+# https://gcc.gnu.org/bugzilla
 
 # This file defines a proc for determining the file format in use by the
 # target.  This is useful for tests that are only supported by certain file
diff --git a/libcpp/configure b/libcpp/configure
index ed98f40a1c1..bdfd83a1973 100755
--- a/libcpp/configure
+++ b/libcpp/configure
@@ -2,7 +2,7 @@
 # Guess values for system-dependent variables and create Makefiles.
 # Generated by GNU Autoconf 2.69 for cpplib  .
 #
-# Report bugs to .
+# Report bugs to <https://gcc.gnu.org/bugzilla>.
 #
 #
 # Copyright (C) 1992-1996, 1998-2012 Free Software Foundation, Inc.
@@ -267,7 +267,7 @@ fi
 $as_echo "$0: be upgraded to zsh 4.3.4 or later."
   else
 $as_echo "$0: Please tell bug-autoc...@gnu.org and
-$0: gcc-b...@gcc.gnu.org about your system, including any
+$0: https://gcc.gnu.org/bugzilla about your system, including any
 $0: error possibly output before this message. Then install
 $0: a modern shell, or manually run the script under such a
 $0: shell if you do have one."
@@ -582,7 +582,7 @@ PACKAGE_NAME='cpplib'
 PACKAGE_TARNAME='cpplib'
 PACKAGE_VERSION=' '
 PACKAGE_STRING='cpplib  '
-PACKAGE_BUGREPORT='gcc-b...@gcc.gnu.org'
+PACKAGE_BUGREPORT='https://gcc.gnu.org/bugzilla'
 PACKAGE_URL=''
 
 ac_unique_file="ucnid.h"
@@ -1410,7 +1410,7 @@ Some influential environment variables:
 Use these variables to override the choices made by `configure' or to help
 it to find libraries and programs with nonstandard names/locations.
 
-Report bugs to .
+Report bugs to <https://gcc.gnu.org/bugzilla>.
 _ACEOF
 ac_status=$?
 fi
@@ -1671,7 +1671,7 @@ $as_echo "$as_me: WARNING: $2: section \"Present But 
Cannot Be Compiled\"" >
 { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: $2: proceeding with the 
compiler's result" >&5
 $as_echo "$as_me: WARNING: $2: proceeding with the compiler's result" >&2;}
 ( $as_echo "## --- ##
-## Report this to gcc-b...@gcc.gnu.org ##
+## Report this to https://gcc.gnu.org/bugzilla ##
 ## --- ##"

Re: [PATCH 2/2] bugzilla: remove `gcc-bugs@` mailing list address

2023-11-22 Thread Ben Boeckel
On Wed, Nov 22, 2023 at 23:15:56 +, Joseph Myers wrote:
> On Mon, 20 Nov 2023, Ben Boeckel wrote:
> 
> > Bugzilla is preferred today.
> > 
> > ChangeLog:
> > 
> > * config-ml.in: Replace gcc-bugs@ with Bugzilla link.
> > * symlink-tree: Replace gcc-bugs@ with Bugzilla link.
> 
> I don't think we should use a URL that redirects (i.e. 
> https://gcc.gnu.org/bugzilla should preferably have a trailing '/'), and 
> arguably we should use https://gcc.gnu.org/bugs/ as the URL; that's the 
> preferred one to point people to for bugs in the compilers themselves, 
> since it gives more instructions on bug reporting (though those 
> instructions may be less relevant for bugs in these files).

I'll update the URL.

> codingconventions.html claims that symlink-tree is "copied from mainline 
> automake".  That is, I think, out-of-date information: automake's 
> contrib/multilib/README says "The master (and probably more up-to-date) 
> copies of the 'config-ml.in' and 'symlink-tree' files are maintained in 
> the GCC development tree".  But it does indicate that 
> codingconventions.html itself should be updated to stop suggesting 
> symlink-tree is maintained elsewhere.

I'll also change that.

> > libcpp/ChangeLog:
> > 
> > * configure: Replace gcc-bugs@ with Bugzilla link.
> > * configure.ac: Replace gcc-bugs@ with Bugzilla link.
> > 
> > libdecnumber/ChangeLog:
> > 
> > * configure: Replace gcc-bugs@ with Bugzilla link.
> > * configure.ac: Replace gcc-bugs@ with Bugzilla link.
> 
> I hope the configure changes are the same as you get with regeneration 
> with the right autoconf version, and so should be described as 
> regeneration in the ChangeLog entries.

Is there a version of autoconf I should use? I have 2.71 laying around
but see that these were generated with 2.69. If you want me to regen
with 2.71, I'll do that as separate prep commits so that this diff is
sensible. Or I can try and dig up a 2.69 in some container to do it.

Thanks,

--Ben


Re: [PATCH v8 0/4] P1689R5 support

2023-09-20 Thread Ben Boeckel
On Tue, Sep 19, 2023 at 17:33:34 -0400, Jason Merrill wrote:
> Pushed, thanks!

Thanks!

Is there a process I can use to backport this to GCC 13?

--Ben


Re: [PATCH wwwdocs 1/1] gcc-14: document P1689R5 scanning output support

2024-04-27 Thread Ben Boeckel
On Sat, Jan 06, 2024 at 14:17:14 +0100, Arsen Arsenović wrote:
> Hi Ben,
> 
> Ben Boeckel  writes:
> 
> > Ping? Is this the right place to submit this patch?
> 
> Yes, this is the correct list, though it is usually recommended to use
> --subject-prefix='PATCH wwwdocs' or such, to catch the right eyes.  See:
> https://gcc.gnu.org/contribute.html#webchanges
> 
> I've added it to my subject, hopefully that works.

No bites yet… Anyone willing to review this patch so that it gets
mentioned on the website?

Thanks,

--Ben


Re: [PATCHv2 wwwdocs 1/1] gcc-14: document P1689R5 scanning output support

2024-05-02 Thread Ben Boeckel
On Tue, Apr 30, 2024 at 10:24:44 +0100, Jonathan Wakely wrote:
> On 20/11/23 11:22 -0500, Ben Boeckel wrote:
> >---
> > htdocs/gcc-14/changes.html | 11 +++
> > 1 file changed, 11 insertions(+)
> >
> >diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
> >index 7278f753..b506eeb1 100644
> >--- a/htdocs/gcc-14/changes.html
> >+++ b/htdocs/gcc-14/changes.html
> >@@ -112,6 +112,17 @@ a work-in-progress.
> >   
> > 
> >   
> >+  C++ module scanning for named modules is now available:
> >+
> >+  https://wg21.link/P1689R5";>P1689R5, Format for
> >+  describing dependencies of source files.
> >+  
> >+  The -fdeps-format=, -fdeps-file=, and
> >+  -fdeps=target= flags may be used to generate P1689 output
> 
> This should be -fdeps-target= not -fdeps=target=.

Whoops, yep.

> >+  (the p1689r5 format is the only available format today).
> 
> I wish the option was more descriptive than "p1689r5", which nobody is
> going to remember (but I assume we don't actually need to specify it
> explicitly since it's the only supported format).

All users of the flag should be having it in the build system itself;
hand-coded makefiles can use it, but will need considerable `jq`
gymnastics to turn the output into properly understood make syntax
snippets for their build.

> >+  
> >+
> 
> Do we need a list for this item? It seems a bit weird that the first
> list item is just the paper  How about just a single paragraph?
> 
> C++ module scanning for named modules is now available, based on the
> format described in https://wg21.link/P1689R5";>P1689R5,
> Format for describing dependencies of source files. The
> -fdeps-format=, -fdeps-file=, and
> -fdeps-target= flags may be used to generate dependency
> information. In GCC 14 p1689r5 is the only valid argument
> for -fdeps-format=.

Sounds good. New patch attached.

--Ben
>From d973efa9689db7d46211721e7c00feea7e6445a6 Mon Sep 17 00:00:00 2001
From: Ben Boeckel 
Date: Thu, 2 May 2024 14:00:01 -0400
Subject: [PATCH 1/1] gcc-14: document P1689R5 scanning output support

---
 htdocs/gcc-14/changes.html | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
index 8dfbf7dc..8998e6c0 100644
--- a/htdocs/gcc-14/changes.html
+++ b/htdocs/gcc-14/changes.html
@@ -513,6 +513,12 @@ a work-in-progress.
   GCC supports a new pragma #pragma GCC novector to
   indicate to the vectorizer not to vectorize the loop annotated with the
   pragma.
+  C++ module scanning for named modules is now available, based on the
+format described inhttps://wg21.link/P1689R5";>P1689R5, Format 
for
+describing dependencies of source files. The -fdeps-format=,
+-fdeps-file=, and -fdeps-target= flags may be
+used to generate dependency information. In GCC 14 p1689r5 is
+the only valid argument for -fdeps-format=.
 
 
 Runtime Library (libstdc++)
-- 
2.44.0



[PATCH 0/4] P1689 followup fixes

2024-05-04 Thread Ben Boeckel
Hi,

Here are some minor fixes to documentation, formatting, and styling to the
P1689R5 support through the `-fdeps-*` flags.

Thanks,

--Ben

Ben Boeckel (4):
  libcpp/mkdeps: fix indentation
  libcpp/init: remove unnecessary `struct` keyword
  gcc/c-family/c-opts: fix quoting for `-fdeps-format=` error message
  gcc/c-family/c.opt: clarify `-fdeps-*` flag documentation

 gcc/c-family/c-opts.cc |  2 +-
 gcc/c-family/c.opt |  6 +++---
 libcpp/init.cc |  2 +-
 libcpp/mkdeps.cc   | 11 ++-
 4 files changed, 11 insertions(+), 10 deletions(-)


base-commit: bba118db3f63cb1e3953a014aa3ac2ad89908950
-- 
2.44.0



[PATCH 1/4] libcpp/mkdeps: fix indentation

2024-05-04 Thread Ben Boeckel
Fixes: 024f135a1e9 (p1689r5: initial support, 2023-09-01)

Reported-by: Roland Illig 

libcpp/

* mkdeps.cc (fdeps_add_target): Fix indentation.

Signed-off-by: Ben Boeckel 
---
 libcpp/mkdeps.cc | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/libcpp/mkdeps.cc b/libcpp/mkdeps.cc
index 4cf0cf09178..8762ead4c34 100644
--- a/libcpp/mkdeps.cc
+++ b/libcpp/mkdeps.cc
@@ -307,11 +307,12 @@ fdeps_add_target (struct mkdeps *d, const char *o, bool 
is_primary)
 {
   o = apply_vpath (d, o);
   if (is_primary)
-  {
-if (d->primary_output)
-  d->fdeps_targets.push (d->primary_output);
-d->primary_output = xstrdup (o);
-  } else
+{
+  if (d->primary_output)
+   d->fdeps_targets.push (d->primary_output);
+  d->primary_output = xstrdup (o);
+}
+  else
 d->fdeps_targets.push (xstrdup (o));
 }
 
-- 
2.44.0



[PATCH 4/4] gcc/c-family/c.opt: clarify `-fdeps-*` flag documentation

2024-05-04 Thread Ben Boeckel
Move the only supported value (as of today) to the flag name itself.
Also reword to clarify that the `-fdeps-file=` file will be written to.

Fixes: 024f135a1e9 (p1689r5: initial support, 2023-09-01)

Reported-by: Roland Illig 

gcc/c-family/

* c.opt: Clarify `-fdeps-*` documentation.

Signed-off-by: Ben Boeckel 
---
 gcc/c-family/c.opt | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 56cccf2a67b..fa82eebb518 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -256,13 +256,13 @@ MT
 C ObjC C++ ObjC++ Joined Separate MissingArgError(missing makefile target 
after %qs)
 -MTAdd a target that does not require quoting.
 
-fdeps-format=
+fdeps-format=p1689r5
 C ObjC C++ ObjC++ NoDriverArg Joined MissingArgError(missing format after %qs)
-Structured format for output dependency information.  Supported (\"p1689r5\").
+Structured format for output dependency information.
 
 fdeps-file=
 C ObjC C++ ObjC++ NoDriverArg Joined MissingArgError(missing output path after 
%qs)
-File for output dependency information.
+File to write dependency information to.
 
 fdeps-target=
 C ObjC C++ ObjC++ NoDriverArg Joined MissingArgError(missing path after %qs)
-- 
2.44.0



[PATCH 3/4] gcc/c-family/c-opts: fix quoting for `-fdeps-format=` error message

2024-05-04 Thread Ben Boeckel
Fixes: 024f135a1e9 (p1689r5: initial support, 2023-09-01)

Reported-by: Roland Illig 

gcc/c-family/

* c-opts.cc (c_common_handle_option): Fix quoting in
`-fdeps-format=` unrecognized parameter error message.

Signed-off-by: Ben Boeckel 
---
 gcc/c-family/c-opts.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc
index be3058dca63..4a164ad0c0b 100644
--- a/gcc/c-family/c-opts.cc
+++ b/gcc/c-family/c-opts.cc
@@ -370,7 +370,7 @@ c_common_handle_option (size_t scode, const char *arg, 
HOST_WIDE_INT value,
   if (!strcmp (arg, "p1689r5"))
cpp_opts->deps.fdeps_format = FDEPS_FMT_P1689R5;
   else
-   error ("%<-fdeps-format=%> unknown format %<%s%>", arg);
+   error ("%<-fdeps-format=%> unknown format %q", arg);
   break;
 
 case OPT_fdeps_file_:
-- 
2.44.0



[PATCH 2/4] libcpp/init: remove unnecessary `struct` keyword

2024-05-04 Thread Ben Boeckel
The initial P1689 patches were written in 2019 and ended up having code
move around over time ended up introducing a `struct` keyword to the
implementation of `cpp_finish`. Remove it to match the rest of the file
and its declaration in the header.

Fixes: 024f135a1e9 (p1689r5: initial support, 2023-09-01)

Reported-by: Roland Illig 

libcpp/

* init.cc (cpp_finish): Remove unnecessary `struct` keyword.

Signed-off-by: Ben Boeckel 
---
 libcpp/init.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libcpp/init.cc b/libcpp/init.cc
index 54fc9236d38..cbd22249b04 100644
--- a/libcpp/init.cc
+++ b/libcpp/init.cc
@@ -862,7 +862,7 @@ read_original_directory (cpp_reader *pfile)
Maybe it should also reset state, such that you could call
cpp_start_read with a new filename to restart processing.  */
 void
-cpp_finish (struct cpp_reader *pfile, FILE *deps_stream, FILE *fdeps_stream)
+cpp_finish (cpp_reader *pfile, FILE *deps_stream, FILE *fdeps_stream)
 {
   /* Warn about unused macros before popping the final buffer.  */
   if (CPP_OPTION (pfile, warn_unused_macros))
-- 
2.44.0



Re: [PATCH 3/4] gcc/c-family/c-opts: fix quoting for `-fdeps-format=` error message

2024-05-08 Thread Ben Boeckel
On Tue, May 07, 2024 at 21:15:09 +, Joseph Myers wrote:
> That can't be right.  The GCC %q is a modifier that needs to have an 
> actual format specifier it modifies (so %qs - which produces the same 
> output as %<%s%> - but not %q by itself).

Yes, I got CI results of failure and noticed that I had prepared the
patches on my laptop, but when I investigated, I had done additional
work on my desktop concurrently I had not pulled back (it builds GCC in
a…reasonable time comparatively) which did have the `%qs` change, but
I've not gotten around to running the test suite again (or reporting
back here). I have another patch revision in the works.

Thanks,

--Ben


[PATCH v6 0/4] P1689R5 support

2023-06-06 Thread Ben Boeckel via Gcc-patches
Hi,

This patch series adds initial support for ISO C++'s [P1689R5][], a
format for describing C++ module requirements and provisions based on
the source code. This is required because compiling C++ with modules is
not embarrassingly parallel and need to be ordered to ensure that
`import some_module;` can be satisfied in time by making sure that any
TU with `export import some_module;` is compiled first.

[P1689R5]: https://isocpp.org/files/papers/P1689R5.html

I've also added patches to include imported module CMI files and the
module mapper file as dependencies of the compilation. I briefly looked
into adding dependencies on response files as well, but that appeared to
need some code contortions to have a `class mkdeps` available before
parsing the command line or to keep the information around until one was
made.

I'd like feedback on the approach taken here with respect to the
user-visible flags. I'll also note that header units are not supported
at this time because the current `-E` behavior with respect to `import
;` is to search for an appropriate `.gcm` file which is not
something such a "scan" can support. A new mode will likely need to be
created (e.g., replacing `-E` with `-fc++-module-scanning` or something)
where headers are looked up "normally" and processed only as much as
scanning requires.

FWIW, Clang as taken an alternate approach with its `clang-scan-deps`
tool rather than using the compiler directly.

Thanks,

--Ben

---
v5 -> v6:

- rebase onto `master` (585c660f041 (reload1: Change return type of
  predicate function from int to bool, 2023-06-06))
- fix crash related to reporting imported CMI files as dependencies
- rework utf-8 validity to patch the new `cpp_valid_utf8_p` function
  instead of the core utf-8 decoding routine to reject invalid
  codepoints (preserves higher-level error detection of invalid utf-8)
- harmonize of `fdeps` spelling in flags, variables, comments, etc.
- rename `-fdeps-output=` to `-fdeps-target=`

v4 -> v5:

- add dependency tracking for imported modules to `-MF`
- add dependency tracking for static module mapper files given to
  `-fmodule-mapper=`

v3 -> v4:

- add missing spaces between function names and arguments

v2 -> v3:

- changelog entries moved to commit messages
- documentation updated/added in the UTF-8 routine editing

v1 -> v2:

- removal of the `deps_write(extra)` parameter to option-checking where
  ndeeded
- default parameter of `cpp_finish(fdeps_stream = NULL)`
- unification of libcpp UTF-8 validity functions from v1
- test cases for flag parsing states (depflags-*) and p1689 output
  (p1689-*)

Ben Boeckel (4):
  libcpp: reject codepoints above 0x10
  p1689r5: initial support
  c++modules: report imported CMI files as dependencies
  c++modules: report module mapper files as a dependency

 gcc/c-family/c-opts.cc|  40 +++-
 gcc/c-family/c.opt|  12 +
 gcc/cp/mapper-client.cc   |   4 +
 gcc/cp/mapper-client.h|   1 +
 gcc/cp/module.cc  |  24 +-
 gcc/doc/invoke.texi   |  15 ++
 gcc/testsuite/g++.dg/modules/depflags-f-MD.C  |   2 +
 gcc/testsuite/g++.dg/modules/depflags-f.C |   1 +
 gcc/testsuite/g++.dg/modules/depflags-fi.C|   3 +
 gcc/testsuite/g++.dg/modules/depflags-fj-MD.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-fj.C|   4 +
 .../g++.dg/modules/depflags-fjo-MD.C  |   4 +
 gcc/testsuite/g++.dg/modules/depflags-fjo.C   |   5 +
 gcc/testsuite/g++.dg/modules/depflags-fo-MD.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-fo.C|   4 +
 gcc/testsuite/g++.dg/modules/depflags-j-MD.C  |   2 +
 gcc/testsuite/g++.dg/modules/depflags-j.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-jo-MD.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-jo.C|   4 +
 gcc/testsuite/g++.dg/modules/depflags-o-MD.C  |   2 +
 gcc/testsuite/g++.dg/modules/depflags-o.C |   3 +
 gcc/testsuite/g++.dg/modules/modules.exp  |   1 +
 gcc/testsuite/g++.dg/modules/p1689-1.C|  18 ++
 gcc/testsuite/g++.dg/modules/p1689-1.exp.json |  27 +++
 gcc/testsuite/g++.dg/modules/p1689-2.C|  16 ++
 gcc/testsuite/g++.dg/modules/p1689-2.exp.json |  16 ++
 gcc/testsuite/g++.dg/modules/p1689-3.C|  14 ++
 gcc/testsuite/g++.dg/modules/p1689-3.exp.json |  16 ++
 gcc/testsuite/g++.dg/modules/p1689-4.C|  14 ++
 gcc/testsuite/g++.dg/modules/p1689-4.exp.json |  14 ++
 gcc/testsuite/g++.dg/modules/p1689-5.C|  14 ++
 gcc/testsuite/g++.dg/modules/p1689-5.exp.json |  14 ++
 gcc/testsuite/g++.dg/modules/test-p1689.py| 222 ++
 gcc/testsuite/lib/modules.exp |  71 ++
 libcpp/charset.cc |   7 +
 libcpp/include/cpplib.h   |  12 +-
 libcpp/include/mkdeps.h   |  17 +-
 libcpp/init.cc|  13 

[PATCH v6 1/4] libcpp: reject codepoints above 0x10FFFF

2023-06-06 Thread Ben Boeckel via Gcc-patches
Unicode does not support such values because they are unrepresentable in
UTF-16.

libcpp/

* charset.cc: Reject encodings of codepoints above 0x10.
UTF-16 does not support such codepoints and therefore all
Unicode rejects such values.

Signed-off-by: Ben Boeckel 
---
 libcpp/charset.cc | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/libcpp/charset.cc b/libcpp/charset.cc
index d7f323b2cd5..3b34d804cf1 100644
--- a/libcpp/charset.cc
+++ b/libcpp/charset.cc
@@ -1886,6 +1886,13 @@ cpp_valid_utf8_p (const char *buffer, size_t num_bytes)
   int err = one_utf8_to_cppchar (&iter, &bytesleft, &cp);
   if (err)
return false;
+
+  /* Additionally, Unicode declares that all codepoints above 0010 are
+invalid because they cannot be represented in UTF-16.
+
+Reject such values.*/
+  if (cp >= 0x10)
+   return false;
 }
   /* No problems encountered.  */
   return true;
-- 
2.40.1



[PATCH v6 4/4] c++modules: report module mapper files as a dependency

2023-06-06 Thread Ben Boeckel via Gcc-patches
It affects the build, and if used as a static file, can reliably be
tracked using the `-MF` mechanism.

gcc/cp/:

* mapper-client.cc, mapper-client.h (open_module_client): Accept
dependency tracking and track module mapper files as
dependencies.
* module.cc (make_mapper, get_mapper): Pass the dependency
tracking class down.

Signed-off-by: Ben Boeckel 
---
 gcc/cp/mapper-client.cc |  4 
 gcc/cp/mapper-client.h  |  1 +
 gcc/cp/module.cc| 18 +-
 3 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/gcc/cp/mapper-client.cc b/gcc/cp/mapper-client.cc
index 39e80df2d25..0ce5679d659 100644
--- a/gcc/cp/mapper-client.cc
+++ b/gcc/cp/mapper-client.cc
@@ -34,6 +34,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "diagnostic-core.h"
 #include "mapper-client.h"
 #include "intl.h"
+#include "mkdeps.h"
 
 #include "../../c++tools/resolver.h"
 
@@ -132,6 +133,7 @@ spawn_mapper_program (char const **errmsg, std::string 
&name,
 
 module_client *
 module_client::open_module_client (location_t loc, const char *o,
+  class mkdeps *deps,
   void (*set_repo) (const char *),
   char const *full_program_name)
 {
@@ -285,6 +287,8 @@ module_client::open_module_client (location_t loc, const 
char *o,
  errmsg = "opening";
else
  {
+   /* Add the mapper file to the dependency tracking. */
+   deps_add_dep (deps, name.c_str ());
if (int l = r->read_tuple_file (fd, ident, false))
  {
if (l > 0)
diff --git a/gcc/cp/mapper-client.h b/gcc/cp/mapper-client.h
index b32723ce296..a3b0b8adc51 100644
--- a/gcc/cp/mapper-client.h
+++ b/gcc/cp/mapper-client.h
@@ -55,6 +55,7 @@ public:
 
 public:
   static module_client *open_module_client (location_t loc, const char *option,
+   class mkdeps *,
void (*set_repo) (const char *),
char const *);
   static void close_module_client (location_t loc, module_client *);
diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index e88ce0a1818..9dbb53d2aaf 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -3969,12 +3969,12 @@ static GTY(()) vec 
*partial_specializations;
 /* Our module mapper (created lazily).  */
 module_client *mapper;
 
-static module_client *make_mapper (location_t loc);
-inline module_client *get_mapper (location_t loc)
+static module_client *make_mapper (location_t loc, class mkdeps *deps);
+inline module_client *get_mapper (location_t loc, class mkdeps *deps)
 {
   auto *res = mapper;
   if (!res)
-res = make_mapper (loc);
+res = make_mapper (loc, deps);
   return res;
 }
 
@@ -14031,7 +14031,7 @@ get_module (const char *ptr)
 /* Create a new mapper connecting to OPTION.  */
 
 module_client *
-make_mapper (location_t loc)
+make_mapper (location_t loc, class mkdeps *deps)
 {
   timevar_start (TV_MODULE_MAPPER);
   const char *option = module_mapper_name;
@@ -14039,7 +14039,7 @@ make_mapper (location_t loc)
 option = getenv ("CXX_MODULE_MAPPER");
 
   mapper = module_client::open_module_client
-(loc, option, &set_cmi_repo,
+(loc, option, deps, &set_cmi_repo,
  (save_decoded_options[0].opt_index == OPT_SPECIAL_program_name)
  && save_decoded_options[0].arg != progname
  ? save_decoded_options[0].arg : nullptr);
@@ -19504,7 +19504,7 @@ maybe_translate_include (cpp_reader *reader, line_maps 
*lmaps, location_t loc,
   dump.push (NULL);
 
   dump () && dump ("Checking include translation '%s'", path);
-  auto *mapper = get_mapper (cpp_main_loc (reader));
+  auto *mapper = get_mapper (cpp_main_loc (reader), cpp_get_deps (reader));
 
   size_t len = strlen (path);
   path = canonicalize_header_name (NULL, loc, true, path, len);
@@ -19620,7 +19620,7 @@ module_begin_main_file (cpp_reader *reader, line_maps 
*lmaps,
 static void
 name_pending_imports (cpp_reader *reader)
 {
-  auto *mapper = get_mapper (cpp_main_loc (reader));
+  auto *mapper = get_mapper (cpp_main_loc (reader), cpp_get_deps (reader));
 
   if (!vec_safe_length (pending_imports))
 /* Not doing anything.  */
@@ -20090,7 +20090,7 @@ init_modules (cpp_reader *reader)
 
   if (!flag_module_lazy)
 /* Get the mapper now, if we're not being lazy.  */
-get_mapper (cpp_main_loc (reader));
+get_mapper (cpp_main_loc (reader), cpp_get_deps (reader));
 
   if (!flag_preprocess_only)
 {
@@ -20300,7 +20300,7 @@ late_finish_module (cpp_reader *reader,  
module_processing_cookie *cookie,
 
   if (!errorcount)
 {
-  auto *mapper = get_mapper (cpp_main_loc (reader));
+  auto *mapper = get_mapper (cpp_main_loc (reader), cpp_get_deps (reader));
   mapper->ModuleCompiled (state->get_flatname ());
 }
   else if (cookie->cmi_name)
-- 
2.40.1



[PATCH v6 2/4] p1689r5: initial support

2023-06-06 Thread Ben Boeckel via Gcc-patches
This patch implements support for [P1689R5][] to communicate to a build
system the C++20 module dependencies to build systems so that they may
build `.gcm` files in the proper order.

Support is communicated through the following three new flags:

- `-fdeps-format=` specifies the format for the output. Currently named
  `p1689r5`.

- `-fdeps-file=` specifies the path to the file to write the format to.

- `-fdeps-target=` specifies the `.o` that will be written for the TU
  that is scanned. This is required so that the build system can
  correlate the dependency output with the actual compilation that will
  occur.

CMake supports this format as of 17 Jun 2022 (to be part of 3.25.0)
using an experimental feature selection (to allow for future usage
evolution without committing to how it works today). While it remains
experimental, docs may be found in CMake's documentation for
experimental features.

Future work may include using this format for Fortran module
dependencies as well, however this is still pending work.

[P1689R5]: https://isocpp.org/files/papers/P1689R5.html
[cmake-experimental]: 
https://gitlab.kitware.com/cmake/cmake/-/blob/master/Help/dev/experimental.rst

TODO:

- header-unit information fields

Header units (including the standard library headers) are 100%
unsupported right now because the `-E` mechanism wants to import their
BMIs. A new mode (i.e., something more workable than existing `-E`
behavior) that mocks up header units as if they were imported purely
from their path and content would be required.

- non-utf8 paths

The current standard says that paths that are not unambiguously
represented using UTF-8 are not supported (because these cases are rare
and the extra complication is not worth it at this time). Future
versions of the format might have ways of encoding non-UTF-8 paths. For
now, this patch just doesn't support non-UTF-8 paths (ignoring the
"unambiguously represetable in UTF-8" case).

- figure out why junk gets placed at the end of the file

Sometimes it seems like the file gets a lot of `NUL` bytes appended to
it. It happens rarely and seems to be the result of some
`ftruncate`-style call which results in extra padding in the contents.
Noting it here as an observation at least.

libcpp/

* include/cpplib.h: Add cpp_deps_format enum.
(cpp_options): Add format field
(cpp_finish): Add dependency stream parameter.
* include/mkdeps.h (deps_add_module_target): Add new preprocessor
parameter used for C++ module tracking.
* init.cc (cpp_finish): Add new preprocessor parameter used for C++
module tracking.
* mkdeps.cc (mkdeps): Implement P1689R5 output.

gcc/

* doc/invoke.texi: Document -fdeps-format=, -fdeps-file=, and
-fdeps-target= flags.

gcc/c-family/

* c-opts.cc (c_common_handle_option): Add fdeps_file variable and
-fdeps-format=, -fdeps-file=, and -fdeps-target= parsing.
* c.opt: Add -fdeps-format=, -fdeps-file=, and -fdeps-target=
flags.

gcc/cp/

* module.cc (preprocessed_module): Pass whether the module is
exported to dependency tracking.

gcc/testsuite/

* g++.dg/modules/depflags-f-MD.C: New test.
* g++.dg/modules/depflags-f.C: New test.
* g++.dg/modules/depflags-fi.C: New test.
* g++.dg/modules/depflags-fj-MD.C: New test.
* g++.dg/modules/depflags-fj.C: New test.
* g++.dg/modules/depflags-fjo-MD.C: New test.
* g++.dg/modules/depflags-fjo.C: New test.
* g++.dg/modules/depflags-fo-MD.C: New test.
* g++.dg/modules/depflags-fo.C: New test.
* g++.dg/modules/depflags-j-MD.C: New test.
* g++.dg/modules/depflags-j.C: New test.
* g++.dg/modules/depflags-jo-MD.C: New test.
* g++.dg/modules/depflags-jo.C: New test.
* g++.dg/modules/depflags-o-MD.C: New test.
* g++.dg/modules/depflags-o.C: New test.
* g++.dg/modules/p1689-1.C: New test.
* g++.dg/modules/p1689-1.exp.json: New test expectation.
* g++.dg/modules/p1689-2.C: New test.
* g++.dg/modules/p1689-2.exp.json: New test expectation.
* g++.dg/modules/p1689-3.C: New test.
* g++.dg/modules/p1689-3.exp.json: New test expectation.
* g++.dg/modules/p1689-4.C: New test.
* g++.dg/modules/p1689-4.exp.json: New test expectation.
* g++.dg/modules/p1689-5.C: New test.
* g++.dg/modules/p1689-5.exp.json: New test expectation.
* g++.dg/modules/modules.exp: Load new P1689 library routines.
* g++.dg/modules/test-p1689.py: New tool for validating P1689 output.
* lib/modules.exp: Support for validating P1689 outputs.

Signed-off-by: Ben Boeckel 
---
 gcc/c-family/c-opts.cc|  40 +++-
 gcc/c-family/c.opt|  12 +
 gcc/cp/module.cc  |   3 +-
 gcc/doc/invoke.texi   |  15 ++
 

[PATCH v6 3/4] c++modules: report imported CMI files as dependencies

2023-06-06 Thread Ben Boeckel via Gcc-patches
They affect the build, so report them via `-MF` mechanisms.

gcc/cp/

* module.cc (do_import): Report imported CMI files as
dependencies.

Signed-off-by: Ben Boeckel 
---
 gcc/cp/module.cc | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index c80f139eb82..e88ce0a1818 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -18966,6 +18966,9 @@ module_state::do_import (cpp_reader *reader, bool 
outermost)
   dump () && dump ("CMI is %s", file);
   if (note_module_cmi_yes || inform_cmi_p)
inform (loc, "reading CMI %qs", file);
+  /* Add the CMI file to the dependency tracking. */
+  if (cpp_get_deps (reader))
+   deps_add_dep (cpp_get_deps (reader), file);
   fd = open (file, O_RDONLY | O_CLOEXEC | O_BINARY);
   e = errno;
 }
-- 
2.40.1



Re: [PATCH v5 4/5] c++modules: report imported CMI files as dependencies

2023-07-18 Thread Ben Boeckel via Gcc-patches
On Tue, Jul 18, 2023 at 16:52:44 -0400, Jason Merrill wrote:
> On 6/25/23 12:36, Ben Boeckel wrote:
> > On Fri, Jun 23, 2023 at 08:12:41 -0400, Nathan Sidwell wrote:
> >> On 6/22/23 22:45, Ben Boeckel wrote:
> >>> On Thu, Jun 22, 2023 at 17:21:42 -0400, Jason Merrill wrote:
> >>>> On 1/25/23 16:06, Ben Boeckel wrote:
> >>>>> They affect the build, so report them via `-MF` mechanisms.
> >>>>
> >>>> Why isn't this covered by the existing code in preprocessed_module?
> >>>
> >>> It appears as though it is neutered in patch 3 where
> >>> `write_make_modules_deps` is used in `make_write` (or will use that name
> >>
> >> Why do you want to record the transitive modules? I would expect just 
> >> noting the
> >> ones with imports directly in the TU would suffice (i.e check the 
> >> 'outermost' arg)
> > 
> > FWIW, only GCC has "fat" modules. MSVC and Clang both require the
> > transitive closure to be passed. The idea there is to minimize the size
> > of individual module files.
> > 
> > If GCC only reads the "fat" modules, then only those should be recorded.
> > If it reads other modules, they should be recorded as well.

For clarification, given:

* a.cppm
```
export module a;
```

* b.cppm
```
export module b;
import a;
```

* use.cppm
```
import b;
```

in a "fat" module setup, `use.cppm` only needs to be told about
`b.cmi` because it contains everything that an importer needs to know
about the `a` module (reachable types, re-exported bits, whatever). With
the "thin" modules, `a.cmi` must be specified when compiling `use.cppm`
to satisfy anything that may be required transitively (e.g., a return
type of some `b` symbol is from `a`). MSVC and Clang (17+) use this
model. I don't know MSVC's rationale, but Clang's is to make CMIs
relocatable by not embedding paths to dependent modules in CMIs. This
should help caching and network transfer sizes in distributed builds.
LLVM/Clang discussion:


https://discourse.llvm.org/t/c-20-modules-should-the-bmis-contain-paths-to-their-dependent-bmis/70422
https://github.com/llvm/llvm-project/issues/62707

Maybe I'm missing how this *actually* works in GCC as I've really only
interacted with it through the command line, but I've not needed to
mention `a.cmi` when compiling `use.cppm`. Is `a.cmi` referenced and
read through some embedded information in `b.cmi` or does `b.cmi`
include enough information to not need to read it at all? If the former,
distributed builds are going to have a problem knowing what files to
send just from the command line (I'll call this "implicit thin"). If the
latter, that is the "fat" CMI that I'm thinking of.

> But wouldn't the transitive modules be dependencies of the direct 
> imports, so (re)building the direct imports would first require building 
> the transitive modules anyway?  Expressing the transitive closure of 
> dependencies for each importer seems redundant when it can be easily 
> derived from the direct dependencies of each module.

I'm not concerned whether it is transitive or not, really. If a file is
read, it should be reported here regardless of the reason. Note that
caching mechanisms may skip actually *doing* the reading, but the
dependencies should still be reported from the cached results as-if the
real work had been performed.

--Ben


Re: [PATCH v5 4/5] c++modules: report imported CMI files as dependencies

2023-07-19 Thread Ben Boeckel via Gcc-patches
On Wed, Jul 19, 2023 at 17:11:08 -0400, Nathan Sidwell wrote:
> GCC is neither of these descriptions.  a CMI does not contain the transitive 
> closure of its imports.  It contains an import table.  That table lists the 
> transitive closure of its imports (it needs that closure to do remapping), 
> and 
> that table contains the CMI pathnames of the direct imports.  Those pathnames 
> are absolute, if the mapper provded an absolute pathm or relative to the CMI 
> repo.
> 
> The rationale here is that if you're building a CMI, Foo, which imports a 
> bunch 
> of modules, those imported CMIs will have the same (relative) location in 
> this 
> compilation and in compilations importing Foo (why would you move them?) Note 
> this is NOT inhibiting relocatable builds, because of the CMI repo.

But it is inhibiting distributed builds because the distributing tool
would need to know:

- what CMIs are actually imported (here, "read the module mapper file"
  (in CMake's case, this is only the modules that are needed; a single
  massive mapper file for an entire project would have extra entries) or
  "act as a proxy for the socket/program specified" for other
  approaches);
- read the CMIs as it sends to the remote side to gather any other CMIs
  that may be needed (recursively);

Contrast this with the MSVC and Clang (17+) mechanism where the command
line contains everything that is needed and a single bolus can be sent.

And relocatable is probably fine. How does it interact with reproducible
builds? Or are GCC CMIs not really something anyone should consider for
installation (even as a "here, maybe this can help consumers"
mechanism)?

> On 7/18/23 20:01, Ben Boeckel wrote:
> > Maybe I'm missing how this *actually* works in GCC as I've really only
> > interacted with it through the command line, but I've not needed to
> > mention `a.cmi` when compiling `use.cppm`. Is `a.cmi` referenced and
> > read through some embedded information in `b.cmi` or does `b.cmi`
> > include enough information to not need to read it at all? If the former,
> > distributed builds are going to have a problem knowing what files to
> > send just from the command line (I'll call this "implicit thin"). If the
> > latter, that is the "fat" CMI that I'm thinking of.
> 
> please don't use perjorative terms like 'fat' and 'thin'.

Sorry, I was internally analogizing to "thinLTO".

--Ben


Re: [PATCH v5 4/5] c++modules: report imported CMI files as dependencies

2023-07-21 Thread Ben Boeckel via Gcc-patches
On Thu, Jul 20, 2023 at 17:00:32 -0400, Nathan Sidwell wrote:
> On 7/19/23 20:47, Ben Boeckel wrote:
> > But it is inhibiting distributed builds because the distributing tool
> > would need to know:
> > 
> > - what CMIs are actually imported (here, "read the module mapper file"
> >(in CMake's case, this is only the modules that are needed; a single
> >massive mapper file for an entire project would have extra entries) or
> >"act as a proxy for the socket/program specified" for other
> >approaches);
> 
> This information is in the machine (& human) README section of the CMI.

Ok. That leaves it up to distributing build tools to figure out at
least.

> > - read the CMIs as it sends to the remote side to gather any other CMIs
> >that may be needed (recursively);
> > 
> > Contrast this with the MSVC and Clang (17+) mechanism where the command
> > line contains everything that is needed and a single bolus can be sent.
> 
> um, the build system needs to create that command line? Where does the build 
> system get that information?  IIUC it'll need to read some file(s) to do that.

It's chained through the P1689 information in the collator as needed. No
extra files need to be read (at least with CMake's approach); certainly
not CMI files.

> > And relocatable is probably fine. How does it interact with reproducible
> > builds? Or are GCC CMIs not really something anyone should consider for
> > installation (even as a "here, maybe this can help consumers"
> > mechanism)?
> 
> Module CMIs should be considered a cacheable artifact.  They are neither 
> object 
> files nor source files.

Sure, cachable sounds fine. What about the installation?

--Ben


Re: [PATCH v5 4/5] c++modules: report imported CMI files as dependencies

2023-07-23 Thread Ben Boeckel via Gcc-patches
On Fri, Jul 21, 2023 at 16:23:07 -0400, Nathan Sidwell wrote:
> It occurs to me that the model I am envisioning is similar to CMake's object 
> libraries.  Object libraries are a convenient name for a bunch of object 
> files. 
> IIUC they're linked by naming the individual object files (or I think the 
> could 
> be implemented as a static lib linked with --whole-archive path/to/libfoo.a 
> -no-whole-archive.  But for this conversation consider them a bunch of 
> separate 
> object files with a convenient group name.

Yes, `--whole-archive` would work great if it had any kind of
portability across CMake's platform set.

> Consider also that object libraries could themselves contain object libraries 
> (I 
> don't know of they can, but it seems like a useful concept).  Then one could 
> create an object library from a collection of object files and object 
> libraries 
> (recursively).  CMake would handle the transitive gtaph.

I think this detail is relevant, but you can use
`$` as an `INTERFACE` sources and it would act
like that, but it is an explicit thing. Instead, `OBJECT` libraries
*only* provide their objects to targets that *directly* link them. If
not, given this:

A (OBJECT library)
B (library of some kind; links PUBLIC to A)
C (links to B)

If `A` has things like linker flags (or, more likely, libraries) as part
of its usage requirements, C will get them on is link line. However, if
OBJECT files are transitive in the same way, the linker (on most
platforms at least) chokes because it now has duplicates of all of A's
symbols: those from the B library and those from A's objects on the link
line.

> Now, allow an object library to itself have some kind of tangible, on-disk 
> representation.  *BUT* not like a static library -- it doesn't include the 
> object files.
> 
> 
> Now that immediately maps onto modules.
> 
> CMI: Object library
> Direct imports: Direct object libraries of an object library
> 
> This is why I don't understand the need explicitly indicate the indirect 
> imports 
> of a CMI.  CMake knows them, because it knows the graph.

Sure, *CMake* knows them, but the *build tool* needs to be told
(typically `make` or `ninja`) because it is what is actually executing
the build graph. The way this is communicated is via `-MF` files and
that's what I'm providing in this patch. Note that `ninja` does not
allow rules to specify such dependencies for other rules than the one it
is reading the file for.

--Ben


Re: [PATCH v5 4/5] c++modules: report imported CMI files as dependencies

2023-07-29 Thread Ben Boeckel via Gcc-patches
On Thu, Jul 27, 2023 at 18:13:48 -0700, Jason Merrill wrote:
> On 7/23/23 20:26, Ben Boeckel wrote:
> > Sure, *CMake* knows them, but the *build tool* needs to be told
> > (typically `make` or `ninja`) because it is what is actually executing
> > the build graph. The way this is communicated is via `-MF` files and
> > that's what I'm providing in this patch. Note that `ninja` does not
> > allow rules to specify such dependencies for other rules than the one it
> > is reading the file for.
> 
> But since the direct imports need to be rebuilt themselves if the 
> transitive imports change, the build graph should be the same whether or 
> not the transitive imports are repeated?  Either way, if a transitive 
> import changes you need to rebuild the direct import and then the importer.

I suppose I have seen enough bad build systems that don't do everything
correctly that I'm interested in creating "pits of success" rather than
"well, you didn't get thing X 100% correct, so you're screwed here too".

The case that I think is most likely here is that someone has a
"superbuild" with 3 projects A, B, and C where C uses B and B uses A. At
the top-level the superbuild exposes just "make projectA
projectB projectC"-granularity (rather than a combined build graph; they
may use different build systems) and then users go into some projectC
directly and forget to update projectB after updating projectA (known to
all use the same compiler/flags so that CMI sharing is possible). The
build it still broken, but ideally they get notified in some useful way
when rebuilding the TU rather than…whatever ends up catching the problem
incidentally.

> I guess it shouldn't hurt to have the transitive imports in the -MF 
> file, as long as they aren't also in the p1689 file, so I'm not 
> particularly opposed to this change, but I don't see how it makes a 
> practical difference.

Correct. The P1689 shouldn't even know about transitive imports (well,
maybe from header units?) as it just records "I saw an `import`
statement" and should never look up CMI files (indeed, we would need
another scanning step to know what CMI files to create for the P1689
scan if they were necessary…).

--Ben


Re: [PATCH v6 0/4] P1689R5 support

2023-06-16 Thread Ben Boeckel via Gcc-patches
On Thu, Jun 08, 2023 at 21:59:13 +0400, Maxim Kuvyrkov wrote:
> This patch series causes ICEs on arm-linux-gnueabihf.  Would you
> please investigate?  Please let me know if you need any in reproducing
> these.

Finally back at it. I tried on aarch64, but wasn't able to reproduce the
errors (alas, it is probably a 32bit thing…let me try with `-m32`). Is
there hardware I can access to try this out on the same target triple?

Alternatively, a backtrace may be able to help pinpoint it enough if you
have the cycles.

Thanks,

--Ben


Re: [PATCH v6 0/4] P1689R5 support

2023-06-16 Thread Ben Boeckel via Gcc-patches
On Fri, Jun 16, 2023 at 15:48:59 -0400, Ben Boeckel wrote:
> On Thu, Jun 08, 2023 at 21:59:13 +0400, Maxim Kuvyrkov wrote:
> > This patch series causes ICEs on arm-linux-gnueabihf.  Would you
> > please investigate?  Please let me know if you need any in reproducing
> > these.
> 
> Finally back at it. I tried on aarch64, but wasn't able to reproduce the
> errors (alas, it is probably a 32bit thing…let me try with `-m32`). Is
> there hardware I can access to try this out on the same target triple?

Trying inside of an i386 container also came up with nothing…I'll try
qemu.

--Ben


Re: [PATCH v6 0/4] P1689R5 support

2023-06-17 Thread Ben Boeckel via Gcc-patches
On Fri, Jun 16, 2023 at 23:55:53 -0400, Jason Merrill wrote:
> I see the same thing with patch 4 on x86_64-pc-linux-gnu, e.g.
> 
> FAIL: g++.dg/modules/ben-1_a.C -std=c++17 (test for excess errors)
> Excess errors:
> /home/jason/gt/gcc/testsuite/g++.dg/modules/ben-1_a.C:9:1: internal
> compiler error: Segmentation fault
> 0x19e2f3c crash_signal
> /home/jason/gt/gcc/toplev.cc:314
> 0x340f3f8 mkdeps::vec::size() const
> /home/jason/gt/libcpp/mkdeps.cc:57
> 0x340dc1f apply_vpath
> /home/jason/gt/libcpp/mkdeps.cc:194
> 0x340e08e deps_add_dep(mkdeps*, char const*)
> /home/jason/gt/libcpp/mkdeps.cc:318
> 0xea7b51 module_client::open_module_client(unsigned int, char const*,
> mkdeps*, void (*)(char const*), char const*)
> /home/jason/gt/gcc/cp/mapper-client.cc:291
> 0xef2ba8 make_mapper
> /home/jason/gt/gcc/cp/module.cc:14042
> 0xf0896c get_mapper(unsigned int, mkdeps*)
> /home/jason/gt/gcc/cp/module.cc:3977
> 0xf032ac name_pending_imports
> /home/jason/gt/gcc/cp/module.cc:19623
> 0xf03a7d preprocessed_module(cpp_reader*)
> /home/jason/gt/gcc/cp/module.cc:19817
> 0xe85104 module_token_cdtor(cpp_reader*, unsigned long)
> /home/jason/gt/gcc/cp/lex.cc:548
> 0xf467b2 cp_lexer_new_main
> /home/jason/gt/gcc/cp/parser.cc:756
> 0xfc1e3a c_parse_file()
> /home/jason/gt/gcc/cp/parser.cc:49725
> 0x11c5bf5 c_common_parse_file()
> /home/jason/gt/gcc/c-family/c-opts.cc:1268

Thanks. I missed a `nullptr` check before calling `deps_add_dep`. I
think I got misled by `make check` returning a zero exit code even if
there are failures.

Thanks,

--Ben


Re: [PATCH v5 3/5] p1689r5: initial support

2023-06-20 Thread Ben Boeckel via Gcc-patches
On Mon, Jun 19, 2023 at 17:33:58 -0400, Jason Merrill wrote:
> On 5/12/23 10:24, Ben Boeckel wrote:
> > `file` can be omitted (the `output_stream` will be used then). I *think*
> > I see that adding:
> > 
> >  %{fdeps_file:-fdeps-file=%{!o:%b.ddi}%{o*:%.ddi%*}}
> 
> %{!fdeps-file: but yes.
> 
> > would at least do for `-fdeps-file` defaults? I don't know if there's a
> > reasonable default for `-fdeps-target=` though given that this command
> > line has no information about the object file that will be used (`-o` is
> > used for preprocessor output since we're leaning on `-E` here).
> 
> I would think it could default to %b.o?

I suppose that could work, yes.

> I had quite a few more comments on the v5 patch that you didn't respond 
> to here or address in the v6 patch; did your mail client hide them from you?

Oof. Sorry, I saw large chunks of quoting and apparently assumed the
rest was fine (I usually do aggressive trimming when doing that style of
review). I see them now. Will go through and include in v7.

--Ben


Re: [PATCH v5 3/5] p1689r5: initial support

2023-06-20 Thread Ben Boeckel via Gcc-patches
On Tue, Feb 14, 2023 at 16:50:27 -0500, Jason Merrill wrote:
> On 1/25/23 13:06, Ben Boeckel wrote:
> > - header-unit information fields
> > 
> > Header units (including the standard library headers) are 100%
> > unsupported right now because the `-E` mechanism wants to import their
> > BMIs. A new mode (i.e., something more workable than existing `-E`
> > behavior) that mocks up header units as if they were imported purely
> > from their path and content would be required.
> 
> I notice that the cpp dependency generation tries (in open_file_failed) 
> to continue after encountering a missing file, is that not sufficient 
> for header units?  Or adjustable to be sufficient?

No. Header units can introduce macros which can be used to modify the
set of modules that are imported. Included headers are "discovered"
dependencies and don't modify the build graph (just add more files that
trigger a rebuild) and can be collected during compilation. Module
dependencies are needed to get the build correct in the first place in
order to:

- order module compilations in the build graph so that imported modules
  are ready before anything using them; and
- computing the set of flags needed for telling the compiler where
  imported modules' CMI files should be located.

> > - non-utf8 paths
> > 
> > The current standard says that paths that are not unambiguously
> > represented using UTF-8 are not supported (because these cases are rare
> > and the extra complication is not worth it at this time). Future
> > versions of the format might have ways of encoding non-UTF-8 paths. For
> > now, this patch just doesn't support non-UTF-8 paths (ignoring the
> > "unambiguously represetable in UTF-8" case).
> 
> typo "representable"

Fixed.

> > diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc
> > index c68a2a27469..1c14ce3fe8e 100644
> > --- a/gcc/c-family/c-opts.cc
> > +++ b/gcc/c-family/c-opts.cc
> > @@ -77,6 +77,9 @@ static bool verbose;
> >   /* Dependency output file.  */
> >   static const char *deps_file;
> >   
> > +/* Enhanced dependency output file.  */
> 
> Maybe "structured", as in the docs?  It isn't really a direct 
> enhancement of the makefile dependencies.

Agreed. I'll also add a link to p1689r5 as a comment for what
"structured" means where it is parsed out.

> > +  if (cpp_opts->deps.format != DEPS_FMT_NONE)
> > +{
> > +  if (!fdeps_file)
> > +   fdeps_stream = out_stream;
> > +  else if (fdeps_file[0] == '-' && fdeps_file[1] == '\0')
> > +   fdeps_stream = stdout;
> 
> You probably want to check that deps_stream and fdeps_stream don't end 
> up as the same stream.

Hmm. But `stdout` is probably fine to use for both though. Basically:

if (fdeps_stream == out_stream && fdeps_stream != stdout)
  make_diagnostic_noise ();

> > @@ -1374,6 +1410,8 @@ handle_deferred_opts (void)
> >   
> > if (opt->code == OPT_MT || opt->code == OPT_MQ)
> >   deps_add_target (deps, opt->arg, opt->code == OPT_MQ);
> > +   else if (opt->code == OPT_fdep_output_)
> > + deps_add_output (deps, opt->arg, true);
> 
> How about fdeps_add_target?

Renamed.

> > diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
> > index ef371ca8c26..630781fdf8a 100644
> > --- a/gcc/c-family/c.opt
> > +++ b/gcc/c-family/c.opt
> > @@ -256,6 +256,18 @@ MT
> >   C ObjC C++ ObjC++ Joined Separate MissingArgError(missing makefile target 
> > after %qs)
> >   -MT   Add a target that does not require quoting.
> >   
> > +fdep-format=
> > +C ObjC C++ ObjC++ NoDriverArg Joined MissingArgError(missing format after 
> > %qs)
> > +Format for output dependency information.  Supported (\"p1689r5\").
> 
> I think we want "structured" here, as well.

Fixed.

> > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > index 06d77983e30..b61c3ebd3ec 100644
> > --- a/gcc/doc/invoke.texi
> > +++ b/gcc/doc/invoke.texi
> > @@ -2791,6 +2791,21 @@ is @option{-fpermitted-flt-eval-methods=c11}.  The 
> > default when in a GNU
> >   dialect (@option{-std=gnu11} or similar) is
> >   @option{-fpermitted-flt-eval-methods=ts-18661-3}.
> >   
> > +@item -fdep-file=@var{file}
> > +@opindex fdep-file
> > +Where to write structured dependency information.
> > +
> > +@item -fdep-format=@var{format}
> > +@opindex fdep-format
> > +The format to use for structured dependency information. @samp{p1689r5} is 
> > the
> > +only supported format righ

Re: [PATCH v5 4/5] c++modules: report imported CMI files as dependencies

2023-06-22 Thread Ben Boeckel via Gcc-patches
On Thu, Jun 22, 2023 at 17:21:42 -0400, Jason Merrill wrote:
> On 1/25/23 16:06, Ben Boeckel wrote:
> > They affect the build, so report them via `-MF` mechanisms.
> 
> Why isn't this covered by the existing code in preprocessed_module?

It appears as though it is neutered in patch 3 where
`write_make_modules_deps` is used in `make_write` (or will use that name
in v7 once I finish up testing). This logic cannot be used for p1689
output because it assumes the location and names of CMI files (`.c++m`)
that will be necessary (it is related to the `CXX_IMPORTS +=` GNU
make/libcody extensions that will, e.g., cause `ninja` to choke if it is
read from `-MF` output as it uses "fancier" Makefile syntax than tools
that are not actually `make` are going to be willing to support). This
codepath is the *actual* filename being read at compile time and is
relevant at all times; it may duplicate what `preprocessed_module` sets
up.

I'm also realizing that this is why I need to pass `-fdeps-format=p1689`
when compiling…there may need to be another, more idiomatic, way to
disable this additional syntax usage in `-MF` output.

--Ben


Re: [PATCH v5 4/5] c++modules: report imported CMI files as dependencies

2023-06-25 Thread Ben Boeckel via Gcc-patches
On Fri, Jun 23, 2023 at 08:12:41 -0400, Nathan Sidwell wrote:
> On 6/22/23 22:45, Ben Boeckel wrote:
> > On Thu, Jun 22, 2023 at 17:21:42 -0400, Jason Merrill wrote:
> >> On 1/25/23 16:06, Ben Boeckel wrote:
> >>> They affect the build, so report them via `-MF` mechanisms.
> >>
> >> Why isn't this covered by the existing code in preprocessed_module?
> > 
> > It appears as though it is neutered in patch 3 where
> > `write_make_modules_deps` is used in `make_write` (or will use that name
> 
> Why do you want to record the transitive modules? I would expect just noting 
> the 
> ones with imports directly in the TU would suffice (i.e check the 'outermost' 
> arg)

FWIW, only GCC has "fat" modules. MSVC and Clang both require the
transitive closure to be passed. The idea there is to minimize the size
of individual module files.

If GCC only reads the "fat" modules, then only those should be recorded.
If it reads other modules, they should be recorded as well.

--Ben


Re: [PATCH v5 5/5] c++modules: report module mapper files as a dependency

2023-06-25 Thread Ben Boeckel via Gcc-patches
On Fri, Jun 23, 2023 at 10:44:11 -0400, Jason Merrill wrote:
> On 1/25/23 16:06, Ben Boeckel wrote:
> > It affects the build, and if used as a static file, can reliably be
> > tracked using the `-MF` mechanism.
> 
> Hmm, this seems a bit like making all .o depend on the Makefile; it 

Technically this is true: the command line for the TU lives in said
Makefile; if I updated it, a new TU would be really nice. This is a
long-standing limitation of `make` though. FWIW, `ninja` fixes it by
tracking the command line used and CMake's Makefiles generator handles
it by storing TU flags in an included file and depending on that file
from the TU output.

> shouldn't be necessary to rebuild all TUs that use modules when we add 
> another module to the mapper file.

If I change it from:

```
mod.a   mod.a.cmi
```

to:

```
mod.a   mod.a.replace.cmi
```

I'd expect a recompile. As with anything, this depends on the
granularity of the mapper files. A global mapper file is very similar to
a global response file and given that we don't have line-change
granularity dependency tracking…

> What is your expected use case for 
> this dependency?

CMake, at least, uses a per-TU mapper file, so any build system using a
similar strategy handling the above case would only affect TUs that
actually list `mod.a`.

--Ben


Re: [PATCH v5 3/5] p1689r5: initial support

2023-06-25 Thread Ben Boeckel via Gcc-patches
On Fri, Jun 23, 2023 at 14:31:17 -0400, Jason Merrill wrote:
> On 6/20/23 15:46, Ben Boeckel wrote:
> > On Tue, Feb 14, 2023 at 16:50:27 -0500, Jason Merrill wrote:
> >> On 1/25/23 13:06, Ben Boeckel wrote:
> 
> >>> Header units (including the standard library headers) are 100%
> >>> unsupported right now because the `-E` mechanism wants to import their
> >>> BMIs. A new mode (i.e., something more workable than existing `-E`
> >>> behavior) that mocks up header units as if they were imported purely
> >>> from their path and content would be required.
> >> >> I notice that the cpp dependency generation tries (in open_file_failed)
> >> to continue after encountering a missing file, is that not sufficient 
> >> for header units?  Or adjustable to be sufficient?
> > 
> > No. Header units can introduce macros which can be used to modify the
> > set of modules that are imported. Included headers are "discovered"
> > dependencies and don't modify the build graph (just add more files that
> > trigger a rebuild) and can be collected during compilation. Module
> > dependencies are needed to get the build correct in the first place in
> > order to:
> > 
> > - order module compilations in the build graph so that imported modules
> >   are ready before anything using them; and
> > - computing the set of flags needed for telling the compiler where
> >   imported modules' CMI files should be located.
> 
> So if the header unit CMI isn't available during dependency generation, 
> would it be better to just #include the header?

It's not so simple: the preprocessor state needs to isolate out
`LOCAL_ONLY` from this case:

```
#define LOCAL_ONLY 1
import ; // The preprocessing of this should *not* see
// `LOCAL_ONLY`.
```

> > Hmm. But `stdout` is probably fine to use for both though. Basically:
> > 
> >  if (fdeps_stream == out_stream && fdeps_stream != stdout)
> >make_diagnostic_noise ();
> 
> (fdeps_stream == deps_stream, but sure, that's reasonable.

Done.

> >> So, I take it this is the common use case you have in mind, generating
> >> Make dependencies for the p1689 file?  When are you thinking the Make
> >> dependencies for the .o are generated?  At build time?
> > 
> > Yes. If an included file changes, the scanning should be performed
> > again. The compilation will have its own `-MF` as well (which should
> > point to the same files plus the CMI files it ends up reading).
> > 
> >> I'm a bit surprised you're using .json rather than an extension that
> >> indicates what the information is.
> > 
> > I can change that; the filename doesn't *really* matter (e.g., CMake
> > uses `.ddi` for "dynamic dependency information").
> 
> That works.

Done.

> >>> `-M` is about discovered dependencies: those that you find out while
> >>> doing work. `-fdep-*` is about ordering dependencies: extracting
> >>> information from file content in order to even order future work around.
> >>
> >> I'm not sure I see the distinction; Makefiles also express ordering
> >> dependencies.  In both cases, you want to find out from the files what
> >> order you will want to process them in when building the project.
> > 
> > Makefiles can express ordering dependencies, but not the `-M` snippets;
> > these are for files that, if changed, should trigger a rebuild. This is > 
> > fundamentally different than module dependencies which instead indicate
> > which *compiles* (or CMI generation if using a 2-phase setup) need to
> > complete before compilation (or CMI generation) of the scanned TU can be
> > performed. Generally generated headers will be ordered manually in the
> > build system description. However, maintaining that same level for
> > in-source dependency information on a per-source level is a *far* higher
> > burden.
> 
> The main difference I see is that the CMI might not exist yet.  As you 
> say, we don't want to require people to write all the dependencies by 
> hand, but that just means we need to be able to generate the 
> dependencies automatically.  In the Make-only model I'm thinking of, one 
> would collect dependencies on an initial failing build, and then start 
> over from the beginning again with the dependencies we discovered.  It's 
> the same two-phase scan and build, but one that uses the same compile 
> commands for both phases.

It's a potentially unbounded set of phases:

- 2 phases per tool that is built that gen

[PATCH v7 3/4] c++modules: report imported CMI files as dependencies

2023-07-02 Thread Ben Boeckel via Gcc-patches
They affect the build, so report them via `-MF` mechanisms.

gcc/cp/

* module.cc (do_import): Report imported CMI files as
dependencies.

gcc/testsuite/

* g++.dg/modules/depreport-1_a.C: New test.
* g++.dg/modules/depreport-1_b.C: New test.
* g++.dg/modules/test-depfile.py: New tool for validating depfile
information.
* lib/modules.exp: Support for validating depfile contents.

Signed-off-by: Ben Boeckel 
---
 gcc/cp/module.cc |   3 +
 gcc/testsuite/g++.dg/modules/depreport-1_a.C |  10 +
 gcc/testsuite/g++.dg/modules/depreport-1_b.C |  12 ++
 gcc/testsuite/g++.dg/modules/test-depfile.py | 187 +++
 gcc/testsuite/lib/modules.exp|  29 +++
 5 files changed, 241 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/modules/depreport-1_a.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depreport-1_b.C
 create mode 100644 gcc/testsuite/g++.dg/modules/test-depfile.py

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 9df60d695b1..f3acc4e02fe 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -18968,6 +18968,9 @@ module_state::do_import (cpp_reader *reader, bool 
outermost)
   dump () && dump ("CMI is %s", file);
   if (note_module_cmi_yes || inform_cmi_p)
inform (loc, "reading CMI %qs", file);
+  /* Add the CMI file to the dependency tracking. */
+  if (cpp_get_deps (reader))
+   deps_add_dep (cpp_get_deps (reader), file);
   fd = open (file, O_RDONLY | O_CLOEXEC | O_BINARY);
   e = errno;
 }
diff --git a/gcc/testsuite/g++.dg/modules/depreport-1_a.C 
b/gcc/testsuite/g++.dg/modules/depreport-1_a.C
new file mode 100644
index 000..241701728a2
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/depreport-1_a.C
@@ -0,0 +1,10 @@
+// { dg-additional-options -fmodules-ts }
+
+export module Foo;
+// { dg-module-cmi Foo }
+
+export class Base
+{
+public:
+  int m;
+};
diff --git a/gcc/testsuite/g++.dg/modules/depreport-1_b.C 
b/gcc/testsuite/g++.dg/modules/depreport-1_b.C
new file mode 100644
index 000..b6e317c6703
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/depreport-1_b.C
@@ -0,0 +1,12 @@
+// { dg-additional-options -fmodules-ts }
+// { dg-additional-options -MD }
+// { dg-additional-options "-MF depreport-1.d" }
+
+import Foo;
+
+void foo ()
+{
+  Base b;
+}
+
+// { dg-final { run-check-module-dep-expect-input "depreport-1.d" 
"gcm.cache/Foo.gcm" } }
diff --git a/gcc/testsuite/g++.dg/modules/test-depfile.py 
b/gcc/testsuite/g++.dg/modules/test-depfile.py
new file mode 100644
index 000..ea4edb61434
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/test-depfile.py
@@ -0,0 +1,187 @@
+import json
+
+
+# Parameters.
+ALL_ERRORS = False
+
+
+def _report_error(msg):
+'''Report an error.'''
+full_msg = 'ERROR: ' + msg
+if ALL_ERRORS:
+print(full_msg)
+else:
+raise RuntimeError(full_msg)
+
+
+class Token(object):
+pass
+
+
+class Output(Token):
+def __init__(self, path):
+self.path = path
+
+
+class Input(Token):
+def __init__(self, path):
+self.path = path
+
+
+class Colon(Token):
+pass
+
+
+class Append(Token):
+pass
+
+
+class Variable(Token):
+def __init__(self, name):
+self.name = name
+
+
+class Word(Token):
+def __init__(self, name):
+self.name = name
+
+
+def validate_depfile(depfile, expect_input=None):
+'''Validate a depfile contains some information
+
+Returns `False` if the information is not found.
+'''
+with open(depfile, 'r') as fin:
+depfile_content = fin.read()
+
+real_lines = []
+join_line = False
+for line in depfile_content.split('\n'):
+# Join the line if needed.
+if join_line:
+line = real_lines.pop() + line
+
+# Detect line continuations.
+join_line = line.endswith('\\')
+# Strip line continuation characters.
+if join_line:
+line = line[:-1]
+
+# Add to the real line set.
+real_lines.append(line)
+
+# Perform tokenization.
+tokenized_lines = []
+for line in real_lines:
+tokenized = []
+join_word = False
+for word in line.split(' '):
+if join_word:
+word = tokenized.pop() + ' ' + word
+
+# Detect word joins.
+join_word = word.endswith('\\')
+# Strip escape character.
+if join_word:
+word = word[:-1]
+
+# Detect `:` at the end of a word.
+if word.endswith(':'):
+tokenized.append(word[:-1])
+word = word[-1]
+
+# Add word to the tokenized set.
+tokenized.append(word)
+
+tokenized_line

[PATCH v7 0/4] P1689R5 support

2023-07-02 Thread Ben Boeckel via Gcc-patches
Hi,

This patch series adds initial support for ISO C++'s [P1689R5][], a
format for describing C++ module requirements and provisions based on
the source code. This is required because compiling C++ with modules is
not embarrassingly parallel and need to be ordered to ensure that
`import some_module;` can be satisfied in time by making sure that any
TU with `export import some_module;` is compiled first.

[P1689R5]: https://isocpp.org/files/papers/P1689R5.html

I've also added patches to include imported module CMI files and the
module mapper file as dependencies of the compilation. I briefly looked
into adding dependencies on response files as well, but that appeared to
need some code contortions to have a `class mkdeps` available before
parsing the command line or to keep the information around until one was
made.

I'd like feedback on the approach taken here with respect to the
user-visible flags. I'll also note that header units are not supported
at this time because the current `-E` behavior with respect to `import
;` is to search for an appropriate `.gcm` file which is not
something such a "scan" can support. A new mode will likely need to be
created (e.g., replacing `-E` with `-fc++-module-scanning` or something)
where headers are looked up "normally" and processed only as much as
scanning requires.

FWIW, Clang as taken an alternate approach with its `clang-scan-deps`
tool rather than using the compiler directly.

Thanks,

--Ben

---
v6 -> v7:

- rebase onto `master` (80ae426a195 (d: Fix core.volatile.volatileLoad
  discarded if result is unused, 2023-07-02))
- add test cases for patches 3 and 4 (new dependency reporting in `-MF`)
- add a Python script to test aspects of generated dependency files
- a new `join` spec function to support `-fdeps-*` defaults based on the
  `-o` flag (needed to strip the leading space that appears otherwise)
- note that JSON writing support should be factored out for use by
  `libcpp` and `gcc` (libiberty?)
- use `.ddi` for the extension of `-fdeps-*` output files by default
- support defaults for `-fdeps-file=` and `-fdeps-target=` when only
  `-fdeps-format=` is provided (with tests)
- error if `-MF` and `-fdeps-file=` are both the same (non-`stdout`)
  file as their formats are incompatible
- expand the documentation on how the `-fdeps-*` flags should be used

v5 -> v6:

- rebase onto `master` (585c660f041 (reload1: Change return type of
  predicate function from int to bool, 2023-06-06))
- fix crash related to reporting imported CMI files as dependencies
- rework utf-8 validity to patch the new `cpp_valid_utf8_p` function
  instead of the core utf-8 decoding routine to reject invalid
  codepoints (preserves higher-level error detection of invalid utf-8)
- harmonize of `fdeps` spelling in flags, variables, comments, etc.
- rename `-fdeps-output=` to `-fdeps-target=`

v4 -> v5:

- add dependency tracking for imported modules to `-MF`
- add dependency tracking for static module mapper files given to
  `-fmodule-mapper=`

v3 -> v4:

- add missing spaces between function names and arguments

v2 -> v3:

- changelog entries moved to commit messages
- documentation updated/added in the UTF-8 routine editing

v1 -> v2:

- removal of the `deps_write(extra)` parameter to option-checking where
  ndeeded
- default parameter of `cpp_finish(fdeps_stream = NULL)`
- unification of libcpp UTF-8 validity functions from v1
- test cases for flag parsing states (depflags-*) and p1689 output
  (p1689-*)

Ben Boeckel (4):
  spec: add a spec function to join arguments
  p1689r5: initial support
  c++modules: report imported CMI files as dependencies
  c++modules: report module mapper files as a dependency

 gcc/c-family/c-opts.cc|  44 +++-
 gcc/c-family/c.opt|  12 +
 gcc/cp/mapper-client.cc   |   5 +
 gcc/cp/mapper-client.h|   1 +
 gcc/cp/module.cc  |  24 +-
 gcc/doc/invoke.texi   |  27 +++
 gcc/gcc.cc|  19 +-
 gcc/json.h|   3 +
 gcc/testsuite/g++.dg/modules/depflags-f-MD.C  |   2 +
 gcc/testsuite/g++.dg/modules/depflags-f.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-fi.C|   4 +
 gcc/testsuite/g++.dg/modules/depflags-fj-MD.C |   3 +
 .../g++.dg/modules/depflags-fj-MF-share.C |   6 +
 gcc/testsuite/g++.dg/modules/depflags-fj.C|   4 +
 .../g++.dg/modules/depflags-fjo-MD.C  |   4 +
 gcc/testsuite/g++.dg/modules/depflags-fjo.C   |   5 +
 gcc/testsuite/g++.dg/modules/depflags-fo-MD.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-fo.C|   4 +
 gcc/testsuite/g++.dg/modules/depflags-j-MD.C  |   2 +
 gcc/testsuite/g++.dg/modules/depflags-j.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-jo-MD.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-jo.C|   4 +
 gcc/testsuite/g++.dg/modu

[PATCH v7 1/4] spec: add a spec function to join arguments

2023-07-02 Thread Ben Boeckel via Gcc-patches
When passing `-o` flags to other options, the typical `-o foo` spelling
leaves a leading whitespace when replacing elsewhere. This ends up
creating flags spelled as `-some-option-with-arg= foo.ext` which doesn't
parse properly. When attempting to make a spec function to just remove
the leading whitespace, the argument splitting ends up masking the
whitespace. However, the intended extension *also* ends up being its own
argument. To perform the desired behavior, the arguments need to be
concatenated together.

gcc/:

* gcc.cc (join_spec_func): Add a spec function to join all
arguments.

Signed-off-by: Ben Boeckel 
---
 gcc/gcc.cc | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/gcc/gcc.cc b/gcc/gcc.cc
index fdfac0b4fe4..44433b80d61 100644
--- a/gcc/gcc.cc
+++ b/gcc/gcc.cc
@@ -447,6 +447,7 @@ static const char *greater_than_spec_func (int, const char 
**);
 static const char *debug_level_greater_than_spec_func (int, const char **);
 static const char *dwarf_version_greater_than_spec_func (int, const char **);
 static const char *find_fortran_preinclude_file (int, const char **);
+static const char *join_spec_func (int, const char **);
 static char *convert_white_space (char *);
 static char *quote_spec (char *);
 static char *quote_spec_arg (char *);
@@ -1772,6 +1773,7 @@ static const struct spec_function static_spec_functions[] 
=
   { "debug-level-gt",  debug_level_greater_than_spec_func },
   { "dwarf-version-gt",dwarf_version_greater_than_spec_func },
   { "fortran-preinclude-file", find_fortran_preinclude_file},
+  { "join",join_spec_func},
 #ifdef EXTRA_SPEC_FUNCTIONS
   EXTRA_SPEC_FUNCTIONS
 #endif
@@ -10975,6 +10977,19 @@ find_fortran_preinclude_file (int argc, const char 
**argv)
   return result;
 }
 
+/* The function takes any number of arguments and joins them together.  */
+
+static const char *
+join_spec_func (int argc, const char **argv)
+{
+  char *result = NULL;
+
+  for (int i = 0; i < argc; ++i)
+result = reconcat(result, result ? result : "", argv[i], NULL);
+
+  return result;
+}
+
 /* If any character in ORIG fits QUOTE_P (_, P), reallocate the string
so as to precede every one of them with a backslash.  Return the
original string or the reallocated one.  */
-- 
2.40.1



[PATCH v7 4/4] c++modules: report module mapper files as a dependency

2023-07-02 Thread Ben Boeckel via Gcc-patches
It affects the build, and if used as a static file, can reliably be
tracked using the `-MF` mechanism.

gcc/cp/:

* mapper-client.cc, mapper-client.h (open_module_client): Accept
dependency tracking and track module mapper files as
dependencies.
* module.cc (make_mapper, get_mapper): Pass the dependency
tracking class down.

gcc/testsuite/:

* g++.dg/modules/depreport-2.modmap: New test.
* g++.dg/modules/depreport-2_a.C: New test.
* g++.dg/modules/depreport-2_b.C: New test.
* g++.dg/modules/test-depfile.py: Support `:|` syntax output
when generating modules.

Signed-off-by: Ben Boeckel 
---
 gcc/cp/mapper-client.cc   |  5 +
 gcc/cp/mapper-client.h|  1 +
 gcc/cp/module.cc  | 18 -
 .../g++.dg/modules/depreport-2.modmap |  2 ++
 gcc/testsuite/g++.dg/modules/depreport-2_a.C  | 15 ++
 gcc/testsuite/g++.dg/modules/depreport-2_b.C  | 14 +
 gcc/testsuite/g++.dg/modules/test-depfile.py  | 20 +++
 7 files changed, 66 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/depreport-2.modmap
 create mode 100644 gcc/testsuite/g++.dg/modules/depreport-2_a.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depreport-2_b.C

diff --git a/gcc/cp/mapper-client.cc b/gcc/cp/mapper-client.cc
index 39e80df2d25..92727195246 100644
--- a/gcc/cp/mapper-client.cc
+++ b/gcc/cp/mapper-client.cc
@@ -34,6 +34,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "diagnostic-core.h"
 #include "mapper-client.h"
 #include "intl.h"
+#include "mkdeps.h"
 
 #include "../../c++tools/resolver.h"
 
@@ -132,6 +133,7 @@ spawn_mapper_program (char const **errmsg, std::string 
&name,
 
 module_client *
 module_client::open_module_client (location_t loc, const char *o,
+  class mkdeps *deps,
   void (*set_repo) (const char *),
   char const *full_program_name)
 {
@@ -285,6 +287,9 @@ module_client::open_module_client (location_t loc, const 
char *o,
  errmsg = "opening";
else
  {
+   /* Add the mapper file to the dependency tracking. */
+   if (deps)
+ deps_add_dep (deps, name.c_str ());
if (int l = r->read_tuple_file (fd, ident, false))
  {
if (l > 0)
diff --git a/gcc/cp/mapper-client.h b/gcc/cp/mapper-client.h
index b32723ce296..a3b0b8adc51 100644
--- a/gcc/cp/mapper-client.h
+++ b/gcc/cp/mapper-client.h
@@ -55,6 +55,7 @@ public:
 
 public:
   static module_client *open_module_client (location_t loc, const char *option,
+   class mkdeps *,
void (*set_repo) (const char *),
char const *);
   static void close_module_client (location_t loc, module_client *);
diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index f3acc4e02fe..77c9edcbc04 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -3969,12 +3969,12 @@ static GTY(()) vec 
*partial_specializations;
 /* Our module mapper (created lazily).  */
 module_client *mapper;
 
-static module_client *make_mapper (location_t loc);
-inline module_client *get_mapper (location_t loc)
+static module_client *make_mapper (location_t loc, class mkdeps *deps);
+inline module_client *get_mapper (location_t loc, class mkdeps *deps)
 {
   auto *res = mapper;
   if (!res)
-res = make_mapper (loc);
+res = make_mapper (loc, deps);
   return res;
 }
 
@@ -14033,7 +14033,7 @@ get_module (const char *ptr)
 /* Create a new mapper connecting to OPTION.  */
 
 module_client *
-make_mapper (location_t loc)
+make_mapper (location_t loc, class mkdeps *deps)
 {
   timevar_start (TV_MODULE_MAPPER);
   const char *option = module_mapper_name;
@@ -14041,7 +14041,7 @@ make_mapper (location_t loc)
 option = getenv ("CXX_MODULE_MAPPER");
 
   mapper = module_client::open_module_client
-(loc, option, &set_cmi_repo,
+(loc, option, deps, &set_cmi_repo,
  (save_decoded_options[0].opt_index == OPT_SPECIAL_program_name)
  && save_decoded_options[0].arg != progname
  ? save_decoded_options[0].arg : nullptr);
@@ -19506,7 +19506,7 @@ maybe_translate_include (cpp_reader *reader, line_maps 
*lmaps, location_t loc,
   dump.push (NULL);
 
   dump () && dump ("Checking include translation '%s'", path);
-  auto *mapper = get_mapper (cpp_main_loc (reader));
+  auto *mapper = get_mapper (cpp_main_loc (reader), cpp_get_deps (reader));
 
   size_t len = strlen (path);
   path = canonicalize_header_name (NULL, loc, true, path, len);
@@ -19622,7 +19622,7 @@ module_begin_main_file (cpp_reader *reader, line_maps 
*lmaps,
 static void
 nam

[PATCH v7 2/4] p1689r5: initial support

2023-07-02 Thread Ben Boeckel via Gcc-patches
les.exp: Support for validating P1689 outputs.

Signed-off-by: Ben Boeckel 
---
 gcc/c-family/c-opts.cc|  44 +++-
 gcc/c-family/c.opt|  12 +
 gcc/cp/module.cc  |   3 +-
 gcc/doc/invoke.texi   |  27 +++
 gcc/gcc.cc|   4 +-
 gcc/json.h|   3 +
 gcc/testsuite/g++.dg/modules/depflags-f-MD.C  |   2 +
 gcc/testsuite/g++.dg/modules/depflags-f.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-fi.C|   4 +
 gcc/testsuite/g++.dg/modules/depflags-fj-MD.C |   3 +
 .../g++.dg/modules/depflags-fj-MF-share.C |   6 +
 gcc/testsuite/g++.dg/modules/depflags-fj.C|   4 +
 .../g++.dg/modules/depflags-fjo-MD.C  |   4 +
 gcc/testsuite/g++.dg/modules/depflags-fjo.C   |   5 +
 gcc/testsuite/g++.dg/modules/depflags-fo-MD.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-fo.C|   4 +
 gcc/testsuite/g++.dg/modules/depflags-j-MD.C  |   2 +
 gcc/testsuite/g++.dg/modules/depflags-j.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-jo-MD.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-jo.C|   4 +
 gcc/testsuite/g++.dg/modules/depflags-o-MD.C  |   2 +
 gcc/testsuite/g++.dg/modules/depflags-o.C |   3 +
 gcc/testsuite/g++.dg/modules/modules.exp  |   1 +
 gcc/testsuite/g++.dg/modules/p1689-1.C|  17 ++
 gcc/testsuite/g++.dg/modules/p1689-1.exp.ddi  |  27 +++
 gcc/testsuite/g++.dg/modules/p1689-2.C|  15 ++
 gcc/testsuite/g++.dg/modules/p1689-2.exp.ddi  |  16 ++
 gcc/testsuite/g++.dg/modules/p1689-3.C|  13 +
 gcc/testsuite/g++.dg/modules/p1689-3.exp.ddi  |  16 ++
 gcc/testsuite/g++.dg/modules/p1689-4.C|  13 +
 gcc/testsuite/g++.dg/modules/p1689-4.exp.ddi  |  14 ++
 gcc/testsuite/g++.dg/modules/p1689-5.C|  13 +
 gcc/testsuite/g++.dg/modules/p1689-5.exp.ddi  |  14 ++
 .../g++.dg/modules/p1689-file-default.C   |  16 ++
 .../g++.dg/modules/p1689-file-default.exp.ddi |  27 +++
 .../g++.dg/modules/p1689-target-default.C |  16 ++
 .../modules/p1689-target-default.exp.ddi  |  27 +++
 gcc/testsuite/g++.dg/modules/test-p1689.py| 222 ++
 gcc/testsuite/lib/modules.exp |  71 ++
 libcpp/include/cpplib.h   |  12 +-
 libcpp/include/mkdeps.h   |   9 +-
 libcpp/init.cc|  13 +-
 libcpp/mkdeps.cc  | 153 +++-
 43 files changed, 859 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-f-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-f.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fi.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fj-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fj-MF-share.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fj.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fjo-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fjo.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fo-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fo.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-j-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-j.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-jo-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-jo.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-o-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-o.C
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-1.C
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-1.exp.ddi
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-2.C
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-2.exp.ddi
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-3.C
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-3.exp.ddi
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-4.C
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-4.exp.ddi
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-5.C
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-5.exp.ddi
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-file-default.C
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-file-default.exp.ddi
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-target-default.C
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-target-default.exp.ddi
 create mode 100644 gcc/testsuite/g++.dg/modules/test-p1689.py
 create mode 100644 gcc/testsuite/lib/modules.exp

diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc
index af19140e382..9d794b2f4de 100644
--- a/gcc/c-family/c-opts.cc
+++ b/gcc/c-family/c-opts.cc
@@ -77,6 +77,9 @@ static bool verbose;
 /* Dependency output file.  */
 static const char *deps_file;
 
+/* Structured dependency output file.  */
+static const char *fdeps_file;
+
 /* The prefix given by -iprefix, if any.  *

Re: [PATCH v5 3/5] p1689r5: initial support

2023-05-12 Thread Ben Boeckel via Gcc-patches
On Tue, Feb 14, 2023 at 16:50:27 -0500, Jason Merrill wrote:
> I notice that the actual flags are all -fdep-*, though some of them are 
> -fdeps-* here, and the internal variables all seem to be fdeps_*.  I 
> lean toward harmonizing on "deps", I think.

Done.

> I don't love the three separate options, but I suppose it's fine.  I'd 
> prefer "target" instead of "output".

Done.

> It should be possible to omit both -file and -target and get reasonable 
> defaults, like the ones for -MD/-MQ in gcc.cc:cpp_unique_options.

`file` can be omitted (the `output_stream` will be used then). I *think*
I see that adding:

%{fdeps_file:-fdeps-file=%{!o:%b.ddi}%{o*:%.ddi%*}}

would at least do for `-fdeps-file` defaults? I don't know if there's a
reasonable default for `-fdeps-target=` though given that this command
line has no information about the object file that will be used (`-o` is
used for preprocessor output since we're leaning on `-E` here).

--Ben


Re: [PATCH v5 1/5] libcpp: reject codepoints above 0x10FFFF

2023-05-12 Thread Ben Boeckel via Gcc-patches
On Mon, Feb 13, 2023 at 10:53:17 -0500, Jason Merrill wrote:
> On 1/25/23 13:06, Ben Boeckel wrote:
> > Unicode does not support such values because they are unrepresentable in
> > UTF-16.
> > 
> > libcpp/
> > 
> > * charset.cc: Reject encodings of codepoints above 0x10.
> > UTF-16 does not support such codepoints and therefore all
> > Unicode rejects such values.
> 
> It seems that this causes a bunch of testsuite failures from tests that 
> expect this limit to be checked elsewhere with a different diagnostic, 
> so I think the easiest thing is to fold this into _cpp_valid_utf8_str 
> instead, i.e.:

Since then, `cpp_valid_utf8_p` has appeared and takes care of the
over-long encodings. The new patchset just checks for codepoints beyond
0x10 and rejects them in this function (and the test suite matches
`master` results for me then).

--Ben


Re: [PATCH v5 4/5] c++modules: report imported CMI files as dependencies

2023-05-12 Thread Ben Boeckel via Gcc-patches
On Mon, Feb 13, 2023 at 13:33:50 -0500, Jason Merrill wrote:
> Both this and the mapper dependency patch seem to cause most of the 
> modules testcases to crash; please remember to run the regression tests 
> (https://gcc.gnu.org/contribute.html#testing)

Fixed for v6. `cpp_get_deps` can return `NULL` which `deps_add_dep`
assumes to not be true; fixed by checking before calling.

--Ben


[PATCH 0/1] RFC: P1689R5 support

2022-09-27 Thread Ben Boeckel via Gcc-patches
This patch adds initial support for ISO C++'s [P1689R5][], a format for
describing C++ module requirements and provisions based on the source
code. This is required because compiling C++ with modules is not
embarrassingly parallel and need to be ordered to ensure that `import
some_module;` can be satisfied in time by making sure that the TU with
`export import some_module;` is compiled first.

[P1689R5]: https://isocpp.org/files/papers/P1689R5.html

I'd like feedback on the approach taken here with respect to the
user-visible flags. I'll also note that header units are not supported
at this time because the current `-E` behavior with respect to `import
;` is to search for an appropriate `.gcm` file which is not
something such a "scan" can support. A new mode will likely need to be
created (e.g., replacing `-E` with `-fc++-module-scanning` or something)
where headers are looked up "normally" and processed only as much as
scanning requires.

Testing is currently happening in CMake's CI using a prior revision of
this patch (the differences are basically the changelog, some style, and
`trtbd` instead of `p1689r5` as the format name).

For testing within GCC, I'll work on the following:

- scanning non-module source
- scanning module-importing source (`import X;`)
- scanning module-exporting source (`export module X;`)
- scanning module implementation unit (`module X;`)
- flag combinations?

Are there existing tools for handling JSON output for testing purposes?
Basically, something that I can add to the test suite that doesn't care
about whitespace, but checks the structure (with sensible replacements
for absolute paths where relevant)?

For the record, Clang has patches with similar flags and behavior by
Chuanqi Xu here:

https://reviews.llvm.org/D134269

with the same flags (though using my old `trtbd` spelling for the
format name).

Thanks,

--Ben

Ben Boeckel (1):
  p1689r5: initial support

 gcc/ChangeLog   |   9 ++
 gcc/c-family/ChangeLog  |   6 +
 gcc/c-family/c-opts.cc  |  40 ++-
 gcc/c-family/c.opt  |  12 ++
 gcc/cp/ChangeLog|   5 +
 gcc/cp/module.cc|   3 +-
 gcc/doc/invoke.texi |  15 +++
 gcc/fortran/ChangeLog   |   5 +
 gcc/fortran/cpp.cc  |   4 +-
 gcc/genmatch.cc |   2 +-
 gcc/input.cc|   4 +-
 libcpp/ChangeLog|  11 ++
 libcpp/include/cpplib.h |  12 +-
 libcpp/include/mkdeps.h |  17 ++-
 libcpp/init.cc  |  14 ++-
 libcpp/mkdeps.cc| 235 ++--
 16 files changed, 368 insertions(+), 26 deletions(-)


base-commit: d812e8cb2a920fd75768e16ca8ded59ad93c172f
-- 
2.37.3



[PATCH 1/1] p1689r5: initial support

2022-09-27 Thread Ben Boeckel via Gcc-patches
This patch implements support for [P1689R5][] to communicate to a build
system the C++20 module dependencies to build systems so that they may
build `.gcm` files in the proper order.

Support is communicated through the following three new flags:

- `-fdeps-format=` specifies the format for the output. Currently named
  `p1689r5`.

- `-fdeps-file=` specifies the path to the file to write the format to.

- `-fdep-output=` specifies the `.o` that will be written for the TU
  that is scanned. This is required so that the build system can
  correlate the dependency output with the actual compilation that will
  occur.

CMake supports this format as of 17 Jun 2022 (to be part of 3.25.0)
using an experimental feature selection (to allow for future usage
evolution without committing to how it works today). While it remains
experimental, docs may be found in CMake's documentation for
experimental features.

Future work may include using this format for Fortran module
dependencies as well, however this is still pending work.

[P1689R5]: https://isocpp.org/files/papers/P1689R5.html
[cmake-experimental]: 
https://gitlab.kitware.com/cmake/cmake/-/blob/master/Help/dev/experimental.rst

TODO:

- header-unit information fields

Header units (including the standard library headers) are 100%
unsupported right now because the `-E` mechanism wants to import their
BMIs. A new mode (i.e., something more workable than existing `-E`
behavior) that mocks up header units as if they were imported purely
from their path and content would be required.

- non-utf8 paths

The current standard says that paths that are not unambiguously
represented using UTF-8 are not supported (because these cases are rare
and the extra complication is not worth it at this time). Future
versions of the format might have ways of encoding non-UTF-8 paths. For
now, this patch just doesn't support non-UTF-8 paths (ignoring the
"unambiguously represetable in UTF-8" case).

- figure out why junk gets placed at the end of the file

Sometimes it seems like the file gets a lot of `NUL` bytes appended to
it. It happens rarely and seems to be the result of some
`ftruncate`-style call which results in extra padding in the contents.
Noting it here as an observation at least.

Signed-off-by: Ben Boeckel 
---
 gcc/ChangeLog   |   9 ++
 gcc/c-family/ChangeLog  |   6 +
 gcc/c-family/c-opts.cc  |  40 ++-
 gcc/c-family/c.opt  |  12 ++
 gcc/cp/ChangeLog|   5 +
 gcc/cp/module.cc|   3 +-
 gcc/doc/invoke.texi |  15 +++
 gcc/fortran/ChangeLog   |   5 +
 gcc/fortran/cpp.cc  |   4 +-
 gcc/genmatch.cc |   2 +-
 gcc/input.cc|   4 +-
 libcpp/ChangeLog|  11 ++
 libcpp/include/cpplib.h |  12 +-
 libcpp/include/mkdeps.h |  17 ++-
 libcpp/init.cc  |  14 ++-
 libcpp/mkdeps.cc| 235 ++--
 16 files changed, 368 insertions(+), 26 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 6dded16c0e3..2d61de6adde 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,12 @@
+2022-09-20  Ben Boeckel  
+
+   * doc/invoke.texi: Document -fdeps-format=, -fdep-file=, and
+   -fdep-output= flags.
+   * genmatch.cc (main): Add new preprocessor parameter used for C++
+   module tracking.
+   * input.cc (test_lexer): Add new preprocessor parameter used for C++
+   module tracking.
+
 2022-09-19  Torbjörn SVENSSON  
 
* targhooks.cc (default_zero_call_used_regs): Improve sorry
diff --git a/gcc/c-family/ChangeLog b/gcc/c-family/ChangeLog
index ba3d76dd6cb..569dcd96e8c 100644
--- a/gcc/c-family/ChangeLog
+++ b/gcc/c-family/ChangeLog
@@ -1,3 +1,9 @@
+2022-09-20  Ben Boeckel  
+
+   * c-opts.cc (c_common_handle_option): Add fdeps_file variable and
+   -fdeps-format=, -fdep-file=, and -fdep-output= parsing.
+   * c.opt: Add -fdeps-format=, -fdep-file=, and -fdep-output= flags.
+
 2022-09-15  Richard Biener  
 
* c-common.h (build_void_list_node): Remove.
diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc
index babaa2fc157..617d0e93696 100644
--- a/gcc/c-family/c-opts.cc
+++ b/gcc/c-family/c-opts.cc
@@ -77,6 +77,9 @@ static bool verbose;
 /* Dependency output file.  */
 static const char *deps_file;
 
+/* Enhanced dependency output file.  */
+static const char *fdeps_file;
+
 /* The prefix given by -iprefix, if any.  */
 static const char *iprefix;
 
@@ -360,6 +363,23 @@ c_common_handle_option (size_t scode, const char *arg, 
HOST_WIDE_INT value,
   deps_file = arg;
   break;
 
+case OPT_fdep_format_:
+  if (!strcmp (arg, "p1689r5"))
+   cpp_opts->deps.format = DEPS_FMT_P1689R5;
+  else
+   error ("%<-fdep-format=%> unknown format %s", arg);
+  break;
+
+case OPT_fdep_file_:
+  deps_seen = true;
+  fdeps_file = arg;
+  break;
+
+case OPT_fdep_output_:
+  deps_seen = true;
+  defer_opt (code, arg);
+  b

Re: [PATCH RESEND 1/1] p1689r5: initial support

2022-10-11 Thread Ben Boeckel via Gcc-patches
On Tue, Oct 04, 2022 at 21:12:03 +0200, Harald Anlauf wrote:
> Am 04.10.22 um 17:12 schrieb Ben Boeckel:
> > This patch implements support for [P1689R5][] to communicate to a build
> > system the C++20 module dependencies to build systems so that they may
> > build `.gcm` files in the proper order.
> 
> Is there a reason that you are touching so many frontends?

Just those that needed the update for `cpp_finish`. It does align with
those that will (eventually) need this support anyways (AFAIK).

> > diff --git a/gcc/fortran/cpp.cc b/gcc/fortran/cpp.cc
> > index 364bd0d2a85..0b9df9c02cd 100644
> > --- a/gcc/fortran/cpp.cc
> > +++ b/gcc/fortran/cpp.cc
> > @@ -712,7 +712,7 @@ gfc_cpp_done (void)
> >   FILE *f = fopen (gfc_cpp_option.deps_filename, "w");
> >   if (f)
> > {
> > - cpp_finish (cpp_in, f);
> > + cpp_finish (cpp_in, f, NULL);
> >   fclose (f);
> > }
> >   else
> > @@ -721,7 +721,7 @@ gfc_cpp_done (void)
> >  xstrerror (errno));
> > }
> > else
> > -   cpp_finish (cpp_in, stdout);
> > +   cpp_finish (cpp_in, stdout, NULL);
> >   }
> >
> > cpp_undef_all (cpp_in);
> 
> Couldn't you simply default the third argument of cpp_finish() to NULL?

I could do that. Wasn't sure how much that would be acceptable given
that it is a "silent" change, but it would reduce the number of files
touched here.

Thanks,

--Ben


Re: [PATCH RESEND 1/1] p1689r5: initial support

2022-10-11 Thread Ben Boeckel via Gcc-patches
On Mon, Oct 10, 2022 at 17:04:09 -0400, Jason Merrill wrote:
> On 10/4/22 11:12, Ben Boeckel wrote:
> > This patch implements support for [P1689R5][] to communicate to a build
> > system the C++20 module dependencies to build systems so that they may
> > build `.gcm` files in the proper order.
> 
> Thanks!
> 
> > Support is communicated through the following three new flags:
> > 
> > - `-fdeps-format=` specifies the format for the output. Currently named
> >`p1689r5`.
> > 
> > - `-fdeps-file=` specifies the path to the file to write the format to.
> 
> Do you expect users to want to emit Makefile (-MM) and P1689 
> dependencies from the same compilation?

Yes, the build system wants to know what files affect scanning as well
(e.g., `#include ` is still possible and if it changes, this
operation should be performed again. The `-M` flags do this quite nicely
already :) .

> > - `-fdep-output=` specifies the `.o` that will be written for the TU
> >that is scanned. This is required so that the build system can
> >correlate the dependency output with the actual compilation that will
> >occur.
> 
> The dependency machinery already needs to be able to figure out the name 
> of the output file, can't we reuse that instead of specifying it on the 
> command line?

This is how it determines the output of the compilation. Because it is
piggy-backing on the `-E` flag, the `-o` flag specifies the output of
the preprocessed source (purely a side-effect right now).

> > diff --git a/libcpp/include/cpplib.h b/libcpp/include/cpplib.h
> > index 2db1e9cbdfb..90787230a9e 100644
> > --- a/libcpp/include/cpplib.h
> > +++ b/libcpp/include/cpplib.h
> > @@ -298,6 +298,9 @@ typedef CPPCHAR_SIGNED_T cppchar_signed_t;
> >   /* Style of header dependencies to generate.  */
> >   enum cpp_deps_style { DEPS_NONE = 0, DEPS_USER, DEPS_SYSTEM };
> >   
> > +/* Format of header dependencies to generate.  */
> > +enum cpp_deps_format { DEPS_FMT_NONE = 0, DEPS_FMT_P1689R5 };
> 
> Why not add this to cpp_deps_style?

It is orthogonal. The `-M` flags and `-fdeps-*` flags are similar, but
`-fdeps-*` dependencies are fundamentally different than `-M`
dependencies. The comment does need updated though.

`-M` is about discovered dependencies: those that you find out while
doing work. `-fdep-*` is about ordering dependencies: extracting
information from file content in order to even order future work around.
It stems from the loss of embarassing parallelism when building C++20
code that uses `import`. It's isomorphic to the Fortran module
compilation ordering problem.

> > @@ -387,7 +410,7 @@ make_write_vec (const mkdeps::vec &vec, 
> > FILE *fp,
> >  .PHONY targets for all the dependencies too.  */
> >   
> >   static void
> > -make_write (const cpp_reader *pfile, FILE *fp, unsigned int colmax)
> > +make_write (const cpp_reader *pfile, FILE *fp, unsigned int colmax, int 
> > extra)
> 
> Instead of adding the "extra" parameter...

Hmm. Not sure why I had named this so poorly. Maybe this comes from my
initial stab at this functionality in 2019 (the format has been hammered
out in ISO C++'s SG15 since then).

> >   {
> > const mkdeps *d = pfile->deps;
> >   
> > @@ -398,7 +421,7 @@ make_write (const cpp_reader *pfile, FILE *fp, unsigned 
> > int colmax)
> > if (d->deps.size ())
> >   {
> > column = make_write_vec (d->targets, fp, 0, colmax, d->quote_lwm);
> > -  if (CPP_OPTION (pfile, deps.modules) && d->cmi_name)
> > +  if (extra && CPP_OPTION (pfile, deps.modules) && d->cmi_name)
> > column = make_write_name (d->cmi_name, fp, column, colmax);
> > fputs (":", fp);
> > column++;
> > @@ -412,7 +435,7 @@ make_write (const cpp_reader *pfile, FILE *fp, unsigned 
> > int colmax)
> > if (!CPP_OPTION (pfile, deps.modules))
> >   return;
> 
> ...how about checking CPP_OPTION for p1689r5 mode here?

I'll take a look at this.

> >   
> > -  if (d->modules.size ())
> > +  if (extra && d->modules.size ())
> >   {
> > column = make_write_vec (d->targets, fp, 0, colmax, d->quote_lwm);
> > if (d->cmi_name)
> > @@ -423,7 +446,7 @@ make_write (const cpp_reader *pfile, FILE *fp, unsigned 
> > int colmax)
> > fputs ("\n", fp);
> >   }
> >   
> > -  if (d->module_name)
> > +  if (extra && d->module_name)
> >   {
> > if (d->cmi_name)
> > {
> > @@ -455,7 +478,7 @@ make_write (c

Re: [PATCH RESEND 1/1] p1689r5: initial support

2022-10-18 Thread Ben Boeckel via Gcc-patches
On Tue, Oct 11, 2022 at 07:42:43 -0400, Ben Boeckel wrote:
> On Mon, Oct 10, 2022 at 17:04:09 -0400, Jason Merrill wrote:
> > Can we share utf8 parsing code with decode_utf8_char in pretty-print.cc?
> 
> I can look at factoring that out. I'll have to decode its logic to see
> how much overlap there is.

There is some mismatch. First, that is in `gcc` and this is in `libcpp`.
Second, `pretty-print.cc`'s implementation:

- fails on an empty string;
- accepts extended-length (5+-byte) encodings which are invalid Unicode;
  and
- decodes codepoint-by-codepoint instead of just validating the entire
  string.

--Ben


Re: [PATCH RESEND 0/1] RFC: P1689R5 support

2022-10-18 Thread Ben Boeckel via Gcc-patches
On Thu, Oct 13, 2022 at 13:08:46 -0400, David Malcolm wrote:
> On Mon, 2022-10-10 at 16:21 -0400, Jason Merrill wrote:
> > David Malcolm would probably know best about JSON wrangling.
> 
> Unfortunately our JSON output doesn't make any guarantees about the
> ordering of keys within an object, so the precise textual output
> changes from run to run.  I've coped with that in my test cases by
> limiting myself to simple regexes of fragments of the JSON output.
> 
> Martin Liska [CCed] went much further in
> 4e275dccfc2467b3fe39012a3dd2a80bac257dd0 by adding a run-gcov-pytest
> DejaGnu directive, allowing for test cases for gcov to be written in
> Python, which can thus test much more interesting assertions about the
> generated JSON.

Ok, if Python is acceptable, I'll use its stdlib to do "fancy" things.
Part of this is because I want to assert that unnecessary fields don't
exist and that sounds…unlikely to be possible in any maintainable way
(assuming it is possible) with regexen. `jq` could help immensely, but
that is probably a bridge too far :) .

Thanks,

--Ben


Re: [PATCH RESEND 1/1] p1689r5: initial support

2022-10-20 Thread Ben Boeckel via Gcc-patches
On Thu, Oct 20, 2022 at 11:39:25 -0400, Jason Merrill wrote:
> Oops, I was thinking this was in gcc as well.  In libcpp there's 
> _cpp_valid_utf8 (which calls one_utf8_to_cppchar).

This routine has a lot more logic (including UCN decoding) and the
`one_utf8_to_cppchar` also supports out-of-bounds codepoints above
`0x10`.

--Ben


[PATCH v2 0/1] RFC: P1689R5 support

2022-10-27 Thread Ben Boeckel via Gcc-patches
Hi,

This patch adds initial support for ISO C++'s [P1689R5][], a format for
describing C++ module requirements and provisions based on the source
code. This is required because compiling C++ with modules is not
embarrassingly parallel and need to be ordered to ensure that `import
some_module;` can be satisfied in time by making sure that the TU with
`export import some_module;` is compiled first.

[P1689R5]: https://isocpp.org/files/papers/P1689R5.html

I'd like feedback on the approach taken here with respect to the
user-visible flags. I'll also note that header units are not supported
at this time because the current `-E` behavior with respect to `import
;` is to search for an appropriate `.gcm` file which is not
something such a "scan" can support. A new mode will likely need to be
created (e.g., replacing `-E` with `-fc++-module-scanning` or something)
where headers are looked up "normally" and processed only as much as
scanning requires.

For the record, Clang has patches with similar flags and behavior by
Chuanqi Xu here:

https://reviews.llvm.org/D134269

with the same flags.

Thanks,

--Ben

---
v1 -> v2:

- removal of the `deps_write(extra)` parameter to option-checking where
  ndeeded
- default parameter of `cpp_finish(fdeps_stream = NULL)`
- unification of libcpp UTF-8 validity functions from v1
- test cases for flag parsing states (depflags-*) and p1689 output
  (p1689-*)

Ben Boeckel (3):
  libcpp: reject codepoints above 0x10
  libcpp: add a function to determine UTF-8 validity of a C string
  p1689r5: initial support

 gcc/ChangeLog |   5 +
 gcc/c-family/ChangeLog|   6 +
 gcc/c-family/c-opts.cc|  40 +++-
 gcc/c-family/c.opt|  12 +
 gcc/cp/ChangeLog  |   5 +
 gcc/cp/module.cc  |   3 +-
 gcc/doc/invoke.texi   |  15 ++
 gcc/testsuite/ChangeLog   |   7 +
 gcc/testsuite/g++.dg/modules/depflags-f-MD.C  |   2 +
 gcc/testsuite/g++.dg/modules/depflags-f.C |   1 +
 gcc/testsuite/g++.dg/modules/depflags-fi.C|   3 +
 gcc/testsuite/g++.dg/modules/depflags-fj-MD.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-fj.C|   4 +
 .../g++.dg/modules/depflags-fjo-MD.C  |   4 +
 gcc/testsuite/g++.dg/modules/depflags-fjo.C   |   5 +
 gcc/testsuite/g++.dg/modules/depflags-fo-MD.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-fo.C|   4 +
 gcc/testsuite/g++.dg/modules/depflags-j-MD.C  |   2 +
 gcc/testsuite/g++.dg/modules/depflags-j.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-jo-MD.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-jo.C|   4 +
 gcc/testsuite/g++.dg/modules/depflags-o-MD.C  |   2 +
 gcc/testsuite/g++.dg/modules/depflags-o.C |   3 +
 gcc/testsuite/g++.dg/modules/modules.exp  |  11 +
 gcc/testsuite/g++.dg/modules/p1689-1.C|  18 ++
 gcc/testsuite/g++.dg/modules/p1689-1.exp.json |  27 +++
 gcc/testsuite/g++.dg/modules/p1689-2.C|  16 ++
 gcc/testsuite/g++.dg/modules/p1689-2.exp.json |  16 ++
 gcc/testsuite/g++.dg/modules/p1689-3.C|  14 ++
 gcc/testsuite/g++.dg/modules/p1689-3.exp.json |  16 ++
 gcc/testsuite/g++.dg/modules/p1689-4.C|  14 ++
 gcc/testsuite/g++.dg/modules/p1689-4.exp.json |  14 ++
 gcc/testsuite/g++.dg/modules/p1689-5.C|  14 ++
 gcc/testsuite/g++.dg/modules/p1689-5.exp.json |  14 ++
 gcc/testsuite/g++.dg/modules/test-p1689.py| 222 ++
 gcc/testsuite/lib/modules.exp |  71 ++
 libcpp/ChangeLog  |  23 ++
 libcpp/charset.cc |  22 +-
 libcpp/include/cpplib.h   |  12 +-
 libcpp/include/mkdeps.h   |  17 +-
 libcpp/init.cc|  13 +-
 libcpp/internal.h |   2 +
 libcpp/mkdeps.cc  | 149 +++-
 43 files changed, 823 insertions(+), 21 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-f-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-f.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fi.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fj-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fj.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fjo-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fjo.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fo-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fo.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-j-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-j.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-jo-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-jo.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-o-MD.C
 create mode 100644

[PATCH v2 2/3] libcpp: add a function to determine UTF-8 validity of a C string

2022-10-27 Thread Ben Boeckel via Gcc-patches
This simplifies the interface for other UTF-8 validity detections when a
simple "yes" or "no" answer is sufficient.

Signed-off-by: Ben Boeckel 
---
 libcpp/ChangeLog  |  6 ++
 libcpp/charset.cc | 18 ++
 libcpp/internal.h |  2 ++
 3 files changed, 26 insertions(+)

diff --git a/libcpp/ChangeLog b/libcpp/ChangeLog
index 4d707277531..4e2c7900ae2 100644
--- a/libcpp/ChangeLog
+++ b/libcpp/ChangeLog
@@ -1,3 +1,9 @@
+2022-10-27  Ben Boeckel  
+
+   * include/charset.cc: Add `_cpp_valid_utf8_str` which determines
+   whether a C string is valid UTF-8 or not.
+   * include/internal.h: Add prototype for `_cpp_valid_utf8_str`.
+
 2022-10-27  Ben Boeckel  
 
* include/charset.cc: Reject encodings of codepoints above 0x10.
diff --git a/libcpp/charset.cc b/libcpp/charset.cc
index e9da6674b5f..0524ab6beba 100644
--- a/libcpp/charset.cc
+++ b/libcpp/charset.cc
@@ -1864,6 +1864,24 @@ _cpp_valid_utf8 (cpp_reader *pfile,
   return true;
 }
 
+extern bool
+_cpp_valid_utf8_str (const char *name)
+{
+  const uchar* in = (const uchar*)name;
+  size_t len = strlen(name);
+  cppchar_t cp;
+
+  while (*in)
+{
+  if (one_utf8_to_cppchar(&in, &len, &cp))
+   {
+ return false;
+   }
+}
+
+  return true;
+}
+
 /* Subroutine of convert_hex and convert_oct.  N is the representation
in the execution character set of a numeric escape; write it into the
string buffer TBUF and update the end-of-string pointer therein.  WIDE
diff --git a/libcpp/internal.h b/libcpp/internal.h
index badfd1b40da..4f2dd4a2f5c 100644
--- a/libcpp/internal.h
+++ b/libcpp/internal.h
@@ -834,6 +834,8 @@ extern bool _cpp_valid_utf8 (cpp_reader *pfile,
 struct normalize_state *nst,
 cppchar_t *cp);
 
+extern bool _cpp_valid_utf8_str (const char *str);
+
 extern void _cpp_destroy_iconv (cpp_reader *);
 extern unsigned char *_cpp_convert_input (cpp_reader *, const char *,
  unsigned char *, size_t, size_t,
-- 
2.37.3



[PATCH v2 3/3] p1689r5: initial support

2022-10-27 Thread Ben Boeckel via Gcc-patches
This patch implements support for [P1689R5][] to communicate to a build
system the C++20 module dependencies to build systems so that they may
build `.gcm` files in the proper order.

Support is communicated through the following three new flags:

- `-fdeps-format=` specifies the format for the output. Currently named
  `p1689r5`.

- `-fdeps-file=` specifies the path to the file to write the format to.

- `-fdep-output=` specifies the `.o` that will be written for the TU
  that is scanned. This is required so that the build system can
  correlate the dependency output with the actual compilation that will
  occur.

CMake supports this format as of 17 Jun 2022 (to be part of 3.25.0)
using an experimental feature selection (to allow for future usage
evolution without committing to how it works today). While it remains
experimental, docs may be found in CMake's documentation for
experimental features.

Future work may include using this format for Fortran module
dependencies as well, however this is still pending work.

[P1689R5]: https://isocpp.org/files/papers/P1689R5.html
[cmake-experimental]: 
https://gitlab.kitware.com/cmake/cmake/-/blob/master/Help/dev/experimental.rst

TODO:

- header-unit information fields

Header units (including the standard library headers) are 100%
unsupported right now because the `-E` mechanism wants to import their
BMIs. A new mode (i.e., something more workable than existing `-E`
behavior) that mocks up header units as if they were imported purely
from their path and content would be required.

- non-utf8 paths

The current standard says that paths that are not unambiguously
represented using UTF-8 are not supported (because these cases are rare
and the extra complication is not worth it at this time). Future
versions of the format might have ways of encoding non-UTF-8 paths. For
now, this patch just doesn't support non-UTF-8 paths (ignoring the
"unambiguously represetable in UTF-8" case).

- figure out why junk gets placed at the end of the file

Sometimes it seems like the file gets a lot of `NUL` bytes appended to
it. It happens rarely and seems to be the result of some
`ftruncate`-style call which results in extra padding in the contents.
Noting it here as an observation at least.

Signed-off-by: Ben Boeckel 

---
 gcc/ChangeLog |   5 +
 gcc/c-family/ChangeLog|   6 +
 gcc/c-family/c-opts.cc|  40 +++-
 gcc/c-family/c.opt|  12 +
 gcc/cp/ChangeLog  |   5 +
 gcc/cp/module.cc  |   3 +-
 gcc/doc/invoke.texi   |  15 ++
 gcc/testsuite/ChangeLog   |   7 +
 gcc/testsuite/g++.dg/modules/depflags-f-MD.C  |   2 +
 gcc/testsuite/g++.dg/modules/depflags-f.C |   1 +
 gcc/testsuite/g++.dg/modules/depflags-fi.C|   3 +
 gcc/testsuite/g++.dg/modules/depflags-fj-MD.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-fj.C|   4 +
 .../g++.dg/modules/depflags-fjo-MD.C  |   4 +
 gcc/testsuite/g++.dg/modules/depflags-fjo.C   |   5 +
 gcc/testsuite/g++.dg/modules/depflags-fo-MD.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-fo.C|   4 +
 gcc/testsuite/g++.dg/modules/depflags-j-MD.C  |   2 +
 gcc/testsuite/g++.dg/modules/depflags-j.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-jo-MD.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-jo.C|   4 +
 gcc/testsuite/g++.dg/modules/depflags-o-MD.C  |   2 +
 gcc/testsuite/g++.dg/modules/depflags-o.C |   3 +
 gcc/testsuite/g++.dg/modules/modules.exp  |  11 +
 gcc/testsuite/g++.dg/modules/p1689-1.C|  18 ++
 gcc/testsuite/g++.dg/modules/p1689-1.exp.json |  27 +++
 gcc/testsuite/g++.dg/modules/p1689-2.C|  16 ++
 gcc/testsuite/g++.dg/modules/p1689-2.exp.json |  16 ++
 gcc/testsuite/g++.dg/modules/p1689-3.C|  14 ++
 gcc/testsuite/g++.dg/modules/p1689-3.exp.json |  16 ++
 gcc/testsuite/g++.dg/modules/p1689-4.C|  14 ++
 gcc/testsuite/g++.dg/modules/p1689-4.exp.json |  14 ++
 gcc/testsuite/g++.dg/modules/p1689-5.C|  14 ++
 gcc/testsuite/g++.dg/modules/p1689-5.exp.json |  14 ++
 gcc/testsuite/g++.dg/modules/test-p1689.py| 222 ++
 gcc/testsuite/lib/modules.exp |  71 ++
 libcpp/ChangeLog  |  11 +
 libcpp/include/cpplib.h   |  12 +-
 libcpp/include/mkdeps.h   |  17 +-
 libcpp/init.cc|  13 +-
 libcpp/mkdeps.cc  | 149 +++-
 41 files changed, 789 insertions(+), 19 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-f-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-f.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fi.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fj-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fj.C
 cr

[PATCH v2 1/3] libcpp: reject codepoints above 0x10FFFF

2022-10-27 Thread Ben Boeckel via Gcc-patches
Unicode does not support such values because they are unrepresentable in
UTF-16.

Signed-off-by: Ben Boeckel 
---
 libcpp/ChangeLog  | 6 ++
 libcpp/charset.cc | 4 ++--
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/libcpp/ChangeLog b/libcpp/ChangeLog
index 18d5bcceaf0..4d707277531 100644
--- a/libcpp/ChangeLog
+++ b/libcpp/ChangeLog
@@ -1,3 +1,9 @@
+2022-10-27  Ben Boeckel  
+
+   * include/charset.cc: Reject encodings of codepoints above 0x10.
+   UTF-16 does not support such codepoints and therefore all Unicode
+   rejects such values.
+
 2022-10-19  Lewis Hyatt  
 
* include/cpplib.h (struct cpp_string): Use new "string_length" GTY.
diff --git a/libcpp/charset.cc b/libcpp/charset.cc
index 12a398e7527..e9da6674b5f 100644
--- a/libcpp/charset.cc
+++ b/libcpp/charset.cc
@@ -216,7 +216,7 @@ one_utf8_to_cppchar (const uchar **inbufp, size_t 
*inbytesleftp,
   if (c <= 0x3FF && nbytes > 5) return EILSEQ;
 
   /* Make sure the character is valid.  */
-  if (c > 0x7FFF || (c >= 0xD800 && c <= 0xDFFF)) return EILSEQ;
+  if (c > 0x10 || (c >= 0xD800 && c <= 0xDFFF)) return EILSEQ;
 
   *cp = c;
   *inbufp = inbuf;
@@ -320,7 +320,7 @@ one_utf32_to_utf8 (iconv_t bigend, const uchar **inbufp, 
size_t *inbytesleftp,
   s += inbuf[bigend ? 2 : 1] << 8;
   s += inbuf[bigend ? 3 : 0];
 
-  if (s >= 0x7FFF || (s >= 0xD800 && s <= 0xDFFF))
+  if (s > 0x10 || (s >= 0xD800 && s <= 0xDFFF))
 return EILSEQ;
 
   rval = one_cppchar_to_utf8 (s, outbufp, outbytesleftp);
-- 
2.37.3



Re: [PATCH v2 2/3] libcpp: add a function to determine UTF-8 validity of a C string

2022-10-28 Thread Ben Boeckel via Gcc-patches
On Fri, Oct 28, 2022 at 08:59:16 -0400, David Malcolm wrote:
> On Thu, 2022-10-27 at 19:16 -0400, Ben Boeckel wrote:
> > This simplifies the interface for other UTF-8 validity detections
> > when a
> > simple "yes" or "no" answer is sufficient.
> > 
> > Signed-off-by: Ben Boeckel 
> > ---
> >  libcpp/ChangeLog  |  6 ++
> >  libcpp/charset.cc | 18 ++
> >  libcpp/internal.h |  2 ++
> >  3 files changed, 26 insertions(+)
> > 
> > diff --git a/libcpp/ChangeLog b/libcpp/ChangeLog
> > index 4d707277531..4e2c7900ae2 100644
> > --- a/libcpp/ChangeLog
> > +++ b/libcpp/ChangeLog
> > @@ -1,3 +1,9 @@
> > +2022-10-27  Ben Boeckel  
> > +
> > +   * include/charset.cc: Add `_cpp_valid_utf8_str` which
> > determines
> > +       whether a C string is valid UTF-8 or not.
> > +   * include/internal.h: Add prototype for
> > `_cpp_valid_utf8_str`.
> > +
> >  2022-10-27  Ben Boeckel  
> >  
> > * include/charset.cc: Reject encodings of codepoints above
> > 0x10.
> 
> The patch looks good to me, with the same potential caveat that you
> might need to move the ChangeLog entry from the patch "body" to the
> leading blurb, to satisfy:
>   ./contrib/gcc-changelog/git_check_commit.py

Ah, I had missed that. Now fixed locally for patches 1 and 2; will be in
v3 pending some time for further reviews.

THanks,

--Ben


Re: [PATCH v2 3/3] p1689r5: initial support

2022-10-28 Thread Ben Boeckel via Gcc-patches
On Thu, Oct 27, 2022 at 19:16:44 -0400, Ben Boeckel wrote:
> diff --git a/gcc/testsuite/g++.dg/modules/modules.exp 
> b/gcc/testsuite/g++.dg/modules/modules.exp
> index afb323d0efd..7fe8825144f 100644
> --- a/gcc/testsuite/g++.dg/modules/modules.exp
> +++ b/gcc/testsuite/g++.dg/modules/modules.exp
> @@ -28,6 +28,7 @@
>  # { dg-module-do [link|run] [xfail] [options] } # link [and run]
>  
>  load_lib g++-dg.exp
> +load_lib modules.exp
>  
>  # If a testcase doesn't have special options, use these.
>  global DEFAULT_CXXFLAGS
> @@ -237,6 +238,13 @@ proc cleanup_module_files { files } {
>  }
>  }
>  
> +# delete the specified set of dep files
> +proc cleanup_dep_files { files } {
> +foreach file $files {
> + file_on_host delete $file
> +}
> +}
> +
>  global testdir
>  set testdir $srcdir/$subdir
>  proc srcdir {} {
> @@ -310,6 +318,7 @@ foreach src [lsort [find $srcdir/$subdir {*_a.[CHX}]] {
>   set std_list [module-init $src]
>   foreach std $std_list {
>   set mod_files {}
> + set dep_files {}
>   global module_do
>   set module_do {"compile" "P"}
>   set asm_list {}
> @@ -346,6 +355,8 @@ foreach src [lsort [find $srcdir/$subdir {*_a.[CHX}]] {
>   set mod_files [find $DEFAULT_REPO *.gcm]
>   }
>   cleanup_module_files $mod_files
> +
> + cleanup_dep_files $dep_files
>   }
>  }
>  }

These `cleanup_dep_files` hunks are leftovers from my attempts at
getting the P1689 and flags tests working; they'll be gone in v3.

--Ben


Re: [PATCH v2 3/3] p1689r5: initial support

2022-11-01 Thread Ben Boeckel via Gcc-patches
On Tue, Nov 01, 2022 at 08:57:37 -0600, Tom Tromey wrote:
> >>>>> "Ben" == Ben Boeckel via Gcc-patches  writes:
> 
> Ben> - `-fdeps-file=` specifies the path to the file to write the format to.
> 
> I don't know how this output is intended to be used, but one mistake
> made with the other dependency-tracking options was that the output file
> isn't created atomically.  As a consequence, Makefiles normally have to
> work around this to be robust.  If that's a possible issue here then it
> would be best to handle it in this patch.

I don't think there'll be any race here because it's the "output" of the
rule as far as the build graph is concerned. It's also JSON, so anything
reading it "early" will get a partial object and easily detect
"something went wrong". And for clarity, the `-o` flag used in CMake
with this is just a side effect of the `-E` mechanism used and is
completely ignored in the CMake usage of this.

--Ben


[PATCH v4 0/3] RFC: P1689R5 support

2022-12-10 Thread Ben Boeckel via Gcc-patches
Hi,

This patch adds initial support for ISO C++'s [P1689R5][], a format for
describing C++ module requirements and provisions based on the source
code. This is required because compiling C++ with modules is not
embarrassingly parallel and need to be ordered to ensure that `import
some_module;` can be satisfied in time by making sure that the TU with
`export import some_module;` is compiled first.

[P1689R5]: https://isocpp.org/files/papers/P1689R5.html

I'd like feedback on the approach taken here with respect to the
user-visible flags. I'll also note that header units are not supported
at this time because the current `-E` behavior with respect to `import
;` is to search for an appropriate `.gcm` file which is not
something such a "scan" can support. A new mode will likely need to be
created (e.g., replacing `-E` with `-fc++-module-scanning` or something)
where headers are looked up "normally" and processed only as much as
scanning requires.

For the record, Clang has patches with similar flags and behavior by
Chuanqi Xu here:

https://reviews.llvm.org/D134269

with the same flags.

Thanks,

--Ben

---
v3 -> v4:

- add missing spaces between function names and arguments

v2 -> v3:

- changelog entries moved to commit messages
- documentation updated/added in the UTF-8 routine editing

v1 -> v2:

- removal of the `deps_write(extra)` parameter to option-checking where
  ndeeded
- default parameter of `cpp_finish(fdeps_stream = NULL)`
- unification of libcpp UTF-8 validity functions from v1
- test cases for flag parsing states (depflags-*) and p1689 output
  (p1689-*)

Ben Boeckel (3):
  libcpp: reject codepoints above 0x10
  libcpp: add a function to determine UTF-8 validity of a C string
  p1689r5: initial support

 gcc/c-family/c-opts.cc|  40 +++-
 gcc/c-family/c.opt|  12 +
 gcc/cp/module.cc  |   3 +-
 gcc/doc/invoke.texi   |  15 ++
 gcc/testsuite/g++.dg/modules/depflags-f-MD.C  |   2 +
 gcc/testsuite/g++.dg/modules/depflags-f.C |   1 +
 gcc/testsuite/g++.dg/modules/depflags-fi.C|   3 +
 gcc/testsuite/g++.dg/modules/depflags-fj-MD.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-fj.C|   4 +
 .../g++.dg/modules/depflags-fjo-MD.C  |   4 +
 gcc/testsuite/g++.dg/modules/depflags-fjo.C   |   5 +
 gcc/testsuite/g++.dg/modules/depflags-fo-MD.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-fo.C|   4 +
 gcc/testsuite/g++.dg/modules/depflags-j-MD.C  |   2 +
 gcc/testsuite/g++.dg/modules/depflags-j.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-jo-MD.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-jo.C|   4 +
 gcc/testsuite/g++.dg/modules/depflags-o-MD.C  |   2 +
 gcc/testsuite/g++.dg/modules/depflags-o.C |   3 +
 gcc/testsuite/g++.dg/modules/modules.exp  |   1 +
 gcc/testsuite/g++.dg/modules/p1689-1.C|  18 ++
 gcc/testsuite/g++.dg/modules/p1689-1.exp.json |  27 +++
 gcc/testsuite/g++.dg/modules/p1689-2.C|  16 ++
 gcc/testsuite/g++.dg/modules/p1689-2.exp.json |  16 ++
 gcc/testsuite/g++.dg/modules/p1689-3.C|  14 ++
 gcc/testsuite/g++.dg/modules/p1689-3.exp.json |  16 ++
 gcc/testsuite/g++.dg/modules/p1689-4.C|  14 ++
 gcc/testsuite/g++.dg/modules/p1689-4.exp.json |  14 ++
 gcc/testsuite/g++.dg/modules/p1689-5.C|  14 ++
 gcc/testsuite/g++.dg/modules/p1689-5.exp.json |  14 ++
 gcc/testsuite/g++.dg/modules/test-p1689.py| 222 ++
 gcc/testsuite/lib/modules.exp |  71 ++
 libcpp/charset.cc |  28 ++-
 libcpp/include/cpplib.h   |  12 +-
 libcpp/include/mkdeps.h   |  17 +-
 libcpp/init.cc|  13 +-
 libcpp/internal.h |   2 +
 libcpp/mkdeps.cc  | 149 +++-
 38 files changed, 773 insertions(+), 21 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-f-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-f.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fi.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fj-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fj.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fjo-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fjo.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fo-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fo.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-j-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-j.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-jo-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-jo.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-o-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-o.C
 create mode 100644 gcc/testsuite/g++.

[PATCH v4 2/3] libcpp: add a function to determine UTF-8 validity of a C string

2022-12-10 Thread Ben Boeckel via Gcc-patches
This simplifies the interface for other UTF-8 validity detections when a
simple "yes" or "no" answer is sufficient.

libcpp/

* charset.cc: Add `_cpp_valid_utf8_str` which determines whether
a C string is valid UTF-8 or not.
* internal.h: Add prototype for `_cpp_valid_utf8_str`.

Signed-off-by: Ben Boeckel 
---
 libcpp/charset.cc | 20 
 libcpp/internal.h |  2 ++
 2 files changed, 22 insertions(+)

diff --git a/libcpp/charset.cc b/libcpp/charset.cc
index 324b5b19136..422cb52595c 100644
--- a/libcpp/charset.cc
+++ b/libcpp/charset.cc
@@ -1868,6 +1868,26 @@ _cpp_valid_utf8 (cpp_reader *pfile,
   return true;
 }
 
+/*  Detect whether a C-string is a valid UTF-8-encoded set of bytes. Returns
+`false` if any contained byte sequence encodes an invalid Unicode codepoint
+or is not a valid UTF-8 sequence. Returns `true` otherwise. */
+
+extern bool
+_cpp_valid_utf8_str (const char *name)
+{
+  const uchar* in = (const uchar*)name;
+  size_t len = strlen (name);
+  cppchar_t cp;
+
+  while (*in)
+{
+  if (one_utf8_to_cppchar (&in, &len, &cp))
+   return false;
+}
+
+  return true;
+}
+
 /* Subroutine of convert_hex and convert_oct.  N is the representation
in the execution character set of a numeric escape; write it into the
string buffer TBUF and update the end-of-string pointer therein.  WIDE
diff --git a/libcpp/internal.h b/libcpp/internal.h
index badfd1b40da..4f2dd4a2f5c 100644
--- a/libcpp/internal.h
+++ b/libcpp/internal.h
@@ -834,6 +834,8 @@ extern bool _cpp_valid_utf8 (cpp_reader *pfile,
 struct normalize_state *nst,
 cppchar_t *cp);
 
+extern bool _cpp_valid_utf8_str (const char *str);
+
 extern void _cpp_destroy_iconv (cpp_reader *);
 extern unsigned char *_cpp_convert_input (cpp_reader *, const char *,
  unsigned char *, size_t, size_t,
-- 
2.38.1



[PATCH v4 3/3] p1689r5: initial support

2022-12-10 Thread Ben Boeckel via Gcc-patches
This patch implements support for [P1689R5][] to communicate to a build
system the C++20 module dependencies to build systems so that they may
build `.gcm` files in the proper order.

Support is communicated through the following three new flags:

- `-fdeps-format=` specifies the format for the output. Currently named
  `p1689r5`.

- `-fdeps-file=` specifies the path to the file to write the format to.

- `-fdep-output=` specifies the `.o` that will be written for the TU
  that is scanned. This is required so that the build system can
  correlate the dependency output with the actual compilation that will
  occur.

CMake supports this format as of 17 Jun 2022 (to be part of 3.25.0)
using an experimental feature selection (to allow for future usage
evolution without committing to how it works today). While it remains
experimental, docs may be found in CMake's documentation for
experimental features.

Future work may include using this format for Fortran module
dependencies as well, however this is still pending work.

[P1689R5]: https://isocpp.org/files/papers/P1689R5.html
[cmake-experimental]: 
https://gitlab.kitware.com/cmake/cmake/-/blob/master/Help/dev/experimental.rst

TODO:

- header-unit information fields

Header units (including the standard library headers) are 100%
unsupported right now because the `-E` mechanism wants to import their
BMIs. A new mode (i.e., something more workable than existing `-E`
behavior) that mocks up header units as if they were imported purely
from their path and content would be required.

- non-utf8 paths

The current standard says that paths that are not unambiguously
represented using UTF-8 are not supported (because these cases are rare
and the extra complication is not worth it at this time). Future
versions of the format might have ways of encoding non-UTF-8 paths. For
now, this patch just doesn't support non-UTF-8 paths (ignoring the
"unambiguously represetable in UTF-8" case).

- figure out why junk gets placed at the end of the file

Sometimes it seems like the file gets a lot of `NUL` bytes appended to
it. It happens rarely and seems to be the result of some
`ftruncate`-style call which results in extra padding in the contents.
Noting it here as an observation at least.

libcpp/

* include/cpplib.h: Add cpp_deps_format enum.
(cpp_options): Add format field
(cpp_finish): Add dependency stream parameter.
* include/mkdeps.h (deps_add_module_target): Add new preprocessor
parameter used for C++ module tracking.
* init.cc (cpp_finish): Add new preprocessor parameter used for C++
module tracking.
* mkdeps.cc (mkdeps): Implement P1689R5 output.

gcc/

* doc/invoke.texi: Document -fdeps-format=, -fdep-file=, and
-fdep-output= flags.

gcc/c-family/

* c-opts.cc (c_common_handle_option): Add fdeps_file variable and
-fdeps-format=, -fdep-file=, and -fdep-output= parsing.
* c.opt: Add -fdeps-format=, -fdep-file=, and -fdep-output= flags.

gcc/cp/

* module.cc (preprocessed_module): Pass whether the module is
exported to dependency tracking.

gcc/testsuite/

* g++.dg/modules/depflags-f-MD.C: New test.
* g++.dg/modules/depflags-f.C: New test.
* g++.dg/modules/depflags-fi.C: New test.
* g++.dg/modules/depflags-fj-MD.C: New test.
* g++.dg/modules/depflags-fj.C: New test.
* g++.dg/modules/depflags-fjo-MD.C: New test.
* g++.dg/modules/depflags-fjo.C: New test.
* g++.dg/modules/depflags-fo-MD.C: New test.
* g++.dg/modules/depflags-fo.C: New test.
* g++.dg/modules/depflags-j-MD.C: New test.
* g++.dg/modules/depflags-j.C: New test.
* g++.dg/modules/depflags-jo-MD.C: New test.
* g++.dg/modules/depflags-jo.C: New test.
* g++.dg/modules/depflags-o-MD.C: New test.
* g++.dg/modules/depflags-o.C: New test.
* g++.dg/modules/p1689-1.C: New test.
* g++.dg/modules/p1689-1.exp.json: New test expectation.
* g++.dg/modules/p1689-2.C: New test.
* g++.dg/modules/p1689-2.exp.json: New test expectation.
* g++.dg/modules/p1689-3.C: New test.
* g++.dg/modules/p1689-3.exp.json: New test expectation.
* g++.dg/modules/p1689-4.C: New test.
* g++.dg/modules/p1689-4.exp.json: New test expectation.
* g++.dg/modules/p1689-5.C: New test.
* g++.dg/modules/p1689-5.exp.json: New test expectation.
* g++.dg/modules/modules.exp: Load new P1689 library routines.
* g++.dg/modules/test-p1689.py: New tool for validating P1689 output.
* lib/modules.exp: Support for validating P1689 outputs.

Signed-off-by: Ben Boeckel 
---
 gcc/c-family/c-opts.cc|  40 +++-
 gcc/c-family/c.opt|  12 +
 gcc/cp/module.cc  |   3 +-
 gcc/doc/invoke.texi   |  15 ++
 gcc/testsuite/g++

[PATCH v4 1/3] libcpp: reject codepoints above 0x10FFFF

2022-12-10 Thread Ben Boeckel via Gcc-patches
Unicode does not support such values because they are unrepresentable in
UTF-16.

libcpp/

* charset.cc: Reject encodings of codepoints above 0x10.
UTF-16 does not support such codepoints and therefore all
Unicode rejects such values.

Signed-off-by: Ben Boeckel 
---
 libcpp/charset.cc | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/libcpp/charset.cc b/libcpp/charset.cc
index 12a398e7527..324b5b19136 100644
--- a/libcpp/charset.cc
+++ b/libcpp/charset.cc
@@ -158,6 +158,10 @@ struct _cpp_strbuf
encoded as any of DF 80, E0 9F 80, F0 80 9F 80, F8 80 80 9F 80, or
FC 80 80 80 9F 80.  Only the first is valid.
 
+   Additionally, Unicode declares that all codepoints above 0010 are
+   invalid because they cannot be represented in UTF-16. As such, all 5- and
+   6-byte encodings are invalid.
+
An implementation note: the transformation from UTF-16 to UTF-8, or
vice versa, is easiest done by using UTF-32 as an intermediary.  */
 
@@ -216,7 +220,7 @@ one_utf8_to_cppchar (const uchar **inbufp, size_t 
*inbytesleftp,
   if (c <= 0x3FF && nbytes > 5) return EILSEQ;
 
   /* Make sure the character is valid.  */
-  if (c > 0x7FFF || (c >= 0xD800 && c <= 0xDFFF)) return EILSEQ;
+  if (c > 0x10 || (c >= 0xD800 && c <= 0xDFFF)) return EILSEQ;
 
   *cp = c;
   *inbufp = inbuf;
@@ -320,7 +324,7 @@ one_utf32_to_utf8 (iconv_t bigend, const uchar **inbufp, 
size_t *inbytesleftp,
   s += inbuf[bigend ? 2 : 1] << 8;
   s += inbuf[bigend ? 3 : 0];
 
-  if (s >= 0x7FFF || (s >= 0xD800 && s <= 0xDFFF))
+  if (s > 0x10 || (s >= 0xD800 && s <= 0xDFFF))
 return EILSEQ;
 
   rval = one_cppchar_to_utf8 (s, outbufp, outbytesleftp);
-- 
2.38.1



[PATCH v8 1/4] spec: add a spec function to join arguments

2023-09-01 Thread Ben Boeckel via Gcc-patches
When passing `-o` flags to other options, the typical `-o foo` spelling
leaves a leading whitespace when replacing elsewhere. This ends up
creating flags spelled as `-some-option-with-arg= foo.ext` which doesn't
parse properly. When attempting to make a spec function to just remove
the leading whitespace, the argument splitting ends up masking the
whitespace. However, the intended extension *also* ends up being its own
argument. To perform the desired behavior, the arguments need to be
concatenated together.

gcc/:

* gcc.cc (join_spec_func): Add a spec function to join all
arguments.

Signed-off-by: Ben Boeckel 
---
 gcc/gcc.cc | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/gcc/gcc.cc b/gcc/gcc.cc
index fdfac0b4fe4..4c4e81dee50 100644
--- a/gcc/gcc.cc
+++ b/gcc/gcc.cc
@@ -447,6 +447,7 @@ static const char *greater_than_spec_func (int, const char 
**);
 static const char *debug_level_greater_than_spec_func (int, const char **);
 static const char *dwarf_version_greater_than_spec_func (int, const char **);
 static const char *find_fortran_preinclude_file (int, const char **);
+static const char *join_spec_func (int, const char **);
 static char *convert_white_space (char *);
 static char *quote_spec (char *);
 static char *quote_spec_arg (char *);
@@ -1772,6 +1773,7 @@ static const struct spec_function static_spec_functions[] 
=
   { "debug-level-gt",  debug_level_greater_than_spec_func },
   { "dwarf-version-gt",dwarf_version_greater_than_spec_func },
   { "fortran-preinclude-file", find_fortran_preinclude_file},
+  { "join",join_spec_func},
 #ifdef EXTRA_SPEC_FUNCTIONS
   EXTRA_SPEC_FUNCTIONS
 #endif
@@ -10975,6 +10977,27 @@ find_fortran_preinclude_file (int argc, const char 
**argv)
   return result;
 }
 
+/* The function takes any number of arguments and joins them together.
+
+   This seems to be necessary to build "-fjoined=foo.b" from "-fseparate foo.a"
+   with a %{fseparate*:-fjoined=%.b$*} rule without adding undesired spaces:
+   when doing $* replacement we first replace $* with the rest of the switch
+   (in this case ""), and then add any arguments as arguments after the result,
+   resulting in "-fjoined= foo.b".  Using this function with e.g.
+   %{fseparate*:-fjoined=%:join(%.b$*)} gets multiple words as separate argv
+   elements instead of separated by spaces, and we paste them together.  */
+
+static const char *
+join_spec_func (int argc, const char **argv)
+{
+  if (argc == 1)
+return argv[0];
+  for (int i = 0; i < argc; ++i)
+obstack_grow (&obstack, argv[i], strlen (argv[i]));
+  obstack_1grow (&obstack, '\0');
+  return XOBFINISH (&obstack, const char *);
+}
+
 /* If any character in ORIG fits QUOTE_P (_, P), reallocate the string
so as to precede every one of them with a backslash.  Return the
original string or the reallocated one.  */
-- 
2.41.0



[PATCH v8 0/4] P1689R5 support

2023-09-01 Thread Ben Boeckel via Gcc-patches
Hi,

This patch series adds initial support for ISO C++'s [P1689R5][], a
format for describing C++ module requirements and provisions based on
the source code. This is required because compiling C++ with modules is
not embarrassingly parallel and need to be ordered to ensure that
`import some_module;` can be satisfied in time by making sure that any
TU with `export import some_module;` is compiled first.

[P1689R5]: https://isocpp.org/files/papers/P1689R5.html

I've also added patches to include imported module CMI files and the
module mapper file as dependencies of the compilation. I briefly looked
into adding dependencies on response files as well, but that appeared to
need some code contortions to have a `class mkdeps` available before
parsing the command line or to keep the information around until one was
made.

I'd like feedback on the approach taken here with respect to the
user-visible flags. I'll also note that header units are not supported
at this time because the current `-E` behavior with respect to `import
;` is to search for an appropriate `.gcm` file which is not
something such a "scan" can support. A new mode will likely need to be
created (e.g., replacing `-E` with `-fc++-module-scanning` or something)
where headers are looked up "normally" and processed only as much as
scanning requires.

FWIW, Clang as taken an alternate approach with its `clang-scan-deps`
tool rather than using the compiler directly.

Thanks,

--Ben

---
v7 -> v8:

- rename `DEPS_FMT_` enum variants to `FDEPS_FMT_` to match the
  associated flag
- memory leak fix in the `join` specfunc implementation (also better
  comments), both from Jason
- formatting fix in `mkdeps.cc` for `write_make_modules_deps` assignment
- comments on new functions for P1689R5 implementation

v6 -> v7:

- rebase onto `master` (80ae426a195 (d: Fix core.volatile.volatileLoad
  discarded if result is unused, 2023-07-02))
- add test cases for patches 3 and 4 (new dependency reporting in `-MF`)
- add a Python script to test aspects of generated dependency files
- a new `join` spec function to support `-fdeps-*` defaults based on the
  `-o` flag (needed to strip the leading space that appears otherwise)
- note that JSON writing support should be factored out for use by
  `libcpp` and `gcc` (libiberty?)
- use `.ddi` for the extension of `-fdeps-*` output files by default
- support defaults for `-fdeps-file=` and `-fdeps-target=` when only
  `-fdeps-format=` is provided (with tests)
- error if `-MF` and `-fdeps-file=` are both the same (non-`stdout`)
  file as their formats are incompatible
- expand the documentation on how the `-fdeps-*` flags should be used

v5 -> v6:

- rebase onto `master` (585c660f041 (reload1: Change return type of
  predicate function from int to bool, 2023-06-06))
- fix crash related to reporting imported CMI files as dependencies
- rework utf-8 validity to patch the new `cpp_valid_utf8_p` function
  instead of the core utf-8 decoding routine to reject invalid
  codepoints (preserves higher-level error detection of invalid utf-8)
- harmonize of `fdeps` spelling in flags, variables, comments, etc.
- rename `-fdeps-output=` to `-fdeps-target=`

v4 -> v5:

- add dependency tracking for imported modules to `-MF`
- add dependency tracking for static module mapper files given to
  `-fmodule-mapper=`

v3 -> v4:

- add missing spaces between function names and arguments

v2 -> v3:

- changelog entries moved to commit messages
- documentation updated/added in the UTF-8 routine editing

v1 -> v2:

- removal of the `deps_write(extra)` parameter to option-checking where
  ndeeded
- default parameter of `cpp_finish(fdeps_stream = NULL)`
- unification of libcpp UTF-8 validity functions from v1
- test cases for flag parsing states (depflags-*) and p1689 output
  (p1689-*)

Ben Boeckel (4):
  spec: add a spec function to join arguments
  p1689r5: initial support
  c++modules: report imported CMI files as dependencies
  c++modules: report module mapper files as a dependency

 gcc/c-family/c-opts.cc|  44 +++-
 gcc/c-family/c.opt|  12 +
 gcc/cp/mapper-client.cc   |   5 +
 gcc/cp/mapper-client.h|   1 +
 gcc/cp/module.cc  |  24 +-
 gcc/doc/invoke.texi   |  27 +++
 gcc/gcc.cc|  27 ++-
 gcc/json.h|   3 +
 gcc/testsuite/g++.dg/modules/depflags-f-MD.C  |   2 +
 gcc/testsuite/g++.dg/modules/depflags-f.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-fi.C|   4 +
 gcc/testsuite/g++.dg/modules/depflags-fj-MD.C |   3 +
 .../g++.dg/modules/depflags-fj-MF-share.C |   6 +
 gcc/testsuite/g++.dg/modules/depflags-fj.C|   4 +
 .../g++.dg/modules/depflags-fjo-MD.C  |   4 +
 gcc/testsuite/g++.dg/modules/depflags-fjo.C   |   5 +
 gcc/testsuite/g++.dg/modules/depflag

[PATCH v8 2/4] p1689r5: initial support

2023-09-01 Thread Ben Boeckel via Gcc-patches
les.exp: Support for validating P1689 outputs.

Signed-off-by: Ben Boeckel 
---
 gcc/c-family/c-opts.cc|  44 +++-
 gcc/c-family/c.opt|  12 +
 gcc/cp/module.cc  |   3 +-
 gcc/doc/invoke.texi   |  27 +++
 gcc/gcc.cc|   4 +-
 gcc/json.h|   3 +
 gcc/testsuite/g++.dg/modules/depflags-f-MD.C  |   2 +
 gcc/testsuite/g++.dg/modules/depflags-f.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-fi.C|   4 +
 gcc/testsuite/g++.dg/modules/depflags-fj-MD.C |   3 +
 .../g++.dg/modules/depflags-fj-MF-share.C |   6 +
 gcc/testsuite/g++.dg/modules/depflags-fj.C|   4 +
 .../g++.dg/modules/depflags-fjo-MD.C  |   4 +
 gcc/testsuite/g++.dg/modules/depflags-fjo.C   |   5 +
 gcc/testsuite/g++.dg/modules/depflags-fo-MD.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-fo.C|   4 +
 gcc/testsuite/g++.dg/modules/depflags-j-MD.C  |   2 +
 gcc/testsuite/g++.dg/modules/depflags-j.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-jo-MD.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-jo.C|   4 +
 gcc/testsuite/g++.dg/modules/depflags-o-MD.C  |   2 +
 gcc/testsuite/g++.dg/modules/depflags-o.C |   3 +
 gcc/testsuite/g++.dg/modules/modules.exp  |   1 +
 gcc/testsuite/g++.dg/modules/p1689-1.C|  17 ++
 gcc/testsuite/g++.dg/modules/p1689-1.exp.ddi  |  27 +++
 gcc/testsuite/g++.dg/modules/p1689-2.C|  15 ++
 gcc/testsuite/g++.dg/modules/p1689-2.exp.ddi  |  16 ++
 gcc/testsuite/g++.dg/modules/p1689-3.C|  13 +
 gcc/testsuite/g++.dg/modules/p1689-3.exp.ddi  |  16 ++
 gcc/testsuite/g++.dg/modules/p1689-4.C|  13 +
 gcc/testsuite/g++.dg/modules/p1689-4.exp.ddi  |  14 ++
 gcc/testsuite/g++.dg/modules/p1689-5.C|  13 +
 gcc/testsuite/g++.dg/modules/p1689-5.exp.ddi  |  14 ++
 .../g++.dg/modules/p1689-file-default.C   |  16 ++
 .../g++.dg/modules/p1689-file-default.exp.ddi |  27 +++
 .../g++.dg/modules/p1689-target-default.C |  16 ++
 .../modules/p1689-target-default.exp.ddi  |  27 +++
 gcc/testsuite/g++.dg/modules/test-p1689.py| 222 ++
 gcc/testsuite/lib/modules.exp |  71 ++
 libcpp/include/cpplib.h   |  12 +-
 libcpp/include/mkdeps.h   |   9 +-
 libcpp/init.cc|  13 +-
 libcpp/mkdeps.cc  | 163 -
 43 files changed, 869 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-f-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-f.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fi.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fj-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fj-MF-share.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fj.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fjo-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fjo.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fo-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fo.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-j-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-j.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-jo-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-jo.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-o-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-o.C
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-1.C
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-1.exp.ddi
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-2.C
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-2.exp.ddi
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-3.C
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-3.exp.ddi
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-4.C
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-4.exp.ddi
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-5.C
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-5.exp.ddi
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-file-default.C
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-file-default.exp.ddi
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-target-default.C
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-target-default.exp.ddi
 create mode 100644 gcc/testsuite/g++.dg/modules/test-p1689.py
 create mode 100644 gcc/testsuite/lib/modules.exp

diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc
index 4961af63de8..1c1f8c84f88 100644
--- a/gcc/c-family/c-opts.cc
+++ b/gcc/c-family/c-opts.cc
@@ -77,6 +77,9 @@ static bool verbose;
 /* Dependency output file.  */
 static const char *deps_file;
 
+/* Structured dependency output file.  */
+static const char *fdeps_file;
+
 /* The prefix given by -iprefix, if any.  *

[PATCH v8 3/4] c++modules: report imported CMI files as dependencies

2023-09-01 Thread Ben Boeckel via Gcc-patches
They affect the build, so report them via `-MF` mechanisms.

gcc/cp/

* module.cc (do_import): Report imported CMI files as
dependencies.

gcc/testsuite/

* g++.dg/modules/depreport-1_a.C: New test.
* g++.dg/modules/depreport-1_b.C: New test.
* g++.dg/modules/test-depfile.py: New tool for validating depfile
information.
* lib/modules.exp: Support for validating depfile contents.

Signed-off-by: Ben Boeckel 
---
 gcc/cp/module.cc |   3 +
 gcc/testsuite/g++.dg/modules/depreport-1_a.C |  10 +
 gcc/testsuite/g++.dg/modules/depreport-1_b.C |  12 ++
 gcc/testsuite/g++.dg/modules/test-depfile.py | 187 +++
 gcc/testsuite/lib/modules.exp|  29 +++
 5 files changed, 241 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/modules/depreport-1_a.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depreport-1_b.C
 create mode 100644 gcc/testsuite/g++.dg/modules/test-depfile.py

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 9df60d695b1..f3acc4e02fe 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -18968,6 +18968,9 @@ module_state::do_import (cpp_reader *reader, bool 
outermost)
   dump () && dump ("CMI is %s", file);
   if (note_module_cmi_yes || inform_cmi_p)
inform (loc, "reading CMI %qs", file);
+  /* Add the CMI file to the dependency tracking. */
+  if (cpp_get_deps (reader))
+   deps_add_dep (cpp_get_deps (reader), file);
   fd = open (file, O_RDONLY | O_CLOEXEC | O_BINARY);
   e = errno;
 }
diff --git a/gcc/testsuite/g++.dg/modules/depreport-1_a.C 
b/gcc/testsuite/g++.dg/modules/depreport-1_a.C
new file mode 100644
index 000..241701728a2
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/depreport-1_a.C
@@ -0,0 +1,10 @@
+// { dg-additional-options -fmodules-ts }
+
+export module Foo;
+// { dg-module-cmi Foo }
+
+export class Base
+{
+public:
+  int m;
+};
diff --git a/gcc/testsuite/g++.dg/modules/depreport-1_b.C 
b/gcc/testsuite/g++.dg/modules/depreport-1_b.C
new file mode 100644
index 000..b6e317c6703
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/depreport-1_b.C
@@ -0,0 +1,12 @@
+// { dg-additional-options -fmodules-ts }
+// { dg-additional-options -MD }
+// { dg-additional-options "-MF depreport-1.d" }
+
+import Foo;
+
+void foo ()
+{
+  Base b;
+}
+
+// { dg-final { run-check-module-dep-expect-input "depreport-1.d" 
"gcm.cache/Foo.gcm" } }
diff --git a/gcc/testsuite/g++.dg/modules/test-depfile.py 
b/gcc/testsuite/g++.dg/modules/test-depfile.py
new file mode 100644
index 000..ea4edb61434
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/test-depfile.py
@@ -0,0 +1,187 @@
+import json
+
+
+# Parameters.
+ALL_ERRORS = False
+
+
+def _report_error(msg):
+'''Report an error.'''
+full_msg = 'ERROR: ' + msg
+if ALL_ERRORS:
+print(full_msg)
+else:
+raise RuntimeError(full_msg)
+
+
+class Token(object):
+pass
+
+
+class Output(Token):
+def __init__(self, path):
+self.path = path
+
+
+class Input(Token):
+def __init__(self, path):
+self.path = path
+
+
+class Colon(Token):
+pass
+
+
+class Append(Token):
+pass
+
+
+class Variable(Token):
+def __init__(self, name):
+self.name = name
+
+
+class Word(Token):
+def __init__(self, name):
+self.name = name
+
+
+def validate_depfile(depfile, expect_input=None):
+'''Validate a depfile contains some information
+
+Returns `False` if the information is not found.
+'''
+with open(depfile, 'r') as fin:
+depfile_content = fin.read()
+
+real_lines = []
+join_line = False
+for line in depfile_content.split('\n'):
+# Join the line if needed.
+if join_line:
+line = real_lines.pop() + line
+
+# Detect line continuations.
+join_line = line.endswith('\\')
+# Strip line continuation characters.
+if join_line:
+line = line[:-1]
+
+# Add to the real line set.
+real_lines.append(line)
+
+# Perform tokenization.
+tokenized_lines = []
+for line in real_lines:
+tokenized = []
+join_word = False
+for word in line.split(' '):
+if join_word:
+word = tokenized.pop() + ' ' + word
+
+# Detect word joins.
+join_word = word.endswith('\\')
+# Strip escape character.
+if join_word:
+word = word[:-1]
+
+# Detect `:` at the end of a word.
+if word.endswith(':'):
+tokenized.append(word[:-1])
+word = word[-1]
+
+# Add word to the tokenized set.
+tokenized.append(word)
+
+tokenized_line

[PATCH v8 4/4] c++modules: report module mapper files as a dependency

2023-09-01 Thread Ben Boeckel via Gcc-patches
It affects the build, and if used as a static file, can reliably be
tracked using the `-MF` mechanism.

gcc/cp/:

* mapper-client.cc, mapper-client.h (open_module_client): Accept
dependency tracking and track module mapper files as
dependencies.
* module.cc (make_mapper, get_mapper): Pass the dependency
tracking class down.

gcc/testsuite/:

* g++.dg/modules/depreport-2.modmap: New test.
* g++.dg/modules/depreport-2_a.C: New test.
* g++.dg/modules/depreport-2_b.C: New test.
* g++.dg/modules/test-depfile.py: Support `:|` syntax output
when generating modules.

Signed-off-by: Ben Boeckel 
---
 gcc/cp/mapper-client.cc   |  5 +
 gcc/cp/mapper-client.h|  1 +
 gcc/cp/module.cc  | 18 -
 .../g++.dg/modules/depreport-2.modmap |  2 ++
 gcc/testsuite/g++.dg/modules/depreport-2_a.C  | 15 ++
 gcc/testsuite/g++.dg/modules/depreport-2_b.C  | 14 +
 gcc/testsuite/g++.dg/modules/test-depfile.py  | 20 +++
 7 files changed, 66 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/depreport-2.modmap
 create mode 100644 gcc/testsuite/g++.dg/modules/depreport-2_a.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depreport-2_b.C

diff --git a/gcc/cp/mapper-client.cc b/gcc/cp/mapper-client.cc
index 39e80df2d25..92727195246 100644
--- a/gcc/cp/mapper-client.cc
+++ b/gcc/cp/mapper-client.cc
@@ -34,6 +34,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "diagnostic-core.h"
 #include "mapper-client.h"
 #include "intl.h"
+#include "mkdeps.h"
 
 #include "../../c++tools/resolver.h"
 
@@ -132,6 +133,7 @@ spawn_mapper_program (char const **errmsg, std::string 
&name,
 
 module_client *
 module_client::open_module_client (location_t loc, const char *o,
+  class mkdeps *deps,
   void (*set_repo) (const char *),
   char const *full_program_name)
 {
@@ -285,6 +287,9 @@ module_client::open_module_client (location_t loc, const 
char *o,
  errmsg = "opening";
else
  {
+   /* Add the mapper file to the dependency tracking. */
+   if (deps)
+ deps_add_dep (deps, name.c_str ());
if (int l = r->read_tuple_file (fd, ident, false))
  {
if (l > 0)
diff --git a/gcc/cp/mapper-client.h b/gcc/cp/mapper-client.h
index b32723ce296..a3b0b8adc51 100644
--- a/gcc/cp/mapper-client.h
+++ b/gcc/cp/mapper-client.h
@@ -55,6 +55,7 @@ public:
 
 public:
   static module_client *open_module_client (location_t loc, const char *option,
+   class mkdeps *,
void (*set_repo) (const char *),
char const *);
   static void close_module_client (location_t loc, module_client *);
diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index f3acc4e02fe..77c9edcbc04 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -3969,12 +3969,12 @@ static GTY(()) vec 
*partial_specializations;
 /* Our module mapper (created lazily).  */
 module_client *mapper;
 
-static module_client *make_mapper (location_t loc);
-inline module_client *get_mapper (location_t loc)
+static module_client *make_mapper (location_t loc, class mkdeps *deps);
+inline module_client *get_mapper (location_t loc, class mkdeps *deps)
 {
   auto *res = mapper;
   if (!res)
-res = make_mapper (loc);
+res = make_mapper (loc, deps);
   return res;
 }
 
@@ -14033,7 +14033,7 @@ get_module (const char *ptr)
 /* Create a new mapper connecting to OPTION.  */
 
 module_client *
-make_mapper (location_t loc)
+make_mapper (location_t loc, class mkdeps *deps)
 {
   timevar_start (TV_MODULE_MAPPER);
   const char *option = module_mapper_name;
@@ -14041,7 +14041,7 @@ make_mapper (location_t loc)
 option = getenv ("CXX_MODULE_MAPPER");
 
   mapper = module_client::open_module_client
-(loc, option, &set_cmi_repo,
+(loc, option, deps, &set_cmi_repo,
  (save_decoded_options[0].opt_index == OPT_SPECIAL_program_name)
  && save_decoded_options[0].arg != progname
  ? save_decoded_options[0].arg : nullptr);
@@ -19506,7 +19506,7 @@ maybe_translate_include (cpp_reader *reader, line_maps 
*lmaps, location_t loc,
   dump.push (NULL);
 
   dump () && dump ("Checking include translation '%s'", path);
-  auto *mapper = get_mapper (cpp_main_loc (reader));
+  auto *mapper = get_mapper (cpp_main_loc (reader), cpp_get_deps (reader));
 
   size_t len = strlen (path);
   path = canonicalize_header_name (NULL, loc, true, path, len);
@@ -19622,7 +19622,7 @@ module_begin_main_file (cpp_reader *reader, line_maps 
*lmaps,
 static void
 nam

[PATCH v3 0/3] RFC: P1689R5 support

2022-11-08 Thread Ben Boeckel via Gcc-patches
Hi,

This patch adds initial support for ISO C++'s [P1689R5][], a format for
describing C++ module requirements and provisions based on the source
code. This is required because compiling C++ with modules is not
embarrassingly parallel and need to be ordered to ensure that `import
some_module;` can be satisfied in time by making sure that the TU with
`export import some_module;` is compiled first.

[P1689R5]: https://isocpp.org/files/papers/P1689R5.html

I'd like feedback on the approach taken here with respect to the
user-visible flags. I'll also note that header units are not supported
at this time because the current `-E` behavior with respect to `import
;` is to search for an appropriate `.gcm` file which is not
something such a "scan" can support. A new mode will likely need to be
created (e.g., replacing `-E` with `-fc++-module-scanning` or something)
where headers are looked up "normally" and processed only as much as
scanning requires.

For the record, Clang has patches with similar flags and behavior by
Chuanqi Xu here:

https://reviews.llvm.org/D134269

with the same flags.

Thanks,

--Ben

---
v2 -> v3:

- changelog entries moved to commit messages
- documentation updated/added in the UTF-8 routine editing

v1 -> v2:

- removal of the `deps_write(extra)` parameter to option-checking where
  ndeeded
- default parameter of `cpp_finish(fdeps_stream = NULL)`
- unification of libcpp UTF-8 validity functions from v1
- test cases for flag parsing states (depflags-*) and p1689 output
  (p1689-*)

Ben Boeckel (3):
  libcpp: reject codepoints above 0x10
  libcpp: add a function to determine UTF-8 validity of a C string
  p1689r5: initial support

 gcc/c-family/c-opts.cc|  40 +++-
 gcc/c-family/c.opt|  12 +
 gcc/cp/module.cc  |   3 +-
 gcc/doc/invoke.texi   |  15 ++
 gcc/testsuite/g++.dg/modules/depflags-f-MD.C  |   2 +
 gcc/testsuite/g++.dg/modules/depflags-f.C |   1 +
 gcc/testsuite/g++.dg/modules/depflags-fi.C|   3 +
 gcc/testsuite/g++.dg/modules/depflags-fj-MD.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-fj.C|   4 +
 .../g++.dg/modules/depflags-fjo-MD.C  |   4 +
 gcc/testsuite/g++.dg/modules/depflags-fjo.C   |   5 +
 gcc/testsuite/g++.dg/modules/depflags-fo-MD.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-fo.C|   4 +
 gcc/testsuite/g++.dg/modules/depflags-j-MD.C  |   2 +
 gcc/testsuite/g++.dg/modules/depflags-j.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-jo-MD.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-jo.C|   4 +
 gcc/testsuite/g++.dg/modules/depflags-o-MD.C  |   2 +
 gcc/testsuite/g++.dg/modules/depflags-o.C |   3 +
 gcc/testsuite/g++.dg/modules/modules.exp  |   1 +
 gcc/testsuite/g++.dg/modules/p1689-1.C|  18 ++
 gcc/testsuite/g++.dg/modules/p1689-1.exp.json |  27 +++
 gcc/testsuite/g++.dg/modules/p1689-2.C|  16 ++
 gcc/testsuite/g++.dg/modules/p1689-2.exp.json |  16 ++
 gcc/testsuite/g++.dg/modules/p1689-3.C|  14 ++
 gcc/testsuite/g++.dg/modules/p1689-3.exp.json |  16 ++
 gcc/testsuite/g++.dg/modules/p1689-4.C|  14 ++
 gcc/testsuite/g++.dg/modules/p1689-4.exp.json |  14 ++
 gcc/testsuite/g++.dg/modules/p1689-5.C|  14 ++
 gcc/testsuite/g++.dg/modules/p1689-5.exp.json |  14 ++
 gcc/testsuite/g++.dg/modules/test-p1689.py| 222 ++
 gcc/testsuite/lib/modules.exp |  71 ++
 libcpp/charset.cc |  28 ++-
 libcpp/include/cpplib.h   |  12 +-
 libcpp/include/mkdeps.h   |  17 +-
 libcpp/init.cc|  13 +-
 libcpp/internal.h |   2 +
 libcpp/mkdeps.cc  | 149 +++-
 38 files changed, 773 insertions(+), 21 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-f-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-f.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fi.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fj-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fj.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fjo-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fjo.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fo-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fo.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-j-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-j.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-jo-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-jo.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-o-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-o.C
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-1.C
 create mode 100644 gcc/testsuite/g++.dg/modules

[PATCH v3 2/3] libcpp: add a function to determine UTF-8 validity of a C string

2022-11-08 Thread Ben Boeckel via Gcc-patches
This simplifies the interface for other UTF-8 validity detections when a
simple "yes" or "no" answer is sufficient.

libcpp/

* charset.cc: Add `_cpp_valid_utf8_str` which determines whether
a C string is valid UTF-8 or not.
* internal.h: Add prototype for `_cpp_valid_utf8_str`.

Signed-off-by: Ben Boeckel 
---
 libcpp/charset.cc | 20 
 libcpp/internal.h |  2 ++
 2 files changed, 22 insertions(+)

diff --git a/libcpp/charset.cc b/libcpp/charset.cc
index 324b5b19136..e130bc01f48 100644
--- a/libcpp/charset.cc
+++ b/libcpp/charset.cc
@@ -1868,6 +1868,26 @@ _cpp_valid_utf8 (cpp_reader *pfile,
   return true;
 }
 
+/*  Detect whether a C-string is a valid UTF-8-encoded set of bytes. Returns
+`false` if any contained byte sequence encodes an invalid Unicode codepoint
+or is not a valid UTF-8 sequence. Returns `true` otherwise. */
+
+extern bool
+_cpp_valid_utf8_str (const char *name)
+{
+  const uchar* in = (const uchar*)name;
+  size_t len = strlen(name);
+  cppchar_t cp;
+
+  while (*in)
+{
+  if (one_utf8_to_cppchar(&in, &len, &cp))
+   return false;
+}
+
+  return true;
+}
+
 /* Subroutine of convert_hex and convert_oct.  N is the representation
in the execution character set of a numeric escape; write it into the
string buffer TBUF and update the end-of-string pointer therein.  WIDE
diff --git a/libcpp/internal.h b/libcpp/internal.h
index badfd1b40da..4f2dd4a2f5c 100644
--- a/libcpp/internal.h
+++ b/libcpp/internal.h
@@ -834,6 +834,8 @@ extern bool _cpp_valid_utf8 (cpp_reader *pfile,
 struct normalize_state *nst,
 cppchar_t *cp);
 
+extern bool _cpp_valid_utf8_str (const char *str);
+
 extern void _cpp_destroy_iconv (cpp_reader *);
 extern unsigned char *_cpp_convert_input (cpp_reader *, const char *,
  unsigned char *, size_t, size_t,
-- 
2.38.1



[PATCH v3 3/3] p1689r5: initial support

2022-11-08 Thread Ben Boeckel via Gcc-patches
This patch implements support for [P1689R5][] to communicate to a build
system the C++20 module dependencies to build systems so that they may
build `.gcm` files in the proper order.

Support is communicated through the following three new flags:

- `-fdeps-format=` specifies the format for the output. Currently named
  `p1689r5`.

- `-fdeps-file=` specifies the path to the file to write the format to.

- `-fdep-output=` specifies the `.o` that will be written for the TU
  that is scanned. This is required so that the build system can
  correlate the dependency output with the actual compilation that will
  occur.

CMake supports this format as of 17 Jun 2022 (to be part of 3.25.0)
using an experimental feature selection (to allow for future usage
evolution without committing to how it works today). While it remains
experimental, docs may be found in CMake's documentation for
experimental features.

Future work may include using this format for Fortran module
dependencies as well, however this is still pending work.

[P1689R5]: https://isocpp.org/files/papers/P1689R5.html
[cmake-experimental]: 
https://gitlab.kitware.com/cmake/cmake/-/blob/master/Help/dev/experimental.rst

TODO:

- header-unit information fields

Header units (including the standard library headers) are 100%
unsupported right now because the `-E` mechanism wants to import their
BMIs. A new mode (i.e., something more workable than existing `-E`
behavior) that mocks up header units as if they were imported purely
from their path and content would be required.

- non-utf8 paths

The current standard says that paths that are not unambiguously
represented using UTF-8 are not supported (because these cases are rare
and the extra complication is not worth it at this time). Future
versions of the format might have ways of encoding non-UTF-8 paths. For
now, this patch just doesn't support non-UTF-8 paths (ignoring the
"unambiguously represetable in UTF-8" case).

- figure out why junk gets placed at the end of the file

Sometimes it seems like the file gets a lot of `NUL` bytes appended to
it. It happens rarely and seems to be the result of some
`ftruncate`-style call which results in extra padding in the contents.
Noting it here as an observation at least.

libcpp/

* include/cpplib.h: Add cpp_deps_format enum.
(cpp_options): Add format field
(cpp_finish): Add dependency stream parameter.
* include/mkdeps.h (deps_add_module_target): Add new preprocessor
parameter used for C++ module tracking.
* init.cc (cpp_finish): Add new preprocessor parameter used for C++
module tracking.
* mkdeps.cc (mkdeps): Implement P1689R5 output.

gcc/

* doc/invoke.texi: Document -fdeps-format=, -fdep-file=, and
-fdep-output= flags.

gcc/c-family/

* c-opts.cc (c_common_handle_option): Add fdeps_file variable and
-fdeps-format=, -fdep-file=, and -fdep-output= parsing.
* c.opt: Add -fdeps-format=, -fdep-file=, and -fdep-output= flags.

gcc/cp/

* module.cc (preprocessed_module): Pass whether the module is
exported to dependency tracking.

gcc/testsuite/

* g++.dg/modules/depflags-f-MD.C: New test.
* g++.dg/modules/depflags-f.C: New test.
* g++.dg/modules/depflags-fi.C: New test.
* g++.dg/modules/depflags-fj-MD.C: New test.
* g++.dg/modules/depflags-fj.C: New test.
* g++.dg/modules/depflags-fjo-MD.C: New test.
* g++.dg/modules/depflags-fjo.C: New test.
* g++.dg/modules/depflags-fo-MD.C: New test.
* g++.dg/modules/depflags-fo.C: New test.
* g++.dg/modules/depflags-j-MD.C: New test.
* g++.dg/modules/depflags-j.C: New test.
* g++.dg/modules/depflags-jo-MD.C: New test.
* g++.dg/modules/depflags-jo.C: New test.
* g++.dg/modules/depflags-o-MD.C: New test.
* g++.dg/modules/depflags-o.C: New test.
* g++.dg/modules/p1689-1.C: New test.
* g++.dg/modules/p1689-1.exp.json: New test expectation.
* g++.dg/modules/p1689-2.C: New test.
* g++.dg/modules/p1689-2.exp.json: New test expectation.
* g++.dg/modules/p1689-3.C: New test.
* g++.dg/modules/p1689-3.exp.json: New test expectation.
* g++.dg/modules/p1689-4.C: New test.
* g++.dg/modules/p1689-4.exp.json: New test expectation.
* g++.dg/modules/p1689-5.C: New test.
* g++.dg/modules/p1689-5.exp.json: New test expectation.
* g++.dg/modules/modules.exp: Load new P1689 library routines.
* g++.dg/modules/test-p1689.py: New tool for validating P1689 output.
* lib/modules.exp: Support for validating P1689 outputs.

Signed-off-by: Ben Boeckel 
---
 gcc/c-family/c-opts.cc|  40 +++-
 gcc/c-family/c.opt|  12 +
 gcc/cp/module.cc  |   3 +-
 gcc/doc/invoke.texi   |  15 ++
 gcc/testsuite/g++

[PATCH v3 1/3] libcpp: reject codepoints above 0x10FFFF

2022-11-08 Thread Ben Boeckel via Gcc-patches
Unicode does not support such values because they are unrepresentable in
UTF-16.

libcpp/

* charset.cc: Reject encodings of codepoints above 0x10.
UTF-16 does not support such codepoints and therefore all
Unicode rejects such values.

Signed-off-by: Ben Boeckel 
---
 libcpp/charset.cc | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/libcpp/charset.cc b/libcpp/charset.cc
index 12a398e7527..324b5b19136 100644
--- a/libcpp/charset.cc
+++ b/libcpp/charset.cc
@@ -158,6 +158,10 @@ struct _cpp_strbuf
encoded as any of DF 80, E0 9F 80, F0 80 9F 80, F8 80 80 9F 80, or
FC 80 80 80 9F 80.  Only the first is valid.
 
+   Additionally, Unicode declares that all codepoints above 0010 are
+   invalid because they cannot be represented in UTF-16. As such, all 5- and
+   6-byte encodings are invalid.
+
An implementation note: the transformation from UTF-16 to UTF-8, or
vice versa, is easiest done by using UTF-32 as an intermediary.  */
 
@@ -216,7 +220,7 @@ one_utf8_to_cppchar (const uchar **inbufp, size_t 
*inbytesleftp,
   if (c <= 0x3FF && nbytes > 5) return EILSEQ;
 
   /* Make sure the character is valid.  */
-  if (c > 0x7FFF || (c >= 0xD800 && c <= 0xDFFF)) return EILSEQ;
+  if (c > 0x10 || (c >= 0xD800 && c <= 0xDFFF)) return EILSEQ;
 
   *cp = c;
   *inbufp = inbuf;
@@ -320,7 +324,7 @@ one_utf32_to_utf8 (iconv_t bigend, const uchar **inbufp, 
size_t *inbytesleftp,
   s += inbuf[bigend ? 2 : 1] << 8;
   s += inbuf[bigend ? 3 : 0];
 
-  if (s >= 0x7FFF || (s >= 0xD800 && s <= 0xDFFF))
+  if (s > 0x10 || (s >= 0xD800 && s <= 0xDFFF))
 return EILSEQ;
 
   rval = one_cppchar_to_utf8 (s, outbufp, outbytesleftp);
-- 
2.38.1



[PATCH v5 0/5] P1689R5 support

2023-01-25 Thread Ben Boeckel via Gcc-patches
Hi,

This patch series adds initial support for ISO C++'s [P1689R5][], a
format for describing C++ module requirements and provisions based on
the source code. This is required because compiling C++ with modules is
not embarrassingly parallel and need to be ordered to ensure that
`import some_module;` can be satisfied in time by making sure that any
TU with `export import some_module;` is compiled first.

[P1689R5]: https://isocpp.org/files/papers/P1689R5.html

I've also added patches to include imported module CMI files and the
module mapper file as dependencies of the compilation. I briefly looked
into adding dependencies on response files as well, but that appeared to
need some code contortions to have a `class mkdeps` available before
parsing the command line or to keep the information around until one was
made.

I'd like feedback on the approach taken here with respect to the
user-visible flags. I'll also note that header units are not supported
at this time because the current `-E` behavior with respect to `import
;` is to search for an appropriate `.gcm` file which is not
something such a "scan" can support. A new mode will likely need to be
created (e.g., replacing `-E` with `-fc++-module-scanning` or something)
where headers are looked up "normally" and processed only as much as
scanning requires.

FWIW, Clang as taken an alternate approach with its `clang-scan-deps`
tool rather than using the compiler directly.

Thanks,

--Ben

---
v4 -> v5:

- add dependency tracking for imported modules to `-MF`
- add dependency tracking for static module mapper files given to
  `-fmodule-mapper=`

v3 -> v4:

- add missing spaces between function names and arguments

v2 -> v3:

- changelog entries moved to commit messages
- documentation updated/added in the UTF-8 routine editing

v1 -> v2:

- removal of the `deps_write(extra)` parameter to option-checking where
  ndeeded
- default parameter of `cpp_finish(fdeps_stream = NULL)`
- unification of libcpp UTF-8 validity functions from v1
- test cases for flag parsing states (depflags-*) and p1689 output
  (p1689-*)

Ben Boeckel (5):
  libcpp: reject codepoints above 0x10
  libcpp: add a function to determine UTF-8 validity of a C string
  p1689r5: initial support
  c++modules: report imported CMI files as dependencies
  c++modules: report module mapper files as a dependency

 gcc/c-family/c-opts.cc|  40 +++-
 gcc/c-family/c.opt|  12 +
 gcc/cp/mapper-client.cc   |   4 +
 gcc/cp/mapper-client.h|   1 +
 gcc/cp/module.cc  |  23 +-
 gcc/doc/invoke.texi   |  15 ++
 gcc/testsuite/g++.dg/modules/depflags-f-MD.C  |   2 +
 gcc/testsuite/g++.dg/modules/depflags-f.C |   1 +
 gcc/testsuite/g++.dg/modules/depflags-fi.C|   3 +
 gcc/testsuite/g++.dg/modules/depflags-fj-MD.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-fj.C|   4 +
 .../g++.dg/modules/depflags-fjo-MD.C  |   4 +
 gcc/testsuite/g++.dg/modules/depflags-fjo.C   |   5 +
 gcc/testsuite/g++.dg/modules/depflags-fo-MD.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-fo.C|   4 +
 gcc/testsuite/g++.dg/modules/depflags-j-MD.C  |   2 +
 gcc/testsuite/g++.dg/modules/depflags-j.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-jo-MD.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-jo.C|   4 +
 gcc/testsuite/g++.dg/modules/depflags-o-MD.C  |   2 +
 gcc/testsuite/g++.dg/modules/depflags-o.C |   3 +
 gcc/testsuite/g++.dg/modules/modules.exp  |   1 +
 gcc/testsuite/g++.dg/modules/p1689-1.C|  18 ++
 gcc/testsuite/g++.dg/modules/p1689-1.exp.json |  27 +++
 gcc/testsuite/g++.dg/modules/p1689-2.C|  16 ++
 gcc/testsuite/g++.dg/modules/p1689-2.exp.json |  16 ++
 gcc/testsuite/g++.dg/modules/p1689-3.C|  14 ++
 gcc/testsuite/g++.dg/modules/p1689-3.exp.json |  16 ++
 gcc/testsuite/g++.dg/modules/p1689-4.C|  14 ++
 gcc/testsuite/g++.dg/modules/p1689-4.exp.json |  14 ++
 gcc/testsuite/g++.dg/modules/p1689-5.C|  14 ++
 gcc/testsuite/g++.dg/modules/p1689-5.exp.json |  14 ++
 gcc/testsuite/g++.dg/modules/test-p1689.py| 222 ++
 gcc/testsuite/lib/modules.exp |  71 ++
 libcpp/charset.cc |  28 ++-
 libcpp/include/cpplib.h   |  12 +-
 libcpp/include/mkdeps.h   |  17 +-
 libcpp/init.cc|  13 +-
 libcpp/internal.h |   2 +
 libcpp/mkdeps.cc  | 149 +++-
 40 files changed, 789 insertions(+), 30 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-f-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-f.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fi.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fj-MD.C
 create mode 10064

[PATCH v5 1/5] libcpp: reject codepoints above 0x10FFFF

2023-01-25 Thread Ben Boeckel via Gcc-patches
Unicode does not support such values because they are unrepresentable in
UTF-16.

libcpp/

* charset.cc: Reject encodings of codepoints above 0x10.
UTF-16 does not support such codepoints and therefore all
Unicode rejects such values.

Signed-off-by: Ben Boeckel 
---
 libcpp/charset.cc | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/libcpp/charset.cc b/libcpp/charset.cc
index 3c47d4f868b..f7ae12ea5a2 100644
--- a/libcpp/charset.cc
+++ b/libcpp/charset.cc
@@ -158,6 +158,10 @@ struct _cpp_strbuf
encoded as any of DF 80, E0 9F 80, F0 80 9F 80, F8 80 80 9F 80, or
FC 80 80 80 9F 80.  Only the first is valid.
 
+   Additionally, Unicode declares that all codepoints above 0010 are
+   invalid because they cannot be represented in UTF-16. As such, all 5- and
+   6-byte encodings are invalid.
+
An implementation note: the transformation from UTF-16 to UTF-8, or
vice versa, is easiest done by using UTF-32 as an intermediary.  */
 
@@ -216,7 +220,7 @@ one_utf8_to_cppchar (const uchar **inbufp, size_t 
*inbytesleftp,
   if (c <= 0x3FF && nbytes > 5) return EILSEQ;
 
   /* Make sure the character is valid.  */
-  if (c > 0x7FFF || (c >= 0xD800 && c <= 0xDFFF)) return EILSEQ;
+  if (c > 0x10 || (c >= 0xD800 && c <= 0xDFFF)) return EILSEQ;
 
   *cp = c;
   *inbufp = inbuf;
@@ -320,7 +324,7 @@ one_utf32_to_utf8 (iconv_t bigend, const uchar **inbufp, 
size_t *inbytesleftp,
   s += inbuf[bigend ? 2 : 1] << 8;
   s += inbuf[bigend ? 3 : 0];
 
-  if (s >= 0x7FFF || (s >= 0xD800 && s <= 0xDFFF))
+  if (s > 0x10 || (s >= 0xD800 && s <= 0xDFFF))
 return EILSEQ;
 
   rval = one_cppchar_to_utf8 (s, outbufp, outbytesleftp);
-- 
2.39.0



[PATCH v5 3/5] p1689r5: initial support

2023-01-25 Thread Ben Boeckel via Gcc-patches
This patch implements support for [P1689R5][] to communicate to a build
system the C++20 module dependencies to build systems so that they may
build `.gcm` files in the proper order.

Support is communicated through the following three new flags:

- `-fdeps-format=` specifies the format for the output. Currently named
  `p1689r5`.

- `-fdeps-file=` specifies the path to the file to write the format to.

- `-fdep-output=` specifies the `.o` that will be written for the TU
  that is scanned. This is required so that the build system can
  correlate the dependency output with the actual compilation that will
  occur.

CMake supports this format as of 17 Jun 2022 (to be part of 3.25.0)
using an experimental feature selection (to allow for future usage
evolution without committing to how it works today). While it remains
experimental, docs may be found in CMake's documentation for
experimental features.

Future work may include using this format for Fortran module
dependencies as well, however this is still pending work.

[P1689R5]: https://isocpp.org/files/papers/P1689R5.html
[cmake-experimental]: 
https://gitlab.kitware.com/cmake/cmake/-/blob/master/Help/dev/experimental.rst

TODO:

- header-unit information fields

Header units (including the standard library headers) are 100%
unsupported right now because the `-E` mechanism wants to import their
BMIs. A new mode (i.e., something more workable than existing `-E`
behavior) that mocks up header units as if they were imported purely
from their path and content would be required.

- non-utf8 paths

The current standard says that paths that are not unambiguously
represented using UTF-8 are not supported (because these cases are rare
and the extra complication is not worth it at this time). Future
versions of the format might have ways of encoding non-UTF-8 paths. For
now, this patch just doesn't support non-UTF-8 paths (ignoring the
"unambiguously represetable in UTF-8" case).

- figure out why junk gets placed at the end of the file

Sometimes it seems like the file gets a lot of `NUL` bytes appended to
it. It happens rarely and seems to be the result of some
`ftruncate`-style call which results in extra padding in the contents.
Noting it here as an observation at least.

libcpp/

* include/cpplib.h: Add cpp_deps_format enum.
(cpp_options): Add format field
(cpp_finish): Add dependency stream parameter.
* include/mkdeps.h (deps_add_module_target): Add new preprocessor
parameter used for C++ module tracking.
* init.cc (cpp_finish): Add new preprocessor parameter used for C++
module tracking.
* mkdeps.cc (mkdeps): Implement P1689R5 output.

gcc/

* doc/invoke.texi: Document -fdeps-format=, -fdep-file=, and
-fdep-output= flags.

gcc/c-family/

* c-opts.cc (c_common_handle_option): Add fdeps_file variable and
-fdeps-format=, -fdep-file=, and -fdep-output= parsing.
* c.opt: Add -fdeps-format=, -fdep-file=, and -fdep-output= flags.

gcc/cp/

* module.cc (preprocessed_module): Pass whether the module is
exported to dependency tracking.

gcc/testsuite/

* g++.dg/modules/depflags-f-MD.C: New test.
* g++.dg/modules/depflags-f.C: New test.
* g++.dg/modules/depflags-fi.C: New test.
* g++.dg/modules/depflags-fj-MD.C: New test.
* g++.dg/modules/depflags-fj.C: New test.
* g++.dg/modules/depflags-fjo-MD.C: New test.
* g++.dg/modules/depflags-fjo.C: New test.
* g++.dg/modules/depflags-fo-MD.C: New test.
* g++.dg/modules/depflags-fo.C: New test.
* g++.dg/modules/depflags-j-MD.C: New test.
* g++.dg/modules/depflags-j.C: New test.
* g++.dg/modules/depflags-jo-MD.C: New test.
* g++.dg/modules/depflags-jo.C: New test.
* g++.dg/modules/depflags-o-MD.C: New test.
* g++.dg/modules/depflags-o.C: New test.
* g++.dg/modules/p1689-1.C: New test.
* g++.dg/modules/p1689-1.exp.json: New test expectation.
* g++.dg/modules/p1689-2.C: New test.
* g++.dg/modules/p1689-2.exp.json: New test expectation.
* g++.dg/modules/p1689-3.C: New test.
* g++.dg/modules/p1689-3.exp.json: New test expectation.
* g++.dg/modules/p1689-4.C: New test.
* g++.dg/modules/p1689-4.exp.json: New test expectation.
* g++.dg/modules/p1689-5.C: New test.
* g++.dg/modules/p1689-5.exp.json: New test expectation.
* g++.dg/modules/modules.exp: Load new P1689 library routines.
* g++.dg/modules/test-p1689.py: New tool for validating P1689 output.
* lib/modules.exp: Support for validating P1689 outputs.

Signed-off-by: Ben Boeckel 
---
 gcc/c-family/c-opts.cc|  40 +++-
 gcc/c-family/c.opt|  12 +
 gcc/cp/module.cc  |   3 +-
 gcc/doc/invoke.texi   |  15 ++
 gcc/testsuite/g++

[PATCH v5 4/5] c++modules: report imported CMI files as dependencies

2023-01-25 Thread Ben Boeckel via Gcc-patches
They affect the build, so report them via `-MF` mechanisms.

gcc/cp/

* module.cc (do_import): Report imported CMI files as
dependencies.

Signed-off-by: Ben Boeckel 
---
 gcc/cp/module.cc | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index ebd30f63d81..dbd1b721616 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -18966,6 +18966,8 @@ module_state::do_import (cpp_reader *reader, bool 
outermost)
   dump () && dump ("CMI is %s", file);
   if (note_module_cmi_yes || inform_cmi_p)
inform (loc, "reading CMI %qs", file);
+  /* Add the CMI file to the dependency tracking. */
+  deps_add_dep (cpp_get_deps (reader), file);
   fd = open (file, O_RDONLY | O_CLOEXEC | O_BINARY);
   e = errno;
 }
-- 
2.39.0



[PATCH v5 5/5] c++modules: report module mapper files as a dependency

2023-01-25 Thread Ben Boeckel via Gcc-patches
It affects the build, and if used as a static file, can reliably be
tracked using the `-MF` mechanism.

gcc/cp/:

* mapper-client.cc, mapper-client.h (open_module_client): Accept
dependency tracking and track module mapper files as
dependencies.
* module.cc (make_mapper, get_mapper): Pass the dependency
tracking class down.

Signed-off-by: Ben Boeckel 
---
 gcc/cp/mapper-client.cc |  4 
 gcc/cp/mapper-client.h  |  1 +
 gcc/cp/module.cc| 18 +-
 3 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/gcc/cp/mapper-client.cc b/gcc/cp/mapper-client.cc
index 39e80df2d25..0ce5679d659 100644
--- a/gcc/cp/mapper-client.cc
+++ b/gcc/cp/mapper-client.cc
@@ -34,6 +34,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "diagnostic-core.h"
 #include "mapper-client.h"
 #include "intl.h"
+#include "mkdeps.h"
 
 #include "../../c++tools/resolver.h"
 
@@ -132,6 +133,7 @@ spawn_mapper_program (char const **errmsg, std::string 
&name,
 
 module_client *
 module_client::open_module_client (location_t loc, const char *o,
+  class mkdeps *deps,
   void (*set_repo) (const char *),
   char const *full_program_name)
 {
@@ -285,6 +287,8 @@ module_client::open_module_client (location_t loc, const 
char *o,
  errmsg = "opening";
else
  {
+   /* Add the mapper file to the dependency tracking. */
+   deps_add_dep (deps, name.c_str ());
if (int l = r->read_tuple_file (fd, ident, false))
  {
if (l > 0)
diff --git a/gcc/cp/mapper-client.h b/gcc/cp/mapper-client.h
index b32723ce296..a3b0b8adc51 100644
--- a/gcc/cp/mapper-client.h
+++ b/gcc/cp/mapper-client.h
@@ -55,6 +55,7 @@ public:
 
 public:
   static module_client *open_module_client (location_t loc, const char *option,
+   class mkdeps *,
void (*set_repo) (const char *),
char const *);
   static void close_module_client (location_t loc, module_client *);
diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index dbd1b721616..37066bf072b 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -3969,12 +3969,12 @@ static GTY(()) vec 
*partial_specializations;
 /* Our module mapper (created lazily).  */
 module_client *mapper;
 
-static module_client *make_mapper (location_t loc);
-inline module_client *get_mapper (location_t loc)
+static module_client *make_mapper (location_t loc, class mkdeps *deps);
+inline module_client *get_mapper (location_t loc, class mkdeps *deps)
 {
   auto *res = mapper;
   if (!res)
-res = make_mapper (loc);
+res = make_mapper (loc, deps);
   return res;
 }
 
@@ -14031,7 +14031,7 @@ get_module (const char *ptr)
 /* Create a new mapper connecting to OPTION.  */
 
 module_client *
-make_mapper (location_t loc)
+make_mapper (location_t loc, class mkdeps *deps)
 {
   timevar_start (TV_MODULE_MAPPER);
   const char *option = module_mapper_name;
@@ -14039,7 +14039,7 @@ make_mapper (location_t loc)
 option = getenv ("CXX_MODULE_MAPPER");
 
   mapper = module_client::open_module_client
-(loc, option, &set_cmi_repo,
+(loc, option, deps, &set_cmi_repo,
  (save_decoded_options[0].opt_index == OPT_SPECIAL_program_name)
  && save_decoded_options[0].arg != progname
  ? save_decoded_options[0].arg : nullptr);
@@ -19503,7 +19503,7 @@ maybe_translate_include (cpp_reader *reader, line_maps 
*lmaps, location_t loc,
   dump.push (NULL);
 
   dump () && dump ("Checking include translation '%s'", path);
-  auto *mapper = get_mapper (cpp_main_loc (reader));
+  auto *mapper = get_mapper (cpp_main_loc (reader), cpp_get_deps (reader));
 
   size_t len = strlen (path);
   path = canonicalize_header_name (NULL, loc, true, path, len);
@@ -19619,7 +19619,7 @@ module_begin_main_file (cpp_reader *reader, line_maps 
*lmaps,
 static void
 name_pending_imports (cpp_reader *reader)
 {
-  auto *mapper = get_mapper (cpp_main_loc (reader));
+  auto *mapper = get_mapper (cpp_main_loc (reader), cpp_get_deps (reader));
 
   if (!vec_safe_length (pending_imports))
 /* Not doing anything.  */
@@ -20089,7 +20089,7 @@ init_modules (cpp_reader *reader)
 
   if (!flag_module_lazy)
 /* Get the mapper now, if we're not being lazy.  */
-get_mapper (cpp_main_loc (reader));
+get_mapper (cpp_main_loc (reader), cpp_get_deps (reader));
 
   if (!flag_preprocess_only)
 {
@@ -20299,7 +20299,7 @@ late_finish_module (cpp_reader *reader,  
module_processing_cookie *cookie,
 
   if (!errorcount)
 {
-  auto *mapper = get_mapper (cpp_main_loc (reader));
+  auto *mapper = get_mapper (cpp_main_loc (reader), cpp_get_deps (reader));
   mapper->ModuleCompiled (state->get_flatname ());
 }
   else if (cookie->cmi_name)
-- 
2.39.0



[PATCH v5 2/5] libcpp: add a function to determine UTF-8 validity of a C string

2023-01-25 Thread Ben Boeckel via Gcc-patches
This simplifies the interface for other UTF-8 validity detections when a
simple "yes" or "no" answer is sufficient.

libcpp/

* charset.cc: Add `_cpp_valid_utf8_str` which determines whether
a C string is valid UTF-8 or not.
* internal.h: Add prototype for `_cpp_valid_utf8_str`.

Signed-off-by: Ben Boeckel 
---
 libcpp/charset.cc | 20 
 libcpp/internal.h |  2 ++
 2 files changed, 22 insertions(+)

diff --git a/libcpp/charset.cc b/libcpp/charset.cc
index f7ae12ea5a2..616be9d02ee 100644
--- a/libcpp/charset.cc
+++ b/libcpp/charset.cc
@@ -1868,6 +1868,26 @@ _cpp_valid_utf8 (cpp_reader *pfile,
   return true;
 }
 
+/*  Detect whether a C-string is a valid UTF-8-encoded set of bytes. Returns
+`false` if any contained byte sequence encodes an invalid Unicode codepoint
+or is not a valid UTF-8 sequence. Returns `true` otherwise. */
+
+extern bool
+_cpp_valid_utf8_str (const char *name)
+{
+  const uchar* in = (const uchar*)name;
+  size_t len = strlen (name);
+  cppchar_t cp;
+
+  while (*in)
+{
+  if (one_utf8_to_cppchar (&in, &len, &cp))
+   return false;
+}
+
+  return true;
+}
+
 /* Subroutine of convert_hex and convert_oct.  N is the representation
in the execution character set of a numeric escape; write it into the
string buffer TBUF and update the end-of-string pointer therein.  WIDE
diff --git a/libcpp/internal.h b/libcpp/internal.h
index 9724676a8cd..48520901b2d 100644
--- a/libcpp/internal.h
+++ b/libcpp/internal.h
@@ -834,6 +834,8 @@ extern bool _cpp_valid_utf8 (cpp_reader *pfile,
 struct normalize_state *nst,
 cppchar_t *cp);
 
+extern bool _cpp_valid_utf8_str (const char *str);
+
 extern void _cpp_destroy_iconv (cpp_reader *);
 extern unsigned char *_cpp_convert_input (cpp_reader *, const char *,
  unsigned char *, size_t, size_t,
-- 
2.39.0



Re: [PATCH v5 0/5] P1689R5 support

2023-02-02 Thread Ben Boeckel via Gcc-patches
On Wed, Jan 25, 2023 at 16:06:31 -0500, Ben Boeckel wrote:
> This patch series adds initial support for ISO C++'s [P1689R5][], a
> format for describing C++ module requirements and provisions based on
> the source code. This is required because compiling C++ with modules is
> not embarrassingly parallel and need to be ordered to ensure that
> `import some_module;` can be satisfied in time by making sure that any
> TU with `export import some_module;` is compiled first.
> 
> [P1689R5]: https://isocpp.org/files/papers/P1689R5.html
> 
> I've also added patches to include imported module CMI files and the
> module mapper file as dependencies of the compilation. I briefly looked
> into adding dependencies on response files as well, but that appeared to
> need some code contortions to have a `class mkdeps` available before
> parsing the command line or to keep the information around until one was
> made.
> 
> I'd like feedback on the approach taken here with respect to the
> user-visible flags. I'll also note that header units are not supported
> at this time because the current `-E` behavior with respect to `import
> ;` is to search for an appropriate `.gcm` file which is not
> something such a "scan" can support. A new mode will likely need to be
> created (e.g., replacing `-E` with `-fc++-module-scanning` or something)
> where headers are looked up "normally" and processed only as much as
> scanning requires.
> 
> FWIW, Clang as taken an alternate approach with its `clang-scan-deps`
> tool rather than using the compiler directly.

Ping? It'd be nice to have this supported in at least GCC 14 (since it
missed 13).

Thanks,

--Ben


Re: [PATCH v5 0/5] P1689R5 support

2023-02-02 Thread Ben Boeckel via Gcc-patches
On Thu, Feb 02, 2023 at 21:24:12 +0100, Harald Anlauf wrote:
> Am 25.01.23 um 22:06 schrieb Ben Boeckel via Gcc-patches:
> > Hi,
> >
> > This patch series adds initial support for ISO C++'s [P1689R5][], a
> > format for describing C++ module requirements and provisions based on
> > the source code. This is required because compiling C++ with modules is
> > not embarrassingly parallel and need to be ordered to ensure that
> > `import some_module;` can be satisfied in time by making sure that any
> > TU with `export import some_module;` is compiled first.
> >
> > [P1689R5]: https://isocpp.org/files/papers/P1689R5.html
> 
> while that paper mentions Fortran, the patch in its present version
> does not seem to implement anything related to Fortran and does not
> touch the gfortran frontend.  Or am I missing anything?  Otherwise,
> could you give an example how it would be used with Fortran?

Correct. Still trying to put the walls back together after modules
KoolAid Man'd their way into the build graph structure :) . Being able
to drop our Fortran parser (well, we'd have to drop support for Fortran
compilers that exist today…so maybe in 2075 or something) and rely on
compilers to tell us the information would be amazing though :) .

FWIW, the initial revision of the patchset did touch the gfortran
frontend, but the new parameter is now defaulted and therefore the
callsite doesn't need an update anymore. I still thought it worthwhile
to keep the Fortran side aware of what is going on in the space.

The link to Fortran comes up because the build graph problem is
isomorphic (Fortran supports exporting multiple modules from a single
TU, but it's not relevant at the graph level; it's the zero -> any case
that is hard), CMake "solved" it already, and C++ is going to have a
*lot* more "I want to consume $other_project's modules using my favorite
compiler/flags" than seems to happen in Fortran. If you're interested,
this is the paper showing how we do it:

https://mathstuf.fedorapeople.org/fortran-modules/fortran-modules.html

> Thus I'd say that it is OK from the gfortran side.

Eventually we'll like to get gfortran supporting this type of scanning,
but…as above.

Thanks,

--Ben


Re: [PATCH v5 0/5] P1689R5 support

2023-02-03 Thread Ben Boeckel via Gcc-patches
On Fri, Feb 03, 2023 at 09:10:21 +, Jonathan Wakely wrote:
> On Fri, 3 Feb 2023 at 08:58, Jonathan Wakely wrote:
> > On Fri, 3 Feb 2023, 04:09 Andrew Pinski via Gcc,  wrote:
> >> On Wed, Jan 25, 2023 at 1:07 PM Ben Boeckel via Fortran
> >>  wrote:
> >> > This patch series adds initial support for ISO C++'s [P1689R5][], a
> >> > format for describing C++ module requirements and provisions based on
> >> > the source code. This is required because compiling C++ with modules is
> >> > not embarrassingly parallel and need to be ordered to ensure that
> >> > `import some_module;` can be satisfied in time by making sure that any
> >> > TU with `export import some_module;` is compiled first.
> >>
> >> I like how folks are complaining that GCC outputs POSIX makefile
> >> syntax from GCC's dependency files which are supposed to be in POSIX
> >> Makefile syntax.
> >> It seems like rather the build tools are people like to use are not
> >> understanding POSIX makefile syntax any more rather.
> >> Also I am not a fan of json, it is too verbose for no use. Maybe it is
> >> time to go back to standardizing a new POSIX makefile syntax rather
> >> than changing C++ here.

I'm not complaining that dependency files are in POSIX (or even
POSIX-to-be) syntax. The information requires a bit more structure than
some variable assignments and I don't expect anything trying to read
them to start trying to understand `VAR_$(DEREF)=` and the behaviors of
`:=` versus `=` assignment to get this reliably.

> > That would take a decade or more. It's too late for POSIX 202x and
> > the pace that POSIX agrees on makefile features is incredibly slow.
> 
> Also, name+=value is *not* POSIX make syntax today, that's an
> extension. That's why the tools don't always support it.
> So I don't think it's true that GCC's dependency files are in POSIX syntax.
> 
> POSIX 202x does add support for it, but it will take some time for it
> to be supported everywhere.

Additionally, while the *syntax* might be supported, encoding all of
P1689 in it would require additional work (e.g., key/value variable
assignments or something). Batch scanning would also be…interesting.
Also note that the imported modules' location cannot be known before
scanning in general, so all you get are "logical names" that you need a
collator to link up with other scan results anyways. Tools such as
`make` and `ninja` cannot know, in general, how to do this linking
between arbitrary targets (e.g., there may be a debug and release build
of the same module in the graph and knowing which to use requires
higher-level info about the entire build graph; modules may also be
considered "private" and not accessible everywhere and therefore should
also not be hooked up across different target boundaries).

While the `CXX_MODULES +=` approach can work for simple cases (a
pseudo-implicit build), it is quite insufficient for the general case.

--Ben