https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116724
--- Comment #5 from David Malcolm <dmalcolm at gcc dot gnu.org> ---
(In reply to Hans-Peter Nilsson from comment #4)
> (In reply to David Malcolm from comment #1)
>
> > Perhaps we should try to capture both the untranslated text and the
> > translated text? SARIF has various abilities for handling translations.
To clarify, consider this hypothethical diagnostic:
error_at (location,
"missing %qs after %qs",
"decl-name", "foo");
with a hypothetical "pig-latin" locale and translation (pig-latin.po); see
https://en.wikipedia.org/wiki/Pig_Latin
where
"missing %qs after %qs"
has this translation in the .po file:
"issingmay %qs afteray %qs"
The classic text output format might read:
foo.c:42:11: erroray: issingmay `decl-name' afteray `foo'
and currently GCC's SARIF output would presumably capture the text of the
message with:
message: {"text": {"issingmay `decl-name' afteray `foo'"}}
i.e. currently GCC's SARIF output for a formatted string "bakes in" both
localization of the format string *and* param substitution.
We could instead defer parameter substitution to the SARIF consumer via ยง3.11.5
"Messages with placeholders"
(https://docs.oasis-open.org/sarif/sarif/v2.1.0/errata01/os/sarif-v2.1.0-errata01-os-complete.html#_Toc141790716)
like this:
message: {"text": {"issingmay `{0}' afteray `{1}'",
"arguments: ["decl-name", "foo"]}
and, with that, potentially capture the pre-translated message string *and* its
translation in the currently after .po file e.g.:
message: {id: "missing %qs after %qs",
"arguments: ["decl-name", "foo"]}
with something like this: (see 3.11.7 "Message string lookup")
"translations": [
{ # A toolComponent object.
"language": "pig-latin",
"contents": ["localizedData"],
"globalMessageStrings": [
{"missing %qs after %qs": {"text": "issingmay {0} afteray {1}"}}]}]
where we'd list the subset of format strings that got used by diagnostics in
the particular log, and their translations, with the caveats that:
- I'm not sure that that's how translations of strings are meant to be stored
(the SARIF spec's tutorial doesn't seem to cover translations yet)
- I'm using (abusing?) the string as its own "id"
If gettext supported it, could even try to capture translations from *all* .po
files. But if needed that's probably much easier to handle via a
post-processing script.
> Works for me! The use-case I was thinking of, is for the SARIF output to be
> a nice containment of the non-source-code part of bug-reports: "instead of
> quoting stderr, use --diagnostics-format=sarif-file and send
> sourcename.sarif".
Sounds like an interesting idea; can you open this as a separate RFE please?
> But, to fulfill that, more is needed, including the gcc
> arguments. (Maybe that's all.)
I've added support for capturing the command-line arguments in GCC 15:
https://gcc.gnu.org/pipermail/gcc-patches/2024-July/658206.html
though note that it's capturing the arguments as supplied by the driver to e.g.
cc1, as opposed to those that the user supplied to the driver.
> I don't see that included, right?
> Sorry for the "creaturization request"!
Thanks for the feedback; hope the above makes sense.