Replacement for diagnostics linter

2019-04-25 Thread Roland Illig
There's a linter in contrib/check-internal-format-escaping.py that
checks whether the base file for translations (such as French or German)
conform to some basic style rules. This mostly affects the diagnostics
emitted by GCC.

As the German translation I have added several checks to the linter and
added rationales to most of the checks.

Since I completely rewrote the linter, it's more convenient to not look
at the patch but only the rewritten code. Therefore I've provided both
of them.

See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90117.
See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90119.
See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90176.

Thoughts?
diff --git a/contrib/check-internal-format-escaping.py 
b/contrib/check-internal-format-escaping.py
index 9c625868012..e06752666b8 100755
--- a/contrib/check-internal-format-escaping.py
+++ b/contrib/check-internal-format-escaping.py
@@ -1,7 +1,8 @@
 #!/usr/bin/env python3
 #
-# Check gcc.pot file for gcc-internal-format and print all strings
-# that contain an option that is not wrapped by %<-option_name%>.
+# Check gcc.pot file for stylistic issues as described in
+# https://gcc.gnu.org/onlinedocs/gccint/Guidelines-for-Diagnostics.html,
+# especially in gcc-internal-format messages.
 #
 # This file is part of GCC.
 #
@@ -17,52 +18,249 @@
 #
 # You should have received a copy of the GNU General Public License
 # along with GCC; see the file COPYING3.  If not see
-# .  */
-#
-#
-#
+# .
 
 import argparse
 import re
+from collections import Counter
+from typing import Dict, Match
+
+import polib
+
+seen_warnings = Counter()
+
+
+def location(msg: polib.POEntry):
+if msg.occurrences:
+occ = msg.occurrences[0]
+return f'{occ[0]}:{occ[1]}'
+return ''
+
+
+def warn(msg: polib.POEntry,
+ diagnostic_id: str, diagnostic: str, include_msgid=True):
+"""
+To suppress a warning for a particular message,
+add a line "#, gcclint:ignore:{diagnostic_id}" to the message.
+"""
+
+if f'gcclint:ignore:{diagnostic_id}' in msg.flags:
+return
+
+seen_warnings[diagnostic] += 1
+
+if include_msgid:
+print(f'{location(msg)}: {diagnostic} in {repr(msg.msgid)}')
+else:
+print(f'{location(msg)}: {diagnostic}')
+
+
+def lint_gcc_internal_format(msg: polib.POEntry):
+"""
+Checks a single message that has the gcc-internal-format. These
+messages use a variety of placeholders like %qs, % and
+%q#E.
+"""
+
+msgid: str = msg.msgid
+
+def outside_quotes(m: Match[str]):
+before = msgid[:m.start(0)]
+return before.count("%<") == before.count("%>")
+
+def lint_matching_placeholders():
+"""
+Warns when literal values in placeholders are not exactly equal
+in the translation. This can happen when doing copy-and-paste
+translations of similar messages.
+
+To avoid these mismatches in the first place,
+structurally equal messages are found by
+lint_diagnostics_differing_only_in_placeholders.
+
+This check only applies when checking a finished translation
+such as de.po, not gcc.pot.
+"""
+
+if not msg.translated():
+return
+
+in_msgid = re.findall('%<[^%]+%>', msgid)
+in_msgstr = re.findall('%<[^%]+%>', msg.msgstr)
+
+if set(in_msgid) != set(in_msgstr):
+warn(msg,
+ 'placeholder-mismatch',
+ f'placeholder mismatch: msgid has {in_msgid}, '
+ f'msgstr has {in_msgstr}',
+ include_msgid=False)
+
+def lint_option_outside_quotes():
+for match in re.finditer(r'\S+', msgid):
+part = match.group()
+if not outside_quotes(match):
+continue
+
+if part.startswith('-'):
+if len(part) >= 2 and part[1].isalpha():
+if part == '-INF':
+continue
+
+warn(msg,
+ 'option-outside-quotes',
+ 'command line option outside %')
+
+if part.startswith('__builtin_'):
+warn(msg,
+ 'builtin-outside-quotes',
+ 'builtin function outside %')
+
+def lint_plain_apostrophe():
+for match in re.finditer("[^%]'", msgid):
+if outside_quotes(match):
+warn(msg, 'apostrophe', 'apostrophe without leading %')
+
+def lint_space_before_quote():
+"""
+A space before %< is often the result of string literals that
+are joined by the C compiler and neither literal has a space
+to separate the words.
+"""
+
+for match in re.finditer("(.?[a-zA-Z0-9])%<", msgid):
+if match.group(1) != '%s':
+warn(msg,
+ 'no-space-before-quote',
+ '%< directly f

[PATCH] Add myself to MAINTAINERS

2019-05-04 Thread Roland Illig
Already committed, as per https://www.gnu.org/software/gcc/svnwrite.html.

2019-05-04  Roland Illig  

* MAINTAINERS (Write After Approval): Add myself.
Index: MAINTAINERS
===
--- MAINTAINERS (Revision 270868)
+++ MAINTAINERS (Revision 270869)
@@ -427,6 +427,7 @@ Andrew John Hughes  

 Dominique d'Humieres   
 Andy Hutchinson
 Naveen H.S     
+Roland Illig   
 Meador Inge
 Bernardo Innocenti 
 Alexander Ivchenko 


[PATCH] improve linter for diagnostics

2019-05-14 Thread Roland Illig
The linter in contrib/ checks all messages that the GCC user may see
whether they conform roughly to the GCC Guidelines for Diagnostics.

* contrib/check-internal-format-escaping.py: improve output of
the linter by providing short rationales for the warnings;
detect space before punctuation; temporarily disable the most
frequent diagnostics, in order to first focus on the ones that
can be fixed more quickly
Index: contrib/check-internal-format-escaping.py
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===
--- contrib/check-internal-format-escaping.py   (revision 
c7bf741f801cfbdf56bf1ee11a74ca0abe1e8c16)
+++ contrib/check-internal-format-escaping.py   (date 1557864788799)
@@ -41,18 +41,19 @@
  diagnostic_id: str, diagnostic: str, include_msgid=True):
 """
 To suppress a warning for a particular message,
-add a line "#, gcclint:ignore:{diagnostic_id}" to the message.
+add a comment "# gcclint:ignore:{diagnostic_id}" to the message.
 """
 
-if f'gcclint:ignore:{diagnostic_id}' in msg.flags:
+marker = f'gcclint:ignore:{diagnostic_id}'
+if marker in msg.comment:
 return
 
-seen_warnings[diagnostic] += 1
+seen_warnings[diagnostic_id] += 1
 
 if include_msgid:
-print(f'{location(msg)}: {diagnostic} in {repr(msg.msgid)}')
+print(f'{location(msg)}: {diagnostic} in {msg.msgid!r} [{marker}]')
 else:
-print(f'{location(msg)}: {diagnostic}')
+print(f'{location(msg)}: {diagnostic} [{marker}]')
 
 
 def lint_gcc_internal_format(msg: polib.POEntry):
@@ -68,6 +69,9 @@
 before = msgid[:m.start(0)]
 return before.count("%<") == before.count("%>")
 
+def find_outside_quotes(regex: str, s: str):
+return filter(outside_quotes, re.finditer(regex, s))
+
 def lint_matching_placeholders():
 """
 Warns when literal values in placeholders are not exactly equal
@@ -89,36 +93,31 @@
 in_msgstr = re.findall('%<[^%]+%>', msg.msgstr)
 
 if set(in_msgid) != set(in_msgstr):
-warn(msg,
- 'placeholder-mismatch',
+warn(msg, 'placeholder-mismatch',
  f'placeholder mismatch: msgid has {in_msgid}, '
  f'msgstr has {in_msgstr}',
  include_msgid=False)
 
 def lint_option_outside_quotes():
-for match in re.finditer(r'\S+', msgid):
+for match in find_outside_quotes(r'\S+', msgid):
 part = match.group()
-if not outside_quotes(match):
-continue
 
-if part.startswith('-'):
-if len(part) >= 2 and part[1].isalpha():
-if part == '-INF':
-continue
-
-warn(msg,
- 'option-outside-quotes',
- 'command line option outside %')
+if len(part) >= 2 and part[0] == '-' \
+and part[1].isalpha() and part != '-INF':
+warn(msg, 'option-outside-quotes',
+ 'command line options '
+ 'should be enclosed in %')
 
 if part.startswith('__builtin_'):
-warn(msg,
- 'builtin-outside-quotes',
- 'builtin function outside %')
+warn(msg, 'builtin-outside-quotes',
+ 'names of builtin functions '
+ 'should be enclosed in %')
 
 def lint_plain_apostrophe():
-for match in re.finditer("[^%]'", msgid):
-if outside_quotes(match):
-warn(msg, 'apostrophe', 'apostrophe without leading %')
+if any(find_outside_quotes("[^%]'", msgid)):
+warn(msg, 'apostrophe',
+ 'apostrophes should be written as %\' '
+ 'to make them typographically correct')
 
 def lint_space_before_quote():
 """
@@ -129,26 +128,45 @@
 
 for match in re.finditer("(.?[a-zA-Z0-9])%<", msgid):
 if match.group(1) != '%s':
-warn(msg,
- 'no-space-before-quote',
- '%< directly following a letter or digit')
+warn(msg, 'space-before-quote',
+ 'a letter or digit should not be '
+ 'directly followed by a %<')
+
+def lint_space_before_punctuation():
+"""
+In written English, there is no space before a colon.
+In text enclosed in % it's ok though.
+"""
+
+for match in find_outside_quotes(" [.,:;?]", msgid):
+warn(msg, 'space-before-punctuation',
+ f'there should be no space before punctuation '
+ f'{match.group()!r}')
 
 def lint_underscore_outside_quotes():
 """
 An underscore outside of quotes is used in several c

patches to detect GCC diagnostics

2019-05-16 Thread Roland Illig
Hi Martin,

I'm impressed how much work you have put into the patches for detecting
nonoptimal diagnostics. It takes a long time to read through the
patches, but it's worth it, knowing that it took much longer for you to
prepare the patch, and that I won't have to submit i18n bug reports in
the foreseeable future. :)



+  /* Diagnose "arg" (short for "argument" when lazy).  */
+  if (!strncmp (format_chars, "arg", 3)
+ && (!format_chars[3]
+ || format_chars[3] == 's'
+ || ISSPACE (format_chars[3])))

Wouldn't it be sufficient to just check for !ISALNUM(format_chars[3])?
This would also catch "specify args, return type and so on".

I didn't like the magic "n == 3", but after experimenting a bit, I came
to the conclusion that the code you wrote is the best possible.

typo: ponters should be pointers

typo: drective should be directive

Since your code contains the expression strncmp(str, sub, sizeof sub -
1) occurs quite often, I was thinking whether it would be useful to
declare str_startswith, which expresses the actual intent more directly.

> nchars > 1

Better use ngettext in these 7 cases, to account for multiple plural
forms in Arabic, Polish and Russian. :)

> +  /* Diagnose a backtick (grave accent).  */

This diagnostic should explain how to fix this one since it might be
non-obvious.

typo: /* Likewise for gimple.  */ -- should be cgraph_node

typo: be  cdiagnosed -- spurious whitespace? ;)

possible typo: arn't

there is a FIXME after "you can%'t do that"

"ignoring %-specifier for non-static local " might be wrong, as
the word "asm-specifier" might come from the C or C++ grammar. Should
this be "% specifier", with a space?

Oh no. "%qE is not an % qualifier" might destroy my hopes of
merging diagnostics with the same pattern. If some of them need to be
prefixed with "a" and some others with "an", they cannot be merged. Or I
need to make an exception when the "before" string ends in "a" or "an".
Luckily, for "the" and "the" only the pronunciation differs but not the
spelling.

-   "%qE attribute argument %E is not in the range [0, %E)",
-   name, val, align);
+   "%qE attribute argument %E is not in the range %s, %E%c",
+   name, val, "[0", align, ')');

I don't like this one as it is asymmetrical. Either both characters
should be passed as %c, or none of them. I prefer passing none of them
to make the string easier to read for translators.

+   "unsuffixed floating constant");

I'd rather write "unsuffixed floating point constant". (appears multiple
times)

-  warning (OPT_Wpragmas, "#pragma redefine_extname ignored due to "
-  "conflict with __asm__ declaration");
+  warning (OPT_Wpragmas, "%<#pragma redefine_extname%> ignored "
+  "due to conflict with % declaration");

Are you sure that you want to remove the underscores? Just asking, I
haven't checked the surrounding code.

- error ("#pragma GCC target string... is badly formed");
+ error ("%<#pragma GCC target%> string is badly formed");
- error ("#pragma GCC optimize string... is badly formed");
+ error ("%<#pragma GCC optimize%> string is badly formed");

I think the "string..." was supposed to go inside the quotes.

+  warning (0, "%s:tag %qx is invalid", filename, tag);

I think there should be a space after the colon, but that should be in
another commit. This one is already big enough.

-inform (cloc, "%s%#qD ", msg, fn);
+inform (cloc, "%s%#qD %s", msg, fn, "");

This change and the similar ones around this will prevent the "" string from being translated at all. These strings should stay
in the format string, there needs to be a different solution for them.

-  error_at (loc, "typeid-expression is not a constant expression "
+  error_at (loc, "% is not a constant expression "

This sounds like a term from the C++ grammar.

+  inform (loc, "in C++11 destructors default to %");

grammar: "defaulting to"? A few lines below there is "because
destructors default", where this word is correct, so it may be a
copy-and-paste mistake.

+  inform (input_location, "  enters % statement");

According to https://en.cppreference.com/w/cpp/language/if#Constexpr_If,
the term "constexpr if statement" is a single word.

   error ("explicitly defaulted function %q+D cannot be declared "
 "as % because the implicit declaration is not "
-"%:", fn);
+"%qs:", fn, "constexpr");

I don't understand why you extracted one of the % but left
the other one in the message.

+  "an %-specification is not allowed "

See somewhere above.

+  error ("requested % %i is out of range [0, %i]",
+ pri, MAX_INIT_PRIORITY);

The other places use %u for MAX_INT_PRIORITY since its value is 65535. I
don't know whether GCC would work at all on platforms where %d is 16
bit, as it would be hard to address hundreds of megabytes of heap in
such an environment.

+  (0, "requested % %i is reserved for internal use",
+  pri);

Same.

+for real_option in -Wstr

Re: patches to detect GCC diagnostics

2019-05-17 Thread Roland Illig


Am 16.05.2019 um 22:24 schrieb Martin Sebor:
> On 5/16/19 8:58 AM, Roland Illig wrote:
>> -  error ("#pragma GCC target string... is badly formed");
>> +  error ("%<#pragma GCC target%> string is badly formed");
>> -  error ("#pragma GCC optimize string... is badly formed");
>> +  error ("%<#pragma GCC optimize%> string is badly formed");
>>
>> I think the "string..." was supposed to go inside the quotes.
>
> The "string" refers to the pragma argument and quoting it would
> imply that the word string should be verbatim.  So I think it's
> correct as is, despite the call above:
>
>   GCC_BAD ("%<#pragma GCC optimize (string [,string]...)%> does "
>    "not have a final %<)%>");
>
> where the string is part of the grammar being shown.

I disagree. All these messages are structurally the same. They refer to
"#pragma GCC target" as a fixed string and "string..." as being a
placeholder for arguments.

It doesn't make sense to say "the string '#pragma GCC target' is
malformed", which is what your suggested message implies. Either the
word "string" should be left out completely, or it should be marked as a
placeholder. You mentioned elsewhere that the diagnostics could use
italics. This might be another case for them.

Instead of the word "string", a more specific term should be used here,
such as "argument to %<#pragma GCC optimize%> is badly formed".

>> -  error_at (loc, "typeid-expression is not a constant expression "
>> +  error_at (loc, "% is not a constant expression "
>>
>> This sounds like a term from the C++ grammar.
>
> It does but there is no /typeid-expression/ in the C++ grammar.
> It's just % expression.  I took out the word "expression"
> because it seemed redundant with the "constant expression" that
> follows.
>
> That said, it could be handy to be able to uniquely distinguish
> grammar terms (do you translate C/C++ grammar terms or leave them
> in English?).  I'd like to add either a new directive or flag to
> %< to let us do that.  E.g. with something like:
>
>   "%#"
>
> GCC could render the term in italics (there's a bug where we
> discussed this that I can't find right now).

In the German translation I kept most of these grammar production names
in English, so that you can search for the exact terms.

In the French, Russian and Swedish translations, most of them are
translated into the user's native language, in order to make them more
understandable.

I'm not sure whether we really need additional directives or flags here.
I think the normal % would give a strong enough indication already.


Re: [PATCH 1/7] Add missing punctuation to message (PR driver/79875)

2017-03-11 Thread Roland Illig
Am 10.03.2017 um 05:12 schrieb Martin Sebor:
> I have just an observation/question here for future consideration.
> If this sort of diagnostic is common (I count 23 instances of it)
> or if it is expected to become common, would it make sense to add
> a directive for it to the pretty printer to ensure consistency?
> I.e., to automatically prepend "; did you mean" and append the
> "?" to the new directive?

As a translator, I'm not too concerned about this one, although it looks
very similar to the repeated "%qs requires %qs", which often appears for
command line options.

My main argument is that the "did you mean" is not just a name for a
thing, but it is part of the sentence grammar. To me that feels like too
much magic for a rarely used case (compared to %D or %qE or %qT, which
appear so often that I'm now very familiar with them).

Also, when adding a placeholder like %H for it, there would be no
surrounding whitespace, similar to %K. That confused me as a translator
when I first saw it, since it is not nicely documented; neither in the
Gettext manual
(https://www.gnu.org/software/gettext/manual/gettext.html#gcc_002dinternal_002dformat)
nor in the GCC files implementing them. There should be at least some
examples of what each placeholder typically expands to.

Roland