On 09/08/18 21:09, David Malcolm wrote:

It turns out that we convert tab characters to *single* space
characters when printing source code.

This behavior has been present since Manu first implemented
-fdiagnostics-show-caret in r186305 (aka
5a9830842f69ebb059061e26f8b0699cbd85121e, PR 24985), where it was this
logic (there in diagnostic.c's diagnostic_show_locus):
       char c = *line == '\t' ? ' ' : *line;
       pp_character (context->printer, c);

(that logic is now in diagnostic-show-locus.c in
layout::print_source_line)

Arguably this is a bug, but it's intimately linked to the way in which
we track "column numbers".  Our "column numbers" are currently simply a
1-based byte-count, I believe, so a tab character is treated by us as
simply an increment of 1 right now.  There are similar issues with
multibyte characters, which are being tracked in PR 49973.


Hi David,

At the time, this was done on purpose for two reasons:

1) The way we counted column numbers already counted tabs as 1-space and ...
2) It leads to wasting less horizontal space and more consistent output.

I believe that (1) was due to this bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52899 which got fixed.

The GCS says that column numbers should count tab stops as 8 spaces [https://www.gnu.org/prep/standards/html_node/Errors.html].
I believe that -ftabstop=8 is the default after PR52899 was fixed.

However, I see that GCC trunk still counts tabs as 1-column, probably because emacs counts tabs as one column when interpreting column numbers in the output of GCC. As long as both of these things are true, I believe it doesn't make much sense to print 8 spaces (or a tab) instead of a 1-column space. It will make interpreting the column numbers much harder and break the parsing of GCC diagnostics done by emacs.

Note that if we print the tab directly, the width of the tab in the terminal may not be the same as in the editor the user is using. Moreover, if the user is using tabs consistently (instead of using tabs on some lines and spaces in others), replacing tabs with 1 space will only reduce the visual space per indentation level, but the indentation structure will remain consistent.

I wish I had added a summary of the above to the code as a comment.

Finally, PR49973 is about GCC counting multiple columns for characters that should be counted as one column. This should be fixed in our line-map implementation using wcwidth() when lexing. It is not the same issue at all. Once column numbers are correctly counted, the output should be fine as well (the caret line does not change multi-byte characters).

I hope the above helps,

        Manuel.

Reply via email to