On 09/08/18 21:08, dmalcolm at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86904 > > Bug ID: 86904 > Summary: Column numbers ignore tab characters > Product: gcc > Version: unknown > Status: UNCONFIRMED > Keywords: diagnostic > Severity: normal > Priority: P3 > Component: other > Assignee: unassigned at gcc dot gnu.org > Reporter: dmalcolm at gcc dot gnu.org > Target Milestone: --- > > As noted in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=19165#c21 : > > /* Both gcc and emacs number source *lines* starting at 1, but > they have differing conventions for *columns*. > > GCC uses a 1-based convention for source columns, > whereas Emacs's M-x column-number-mode uses a 0-based convention. > > For example, an error in the initial, left-hand > column of source line 3 is reported by GCC as: > > some-file.c:3:1: error: ...etc... > > On navigating to the location of that error in Emacs > (e.g. via "next-error"), > the locus is reported in the Mode Line > (assuming M-x column-number-mode) as: > > some-file.c 10% (3, 0) > > i.e. "3:1:" in GCC corresponds to "(3, 0)" in Emacs. */ > > In terms of 0 vs 1, GCC complies with the GNU standards here: > https://www.gnu.org/prep/standards/html_node/Errors.html > > However our "column numbers" are also simply a 1-based byte-count, so a tab > character is treated by us as simply an increment of 1 right now. > > (see also PR 49973, which covers the case of multibyte characters). > > It turns out that we convert tab characters to *single* space characters when > printing source code. > > This behavior has been present since Manu first implemented > -fdiagnostics-show-caret in r186305 (aka > 5a9830842f69ebb059061e26f8b0699cbd85121e, PR 24985), where it was this logic > (there in diagnostic.c's diagnostic_show_locus): > char c = *line == '\t' ? ' ' : *line; > pp_character (context->printer, c); > > (that logic is now in diagnostic-show-locus.c in layout::print_source_line) > > This is arguably a bug, but it's intimately linked to the way in which we > track > "column numbers". > > Our "column numbers" are currently simply a 1-based byte-count, I believe, so > a > tab character is treated by us as simply an increment of 1 right now. > > There are similar issues with encodings that aren't 1 byte per character (e.g. > non-ASCII unicode characters), which are being tracked in PR 49973. > > Presumably, when we print source lines containing tab characters, we should > emit a number of spaces to reach a tab stop. > > Consider a diagnostic with a multiline range covering the > following source (and beyond): > > indent: 6 (tabs: 0, spaces: 6) > indent: 7 (tabs: 0, spaces: 7) > indent: 8 (tabs: 1, spaces: 0) > indent: 9 (tabs: 1, spaces: 1) > > i.e.: > > " indent: 6 (tabs: 0, spaces: 6)\n" > " indent: 7 (tabs: 0, spaces: 7)\n" > "\tindent: 8 (tabs: 1, spaces: 0)\n" > "\t indent: 9 (tabs: 1, spaces: 1)\n" > > Currently diagnostic_show_locus prints: > > indent: 6 (tabs: 0, spaces: 6) > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > indent: 7 (tabs: 0, spaces: 7) > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > indent: 8 (tabs: 1, spaces: 0) > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > indent: 9 (tabs: 1, spaces: 1) > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > i.e: > " indent: 6 (tabs: 0, spaces: 6)\n" > " ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n" > " indent: 7 (tabs: 0, spaces: 7)\n" > " ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n" > " indent: 8 (tabs: 1, spaces: 0)\n" > " ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n" > " indent: 9 (tabs: 1, spaces: 1)\n" > " ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n" > > which misrepresents the indentation of the user's code. > > It should respect tabstops, and print: > > indent: 6 (tabs: 0, spaces: 6) > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > indent: 7 (tabs: 0, spaces: 7) > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > indent: 8 (tabs: 1, spaces: 0) > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > indent: 9 (tabs: 1, spaces: 1) > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > i.e.: > > " indent: 6 (tabs: 0, spaces: 6)\n" > " ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n" > " indent: 7 (tabs: 0, spaces: 7)\n" > " ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n" > " indent: 8 (tabs: 1, spaces: 0)\n" > " ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n" > " indent: 9 (tabs: 1, spaces: 1)\n" > " ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n" > > We should also handle erroneous leading spaces before a tab, so that e.g. > > " \tfoo" > > should be printed as if it were: > > "\tfoo" > > (given that that's what the user's editor is probably printing it as). > > Similarly, we should presumably print "8" for the column number for the 'f' of > "foo". However, IDEs are expecting GCC's existing behavior, so we should > probably add a command-line option for controlling this. > > Adding a left margin with line numbers (as of r263450) doesn't change this > bug, > but makes fixing it slightly more complicated. > > Maybe: > -fdiagnostics-x-coord=bytes : count of bytes > -fdiagnostics-x-coord=characters : count of characters (not special-casing > tab) > -fdiagnostics-x-coord=columns : count of columns: as per characters, but > with > tabs doing tabstops
how about -fdiagnostics-x-coord=visual-[n] Where n is the size of a hard tab? Some folks change the tab stop to 4, for example. Or maybe ...coord=tab[-n], where -n defaults to "-8". R. > (currently we use "bytes" Not sure if we need "characters") > > I'm thinking that internally, we should continue to track byte offsets, but > make the option affect the presentation of the number in diagnostics*.c. > > (should it affect -fdiagnostics-parseable-fixits ? > see also the Emacs RFE for parseable fixits: > https://debbugs.gnu.org/cgi/bugreport.cgi?bug=25987 ) >