On Thu, 2025-01-30 at 12:01 +0000, Bader, Lucas wrote:
> Thanks a lot for your detailed feedback!
> I will rework my patch, especially to make the
> get_source_line_preprocessed function
> more readable and more efficient.
> 
> Some comments in the mean time:
> 
> > This may sound silly, but please use "num" rather than "no" as an
> > abbreviation for "number": to my under-caffeinated brain "no" reads
> > as
> > the opposite of "yes".
> 
> I fully agree, seems like 5 years ago my naming taste was a bit off.
> 
> > What about buffer[1]?  Does this have to be a ' '?
> ...
> > Don't we also need a trailing '"' and a newline?
> 
> Ack, doesn't hurt to check buffer[1]. The newline might not come
> directly after the trailing '"' however, as
> line markers can contain additional flags after the source file name.
> If we have:
> 
> # (\d+) "(.+)"
> 
> it should be safe to assume it's a line marker.
> 
> > Has this code been tested on large examples?  It looks to me like
> > this
> function does a linear scan starting at line 1 every time in the
> .i/.ii
> file upwards looking for markers of interest;
> 
> Good points, I am looking into tracking line markers in the
> file_cache_slot.
> I am currently testing a version with a simple vector for
> memoization.
> 
> FWIW we are using this patch in our internal gcc fork for building a
> pretty substantial C++ codebase and
> have not seen build time regressions directly linked to it.
> However, testing it against a fairly large .ii file that produces
> many warnings from that codebase (~400k LoC) shows 
> significant speedup with the tracking version:
> 
> ~/gcc-dev/bin/gcc TEST.cpp.ii  62.58s user 1.01s system 99% cpu
> 1:03.66 total # path as is
> ~/gcc-dev/bin/gcc TEST.cpp.ii  32.80s user 0.95s system 99% cpu
> 33.749 total # with line marker memoization

Interesting - thanks.  Do you have a profile of the hot spots?  I
wonder to what extent Andi's recent patches would help.

> 
> > This function is rather complicated
> 
> I agree. To be honest, looking at it after a couple of years was also
> not very pleasant.
> 
> > Something else that occurred to me: assuming the file in question
> > has
> gone through libcpp, we've already parsed the line markers, and
> line_table will have a series of line map instances representing the
> ranges in question.
> 
> According to some comments I found, it seems that for e.g the C
> front-end, it can happen that we start emitting diagnostics
> before the line map has seen the end of the file, so it could
> probably only act as a hint?

Yes - though I think for any location we're likely to warn about, the
line maps instance will have line map instances for all of the
markers;.  For reference the code in libcpp to parse the line markers
is in libcpp/directives.cc:::do_linemarker

But this will only work for cases where libcpp is used to lex the file,
which is the case for frontends that support .i/.ii files, but there
are use-cases of input.cc that don't do that, e.g. libgdiagnostics.

So though I thought this alternate approach was worth mentioning, I
think the overall way your patch does it is probably the one we should
follow, as it avoids tight coupling with libcpp.

Thanks
Dave

Reply via email to