On Thu, 2025-01-30 at 12:01 +0000, Bader, Lucas wrote: > Thanks a lot for your detailed feedback! > I will rework my patch, especially to make the > get_source_line_preprocessed function > more readable and more efficient. > > Some comments in the mean time: > > > This may sound silly, but please use "num" rather than "no" as an > > abbreviation for "number": to my under-caffeinated brain "no" reads > > as > > the opposite of "yes". > > I fully agree, seems like 5 years ago my naming taste was a bit off. > > > What about buffer[1]? Does this have to be a ' '? > ... > > Don't we also need a trailing '"' and a newline? > > Ack, doesn't hurt to check buffer[1]. The newline might not come > directly after the trailing '"' however, as > line markers can contain additional flags after the source file name. > If we have: > > # (\d+) "(.+)" > > it should be safe to assume it's a line marker. > > > Has this code been tested on large examples? It looks to me like > > this > function does a linear scan starting at line 1 every time in the > .i/.ii > file upwards looking for markers of interest; > > Good points, I am looking into tracking line markers in the > file_cache_slot. > I am currently testing a version with a simple vector for > memoization. > > FWIW we are using this patch in our internal gcc fork for building a > pretty substantial C++ codebase and > have not seen build time regressions directly linked to it. > However, testing it against a fairly large .ii file that produces > many warnings from that codebase (~400k LoC) shows > significant speedup with the tracking version: > > ~/gcc-dev/bin/gcc TEST.cpp.ii 62.58s user 1.01s system 99% cpu > 1:03.66 total # path as is > ~/gcc-dev/bin/gcc TEST.cpp.ii 32.80s user 0.95s system 99% cpu > 33.749 total # with line marker memoization
Interesting - thanks. Do you have a profile of the hot spots? I wonder to what extent Andi's recent patches would help. > > > This function is rather complicated > > I agree. To be honest, looking at it after a couple of years was also > not very pleasant. > > > Something else that occurred to me: assuming the file in question > > has > gone through libcpp, we've already parsed the line markers, and > line_table will have a series of line map instances representing the > ranges in question. > > According to some comments I found, it seems that for e.g the C > front-end, it can happen that we start emitting diagnostics > before the line map has seen the end of the file, so it could > probably only act as a hint? Yes - though I think for any location we're likely to warn about, the line maps instance will have line map instances for all of the markers;. For reference the code in libcpp to parse the line markers is in libcpp/directives.cc:::do_linemarker But this will only work for cases where libcpp is used to lex the file, which is the case for frontends that support .i/.ii files, but there are use-cases of input.cc that don't do that, e.g. libgdiagnostics. So though I thought this alternate approach was worth mentioning, I think the overall way your patch does it is probably the one we should follow, as it avoids tight coupling with libcpp. Thanks Dave