http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52952
--- Comment #11 from dodji at seketeli dot org <dodji at seketeli dot org> 2012-05-24 20:30:10 UTC --- "manu at gcc dot gnu.org" <gcc-bugzi...@gcc.gnu.org> a écrit: >> With the current infrastructure, I fear we cannot re-process the format >> string *after* the initial pre-processing phase is done, to create new >> locations that we'd a in the line maps. > > Could you elaborate on the reasons for this? Is it impossible to create new > locations on the fly? Not really, no. It is not practical to insert new locations inside an existing location map because that would imply to update (modify) all the maps that come after the one you have modified. Also, that would invalidate all the locations that have been encoded by the maps that you would have updated. Basically, the current encoding of the map requires that a new location encoding in a map must always be the last location of that map. You cannot insert a location in the "middle" of an existing map. > As a brute-force approach, we at least should be able to re-preproces the > whole > file, no? I guess that would take re-processing the whole compilation unit, starting from the location map that you have changed. And, just handling locations wouldn't be enough, we'd need to basically re-tokenize the files that are re-processed, because the locations are primarily carried by instances of cpp_token, and we need the locations of these tokens to be updated. That doesn't seem practical. > Could we do this by invoking libcpp directly rather than calling a > command? Yes, even though it wouldn't be practical, in my opinion. > >> Does that make sense? > > This implies that the diagnostics code would need to handle a byte > offset, no? Yes, probably. > And I am not sure this will handle well the case of split strings and macro > expansion, like Clang does. Yeah. Which makes me think that maybe we might want to introduce a new way to represent string literal tokens in libcpp, that keeps the underlying raw format. There would be a character oriented iterator API for that string literal representation. And that iterator API could provide its user with the file/line/column information for the current character. And one could pass any such iterator to some of the diagnostic routines.