On Sat, 30 Sept 2023 at 22:12, Joern Rennecke <joern.renne...@embecosm.com> wrote:
> Also, we might have different directives for not scanning in LTO sections - > or just ignoring .ascii . Or maybe the other way round - you have to do > something special if you want to scan inside strings, and by default we > don't look inside strings? > LTO information uses ascii, and ISTR sometimes also a zero-terminated > variant (asciiz?); There might also some string constant outputs, or stabs > information. > One possible rule I think might work is: if the RE doesn't mention a quote, > don't scan what's quoted inside double quotes. Although we might to have > to look out for backslash-escaped quotes to find the proper end of a quoted > string. I've though about this some more, and we need something that's simple for dejagnu and simple to describe. So I propose we look at the first character of the regexp, and if it's neither ^ nor \ (neither caret nor backslash), we consider the regexp un-anchored, and prepend ^[^"]* , so it won't allow a match after a double quote. Then document this in sourcebuild.texi, with some mention of lto information and stabs, and also mentioning that if you really want to match irrespective of a leading quote, you can prepend ^.* to your regexp. There are good reasons to be more specific with your regexps in general, but the matches in LTO are particularily damaging because they appear semi-random, so often escape a regression test when the test is made, only to surface during somebody else's regression test.