https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88947

--- Comment #6 from Tim Shen <timshen at gcc dot gnu.org> ---
(In reply to Tomalak Geret'kal from comment #4)
> To be honest I'd expect this in less trivial circumstances too. If, at a
> given stage of processing, the only possible paths towards a match all
> require a prefix that's already been ruled out, that should be an immediate
> return false.

Thinking about this more, I think it's also easy to support the following case:
regex_search(..., regex("{arbitrary_literal_string}...")
where {arbitrary_literal_string} is a literal string, without other regex magic
like "|" or "*". The literal prefix should be passed into a substring search.

For implemention:
The new algorithm for regex_search() would be:
(1) prefix = find the longest deterministic prefix of the regex
(2) pos = find the first occurance of the prefix in the target. If it fails,
return false.
(3) target = target[pos+prefix.size():]
(4) try match target from start (as it's currently done)
(5) if (4) fails, go to (2).

(1) can be done at regex-compiling time. It'd require a new data member thus
non-ABI compatible. (1) can also be done at matching time. Either way, we are
looking for the longest sequence of literals from start. It stops with any of
"*", "?", "(" or any other magic.

(2) can be as plain as a substring search, but one might prefer the O(n) KMP
algorithm if we happen to have one.

Reply via email to