On StackOverflow (here: https://stackoverflow.com/questions/78803652/why-does-gsub-in-r-match-one-character-too-many) there was a question about this result:

> gsub("^([0-9]{,5}).*","\\1","123456789")
[1] "123456"

The OP expected "12345" as the result.  Several points were raised:

- The R docs don't mention the case of {,5} for the default perl = FALSE which uses TRE.
 - perl = TRUE gives the OP's expected result of "12345".
- perl = TRUE does *not* give the documented result on at least one system (which is "123456789", because "{,5}" is documented to not be a quantifier, so it should only match the literal string "{,5}"). - Some regexp engines (including Perl and Awk) document that "12345" is correct.

Is any of this worth fixing?

Duncan Murdoch

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to