Re: [Rd] Question about regexp edge case

2024-08-09 Thread Duncan Murdoch
Thanks! I think your suggested additions to the docs are perfect. Duncan Murdoch On 2024-08-09 5:01 a.m., Tomas Kalibera wrote: On 8/1/24 20:55, Duncan Murdoch wrote: Thanks Tomas.  Do note that my original post also mentioned a bug or doc error in the PCRE docs for this regexp:   - perl

Re: [Rd] Question about regexp edge case

2024-08-09 Thread Tomas Kalibera
On 8/1/24 20:55, Duncan Murdoch wrote: Thanks Tomas.  Do note that my original post also mentioned a bug or doc error in the PCRE docs for this regexp:   - perl = TRUE does *not* give the documented result on at least one system (which is "123456789", because "{,5}" is documented to not be

Re: [Rd] Question about regexp edge case

2024-08-01 Thread Duncan Murdoch
Thanks Tomas. Do note that my original post also mentioned a bug or doc error in the PCRE docs for this regexp: - perl = TRUE does *not* give the documented result on at least one system (which is "123456789", because "{,5}" is documented to not be a quantifier, so it should only match the

Re: [Rd] Question about regexp edge case

2024-08-01 Thread Tomas Kalibera
On 7/29/24 09:37, Ivan Krylov via R-devel wrote: В Sun, 28 Jul 2024 20:02:21 -0400 Duncan Murdoch пишет: gsub("^([0-9]{,5}).*","\\1","123456789") [1] "123456" This is in TRE itself: for "^([0-9]{,1})" tre_regexecb returns {.rm_so = 0, .rm_eo = 1}, matching "1", but for "^([0-9]{,2})" and ab

Re: [Rd] Question about regexp edge case

2024-07-29 Thread Ivan Krylov via R-devel
В Sun, 28 Jul 2024 20:02:21 -0400 Duncan Murdoch пишет: > gsub("^([0-9]{,5}).*","\\1","123456789") > [1] "123456" This is in TRE itself: for "^([0-9]{,1})" tre_regexecb returns {.rm_so = 0, .rm_eo = 1}, matching "1", but for "^([0-9]{,2})" and above it returns an off-by-one result, {.rm_so = 0

[Rd] Question about regexp edge case

2024-07-28 Thread Duncan Murdoch
On StackOverflow (here: https://stackoverflow.com/questions/78803652/why-does-gsub-in-r-match-one-character-too-many) there was a question about this result: > gsub("^([0-9]{,5}).*","\\1","123456789") [1] "123456" The OP expected "12345" as the result. Several points were raised: - The R do