This is AFAICS an instance of bug PR#14408 : it seems that in UTF-8 locales the grammar generated by the TRE engine for repetitions is in odd cases buggy. And as the author has vanished, our hopes of his fixing it are slim.

Try perl=TRUE .

On 09/12/2011 14:20, Jannis wrote:
Dear R users,


the way I understand the documentation of sub() and regexp the following code:



sub('[[:digit:]]{1,2}', '', '9ewww')



... should yield:

'ewww'


It returns, however:

'www'


Why is this the case? My code should just substitute 1 (minimum) or up to 2 
(maximum) digits, i.e. numbers and not the 'e' in the string. Do I misinterpret 
something here?


Thanks for any ideas
Jannis


sessionInfo()
R version 2.14.0 (2011-10-31)
Platform: i686-pc-linux-gnu (32-bit)

locale:
  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=C                 LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
Brian D. Ripley,                  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to