On 17 February 2018 at 21:10, Hugh Parsonage wrote: | I was told to re-raise this issue with R-dev: | | In the documentation of R-dev and R-3.4.3, under ?gsub | | > replacement | > ... For perl = TRUE only, it can also contain "\U" or "\L" to convert the rest of the replacement to upper or lower case and "\E" to end case conversion. | | However, the following code runs differently: | | tempf <- tempfile() | writeLines(enc2utf8("author: Amélie"), con = tempf, useBytes = TRUE) | entry <- readLines(tempf, encoding = "UTF-8") | gsub("(\\w)", "\\U\\1", entry, perl = TRUE) | | | "AUTHOR: AMÉLIE" # R-3.4.3 | | "A" # R-dev
Confirmed for R-devel (current) on Ubuntu 17.10. But ... isn't the regexp you use wrong, ie isn't R-devel giving the correct answer? R> tempf <- tempfile() R> writeLines(enc2utf8("author: Amélie"), con = tempf, useBytes = TRUE) R> entry <- readLines(tempf, encoding = "UTF-8") R> gsub("(\\w)", "\\U\\1", entry, perl = TRUE) [1] "A" R> gsub("(\\w+)", "\\U\\1", entry, perl = TRUE) [1] "AUTHOR" R> gsub("(.*)", "\\U\\1", entry, perl = TRUE) [1] "AUTHOR: AMÉLIE" R> Dirk -- http://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel