Re: [Rd] Error in substring: invalid multibyte string

2020-06-29 Thread Tomas Kalibera
From the user's (or package author's) point, all strings should always be valid in their declared encoding. If they are not, the result of string operations is undefined - it may be an error or warning, but also silently produced correct or incorrect result. There are R functions that check if

Re: [Rd] Error in substring: invalid multibyte string

2020-06-27 Thread Toby Hocking
Thanks for the quick response Ivan. readLines with encoding='latin1' works for me (on Ubuntu). However I was more concerned with the inconsistency in results between substr and regexpr. I was expecting that if one of them errors because of an unknown encoding then the other should as well. Even be

Re: [Rd] Error in substring: invalid multibyte string

2020-06-27 Thread Ivan Krylov
On Fri, 26 Jun 2020 15:57:06 -0700 Toby Hocking wrote: >invalid multibyte string at 'gel-A<6b>iyoshi' >https://stat.ethz.ch/pipermail/r-devel/1999-November/author.html The server says that the text is UTF-8: curl -sI \ https://stat.ethz.ch/pipermail/r-devel/1999-November/author.html | \ grep

[Rd] Error in substring: invalid multibyte string

2020-06-26 Thread Toby Hocking
Hi all, I'm getting the following error from substring: > substr("Jens Oehlschl\xe4gel-Akiyoshi", 1, 100) Error in substr("Jens Oehlschl\xe4gel-Akiyoshi", 1, 100) : invalid multibyte string at 'gel-A<6b>iyoshi' Is that normal / intended? I've tried setting the Encoding/locale to Latin-1/UTF-8 b