From the user's (or package author's) point, all strings should always
be valid in their declared encoding. If they are not, the result of
string operations is undefined - it may be an error or warning, but also
silently produced correct or incorrect result. There are R functions
that check if
Thanks for the quick response Ivan. readLines with encoding='latin1' works
for me (on Ubuntu).
However I was more concerned with the inconsistency in results between
substr and regexpr. I was expecting that if one of them errors because of
an unknown encoding then the other should as well. Even be
On Fri, 26 Jun 2020 15:57:06 -0700
Toby Hocking wrote:
>invalid multibyte string at 'gel-A<6b>iyoshi'
>https://stat.ethz.ch/pipermail/r-devel/1999-November/author.html
The server says that the text is UTF-8:
curl -sI \
https://stat.ethz.ch/pipermail/r-devel/1999-November/author.html | \
grep
Hi all,
I'm getting the following error from substring:
> substr("Jens Oehlschl\xe4gel-Akiyoshi", 1, 100)
Error in substr("Jens Oehlschl\xe4gel-Akiyoshi", 1, 100) :
invalid multibyte string at 'gel-A<6b>iyoshi'
Is that normal / intended? I've tried setting the Encoding/locale to
Latin-1/UTF-8 b