Re: [R] Problem comparing two strings

2019-11-18 Thread peter dalgaard
A version of this came up not long ago in a slightly different context (bug 17369: parse() doesn't honor unicode in NFD normalization). The basic issue is that there are different unicode normalizations (look it up...). Briefly, accented characters exist in two forms, one as a single code poin

Re: [R] Problem comparing two strings

2019-11-18 Thread Duncan Murdoch
On 18/11/2019 10:11 a.m., Björn Fisseler wrote: Hello, I'm struggling comparing two strings, which come from different data sets. This strings are identical: "Alexander Jäger" But when I compare these strings: string1 == string2 the result is FALSE. Looking at the raw bytes used to encode the

Re: [R] Problem comparing two strings

2019-11-18 Thread Björn Fisseler
Thank you! That solved my problem! Best         Björn Am 18.11.19 um 16:34 schrieb Ivan Krylov: > On Mon, 18 Nov 2019 16:11:44 +0100 > "Björn Fisseler" wrote: > >> It's obviously the umlaut "ä" in this example which is encoded with >> two respectively three bytes. The question is how to change

Re: [R] Problem comparing two strings

2019-11-18 Thread Ivan Krylov
On Mon, 18 Nov 2019 16:11:44 +0100 "Björn Fisseler" wrote: > It's obviously the umlaut "ä" in this example which is encoded with > two respectively three bytes. The question is how to change this? Welcome to the wonderful world of Unicode-related problems! It is, indeed, possible to represent th