Re: [R] Problem comparing two strings

Björn Fisseler Mon, 18 Nov 2019 07:40:33 -0800

Thank you! That solved my problem!

Best


         Björn

Am 18.11.19 um 16:34 schrieb Ivan Krylov:
> On Mon, 18 Nov 2019 16:11:44 +0100
> "Björn Fisseler" <bjoern.fisse...@googlemail.com> wrote:
>
>> It's obviously the umlaut "ä" in this example which is encoded with
>> two respectively three bytes. The question is how to change this?
> Welcome to the wonderful world of Unicode-related problems! It is,
> indeed, possible to represent the same glyph using either one
> code-point (LATIN SMALL LETTER A WITH DIAERESIS) or two code points
> (LATIN SMALL LETTER A followed by COMBINING DIAERESIS). (Other
> combinations of code points resulting in the same glyph are probably
> also possible.)
>
> What you are looking for is called "Unicode normalization" and it is
> implemented in the stringi package, in functions stri_trans_nfc
> (normalization: there are multiple normal forms to choose from but W3C
> guidelines recommend NFC) and stri_compare / stri_cmp (test for
> canonical equivalence).
>
> See also: ?stringi::stri_cmp and https://stackoverflow.com/a/20684794
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem comparing two strings

Reply via email to