Thank you! That solved my problem! Best
Björn Am 18.11.19 um 16:34 schrieb Ivan Krylov: > On Mon, 18 Nov 2019 16:11:44 +0100 > "Björn Fisseler" <bjoern.fisse...@googlemail.com> wrote: > >> It's obviously the umlaut "ä" in this example which is encoded with >> two respectively three bytes. The question is how to change this? > Welcome to the wonderful world of Unicode-related problems! It is, > indeed, possible to represent the same glyph using either one > code-point (LATIN SMALL LETTER A WITH DIAERESIS) or two code points > (LATIN SMALL LETTER A followed by COMBINING DIAERESIS). (Other > combinations of code points resulting in the same glyph are probably > also possible.) > > What you are looking for is called "Unicode normalization" and it is > implemented in the stringi package, in functions stri_trans_nfc > (normalization: there are multiple normal forms to choose from but W3C > guidelines recommend NFC) and stri_compare / stri_cmp (test for > canonical equivalence). > > See also: ?stringi::stri_cmp and https://stackoverflow.com/a/20684794 > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.