I can confirm that it doesn't happen on Ubuntu 18.04.1 so Peter is most likely correct; it looks like its Windows specific.
On Thu, 7 Feb 2019 at 12:55, peter dalgaard <pda...@gmail.com> wrote: > > This doesn't seem to be happening on MacOS, neither in Terminal nor RStudio, > (R 3.5.1, R-devel, R-patched). So probably Windows specific. > > -pd > > > On 7 Feb 2019, at 11:17 , David Byrne <david.byrne...@gmail.com> wrote: > > > > Bug > > Using read.table(file, encoding="UTF-8") to import a UTF-8 encoded > > file containing the infinity symbol (' ∞ ') results in the infinity > > symbol imported as the number 8. Other Unicode characters seem > > unaffected, example, Zhe: ж > > > > Expected Behavior: > > The imported data.frame should represent the infinity symbol as the > > expected 'Inf' so that normal mathematical operations can be processed > > > > Stack Overflow Post: > > I created a question on Stack Overflow where one other member was able > > to reproduce the same issues I was having. This question can be found > > at: > > https://stackoverflow.com/questions/54522196/r-read-table-with-utf-8-encoded-file-reads-infinity-symbol-as-8-int > > > > Method to Reproduce - 1: > > A simple method to reproduce this issues is to use R-Studio: In the > > console, type the following: > >> read.table(text=" ∞", encoding="UTF-8") > > > > The result should be a data.frame with a single value of '8' > > > > Repeating the same with ж Results in correct expected behavior > > > > Method to Reproduce - 2: > > Create a .csv file containing the infinity and Zhe characters (I have > > attached the file for convenience, hopefully it is no rejected by your > > email service). Launch an interactive session using > > > >> r --vanilla > > > > Enter the following statement taking care to replace the > > <path-to-file> with the appropriate one: > > > >> read.table("<path-to-file>/unicode_chars.csv", sep=",", encoding="UTF-8") > > > > > > This should result in a two element data.frame; the first being the > > incorrect value of 8 with an additional <U+FEFF> and the second the > > correct value of Zhe. > > > > Note the additional <U+FEFF> prefixed to the front of the '8'. This > > appears to be a hidden character for the purposes of letting editors > > know the encoding. The following link has some explanation however, it > > states this is caused by excel. The file I created was done so using > > notepad and not Excel. > > > > https://medium.freecodecamp.org/a-quick-tale-about-feff-the-invisible-character-cd25cd4630e7 > > > > System Details: > > OS: > >> Windows 10.0.17134 Build 17134 > > > > > > R Version: > >> platform x86_64-w64-mingw32 > >> arch x86_64 > >> os mingw32 > >> system x86_64, mingw32 > >> status > >> major 3 > >> minor 4.1 > >> year 2017 > >> month 06 > >> day 30 > >> svn rev 72865 > >> language R > >> version.string R version 3.4.1 (2017-06-30) > >> nickname Single Candle > > ______________________________________________ > > R-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > -- > Peter Dalgaard, Professor, > Center for Statistics, Copenhagen Business School > Solbjerg Plads 3, 2000 Frederiksberg, Denmark > Phone: (+45)38153501 > Office: A 4.23 > Email: pd....@cbs.dk Priv: pda...@gmail.com > > > > > > > > > ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel