> * William Dunlap <jqha...@gvopb.pbz> [2012-09-13 19:50:21 +0000]: > > On Windows with R-2.15.1 in a 1252 locale, I had to read (and toss) out > the initial 3 bytes (the byte-order mark?) to make things work: > > > socket <- > > > url("http://www.loc.gov/standards/iso639-2/ISO-639-2_utf-8.txt",open="r",encoding="utf-8") > > readChar(socket, nchars=3, useBytes=TRUE) > [1] ""
confirmed - first 3 bytes are "\357\273\277" > > d <- read.table(socket, quote="", sep="|", stringsAsFactors=FALSE) > > dim(d) > [1] 485 5 > > head(d) > V1 V2 V3 V4 V5 > 1 aar aa Afar afar > 2 abk ab Abkhazian abkhaze > 3 ace Achinese aceh > 4 ach Acoli acoli > 5 ada Adangme adangme > 6 ady Adyghe; Adygei adyghé alas, this is all I get: Warning message: In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : invalid input found on input connection 'http://www.loc.gov/standards/iso639-2/ISO-639-2_utf-8.txt' a3bibliographic a3terminologic a2 english french 1 aar NA aa Afar afar 2 abk NA ab Abkhazian abkhaze 3 ace NA Achinese aceh 4 ach NA Acoli acoli 5 ada NA Adangme adangme 6 ady NA Adyghe; Adygei adygh note that the first non-ASCII character terminates the input. so, I still cannot read the data from the URL. I can read the file though - with quote="" (thanks Peter!) - except that the first record is "\357\273\277aar". -- Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000 http://www.childpsy.net/ http://thereligionofpeace.com http://mideasttruth.com http://iris.org.il http://jihadwatch.org The only thing worse than X Windows: (X Windows) - X ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.