On Dec 31, 2011, at 16:05 , Dennis Fisher wrote: > R version: 2.13.1 > OS X > > Colleagues, > > I am working with a CSV file; for testing purposes, I created an XLS version > of the file. > When I read these files using read.xls (gdata) or read.csv, I encounter an > error: > Error in type.convert(data[[i]], as.is = as.is[i], dec = dec, > na.strings = character(0L)) : > invalid multibyte string at '<b0>C' > The error occurs whether or not I invoke the "as.is" option of read.csv. > > The trigger for this error is a "degree C" string (\xb0). The offending line > is: > [1] > "\"DD4A14\",\"VITALS\",\"SITE038\",\"038-501\",\"SCREENING\",\"\",\"Temperature\",\"37.8\",\"\xb0C\",\"1005_TS\",\"e2\",\"1005/cla\",\"\",5/25/2011,-1,2,0,0,0,0,0,0,1,7/20/2011 > 16:48:25,240,1"
I think this means that you are working in UTF-8, trying to read something that is encoded in Latin-1. Try playing with the fileEncoding or encoding arguments; my first try would be fileEncoding="latin1". -pd > > I can get around the error by reading the file with readLines, then editing > out that character: > PATH <- textConnection(sub("\xb0", "degrees", readLines(PATH))) > read.csv(PATH, header=T, as.is=T) > This alternate approach is successful. This leads to two questions: > > 1. Why can readLines handle that character string whereas read.csv cannot? > > 2. Reading the text connection is slow - it takes ~ 11 seconds to read a > file with 11K rows. I edited the file to replace to offending character with > "degree". read.csv reads the 11K rows of the new file in a fraction of a > second. Can someone explain why reading the text connection is so much slower > than reading a file? > > Dennis > > Dennis Fisher MD > P < (The "P Less Than" Company) > Phone: 1-866-PLessThan (1-866-753-7784) > Fax: 1-866-PLessThan (1-866-753-7784) > www.PLessThan.com > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd....@cbs.dk Priv: pda...@gmail.com ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.