On Dec 31, 2011, at 16:05 , Dennis Fisher wrote:

> R version: 2.13.1
> OS X
> 
> Colleagues,
> 
> I am working with a CSV file; for testing purposes, I created an XLS version 
> of the file.  
> When I read these files using read.xls (gdata) or read.csv, I encounter an 
> error:
>       Error in type.convert(data[[i]], as.is = as.is[i], dec = dec, 
> na.strings = character(0L)) : 
>         invalid multibyte string at '<b0>C'
> The error occurs whether or not I invoke the "as.is" option of read.csv.
> 
> The trigger for this error is a "degree C" string (\xb0).  The offending line 
> is:
> [1] 
> "\"DD4A14\",\"VITALS\",\"SITE038\",\"038-501\",\"SCREENING\",\"\",\"Temperature\",\"37.8\",\"\xb0C\",\"1005_TS\",\"e2\",\"1005/cla\",\"\",5/25/2011,-1,2,0,0,0,0,0,0,1,7/20/2011
>  16:48:25,240,1"

I think this means that you are working in UTF-8, trying to read something that 
is encoded in Latin-1. Try playing with the fileEncoding or encoding arguments; 
my first try would be fileEncoding="latin1".

-pd


> 
> I can get around the error by reading the file with readLines, then editing 
> out that character:
>       PATH <- textConnection(sub("\xb0", "degrees", readLines(PATH)))
>       read.csv(PATH, header=T, as.is=T) 
> This alternate approach is successful.  This leads to two questions:
> 
> 1.  Why can readLines handle that character string whereas read.csv cannot?
> 
> 2.  Reading the text connection is slow -  it takes ~ 11 seconds to read a 
> file with 11K rows.  I edited the file to replace to offending character with 
> "degree".  read.csv reads the 11K rows of the new file in a fraction of a 
> second. Can someone explain why reading the text connection is so much slower 
> than reading a file?
> 
> Dennis
> 
> Dennis Fisher MD
> P < (The "P Less Than" Company)
> Phone: 1-866-PLessThan (1-866-753-7784)
> Fax: 1-866-PLessThan (1-866-753-7784)
> www.PLessThan.com
> 
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd....@cbs.dk  Priv: pda...@gmail.com

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to