I'm trying to read a text file (actually the ftp file in command below),
and I'm getting an error:
> SpCodes=read.fwf("
ftp://ftpext.usgs.gov/pub/er/md/laurel/BBS/DataFiles/SpeciesList.txt",
+ widths=c(7,6,51,51), skip=6, n=5, header=F,
stringsAsFactors=F)
Error in substring(x, first, last) :
invalid multibyte string at '<e0> vent'
The problem is caused by"Dendrocygne à ventre noir", which has a French
character which seems to be causing the problems: there are more throughout
the file (and I want to read the whole file: I'm picking uot bits above to
make it easier), so I can't manually delete this. The file is apparently in
the ISO-8859 format (or it might be windows-1252), but using that in either
encoding= or fileEncoding= doesn't work:
SpCodes=read.fwf("
ftp://ftpext.usgs.gov/pub/er/md/laurel/BBS/DataFiles/SpeciesList.txt",
widths=c(7,6,51,51), skip=6, n=5, header=F,
stringsAsFactors=F, fileEncoding="ISO-8859")
Can anyone suggest a solution? In case it helps, here's my session info:
> sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
LC_TIME=en_GB.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_US.UTF-8
LC_PAPER=en_GB.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] tools_3.1.0
>
--
Bob O'Hara
Biodiversity and Climate Research Centre
Senckenberganlage 25
D-60325 Frankfurt am Main,
Germany
Tel: +49 69 798 40226
Mobile: +49 1515 888 5440
WWW: http://www.bik-f.de/root/index.php?page_id=219
Blog: http://occamstypewriter.org/boboh/
Journal of Negative Results - EEB: www.jnr-eeb.org
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.