I'm trying to read a text file (actually the ftp file in command below), and I'm getting an error:
> SpCodes=read.fwf(" ftp://ftpext.usgs.gov/pub/er/md/laurel/BBS/DataFiles/SpeciesList.txt", + widths=c(7,6,51,51), skip=6, n=5, header=F, stringsAsFactors=F) Error in substring(x, first, last) : invalid multibyte string at '<e0> vent' The problem is caused by"Dendrocygne à ventre noir", which has a French character which seems to be causing the problems: there are more throughout the file (and I want to read the whole file: I'm picking uot bits above to make it easier), so I can't manually delete this. The file is apparently in the ISO-8859 format (or it might be windows-1252), but using that in either encoding= or fileEncoding= doesn't work: SpCodes=read.fwf(" ftp://ftpext.usgs.gov/pub/er/md/laurel/BBS/DataFiles/SpeciesList.txt", widths=c(7,6,51,51), skip=6, n=5, header=F, stringsAsFactors=F, fileEncoding="ISO-8859") Can anyone suggest a solution? In case it helps, here's my session info: > sessionInfo() R version 3.1.0 (2014-04-10) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_GB.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_GB.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_3.1.0 > -- Bob O'Hara Biodiversity and Climate Research Centre Senckenberganlage 25 D-60325 Frankfurt am Main, Germany Tel: +49 69 798 40226 Mobile: +49 1515 888 5440 WWW: http://www.bik-f.de/root/index.php?page_id=219 Blog: http://occamstypewriter.org/boboh/ Journal of Negative Results - EEB: www.jnr-eeb.org [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.