Ah, thanks. Now I understand a bit more what's happening (and also a bit more about connections: I have managed to avoid having to understand them).
Right, now I can read my meta-data, I'll start on trying to read the data... Bob On 13 May 2014 16:10, peter dalgaard <pda...@gmail.com> wrote: > Hi Bob, Long time no see. > > The following works for me. In general, I think it is tricky to rely on > encodings to be passed on to the appropriate agent, so try to be as > specific as possible about it. > > con <- url(" > ftp://ftpext.usgs.gov/pub/er/md/laurel/BBS/DataFiles/SpeciesList.txt", > encoding="Latin1") > SpCodes=read.fwf(con, > widths=c(7,6,51,51), skip=6, n=5, header=F, > stringsAsFactors=F) > > AFAICT, the root cause is that encoding= is passed by read.fwf() to > read.table(), once the columns are split out, but not to the file > connection used to get the data for splitting. > > It also worked to get the whole enchilada using readLines, convert with > iconv() and then use read.fwf on a textConnection to the converted lines. > > And, BTW, even though encoding names vary between platforms, "ISO-8859" is > almost surely wrong, because there is "ISO-8859-1", "ISO-8859-2", ... > > - Peter > > > On 13 May 2014, at 15:35 , Bob O'Hara <rni....@gmail.com> wrote: > > > I'm trying to read a text file (actually the ftp file in command below), > > and I'm getting an error: > > > >> SpCodes=read.fwf(" > > ftp://ftpext.usgs.gov/pub/er/md/laurel/BBS/DataFiles/SpeciesList.txt", > > + widths=c(7,6,51,51), skip=6, n=5, header=F, > > stringsAsFactors=F) > > Error in substring(x, first, last) : > > invalid multibyte string at '<e0> vent' > > > > The problem is caused by"Dendrocygne à ventre noir", which has a French > > character which seems to be causing the problems: there are more > throughout > > the file (and I want to read the whole file: I'm picking uot bits above > to > > make it easier), so I can't manually delete this. The file is apparently > in > > the ISO-8859 format (or it might be windows-1252), but using that in > either > > encoding= or fileEncoding= doesn't work: > > > > SpCodes=read.fwf(" > > ftp://ftpext.usgs.gov/pub/er/md/laurel/BBS/DataFiles/SpeciesList.txt", > > widths=c(7,6,51,51), skip=6, n=5, header=F, > > stringsAsFactors=F, fileEncoding="ISO-8859") > > > > Can anyone suggest a solution? In case it helps, here's my session info: > >> sessionInfo() > > R version 3.1.0 (2014-04-10) > > Platform: x86_64-pc-linux-gnu (64-bit) > > > > locale: > > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > > LC_TIME=en_GB.UTF-8 LC_COLLATE=en_US.UTF-8 > > [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_US.UTF-8 > > LC_PAPER=en_GB.UTF-8 LC_NAME=C > > [9] LC_ADDRESS=C LC_TELEPHONE=C > > LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C > > > > attached base packages: > > [1] stats graphics grDevices utils datasets methods base > > > > loaded via a namespace (and not attached): > > [1] tools_3.1.0 > >> > > > > > > -- > > Bob O'Hara > > > > Biodiversity and Climate Research Centre > > Senckenberganlage 25 > > D-60325 Frankfurt am Main, > > Germany > > > > Tel: +49 69 798 40226 > > Mobile: +49 1515 888 5440 > > WWW: http://www.bik-f.de/root/index.php?page_id=219 > > Blog: http://occamstypewriter.org/boboh/ > > Journal of Negative Results - EEB: www.jnr-eeb.org > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > -- > Peter Dalgaard, Professor > Center for Statistics, Copenhagen Business School > Solbjerg Plads 3, 2000 Frederiksberg, Denmark > Phone: (+45)38153501 > Email: pd....@cbs.dk Priv: pda...@gmail.com > > -- Bob O'Hara Biodiversity and Climate Research Centre Senckenberganlage 25 D-60325 Frankfurt am Main, Germany Tel: +49 69 798 40226 Mobile: +49 1515 888 5440 WWW: http://www.bik-f.de/root/index.php?page_id=219 Blog: http://occamstypewriter.org/boboh/ Journal of Negative Results - EEB: www.jnr-eeb.org [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.