Ah, thanks. Now I understand a bit more what's happening (and also a bit
more about connections: I have managed to avoid having to understand them).

Right, now I can read my meta-data, I'll start on trying to read the data...

Bob


On 13 May 2014 16:10, peter dalgaard <pda...@gmail.com> wrote:

> Hi Bob, Long time no see.
>
> The following works for me. In general, I think it is tricky to rely on
> encodings to be passed on to the appropriate agent, so try to be as
> specific as possible about it.
>
> con <- url("
> ftp://ftpext.usgs.gov/pub/er/md/laurel/BBS/DataFiles/SpeciesList.txt";,
>      encoding="Latin1")
> SpCodes=read.fwf(con,
>                 widths=c(7,6,51,51), skip=6, n=5, header=F,
>                 stringsAsFactors=F)
>
> AFAICT, the root cause is that encoding= is passed by read.fwf() to
> read.table(), once the columns are split out, but not to the file
> connection used to get the data for splitting.
>
> It also worked to get the whole enchilada using readLines, convert with
> iconv() and then use read.fwf on a textConnection to the converted lines.
>
> And, BTW, even though encoding names vary between platforms, "ISO-8859" is
> almost surely wrong, because there is "ISO-8859-1", "ISO-8859-2", ...
>
> - Peter
>
>
> On 13 May 2014, at 15:35 , Bob O'Hara <rni....@gmail.com> wrote:
>
> > I'm trying to read a text file (actually the ftp file in command below),
> > and I'm getting an error:
> >
> >> SpCodes=read.fwf("
> > ftp://ftpext.usgs.gov/pub/er/md/laurel/BBS/DataFiles/SpeciesList.txt";,
> > +                  widths=c(7,6,51,51), skip=6, n=5, header=F,
> > stringsAsFactors=F)
> > Error in substring(x, first, last) :
> >  invalid multibyte string at '<e0> vent'
> >
> > The problem is caused by"Dendrocygne à ventre noir", which has a French
> > character which seems to be causing the problems: there are more
> throughout
> > the file (and I want to read the whole file: I'm picking uot bits above
> to
> > make it easier), so I can't manually delete this. The file is apparently
> in
> > the ISO-8859 format (or it might be windows-1252), but using that in
> either
> > encoding= or fileEncoding= doesn't work:
> >
> > SpCodes=read.fwf("
> > ftp://ftpext.usgs.gov/pub/er/md/laurel/BBS/DataFiles/SpeciesList.txt";,
> >                 widths=c(7,6,51,51), skip=6, n=5, header=F,
> > stringsAsFactors=F, fileEncoding="ISO-8859")
> >
> > Can anyone suggest a solution? In case it helps, here's my session info:
> >> sessionInfo()
> > R version 3.1.0 (2014-04-10)
> > Platform: x86_64-pc-linux-gnu (64-bit)
> >
> > locale:
> > [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
> > LC_TIME=en_GB.UTF-8        LC_COLLATE=en_US.UTF-8
> > [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_US.UTF-8
> > LC_PAPER=en_GB.UTF-8       LC_NAME=C
> > [9] LC_ADDRESS=C               LC_TELEPHONE=C
> > LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
> >
> > attached base packages:
> > [1] stats     graphics  grDevices utils     datasets  methods   base
> >
> > loaded via a namespace (and not attached):
> > [1] tools_3.1.0
> >>
> >
> >
> > --
> > Bob O'Hara
> >
> > Biodiversity and Climate Research Centre
> > Senckenberganlage 25
> > D-60325 Frankfurt am Main,
> > Germany
> >
> > Tel: +49 69 798 40226
> > Mobile: +49 1515 888 5440
> > WWW:   http://www.bik-f.de/root/index.php?page_id=219
> > Blog: http://occamstypewriter.org/boboh/
> > Journal of Negative Results - EEB: www.jnr-eeb.org
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> --
> Peter Dalgaard, Professor
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Email: pd....@cbs.dk  Priv: pda...@gmail.com
>
>


-- 
Bob O'Hara

Biodiversity and Climate Research Centre
Senckenberganlage 25
D-60325 Frankfurt am Main,
Germany

Tel: +49 69 798 40226
Mobile: +49 1515 888 5440
WWW:   http://www.bik-f.de/root/index.php?page_id=219
Blog: http://occamstypewriter.org/boboh/
Journal of Negative Results - EEB: www.jnr-eeb.org

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to