Hi Bob, Long time no see.

The following works for me. In general, I think it is tricky to rely on 
encodings to be passed on to the appropriate agent, so try to be as specific as 
possible about it.

con <- 
url("ftp://ftpext.usgs.gov/pub/er/md/laurel/BBS/DataFiles/SpeciesList.txt";,
     encoding="Latin1")
SpCodes=read.fwf(con,
                widths=c(7,6,51,51), skip=6, n=5, header=F,
                stringsAsFactors=F)

AFAICT, the root cause is that encoding= is passed by read.fwf() to 
read.table(), once the columns are split out, but not to the file connection 
used to get the data for splitting.

It also worked to get the whole enchilada using readLines, convert with iconv() 
and then use read.fwf on a textConnection to the converted lines.

And, BTW, even though encoding names vary between platforms, "ISO-8859" is 
almost surely wrong, because there is "ISO-8859-1", "ISO-8859-2", ...

- Peter


On 13 May 2014, at 15:35 , Bob O'Hara <rni....@gmail.com> wrote:

> I'm trying to read a text file (actually the ftp file in command below),
> and I'm getting an error:
> 
>> SpCodes=read.fwf("
> ftp://ftpext.usgs.gov/pub/er/md/laurel/BBS/DataFiles/SpeciesList.txt";,
> +                  widths=c(7,6,51,51), skip=6, n=5, header=F,
> stringsAsFactors=F)
> Error in substring(x, first, last) :
>  invalid multibyte string at '<e0> vent'
> 
> The problem is caused by"Dendrocygne à ventre noir", which has a French
> character which seems to be causing the problems: there are more throughout
> the file (and I want to read the whole file: I'm picking uot bits above to
> make it easier), so I can't manually delete this. The file is apparently in
> the ISO-8859 format (or it might be windows-1252), but using that in either
> encoding= or fileEncoding= doesn't work:
> 
> SpCodes=read.fwf("
> ftp://ftpext.usgs.gov/pub/er/md/laurel/BBS/DataFiles/SpeciesList.txt";,
>                 widths=c(7,6,51,51), skip=6, n=5, header=F,
> stringsAsFactors=F, fileEncoding="ISO-8859")
> 
> Can anyone suggest a solution? In case it helps, here's my session info:
>> sessionInfo()
> R version 3.1.0 (2014-04-10)
> Platform: x86_64-pc-linux-gnu (64-bit)
> 
> locale:
> [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
> LC_TIME=en_GB.UTF-8        LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_US.UTF-8
> LC_PAPER=en_GB.UTF-8       LC_NAME=C
> [9] LC_ADDRESS=C               LC_TELEPHONE=C
> LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
> 
> loaded via a namespace (and not attached):
> [1] tools_3.1.0
>> 
> 
> 
> -- 
> Bob O'Hara
> 
> Biodiversity and Climate Research Centre
> Senckenberganlage 25
> D-60325 Frankfurt am Main,
> Germany
> 
> Tel: +49 69 798 40226
> Mobile: +49 1515 888 5440
> WWW:   http://www.bik-f.de/root/index.php?page_id=219
> Blog: http://occamstypewriter.org/boboh/
> Journal of Negative Results - EEB: www.jnr-eeb.org
> 
>       [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd....@cbs.dk  Priv: pda...@gmail.com

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to