See the help page. We haven't been told but it looks like the Debian
system is in a UTF-8 locale: reencode=FALSE is likely appropriate there.
However, the posting guide does ask for the output of sessionInof() for a
good reason.
Yes, it looks like 65001 is UTF-8, but we don't know for certain. I am
planning on assuming so for the next release of foreign, which will follow
R 2.8.1 early in the next year.
I think the title is rather off: this is more what read.spss does about
undocumented features of SPSS formats (and record type 7, subtype 20 is
another such feature).
On Mon, 15 Dec 2008, Peter Dalgaard wrote:
Jeroen Ooms wrote:
SPSS seems to have changed its default datafile format, resulting in issues
for read.spss(). In Windows this results in a warning, in Debian the import
completely fails:
Debian (R version 2.8.0 (2008-10-20) i486-pc-linux-gnu, foreign_0.8-29)
read.spss("/home/jeroen/samples/Tomato.sav")
Error in iconv(names(rval), cp, "") :
unsupported conversion from 'CP65001' to ''
In addition: Warning messages:
1: In read.spss("/home/jeroen/samples/Tomato.sav") :
/home/jeroen/samples/Tomato.sav: File-indicated character representation
code (65001) looks like a Windows codepage
2: In read.spss("/home/jeroen/samples/Tomato.sav") :
/home/jeroen/samples/Tomato.sav: Unrecognized record type 7, subtype 20
encountered in system file
windows (R version 2.8.0 (2008-10-20), foreign_0.8-29)
read.spss("C:/Program
Files/SPSSInc/Statistics17/Samples/English/Tomato.sav")
...
attr(,"codepage")
[1] 65001
Warning messages:
1: In read.spss("C:/Program
Files/SPSSInc/Statistics17/Samples/English/Tomato.sav") :
C:/Program Files/SPSSInc/Statistics17/Samples/English/Tomato.sav:
File-indicated character representation code (65001) looks like a Windows
codepage
2: In read.spss("C:/Program
Files/SPSSInc/Statistics17/Samples/English/Tomato.sav") :
C:/Program Files/SPSSInc/Statistics17/Samples/English/Tomato.sav:
Unrecognized record type 7, subtype 20 encountered in system file
I've share some sample datafiles that are included with SPSS, so you can
take a look: http://jeroen.xlshosting.net/samples/
I hope there is a fix, I think importing data from SPSS is a very popular
feature.
We do prefer people to export from SPSS in a documented format.
Thank you!
Thanks,
It looks like adding reencode="utf8" removes the iconv message. The warnings
appear to be harmless.
In fact, reencode="ascii" works for me as well on the Tomato.sav file.
However as far as I can google, Code Page 65001 _is_ UTF-8...
--
O__ ---- Peter Dalgaard Ă˜ster Farimagsgade 5, Entr.B
c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalga...@biostat.ku.dk) FAX: (+45) 35327907
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
--
Brian D. Ripley, rip...@stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel