Re: [Rd] read.spss issues

David Winsemius Wed, 15 Feb 2012 14:05:56 -0800


On Feb 15, 2012, at 3:28 PM, Thomas Lumley wrote:

On Wed, Feb 15, 2012 at 7:05 PM, Jeroen Ooms <[email protected]> wrote:

The second problem is that the spss dataformat allows to specify

'duplicate labels', whereas this is not allowed for factors.read.spss

does not deal with this and creates a bad factor

x <- read.spss("http://www.stat.ucla.edu/~jeroen/spss/duplicate_labels.sav",

use.value.labels=T);
levels(x$opinion);

which causes issues downstream. I am not sure if this is an issue in
read.spss() or as.factor(), but I guess it might be wise to try to
detect duplicate levels and assign them all with one and the same
integer value when converting to a factor.


I think this one would be better dealt with by giving an error.

SPSS value labels are just labels, so they don't map very well onto R
factors, which are enumerated types.  Rather than force them and lose
data, I would prefer to make the user decide what to do.

I could imagine that users might appreciate the possibility of gettingthe data from read.spss one pass, but also getting the labels from aseparate function that made a best guess at what was needed but didnot try to unambiguously match up variables with factor levels for allvariables. For big datasets, there might be only a few edits needed tothrow out duplicates and save a lot of typing errors.


--

David Winsemius, MD
West Hartford, CT

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] read.spss issues

Reply via email to