On Apr 26, 2010, at 3:39 PM, Gosse, Michelle wrote:
Greetings all.
I'm starting analysis in R on a reasonably sized pre-existing
dataset, of 583 variables and 1127 observations. This was an SPSS
datafile, which I read in using the read.spss command using the
foreign package, and the data was assigned to a data.frame when it
was read in. The defaults in read.spss were used, except I set
to.data.frame = TRUE.
The data is a survey dataset (each observation/case = one
participant), and many of the variables are participants' responses
to Likert scale items. These have been coded on a 1 to 7 scale, with
"8" used to code "Don't know" responses. The assumption is that the
1-7 responses are at least interval level, however the response "8"
is clearly not. For many analyses, this doesn't matter because I'm
only doing chi-square tests. However, for a between-group comparison
crosstab I would like to exclude those who gave "8" responses
because I am only interesting in testing differences for the
participants who gave responses measured on the Likert scale proper.
I have encountered problems when I need to exclude the observations
from analysis, where they gave an "8" response to either of two
questions (Question 1A and Question 1B), which relate to columns 72
and 73 of the dataframe. The chi-square I am trying to do is based
on two other variables (mean of Q1A+Q1B for each participant) and a
grouping variable, which are contained in columns 8 and 80 of the
dataframe, respectively. The reason I am excluding anyone who gave
an "8" ("Don't know) response on questions 1A and 1B is that their
mean on these two questions cannot be interpreted as the value "8"
is nominal rather than interval/ratio and therefore cannot be used
in a mathematical expression.
I've been trying to use an if-or combination, and I can't get it to
work.
Did you read the help page for if?
?"if"
The chi-square test without the attempt to subset using "if" is
working fine, I don't understand what I am doing wrong in my
attempts to subset.
I have tried to reference the variables like this:
if ("Q1A"!=8 | "Q1B"!=8)
+ (table(micronutrients[,8,80]))
<group counts snipped>
chisq.test(table(micronutrients[,8,80]))
The group counts returned from the table statement show me that no
observations are being excluded from the analysis. The chisq.test
works fine on (table(micronutrients[,8,80])) but, of course, it is
being performed on the entire dataset as I have been unsuccessful in
subsetting the data.
I tried to see if the column names were objects and I got these
errors:
object("Q1A")
Error: could not find function "object"
Q1A
Error: object 'Q1A' not found
I'm not sure if this is important.
So I tried to do the if-or using the column number, but that didn't
work either:
if (micronutrients[,72]!=8 | micronutrients[,73]!=8)
Leave behind your SPSS syntactical constructions. SPSS and the SAS
data steps have implicit loops that operate sequantially along rows of
datasets. R does not work that way. A corresponding operation in R
might be:
apply(dataframe1, 1, <function that works on a row of data>)
"if" is a program control mechanism that does not operate on
vectors. If it gets a vector if evaluates the first element and
ignores the rest. (You should have gotten a warning and you should
have posted the warning.)
There is also the ifelse function that works with and returns vectors.
+ ()
<group counts snipped>
Warning message:
In if () (table(micronutrients[, :
the condition has length > 1 and only the first element will be used
I got exactly the same chi-square output as in my previous attempt.
If any of you know SPSS, what I am trying to do in R is equivalent
to: temporary. select if not (Q1A=8 or Q1B=8). In SAS, it would be
the same as a subsetting if that lasted only for the particular
analysis, or a where, e.g. proc tabulate; where Q1A ne 8 or Q1B ne 8;
How can I subset the data? I would prefer not to create another
variable to hold the recodes as the dataset is already complex.
?subset
?with
with(subset(dfrm, micronutrients[, 72] != 8 | micronutrients[, 73] !=
8), table(...) )
Not sure what you intended with ... table(micronutrients[,8,80]) ... ,
but generally one does not first reference an object with two
dimensions and then do so with three. It is considered good form
around these parts to offer at the very least str() on an object about
which you are hoping to get specific advice. We cannot read your mind.
I only wish the subsetting condition to hold for the test
immediately following the instruction to subset (I need to subset
the data in different ways for different question combinations).
Because the instruction is complete once the table() command is
issued, I am assuming that the if statement only relates to the
table() command and therefore only indirectly to the chisq.test()
command following (as this is being performed on the subsetted
table) - which is exactly what I want.
Do not see any code to which this refers. You can assign the result of
the with(subset( ...), ... ) operation to an object on which you can
do statistical tests, but table does not automatically do chi-square
test. Maybe you should look at summary.table() or xtabs()
Cheers
Michelle
**********************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.
This footnote also confirms that this email message has been swept by
MIMEsweeper for the presence of computer viruses.
www.clearswift.com
**********************************************************************
[[alternative HTML version deleted]]
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.