And if your data is in a dataframe (... please include an example of
the results of str() next time...) :
> dfrm <- rd.txt("Column1, Column2, Column3
+ Yes,Yes,Yes
+ Yes,No,Yes
+ No,No,No
+ No,Yes,No
+ Yes,Yes,No", sep=",") #rd.txt is just a wrapper I use for
read.table(textConnection( ), header=TRUE, ... )
> dfrm$newvar <- apply(subset(dfrm, select=c(Column1, Column2,
Column3)), 1,
+ function(x) { if (all(x=="Yes")) {"Yes"}
else {"No"} } )
> dfrm
Column1 Column2 Column3 newvar
1 Yes Yes Yes Yes
2 Yes No Yes No
3 No No No No
4 No Yes No No
5 Yes Yes No No
Notice that I created this variable in a manner that did not require
the use of every column of the dataframe.
--
David
On Feb 26, 2010, at 7:57 PM, Don MacQueen wrote:
If your data is in a matrix named "orgdata" :
newvar <- apply(orgdata , 1, function(arow, if (all(arow=='Yes'))
'Yes' else 'No'
Yes, at least 2 missing parens and an unneeded comma, perhaps:
newvar <- apply(orgdata , 1, function(arow) if (all(arow=='Yes'))
'Yes' else 'No' )
newdata <- cbind(orgdata, newvar)
finaloutcome <- newdata[ newvar=='Yes',]
The key to this is the apply() function.
I might have missed some parentheses...
There are other ways; this is just one. I might think of a simpler
one if I gave it more time...
-Don
At 4:40 PM -0800 2/26/10, wookie1976 wrote:
I am new to R, but have been using SAS for years. In this
transition period,
I am finding myself pulling my hair out to do some of the simplest
things.
An example of this is that I need to generate a new variable based
on the
outcome of several existing variables in a data row. In other
words, if the
variable in all three existing columns are "Yes", then then the new
variable
should also be "Yes", however if any one of the three existing
variables is
a "No", then then new variable should be a "No". I would then use
that new
variable as an exclusion for data in a new or existing dataset
(i.e., if
NewVariable = "No" then delete):
Take this:
Column1, Column2, Column3
Yes, Yes, Yes
Yes, No, Yes
No, No, No
No, Yes, No
Yes, Yes, No
Generate this:
Column1, Column2, Column3, NewVariable1
Yes, Yes, Yes, Yes
Yes, No, Yes, No
No, No, No, No
No, Yes, No, No
Yes, Yes, No, No
And end up with this:
Column1, Column2, Column3, NewVariable1
Yes, Yes, Yes, Yes
Any suggestions on how to efficiently do this in either the
existing or a
new dataset?
You might have simplified this a bit if you let the columns be logical
rather than character.
> dfrm$newvar <- apply(subset(dfrm, select=c(Column1, Column2,
Column3)), 1,
+ function(x) { (all(x=="Yes")) } )
> dfrm
Column1 Column2 Column3 newvar
1 Yes Yes Yes TRUE
2 Yes No Yes FALSE
3 No No No FALSE
4 No Yes No FALSE
5 Yes Yes No FALSE
You would then be able to apply more simple tests with operators and
functions that accept the logical data type.
--
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.