Thank you very much for very valuable comments. They are very informative.
Bests, Niklas 2012/9/16 Ted Harding <ted.hard...@wlandres.net> > [See at end] > On 15-Sep-2012 20:36:49 Niklas Fischer wrote: > > Dear R users, > > > > I have a reproducible data and try to create new variable "clo" is 1 if > > know variable is equal to "very well" or "fairly well" and getalong is 4 > or > > 5 > > otherwise it is 0. > > >[A] > rep_data<- read.table(header=TRUE, text=" > id1 id2 know getalong > 100000016_a1 100000016_a2 very well 4 > 100000035_a1 100000035_a2 fairly well NA > 100000036_a1 100000036_a2 very well 3 > 100000039_a1 100000039_a2 very well 5 > 100000067_a1 100000067_a2 very well 5 > 100000076_a1 100000076_a2 fairly well 5 > ") > > rep_data$clo<- ifelse((rep_data$know==c("fairly well","very well") & > rep_data$getalong==c(4,5)),1,0) > > > For sure, something must be wrong, I couldn't find it out. > > rep_data > id1 id2 know getalong clo > 100000016_a1 100000016_a2 very well 4 0 > 100000035_a1 100000035_a2 fairly well NA 0 > 100000036_a1 100000036_a2 very well 3 0 > 100000039_a1 100000039_a2 very well 5 0 > 100000067_a1 100000067_a2 very well 5 0 > 100000076_a1 100000076_a2 fairly well 5 0 > > > Any help is appreciated.. > > Bests, > > Niklas > > There are several things wrong with the way you are trying to do it, > and indeed it is a bit complicated! > > First: if the above table (at >[A] above) is the format in which > you input the data, then you should either comma-separate your > data fields (and use sep="," in read.table(), or else just use > read.csv()), or else enclose the two-word fields within "...", > i.e. EITHER: > >[B] > id1, id2, know, getalong > 100000016_a1, 100000016_a2, very well, 4 > 100000035_a1, 100000035_a2, fairly well, NA > 100000036_a1, 100000036_a2, very well, 3 > 100000039_a1, 100000039_a2, very well, 5 > 100000067_a1, 100000067_a2, very well, 5 > 100000076_a1, 100000076_a2, fairly well, 5 > > OR: > >[C] > id1 id2 know getalong > 100000016_a1 100000016_a2 "very well" 4 > 100000035_a1 100000035_a2 "fairly well" NA > 100000036_a1 100000036_a2 "very well" 3 > 100000039_a1 100000039_a2 "very well" 5 > 100000067_a1 100000067_a2 "very well" 5 > 100000076_a1 100000076_a2 "fairly well" 5 > > Otherwise, in your original format, read.table() will read in > FIVE fields, since it will treat "very" and "well" as separate, > and will treat "fairly" and "well" as separate. Furthermore, > it will match the header "getalong" with the 5th field (4,NA,etc), > the header "know" with the 4th field ("well","well",...,"well"), > header "id2" with the 3rd field ("very","fairly","very",...,"fairly"), > and header "id1" with the 2nd field ("100000016_a2"). > > And even further more, the first field will become the row-names > of the dataframe and will no longer be data! > > Second: Use of "==" to compare $know with "very well" and > "fairly well" will not work as you expect. In your comparison > > rep_data$know==c("fairly well","very well") > > you will get the result: > > # [1] FALSE FALSE FALSE TRUE FALSE FALSE > > rather then your expected > > # [1] TRUE TRUE TRUE TRUE TRUE TRUE. > > This is because "==" will compare $know with ONE ELEMENT of > c("fairly well","very well"), and will recycle these elements, > so it will compare $know successively with > > "fairly well","very well" "fairly well","very well" "fairly well","very > well" > > and since $know is > > "very well","fairly well","very well","very well","very well","fairly well" > > the only match is in the 4th instance, which is why you get > > # [1] FALSE FALSE FALSE TRUE FALSE FALSE > > A better comparison is to use the "%in" operator, as in: > > rep_data$know %in% c("fairly well","very well") > # [1] TRUE TRUE TRUE TRUE TRUE TRUE > > so you can in the end do: > > rep_data$clo<- > ifelse((rep_data$know %in% c("fairly well","very well")) & > (rep_data$getalong %in% c(4,5)),1,0) > > which results in: > > rep_data > # id1 id2 know getalong clo > # 1 100000016_a1 100000016_a2 very well 4 1 > # 2 100000035_a1 100000035_a2 fairly well NA 0 > # 3 100000036_a1 100000036_a2 very well 3 0 > # 4 100000039_a1 100000039_a2 very well 5 1 > # 5 100000067_a1 100000067_a2 very well 5 1 > # 6 100000076_a1 100000076_a2 fairly well 5 1 > > Finally, I suppose it is a happy coincidence that > > NA %in% c(4,5) > > yields FALSE rather than what R might have been written to yield, > i.e. NA -- since NA is basically a synonym for "something that we > do not know the value of", strictly speaking we do not know the > value of NA %in% c(4,5). It is possible that the "something that > we do not know the value of" could be either 4 or 5, in which case > NA %in% c(4,5) would be TRUE; but it is also possible that the > "something that we do not know the value of" could be neither > 4 nor 5, in which case NA %in% c(4,5) would be FALSE; but since > we do not know which of these possibilities is the case, we do > not know whether it should be TRUE or FALSE, so one can argue > that the result should itself be NA. But, as it happens, > > 3 %in% c(4,5) > # [1] FALSE > 4 %in% c(4,5) > # [1] TRUE > 5 %in% c(4,5) > # [1] TRUE > NA %in% c(3,4) > # [1] FALSE > > so all is well! > > Hoping this helps, > Ted. > > ------------------------------------------------- > E-Mail: (Ted Harding) <ted.hard...@wlandres.net> > Date: 15-Sep-2012 Time: 23:02:14 > This message was sent by XFMail > ------------------------------------------------- > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.