Thank you very much for very valuable comments.

They are very informative.

Bests,
Niklas

2012/9/16 Ted Harding <ted.hard...@wlandres.net>

> [See at end]
> On 15-Sep-2012 20:36:49 Niklas Fischer wrote:
> > Dear R users,
> >
> > I have a reproducible data and try to create new variable "clo" is 1  if
> > know variable is equal to "very well" or "fairly well" and getalong is 4
> or
> > 5
> > otherwise it is 0.
>
> >[A]
> rep_data<- read.table(header=TRUE, text="
>            id1        id2        know getalong
>    100000016_a1 100000016_a2   very well        4
>    100000035_a1 100000035_a2 fairly well       NA
>    100000036_a1 100000036_a2   very well        3
>    100000039_a1 100000039_a2   very well        5
>    100000067_a1 100000067_a2   very well        5
>    100000076_a1 100000076_a2 fairly well        5
> ")
>
> rep_data$clo<- ifelse((rep_data$know==c("fairly well","very well") &
> rep_data$getalong==c(4,5)),1,0)
>
> > For sure, something must be wrong, I couldn't find it out.
>
> rep_data
>                       id1    id2 know getalong clo
> 100000016_a1 100000016_a2   very well        4   0
> 100000035_a1 100000035_a2 fairly well       NA   0
> 100000036_a1 100000036_a2   very well        3   0
> 100000039_a1 100000039_a2   very well        5   0
> 100000067_a1 100000067_a2   very well        5   0
> 100000076_a1 100000076_a2 fairly well        5   0
>
> > Any help is appreciated..
> > Bests,
> > Niklas
>
> There are several things wrong with the way you are trying to do it,
> and indeed it is a bit complicated!
>
> First: if the above table (at >[A] above) is the format in which
> you input the data, then you should either comma-separate your
> data fields (and use sep="," in read.table(), or else just use
> read.csv()), or else enclose the two-word fields within "...",
> i.e. EITHER:
> >[B]
>            id1,       id2,       know,   getalong
>    100000016_a1, 100000016_a2,   very well,        4
>    100000035_a1, 100000035_a2, fairly well,       NA
>    100000036_a1, 100000036_a2,   very well,        3
>    100000039_a1, 100000039_a2,   very well,        5
>    100000067_a1, 100000067_a2,   very well,        5
>    100000076_a1, 100000076_a2, fairly well,        5
>
> OR:
> >[C]
>            id1        id2        know getalong
>    100000016_a1 100000016_a2   "very well"        4
>    100000035_a1 100000035_a2 "fairly well"       NA
>    100000036_a1 100000036_a2   "very well"        3
>    100000039_a1 100000039_a2   "very well"        5
>    100000067_a1 100000067_a2   "very well"        5
>    100000076_a1 100000076_a2 "fairly well"        5
>
> Otherwise, in your original format, read.table() will read in
> FIVE fields, since it will treat "very" and "well" as separate,
> and will treat "fairly" and "well" as separate. Furthermore,
> it will match the header "getalong" with the 5th field (4,NA,etc),
> the header "know" with the 4th field ("well","well",...,"well"),
> header "id2" with the 3rd field ("very","fairly","very",...,"fairly"),
> and header "id1" with the 2nd field ("100000016_a2").
>
> And even further more, the first field will become the row-names
> of the dataframe and will no longer be data!
>
> Second: Use of "==" to compare $know with "very well" and
> "fairly well" will not work as you expect. In your comparison
>
>   rep_data$know==c("fairly well","very well")
>
> you will get the result:
>
>   # [1] FALSE FALSE FALSE  TRUE FALSE FALSE
>
> rather then your expected
>
>   # [1] TRUE TRUE TRUE TRUE TRUE TRUE.
>
> This is because "==" will compare $know with ONE ELEMENT of
> c("fairly well","very well"), and will recycle these elements,
> so it will compare $know successively with
>
> "fairly well","very well" "fairly well","very well" "fairly well","very
> well"
>
> and since $know is
>
> "very well","fairly well","very well","very well","very well","fairly well"
>
> the only match is in the 4th instance, which is why you get
>
>   # [1] FALSE FALSE FALSE  TRUE FALSE FALSE
>
> A better comparison is to use the "%in" operator, as in:
>
>   rep_data$know %in% c("fairly well","very well")
>   # [1] TRUE TRUE TRUE TRUE TRUE TRUE
>
> so you can in the end do:
>
>   rep_data$clo<-
>     ifelse((rep_data$know %in% c("fairly well","very well")) &
>            (rep_data$getalong %in% c(4,5)),1,0)
>
> which results in:
>
>   rep_data
>   #            id1          id2        know getalong clo
>   # 1 100000016_a1 100000016_a2   very well        4   1
>   # 2 100000035_a1 100000035_a2 fairly well       NA   0
>   # 3 100000036_a1 100000036_a2   very well        3   0
>   # 4 100000039_a1 100000039_a2   very well        5   1
>   # 5 100000067_a1 100000067_a2   very well        5   1
>   # 6 100000076_a1 100000076_a2 fairly well        5   1
>
> Finally, I suppose it is a happy coincidence that
>
>   NA %in% c(4,5)
>
> yields FALSE rather than what R might have been written to yield,
> i.e. NA -- since NA is basically a synonym for "something that we
> do not know the value of", strictly speaking we do not know the
> value of NA %in% c(4,5). It is possible that the "something that
> we do not know the value of" could be either 4 or 5, in which case
> NA %in% c(4,5) would be TRUE; but it is also possible that the
> "something that we do not know the value of" could be neither
> 4 nor 5, in which case NA %in% c(4,5) would be FALSE; but since
> we do not know which of these possibilities is the case, we do
> not know whether it should be TRUE or FALSE, so one can argue
> that the result should itself be NA. But, as it happens,
>
>   3 %in% c(4,5)
>   # [1] FALSE
>   4 %in% c(4,5)
>   # [1] TRUE
>   5 %in% c(4,5)
>   # [1] TRUE
>   NA %in% c(3,4)
>   # [1] FALSE
>
> so all is well!
>
> Hoping this helps,
> Ted.
>
> -------------------------------------------------
> E-Mail: (Ted Harding) <ted.hard...@wlandres.net>
> Date: 15-Sep-2012  Time: 23:02:14
> This message was sent by XFMail
> -------------------------------------------------
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to