Re: [R] How to create a new variable based on parts of another character variable: A generalization

Bert Gunter Mon, 24 Oct 2011 09:06:50 -0700

... Well, this works in this simple case, but is too clumsy for a general
formulation of this problem:  given a "dictionary" consisting of two
character vectors of unique "names" (or two columns in a data frame), x and
y,  how does one convert a factor z with levels in x into the corresponding
equivalent with levels in y?


There are likely a zillion ways to do this with various packages and
functions, but the simplest and most straightforward must surely be:
factor(y[z])

Example:
> x <- LETTERS[1:4]
> y <- LETTERS[5:8]
> z <- factor(sample(x,15, rep=TRUE))
> z
 [1] B D A C B A B D A D D A A D B
Levels: A B C D
> factor(y[z])
 [1] F H E G F E F H E H H E E H F
Levels: E F G H

This is a nice example of the utility of the factor data structure, which
tends to get dissed a lot, because it can badly burn you if you're not
careful with it.

A fuller discussion of these issues can be found by searching on"associative
arrays"  or "hashes", of which factors are an elementary example.

-- Bert


On Mon, Oct 24, 2011 at 6:00 AM, Petr PIKAL <petr.pi...@precheza.cz> wrote:

> Hi
>
> If you want to get rid of regular expressions at all and your A values
> start AWI for Arctic and UFT for boreal you can
>
> DF$D <- ifelse(substr(DF$A, 1,1) == "A", "Arctic", "Boreal")
>
> Regards
> Petr
>
>
>
> >
> > Hello,
> > I am just starting with R and I am having a (most probably) stupid
> problem
> > by creating a new variable in a data.frame based on a part of another
> > character variable.
> >
> > I have a data frame like this one:
> >
> >
> > A         B       C
> > AWI-test1   1      i
> > AWI-test5   2      r
> > AWI-tes75   56      z
> > UFT-2      5      I
> > UFT56      f      t
> > UFT356      9j      t
> > etc. etc.      89      t
> >
> >
> > I now want to look in the variable A if the string AWI is present and
> then
> > create a variable D and putting "Arctic" inside. However, if the string
> > UFT occurs in the variable A, then the variable D shall be "Boreal" etc.
> etc.
> >
> > The resulting data.frame file should look like
> > A         B       C   D
> > AWI-test1   1      i   Arctic
> > AWI-test5   2      r   Arctic
> > AWI-tes75   56      z   Arctic
> > UFT-2      5      I   Boreal
> > UFT56      f      t   Boreal
> > UFT356      9j      t   Boreal
> > etc. etc.      89      t
> >
> >
> > I know how to do this when I want to look for the entire string of A
> means
> > when there is "AWI-test1" and then create the variable D with "Arctic"
> but
> > not how to look only for a substring in A?
> > Would be great if somebody might help.
> > Thanks
> > Philipp
> >
> >
> >
> > ***************************************************
> >
> >
> >    [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to create a new variable based on parts of another character variable: A generalization

Reply via email to