Re: [R] Updating a data frame based on if condition

Jeff Johnson Tue, 18 Feb 2014 14:05:28 -0800

Thanks David, that's a great improvement.


On Tue, Feb 18, 2014 at 12:36 PM, David Carlson <dcarl...@tamu.edu> wrote:

> What you have can work, but it will be hard to maintain and
> debug. Easier to follow is
>
> > cond1 <- mydata$FNAME_TOKEN_COUNT > 3
> > cond2 <- mydata$FNAME_LENGTH > 55
> > cond3 <- regexpr("9", mydata$FNAME_PATTERN) == 0
> >  mydata$FNAME_SUSPECT <- apply(cbind(cond1, cond2, cond3), 1,
> any)
> > mydata$FNAME_SUSPECT
>  [1] FALSE  TRUE FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE  TRUE
> FALSE FALSE
> [13]  TRUE  TRUE  TRUE FALSE  TRUE  TRUE FALSE FALSE FALSE  TRUE
> TRUE FALSE
> [25]  TRUE  TRUE FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE  TRUE
> TRUE  TRUE
> [37] FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE  TRUE FALSE FALSE
> TRUE  TRUE
> [49]  TRUE  TRUE
>
> And adding or changing a condition is pretty simple
>
> David C
>
> From: Jeff Johnson [mailto:mrjeffto...@gmail.com]
> Sent: Tuesday, February 18, 2014 12:54 PM
> To: dcarl...@tamu.edu
> Cc: R help
> Subject: Re: [R] Updating a data frame based on if condition
>
> Ahh, I was specifying the second argument FALSE incorrectly.
> Works now as:
>
> mydata$FNAME_SUSPECT <- ifelse(mydata$FNAME_TOKEN_COUNT > 3,
> TRUE,
>              ifelse(mydata$FNAME_LENGTH > 55, TRUE,
>                     ifelse(regexpr("9", mydata$FNAME_PATTERN) ==
> 0, TRUE, FALSE
>                            )
>                       )
>                     )
>
>
> On Tue, Feb 18, 2014 at 10:21 AM, Jeff Johnson
> <mrjeffto...@gmail.com> wrote:
> This is my first time with ifelse, but I've tried:
>
> mydata$FNAME_SUSPECT <- ifelse(mydata$FNAME_TOKEN_COUNT > 3,
> TRUE, FALSE,
>              ifelse(mydata$FNAME_LENGTH > 35, TRUE, FALSE,
>                     ifelse(regexpr("9", mydata$FNAME_PATTERN) >
> 0, TRUE, FALSE
>                            )
>                       )
>                     )
>
> Error in ifelse(mydata$FNAME_TOKEN_COUNT > 3, TRUE, FALSE,
> ifelse(mydata$FNAME_LENGTH >  :
>   unused argument (ifelse(mydata$FNAME_LENGTH > 35, TRUE, FALSE,
> ifelse(regexpr("9", mydata$FNAME_PATTERN) > 0, TRUE, FALSE)))
>
> I have the R for Dummies book which covers it a bit, but I just
> ordered the R Cookbook.
>
> On Tue, Feb 18, 2014 at 10:16 AM, David Carlson
> <dcarl...@tamu.edu> wrote:
> Not always true, but it is in this case:
>
> ?ifelse
>
> David C
>
> -----Original Message-----
> From: r-help-boun...@r-project.org
> [mailto:r-help-boun...@r-project.org] On Behalf Of Jeff Johnson
> Sent: Tuesday, February 18, 2014 11:24 AM
> To: R help
> Subject: [R] Updating a data frame based on if condition
>
> I have a subset of data that I have identified as "suspect" (for
> example,
> the first name has excessive spaces, is longer than 35
> characters or has a
> number).
>
> What I want to do is update the FNAME_SUSPECT field in "mydata"
> to TRUE if
> any of those conditions are met.
>
> Here's my data:
> > dput(mydata)
> structure(list(PERSON_FIRST_NAME = c("1298530", "JULIA, TAYLOR,
> CS AND
> JEFF",
> "88", "4465891170098562", "1124211", "LEWIS & MARY KAY", "KARL R
> O S",
> "5466181820076010", "JULI0 C", "WAYNE   T.", "1124211",
> "1124211",
> "ROBERT B & VIONA D", "DENNIS and MARY SUE", "BRIAN   JOANNE",
> "1124211", "RONALD and  GAIL", "Mike and Mary Lou", "31763006",
> "7", "11460735", "Paul and Mary Beth", "JIMMY and RUTH MARIE",
> "1124211", "WAYNE & LU ANN", "SCOTT & ANNA MARIE", "1124211",
> "1124211", "952714", "DAVID, RHONDA and NATALIE", "VIRGINIA
> S",
> "707069", "4397836190001917", "MARIA DE LA LUZ", "MARIA DE LA
> LUZ",
> "G & S COMPUTERIZED GRADING", "1124211", "1124211", "1124211",
> "1124211", "MARIA DE LA LUZ", "ED AND JANICE KISHI", "1124211",
> "Garrett A. and Jenny E.", "1124211", "1124211", "Hiram T. and
> A. Judith",
> "MA DE LA LUZ", "STEVE, Bev, and Caleb", "MR AND MRS EVER"),
>     FNAME_SUSPECT = c(FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
>     FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
> FALSE,
>     FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
> FALSE,
>     FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
> FALSE,
>     FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
> FALSE,
>     FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE),
>     FNAME_LENGTH = c(7L, 26L, 2L, 16L, 7L, 16L, 10L, 16L, 7L,
>     10L, 7L, 7L, 18L, 19L, 14L, 7L, 16L, 17L, 8L, 1L, 8L, 18L,
>     20L, 7L, 14L, 18L, 7L, 7L, 6L, 25L, 12L, 6L, 16L, 15L, 15L,
>     26L, 7L, 7L, 7L, 7L, 15L, 19L, 7L, 23L, 7L, 7L, 22L, 12L,
>     21L, 15L), FNAME_PATTERN = c("9999999",
> "AAAAA,_AAAAAA,_AA_AAA_AAAA",
>     "99", "9999999999999999", "9999999", "AAAAA_&_AAAA_AAA",
>     "AAAA_A_A_A", "9999999999999999", "AAAA9_A", "AAAAA___A.",
>     "9999999", "9999999", "AAAAAA_A_&_AAAAA_A",
> "AAAAAA_AAA_AAAA_AAA",
>     "AAAAA___AAAAAA", "9999999", "AAAAAA_AAA__AAAA",
> "AAAA_AAA_AAAA_AAA",
>     "99999999", "9", "99999999", "AAAA_AAA_AAAA_AAAA",
> "AAAAA_AAA_AAAA_AAAAA",
>     "9999999", "AAAAA_&_AA_AAA", "AAAAA_&_AAAA_AAAAA",
> "9999999",
>     "9999999", "999999", "AAAAA,_AAAAAA_AAA_AAAAAAA",
> "AAAAAAAA___A",
>     "999999", "9999999999999999", "AAAAA_AA_AA_AAA",
> "AAAAA_AA_AA_AAA",
>     "A_&_A_AAAAAAAAAAAA_AAAAAAA", "9999999", "9999999",
> "9999999",
>     "9999999", "AAAAA_AA_AA_AAA", "AA_AAA_AAAAAA_AAAAA",
> "9999999",
>     "AAAAAAA_A._AAA_AAAAA_A.", "9999999", "9999999",
> "AAAAA_A._AAA_A._AAAAAA",
>     "AA_AA_AA_AAA", "AAAAA,_AAA,_AAA_AAAAA", "AA_AAA_AAA_AAAA"
>     ), FNAME_TOKEN_COUNT = c(1L, 5L, 1L, 1L, 1L, 4L, 4L, 1L,
>     2L, 4L, 1L, 1L, 5L, 4L, 4L, 1L, 4L, 4L, 1L, 1L, 1L, 4L, 4L,
>     1L, 4L, 4L, 1L, 1L, 1L, 4L, 4L, 1L, 1L, 4L, 4L, 5L, 1L, 1L,
>     1L, 1L, 4L, 4L, 1L, 5L, 1L, 1L, 5L, 4L, 4L, 4L)), .Names =
> c("PERSON_FIRST_NAME",
> "FNAME_SUSPECT", "FNAME_LENGTH", "FNAME_PATTERN",
> "FNAME_TOKEN_COUNT"
> ), row.names = c(6717L, 11035L, 11626L, 14965L, 17874L, 24341L,
> 25582L, 25834L, 26851L, 30134L, 36385L, 45244L, 46947L, 61449L,
> 67564L, 71465L, 73782L, 75278L, 78977L, 79037L, 80577L, 81644L,
> 84427L, 86286L, 89963L, 91208L, 94054L, 99518L, 114658L,
> 128305L,
> 129082L, 137492L, 137573L, 138556L, 139489L, 148757L, 153956L,
> 155546L, 160533L, 162386L, 162681L, 165220L, 168063L, 173003L,
> 175322L, 179935L, 180991L, 181215L, 183787L, 184573L), class =
> "data.frame")
>
> Note I defaulted all of the FNAME_SUSPECT to FALSE. I plan to
> change that
> later.
>
> I've tried running this:
> if(mydata$FNAME_TOKEN_COUNT > 3 | mydata$FNAME_LENGTH > 35 |
> regexpr("9",
> mydata$FNAME_PATTERN) > 0)
>         mydata$FNAME_SUSPECT <- TRUE
>
> however I get the error:
> Warning message:
> In if (mydata$FNAME_TOKEN_COUNT > 3 | mydata$FNAME_LENGTH > 35 |
> :
>   the condition has length > 1 and only the first element will
> be used
>
> Would I be better doing this in a for loop? I had once heard
> that if you're
> doing a for loop in R, you're doing something wrong.
> --
> Jeff
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible
> code.
>
>
>
>
> --
> Jeff
>
>
>
>
> --
> Jeff
>
>


-- 
Jeff

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Updating a data frame based on if condition

Reply via email to