Yes, wonderful! This seems to work beautifully. Thank you so much!
________________________________ From: Rui Barradas <ruipbarra...@sapo.pt> Sent: Friday, June 11, 2021 2:03 PM To: Debbie Hahs-Vaughn <deb...@ucf.edu>; r-help@R-project.org <r-help@R-project.org> Subject: Re: [R] Identifying words from a list and code as 0 or 1 and words NOT on the list code as 1 Hello, For what I understood of the problem, this might be what you want. library(dplyr) library(stringr) coreWordsPat <- paste0("\\b", coreWords, "\\b") coreWordsPat <- paste(coreWordsPat, collapse = "|") left_join( df %>% mutate(Core = +str_detect(Utterance, coreWordsPat)) %>% select(ID, Utterance, Core), df %>% mutate(Fringe = str_remove_all(Utterance, coreWordsPat), Fringe = +(nchar(trimws(Fringe)) > 0)) %>% select(ID, Fringe), by = "ID" ) Hope this helps, Rui Barradas Às 18:02 de 11/06/21, Debbie Hahs-Vaughn escreveu: > I am working with utterances, statements spoken by children. From each > utterance, if one or more words in the statement match a predefined list of > multiple 'core' words (probably 300 words), then I want to input '1' into > 'Core' (and if none, then input '0' into 'Core'). > > If there are one or more words in the statement that are NOT core words, then > I want to input '1' into 'Fringe' (and if there are only core words and > nothing extra, then input '0' into 'Fringe'). I will not have a list of > Fringe words. > > Basically, right now I have a child ID and only the utterances. Here is a > snippet of my data. > > ID Utterance > 1 a baby > 2 small > 3 yes > 4 where's his bed > 5 there's his bed > 6 where's his pillow > 7 what is that on his head > 8 hey he has his arm stuck here > 9 there there's it > 10 now you're gonna go night-night > 11 and that's the thing you can turn on > 12 yeah where's the music box > 13 what is this > 14 small > 15 there you go baby > > > The following code runs but isn't doing exactly what I need--which is: 1) > the ability to detect words from the list and define as core; 2) the ability > to search the utterance and if there are any words in the utterance that are > NOT core, to identify those as �1� as I will not have a list of fringe words. > > ``` > > library(dplyr) > library(stringr) > library(tidyr) > > coreWords <-c("I", "no", "yes", "my", "the", "want", "is", "it", "that", "a", > "go", "mine", "you", "what", "on", "in", "here", "more", "out", "off", > "some", "help", "all done", "finished") > > str_detect(df,) > > dfplus <- df %>% > mutate(id = row_number()) %>% > separate_rows(Utterance, sep = ' ') %>% > mutate(Core = + str_detect(Utterance, str_c(coreWords, collapse = '|')), > Fringe = + !Core) %>% > group_by(id) %>% > mutate(Core = + (sum(Core) > 0), > Fringe = + (sum(Fringe) > 0)) %>% > slice(1) %>% > select(-Utterance) %>% > left_join(df) %>% > ungroup() %>% > select(Utterance, Core, Fringe, ID) > > ``` > > The dput() code is: > > structure(list(Utterance = c("a baby", "small", "yes", "where's his bed", > "there's his bed", "where's his pillow", "what is that on his head", > "hey he has his arm stuck here", "there there's it", "now you're gonna go > night-night", > "and that's the thing you can turn on", "yeah where's the music box", > "what is this", "small", "there you go baby ", "what is this for ", > "a ", "and the go goodnight here ", "and what is this ", " what's that sound > ", > "what does she say ", "what she say", "should I turn the on so Laura doesn't > cry ", > "what is this ", "what is that ", "where's clothes ", " where's the baby's > bedroom ", > "that might be in dad's bed+room ", "yes ", "there you go baby ", > "you're welcome "), Core = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, > 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, > 1L, 1L, 1L, 1L, 1L, 1L, 1L), Fringe = c(0L, 0L, 0L, 1L, 1L, 1L, > 0L, 1L, 0L, 1L, 1L, 1L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, > 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), ID = 1:31), row.names = c(NA, > -31L), class = c("tbl_df", "tbl", "data.frame")) > > ``` > > The first 10 rows of output looks like this: > > Utterance Core Fringe ID > 1 a baby 1 0 1 > 2 small 1 0 2 > 3 yes 1 0 3 > 4 where's his bed 1 1 4 > 5 there's his bed 1 1 5 > 6 where's his pillow 1 1 6 > 7 what is that on his head 1 0 7 > 8 hey he has his arm stuck here 1 1 8 > 9 there there's it 1 0 9 > 10 now you're gonna go night-night 1 1 10 > > For example, in line 1 of the output, �a� is a core word so �1� for core is > correct. However, �baby� should be picked up as fringe so there should be > �1�, not �0�, for fringe. Lines 7 and 9 also have words that should be > identified as fringe but are not. > > Additionally, it seems like if the utterance has parts of a core word in it, > it�s being counted. For example, �small� is identified as a core word even > though it's not (but 'all done' is a core word). 'Where's his bed' is > identified as core and fringe, although none of the words are core. > > Any suggestions on what is happening and how to correct it are greatly > appreciated. > > [[alternative HTML version deleted]] > > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=04%7C01%7Cdebbie%40ucf.edu%7Cfffbba02d03c4ec3314908d92d034863%7Cbb932f15ef3842ba91fcf3c59d5dd1f1%7C0%7C1%7C637590314387984018%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=HqicUtmxaVxSROWUgRSZMjxcyCBqBCq3OzKe1Iha4Jo%3D&reserved=0 > PLEASE do read the posting guide > https://nam02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r-project.org%2Fposting-guide.html&data=04%7C01%7Cdebbie%40ucf.edu%7Cfffbba02d03c4ec3314908d92d034863%7Cbb932f15ef3842ba91fcf3c59d5dd1f1%7C0%7C1%7C637590314387984018%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=4MthQRICH68CnIgYsX08AAcrhyLKHloibl23VmIaCjY%3D&reserved=0 > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.