Hi, I tried with bigger dataset. set.seed(25) names <- sample(c("bob", "joe", "cr...@gmail.com", "emily", "j...@yahoo.com"),5e6,replace=TRUE) set.seed(1651) emails <- sample(c("b...@cup.com", "joesm...@gmail.com", "cr...@gmail.com", "emi...@yahoo.com", "j...@yahoo.com"),5e6,replace=TRUE)
df <- data.frame(names, emails) dim(df) #[1] 5000000 2 df[]<-lapply(df,as.character) system.time(df[,1][grep("@",df$names)]<- "" ) # user system elapsed # 1.732 0.108 1.844 system.time(dfNew1<-df[grep("\\w+",df$names),]) # user system elapsed # 0.896 0.024 0.923 system.time(dfNew2<- df[df$names!="",]) # user system elapsed # 0.460 0.028 0.490 A.K. ________________________________ From: Yasha Podeswa <ypode...@gmail.com> To: arun <smartpink...@yahoo.com> Cc: R help <r-help@r-project.org>; Uwe Ligges <lig...@statistik.tu-dortmund.de> Sent: Sunday, January 27, 2013 2:05 PM Subject: Re: [R] Removing values containing a specific character You two were 100% right, it was just a memory issue. This was part of a bigger project where I had a number of data frames loaded, all with 1-5 million rows. Cleaned up my code to have less data frames loaded at once, and everything is working great. Thanks for the help! On Jan 27, 2013 9:46 AM, "arun" <smartpink...@yahoo.com> wrote: Hi Yasha, > > I guess you got Uwe's response. > > I created `df2` with the intention of getting the two results from the >original dataset. >For example, after you get the first result >df[,1][grep("@",df$names)]<- "" >#you can get the second result by: >df[df$names!="",] > # names emails >#1 bob b...@cup.com >#2 joe joesm...@gmail.com >#4 emily emi...@yahoo.com > >#or >df[grep("\\w+",df$names),] ># names emails >#1 bob b...@cup.com >#2 joe joesm...@gmail.com >#4 emily emi...@yahoo.com > >But, I am not sure how this will work over a 5.5 million rows. >A.K. > > > > >----- Original Message ----- >From: ypodeswa <ypode...@gmail.com> >To: r-help@r-project.org >Cc: >Sent: Sunday, January 27, 2013 1:11 AM >Subject: Re: [R] Removing values containing a specific character > >Actually, it worked perfectly for my sample data, but my actual data has >5.5 million rows, and grep doesn't seem to work with over a million rows. >Any idea on a workaround? > > >On Sat, Jan 26, 2013 at 9:37 PM, Yasha Podeswa <ypode...@gmail.com> wrote: > >> Awesome, thanks Arun, that's exactly what I was looking for! >> >> >> On Sat, Jan 26, 2013 at 9:21 PM, arun kirshna [via R] < >> ml-node+s789695n4656749...@n4.nabble.com> wrote: >> >>> Hi, >>> Try this: >>> df[]<-lapply(df,as.character) >>> df2<-df >>> df[,1][grep("@",df$names)]<- "" >>> df >>> #names emails >>> #1 bob b...@cup.com >>> #2 joe joesm...@gmail.com >>> #3 cr...@gmail.com >>> #4 emily emi...@yahoo.com >>> #5 j...@yahoo.com >>> >>> #2nd part: >>> >>> df2[-grep("@",df2$names),] >>> names emails >>> #1 bob b...@cup.com >>> #2 joe joesm...@gmail.com >>> #4 emily emi...@yahoo.com >>> A.K. >>> >>> ------------------------------ >>> If you reply to this email, your message will be added to the >>> discussion below: >>> >>> http://r.789695.n4.nabble.com/Removing-values-containing-a-specific-character-tp4656744p4656749.html >>> To unsubscribe from Removing values containing a specific character, click >>> here<http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4656744&code=eXBvZGVzd2FAZ21haWwuY29tfDQ2NTY3NDR8LTEyMTY0MzM4NDk=> >>> . >>> NAML<http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> >>> >> >> > > > > >-- >View this message in context: >http://r.789695.n4.nabble.com/Removing-values-containing-a-specific-character-tp4656744p4656751.html >Sent from the R help mailing list archive at Nabble.com. > [[alternative HTML version deleted]] > >______________________________________________ >R-help@r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.