I've compared the solutions. *Solution 1:* myf <- function( df1, df2 ){ cond <- df2$a > min(df1$a) if( cond ) { idx <- which( df1$a == min(df1$a) ) df1[idx, ] <- df2[1, ] } df1 }
# On a larger example, set.seed(4530) tst <- data.frame(item = 1:1000,a = rnorm(1000),b = rnorm(1000)) # large data frame u<-tst system.time( for(i in 1:100000){ y<-data.frame(item=(1000+i),a=rnorm(1),b=rnorm(1)) # small data frame, every time new u <- myf(u, y) }) Took me about 31.90 sec *Solution 2:* set.seed(4530) x <- data.frame(item = 1:1000,a = rnorm(1000),b = rnorm(1000)) # large data frame system.time( for(i in 1:100000){ y<-data.frame(item=(1000+i),a=rnorm(1),b=rnorm(1)) # small data frame, every time new u[intersect(which(u$a < y$a),which.min(u$a)),] <- y }) The solution is correct (despite warnings) but took longer - about 48.84 sec. Dimitri On Wed, Jan 30, 2013 at 3:27 PM, Dimitri Liakhovitski < dimitri.liakhovit...@gmail.com> wrote: > In realy, values in a will be not integers, but numeric. They will never > be identical, but it could be that they are pretty close - I don't know > after how many points after the comma matter. > Dimitri > > On Wed, Jan 30, 2013 at 2:06 PM, arun <smartpink...@yahoo.com> wrote: > >> Hi, >> Any chance x$a to have the same number repeated? >> >> If `Item` and `a` are unique, I guess both the solutions should work. >> >> set.seed(1851) >> x<- >> data.frame(item=sample(letters[1:20],20,replace=F),a=sample(1:45,20,replace=F),b=sample(20:50,20,replace=F),stringsAsFactors=F) >> y<- data.frame(item="z",a=3,b=10,stringsAsFactors=F) >> >> x[intersect(which(x$a < y$a),which.min(x$a)),] >> # item a b >> #17 c 1 48 >> x[x$a==which.min(x$a[x$a<y$a]),] >> # item a b >> #17 c 1 48 >> #or >> >> x[x$a%in%which.min(x$a[x$a<y$a]),] >> # item a b >> #17 c 1 48 >> >> x[x$a%in%which.min(x$a[x$a<y$a]),]<-y >> >> tail(x) >> # item a b >> #15 q 45 30 >> #16 g 10 23 >> #17 z 3 10 >> #18 r 15 39 >> #19 l 18 45 >> #20 t 35 33 >> >> #However, if `item` column is unique, but `a` is not, then the one I >> mentioned previously arise. >> set.seed(1851) >> x1<- >> data.frame(item=sample(letters[1:20],20,replace=F),a=sample(1:10,20,replace=T),b=sample(20:50,20,replace=F),stringsAsFactors=F) >> y1<- data.frame(item="z",a=3,b=10,stringsAsFactors=F) >> >> >> x1[intersect(which(x1$a < y1$a),which.min(x1$a)),] >> # item a b >> #3 s 1 41 >> x1[x1$a==which.min(x1$a[x1$a<y1$a]),] >> # item a b >> #3 s 1 41 >> #11 h 1 46 >> #17 c 1 48 >> x1[x1$a==which.min(x1$a[x1$a<y1$a]),]<- y1 >> A.K. >> >> >> ________________________________ >> From: Dimitri Liakhovitski <dimitri.liakhovit...@gmail.com> >> To: arun <smartpink...@yahoo.com> >> Cc: R help <r-help@r-project.org>; Jessica Streicher < >> j.streic...@micromata.de> >> Sent: Wednesday, January 30, 2013 1:49 PM >> Subject: Re: [R] Fastest way to compare a single value with all values in >> one column of a data frame >> >> >> Sorry - I should have clarified: >> My identifiers (in column "item") will always be unique. In other words, >> one entry in column "item" will never be repeated - neither in x nor in y. >> Dimitri >> >> >> On Wed, Jan 30, 2013 at 1:27 PM, Dimitri Liakhovitski < >> dimitri.liakhovit...@gmail.com> wrote: >> >> Thank you, everyone! I'll try to test those different approaches. Really >> appreciate your help! >> >Dimitri >> > >> > >> >On Wed, Jan 30, 2013 at 11:03 AM, arun <smartpink...@yahoo.com> wrote: >> > >> >HI, >> >> >> >>Sorry, my previous solution doesn't work. >> >>This should work for your dataset: >> >>set.seed(1851) >> >>x<- >> data.frame(item=sample(letters[1:5],20,replace=TRUE),a=sample(1:15,20,replace=TRUE),b=sample(20:30,20,replace=TRUE),stringsAsFactors=F) >> >>y<- data.frame(item="f",a=3,b=10,stringsAsFactors=F) >> >> x[x$a%in%which.min(x[x$a<y$a,]$a),]<- y #if there are multiple minimum >> values >> >> >> >>set.seed(1241) >> >>x1<- >> data.frame(item=sample(letters[1:10],1e4,replace=TRUE),a=sample(1:30,1e4,replace=TRUE),b=sample(1:100,1e4,replace=TRUE),stringsAsFactors=F) >> >>y1<- data.frame(item="f",a=3,b=10,stringsAsFactors=F) >> >>length(x1$a[x1$a==1]) >> >>#[1] 330 >> >> system.time({x1[x1$a%in%which.min(x1[x1$a<y1$a,]$a),]<- y1}) >> >># user system elapsed >> >> # 0.000 0.000 0.001 >> >>length(x1$a[x1$a==1]) >> >>#[1] 0 >> >> >> >> >> >>#For some reason, it is not working when the multiple number of minimum >> values > some value >> >> >> >>set.seed(1241) >> >>x1<- >> data.frame(item=sample(letters[1:10],1e5,replace=TRUE),a=sample(1:30,1e5,replace=TRUE),b=sample(1:100,1e5,replace=TRUE),stringsAsFactors=F) >> >>y1<- data.frame(item="f",a=3,b=10,stringsAsFactors=F) >> >>length(x1$a[x1$a==1]) >> >>#[1] 3404 >> >>x1[x1$a%in%which.min(x1[x1$a<y1$a,]$a),]<- y1 >> >> length(x1$a[x1$a==1]) >> >>#[1] 3404 #not getting replaced >> >> >> >>#However, if I try: >> >>set.seed(1241) >> >> x1<- >> data.frame(item=sample(letters[1:10],1e6,replace=TRUE),a=sample(1:5000,1e6,replace=TRUE),b=sample(1:100,1e6,replace=TRUE),stringsAsFactors=F) >> >> y1<- data.frame(item="f",a=3,b=10,stringsAsFactors=F) >> >> length(x1$a[x1$a==1]) >> >>#[1] 208 >> >> system.time(x1[x1$a%in%which.min(x1[x1$a<y1$a,]$a),]<- y1) >> >>#user system elapsed >> >> # 0.124 0.016 0.138 >> >> length(x1$a[x1$a==1]) >> >>#[1] 0 >> >> >> >> >> >>#Tried Jessica's solution: >> >>set.seed(1851) >> >> x<- >> data.frame(item=sample(letters[1:5],20,replace=TRUE),a=sample(1:15,20,replace=TRUE),b=sample(20:30,20,replace=TRUE),stringsAsFactors=F) >> >> y<- data.frame(item="f",a=3,b=10,stringsAsFactors=F) >> >> x[intersect(which(x$a < y$a),which.min(x$a)),] <- y >> >> >> >> x >> >># item a b >> >>#1 a 8 25 >> >>#2 a 10 26 >> >>#3 f 3 10 #replaced >> >>#4 e 15 26 >> >>#5 b 13 20 >> >>#6 a 5 23 >> >>#7 d 4 29 >> >>#8 e 2 24 >> >>#9 c 7 30 >> >>#10 e 14 24 >> >>#11 d 2 20 >> >>#12 e 10 21 >> >>#13 c 13 27 >> >>#14 d 12 23 >> >>#15 b 11 26 >> >>#16 e 5 22 >> >>#17 c 1 26 #it is not replaced >> >>#18 a 8 21 >> >>#19 e 10 26 >> >>#20 c 2 22 >> >> >> >> >> >> >> >> >> >>A.K. >> >> >> >> >> >> >> >> >> >> >> >>----- Original Message ----- >> >>From: Dimitri Liakhovitski <dimitri.liakhovit...@gmail.com> >> >>To: r-help <r-help@r-project.org> >> >>Cc: >> >>Sent: Tuesday, January 29, 2013 4:11 PM >> >>Subject: [R] Fastest way to compare a single value with all values in >> one column of a data frame >> >> >> >> >> >>Hello! >> >> >> >>I have a large data frame x: >> >>x<-data.frame(item=letters[1:5],a=1:5,b=11:15) # in actuality, x has >> 1000 >> >>rows >> >>x$item<-as.character(x$item) >> >>I also have a small data frame y with just 1 row: >> >>y<-data.frame(item="f",a=3,b=10) >> >>y$item<-as.character(y$item) >> >> >> >>I have to decide if y$a is larger than the smallest of all the values in >> >>x$a. If it is, I want y to replace the whole row in x that has the >> lowest >> >>value in column a. >> >>This is how I'd do it. >> >> >> >>if(y$a>min(x$a)){ >> >> whichmin<-which(x$a==min(x$a)) >> >> x[whichmin,]<-y[1,] >> >>} >> >> >> >> >> >>I am wondering if there is a faster way of doing it. What would be the >> >>fastest possible way? I'd have to do it, unfortunately, many-many times. >> >> >> >>Thank you very much! >> >> >> >>-- >> >>Dimitri Liakhovitski >> >> >> >>gfk.com <http://marketfusionanalytics.com/> >> >> >> >> [[alternative HTML version deleted]] >> >> >> >>______________________________________________ >> >>R-help@r-project.org mailing list >> >>https://stat.ethz.ch/mailman/listinfo/r-help >> >>PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html> >> >>and provide commented, minimal, self-contained, reproducible code. >> >> >> >> >> > >> > >> >-- >> > >> >Dimitri Liakhovitski >> >gfk.com >> >> >> -- >> >> Dimitri Liakhovitski >> gfk.com >> > > > > -- > Dimitri Liakhovitski > gfk.com <http://marketfusionanalytics.com/> > -- Dimitri Liakhovitski gfk.com <http://marketfusionanalytics.com/> [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.