Hi, Any chance x$a to have the same number repeated? If `Item` and `a` are unique, I guess both the solutions should work.
set.seed(1851) x<- data.frame(item=sample(letters[1:20],20,replace=F),a=sample(1:45,20,replace=F),b=sample(20:50,20,replace=F),stringsAsFactors=F) y<- data.frame(item="z",a=3,b=10,stringsAsFactors=F) x[intersect(which(x$a < y$a),which.min(x$a)),] # item a b #17 c 1 48 x[x$a==which.min(x$a[x$a<y$a]),] # item a b #17 c 1 48 #or x[x$a%in%which.min(x$a[x$a<y$a]),] # item a b #17 c 1 48 x[x$a%in%which.min(x$a[x$a<y$a]),]<-y tail(x) # item a b #15 q 45 30 #16 g 10 23 #17 z 3 10 #18 r 15 39 #19 l 18 45 #20 t 35 33 #However, if `item` column is unique, but `a` is not, then the one I mentioned previously arise. set.seed(1851) x1<- data.frame(item=sample(letters[1:20],20,replace=F),a=sample(1:10,20,replace=T),b=sample(20:50,20,replace=F),stringsAsFactors=F) y1<- data.frame(item="z",a=3,b=10,stringsAsFactors=F) x1[intersect(which(x1$a < y1$a),which.min(x1$a)),] # item a b #3 s 1 41 x1[x1$a==which.min(x1$a[x1$a<y1$a]),] # item a b #3 s 1 41 #11 h 1 46 #17 c 1 48 x1[x1$a==which.min(x1$a[x1$a<y1$a]),]<- y1 A.K. ________________________________ From: Dimitri Liakhovitski <dimitri.liakhovit...@gmail.com> To: arun <smartpink...@yahoo.com> Cc: R help <r-help@r-project.org>; Jessica Streicher <j.streic...@micromata.de> Sent: Wednesday, January 30, 2013 1:49 PM Subject: Re: [R] Fastest way to compare a single value with all values in one column of a data frame Sorry - I should have clarified: My identifiers (in column "item") will always be unique. In other words, one entry in column "item" will never be repeated - neither in x nor in y. Dimitri On Wed, Jan 30, 2013 at 1:27 PM, Dimitri Liakhovitski <dimitri.liakhovit...@gmail.com> wrote: Thank you, everyone! I'll try to test those different approaches. Really appreciate your help! >Dimitri > > >On Wed, Jan 30, 2013 at 11:03 AM, arun <smartpink...@yahoo.com> wrote: > >HI, >> >>Sorry, my previous solution doesn't work. >>This should work for your dataset: >>set.seed(1851) >>x<- >>data.frame(item=sample(letters[1:5],20,replace=TRUE),a=sample(1:15,20,replace=TRUE),b=sample(20:30,20,replace=TRUE),stringsAsFactors=F) >>y<- data.frame(item="f",a=3,b=10,stringsAsFactors=F) >> x[x$a%in%which.min(x[x$a<y$a,]$a),]<- y #if there are multiple minimum values >> >>set.seed(1241) >>x1<- >>data.frame(item=sample(letters[1:10],1e4,replace=TRUE),a=sample(1:30,1e4,replace=TRUE),b=sample(1:100,1e4,replace=TRUE),stringsAsFactors=F) >>y1<- data.frame(item="f",a=3,b=10,stringsAsFactors=F) >>length(x1$a[x1$a==1]) >>#[1] 330 >> system.time({x1[x1$a%in%which.min(x1[x1$a<y1$a,]$a),]<- y1}) >># user system elapsed >> # 0.000 0.000 0.001 >>length(x1$a[x1$a==1]) >>#[1] 0 >> >> >>#For some reason, it is not working when the multiple number of minimum >>values > some value >> >>set.seed(1241) >>x1<- >>data.frame(item=sample(letters[1:10],1e5,replace=TRUE),a=sample(1:30,1e5,replace=TRUE),b=sample(1:100,1e5,replace=TRUE),stringsAsFactors=F) >>y1<- data.frame(item="f",a=3,b=10,stringsAsFactors=F) >>length(x1$a[x1$a==1]) >>#[1] 3404 >>x1[x1$a%in%which.min(x1[x1$a<y1$a,]$a),]<- y1 >> length(x1$a[x1$a==1]) >>#[1] 3404 #not getting replaced >> >>#However, if I try: >>set.seed(1241) >> x1<- >>data.frame(item=sample(letters[1:10],1e6,replace=TRUE),a=sample(1:5000,1e6,replace=TRUE),b=sample(1:100,1e6,replace=TRUE),stringsAsFactors=F) >> y1<- data.frame(item="f",a=3,b=10,stringsAsFactors=F) >> length(x1$a[x1$a==1]) >>#[1] 208 >> system.time(x1[x1$a%in%which.min(x1[x1$a<y1$a,]$a),]<- y1) >>#user system elapsed >> # 0.124 0.016 0.138 >> length(x1$a[x1$a==1]) >>#[1] 0 >> >> >>#Tried Jessica's solution: >>set.seed(1851) >> x<- >>data.frame(item=sample(letters[1:5],20,replace=TRUE),a=sample(1:15,20,replace=TRUE),b=sample(20:30,20,replace=TRUE),stringsAsFactors=F) >> y<- data.frame(item="f",a=3,b=10,stringsAsFactors=F) >> x[intersect(which(x$a < y$a),which.min(x$a)),] <- y >> >> x >># item a b >>#1 a 8 25 >>#2 a 10 26 >>#3 f 3 10 #replaced >>#4 e 15 26 >>#5 b 13 20 >>#6 a 5 23 >>#7 d 4 29 >>#8 e 2 24 >>#9 c 7 30 >>#10 e 14 24 >>#11 d 2 20 >>#12 e 10 21 >>#13 c 13 27 >>#14 d 12 23 >>#15 b 11 26 >>#16 e 5 22 >>#17 c 1 26 #it is not replaced >>#18 a 8 21 >>#19 e 10 26 >>#20 c 2 22 >> >> >> >> >>A.K. >> >> >> >> >> >>----- Original Message ----- >>From: Dimitri Liakhovitski <dimitri.liakhovit...@gmail.com> >>To: r-help <r-help@r-project.org> >>Cc: >>Sent: Tuesday, January 29, 2013 4:11 PM >>Subject: [R] Fastest way to compare a single value with all values in one >>column of a data frame >> >> >>Hello! >> >>I have a large data frame x: >>x<-data.frame(item=letters[1:5],a=1:5,b=11:15) # in actuality, x has 1000 >>rows >>x$item<-as.character(x$item) >>I also have a small data frame y with just 1 row: >>y<-data.frame(item="f",a=3,b=10) >>y$item<-as.character(y$item) >> >>I have to decide if y$a is larger than the smallest of all the values in >>x$a. If it is, I want y to replace the whole row in x that has the lowest >>value in column a. >>This is how I'd do it. >> >>if(y$a>min(x$a)){ >> whichmin<-which(x$a==min(x$a)) >> x[whichmin,]<-y[1,] >>} >> >> >>I am wondering if there is a faster way of doing it. What would be the >>fastest possible way? I'd have to do it, unfortunately, many-many times. >> >>Thank you very much! >> >>-- >>Dimitri Liakhovitski >> >>gfk.com <http://marketfusionanalytics.com/> >> >> [[alternative HTML version deleted]] >> >>______________________________________________ >>R-help@r-project.org mailing list >>https://stat.ethz.ch/mailman/listinfo/r-help >>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>and provide commented, minimal, self-contained, reproducible code. >> >> > > >-- > >Dimitri Liakhovitski >gfk.com -- Dimitri Liakhovitski gfk.com ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.