Re: [R] Fastest way to compare a single value with all values in one column of a data frame

Dimitri Liakhovitski Wed, 30 Jan 2013 12:29:50 -0800

In realy, values in a will be not integers, but numeric. They will never be
identical, but it could be that they are pretty close - I don't know after
how many points after the comma matter.
Dimitri


On Wed, Jan 30, 2013 at 2:06 PM, arun <smartpink...@yahoo.com> wrote:

> Hi,
> Any chance x$a to have the same number repeated?
>
> If `Item` and `a` are unique,  I guess both the solutions should work.
>
> set.seed(1851)
> x<-
> data.frame(item=sample(letters[1:20],20,replace=F),a=sample(1:45,20,replace=F),b=sample(20:50,20,replace=F),stringsAsFactors=F)
> y<- data.frame(item="z",a=3,b=10,stringsAsFactors=F)
>
> x[intersect(which(x$a < y$a),which.min(x$a)),]
>  #  item a  b
> #17    c 1 48
>  x[x$a==which.min(x$a[x$a<y$a]),]
> #   item a  b
> #17    c 1 48
> #or
>
> x[x$a%in%which.min(x$a[x$a<y$a]),]
> #   item a  b
> #17    c 1 48
>
> x[x$a%in%which.min(x$a[x$a<y$a]),]<-y
>
> tail(x)
> #   item  a  b
> #15    q 45 30
> #16    g 10 23
> #17    z  3 10
> #18    r 15 39
> #19    l 18 45
> #20    t 35 33
>
> #However, if `item` column is unique, but `a` is not, then the one I
> mentioned previously arise.
> set.seed(1851)
> x1<-
> data.frame(item=sample(letters[1:20],20,replace=F),a=sample(1:10,20,replace=T),b=sample(20:50,20,replace=F),stringsAsFactors=F)
> y1<- data.frame(item="z",a=3,b=10,stringsAsFactors=F)
>
>
> x1[intersect(which(x1$a < y1$a),which.min(x1$a)),]
>  # item a  b
> #3    s 1 41
> x1[x1$a==which.min(x1$a[x1$a<y1$a]),]
>  #  item a  b
> #3     s 1 41
> #11    h 1 46
> #17    c 1 48
> x1[x1$a==which.min(x1$a[x1$a<y1$a]),]<- y1
> A.K.
>
>
> ________________________________
> From: Dimitri Liakhovitski <dimitri.liakhovit...@gmail.com>
> To: arun <smartpink...@yahoo.com>
> Cc: R help <r-help@r-project.org>; Jessica Streicher <
> j.streic...@micromata.de>
> Sent: Wednesday, January 30, 2013 1:49 PM
> Subject: Re: [R] Fastest way to compare a single value with all values in
> one column of a data frame
>
>
> Sorry - I should have clarified:
> My identifiers (in column "item") will always be unique. In other words,
> one entry in column "item" will never be repeated - neither in x nor in y.
> Dimitri
>
>
> On Wed, Jan 30, 2013 at 1:27 PM, Dimitri Liakhovitski <
> dimitri.liakhovit...@gmail.com> wrote:
>
> Thank you, everyone! I'll try to test those different approaches. Really
> appreciate your help!
> >Dimitri
> >
> >
> >On Wed, Jan 30, 2013 at 11:03 AM, arun <smartpink...@yahoo.com> wrote:
> >
> >HI,
> >>
> >>Sorry, my previous solution doesn't work.
> >>This should work for your dataset:
> >>set.seed(1851)
> >>x<-
> data.frame(item=sample(letters[1:5],20,replace=TRUE),a=sample(1:15,20,replace=TRUE),b=sample(20:30,20,replace=TRUE),stringsAsFactors=F)
> >>y<- data.frame(item="f",a=3,b=10,stringsAsFactors=F)
> >> x[x$a%in%which.min(x[x$a<y$a,]$a),]<- y #if there are multiple minimum
> values
> >>
> >>set.seed(1241)
> >>x1<-
> data.frame(item=sample(letters[1:10],1e4,replace=TRUE),a=sample(1:30,1e4,replace=TRUE),b=sample(1:100,1e4,replace=TRUE),stringsAsFactors=F)
> >>y1<- data.frame(item="f",a=3,b=10,stringsAsFactors=F)
> >>length(x1$a[x1$a==1])
> >>#[1] 330
> >> system.time({x1[x1$a%in%which.min(x1[x1$a<y1$a,]$a),]<- y1})
> >>#   user  system elapsed
> >> # 0.000   0.000   0.001
> >>length(x1$a[x1$a==1])
> >>#[1] 0
> >>
> >>
> >>#For some reason, it is not working when the multiple number of minimum
> values > some value
> >>
> >>set.seed(1241)
> >>x1<-
> data.frame(item=sample(letters[1:10],1e5,replace=TRUE),a=sample(1:30,1e5,replace=TRUE),b=sample(1:100,1e5,replace=TRUE),stringsAsFactors=F)
> >>y1<- data.frame(item="f",a=3,b=10,stringsAsFactors=F)
> >>length(x1$a[x1$a==1])
> >>#[1] 3404
> >>x1[x1$a%in%which.min(x1[x1$a<y1$a,]$a),]<- y1
> >> length(x1$a[x1$a==1])
> >>#[1] 3404 #not getting replaced
> >>
> >>#However, if I try:
> >>set.seed(1241)
> >> x1<-
> data.frame(item=sample(letters[1:10],1e6,replace=TRUE),a=sample(1:5000,1e6,replace=TRUE),b=sample(1:100,1e6,replace=TRUE),stringsAsFactors=F)
> >> y1<- data.frame(item="f",a=3,b=10,stringsAsFactors=F)
> >> length(x1$a[x1$a==1])
> >>#[1] 208
> >> system.time(x1[x1$a%in%which.min(x1[x1$a<y1$a,]$a),]<- y1)
> >>#user  system elapsed
> >> # 0.124   0.016   0.138
> >>  length(x1$a[x1$a==1])
> >>#[1] 0
> >>
> >>
> >>#Tried Jessica's solution:
> >>set.seed(1851)
> >> x<-
> data.frame(item=sample(letters[1:5],20,replace=TRUE),a=sample(1:15,20,replace=TRUE),b=sample(20:30,20,replace=TRUE),stringsAsFactors=F)
> >> y<- data.frame(item="f",a=3,b=10,stringsAsFactors=F)
> >> x[intersect(which(x$a < y$a),which.min(x$a)),] <- y
> >>
> >> x
> >>#   item  a  b
> >>#1     a  8 25
> >>#2     a 10 26
> >>#3     f  3 10 #replaced
> >>#4     e 15 26
> >>#5     b 13 20
> >>#6     a  5 23
> >>#7     d  4 29
> >>#8     e  2 24
> >>#9     c  7 30
> >>#10    e 14 24
> >>#11    d  2 20
> >>#12    e 10 21
> >>#13    c 13 27
> >>#14    d 12 23
> >>#15    b 11 26
> >>#16    e  5 22
> >>#17    c  1 26  #it is not replaced
> >>#18    a  8 21
> >>#19    e 10 26
> >>#20    c  2 22
> >>
> >>
> >>
> >>
> >>A.K.
> >>
> >>
> >>
> >>
> >>
> >>----- Original Message -----
> >>From: Dimitri Liakhovitski <dimitri.liakhovit...@gmail.com>
> >>To: r-help <r-help@r-project.org>
> >>Cc:
> >>Sent: Tuesday, January 29, 2013 4:11 PM
> >>Subject: [R] Fastest way to compare a single value with all values in
> one column of a data frame
> >>
> >>
> >>Hello!
> >>
> >>I have a large data frame x:
> >>x<-data.frame(item=letters[1:5],a=1:5,b=11:15)  # in actuality, x has
> 1000
> >>rows
> >>x$item<-as.character(x$item)
> >>I also have a small data frame y with just 1 row:
> >>y<-data.frame(item="f",a=3,b=10)
> >>y$item<-as.character(y$item)
> >>
> >>I have to decide if y$a is larger than the smallest of all the values in
> >>x$a. If it is, I want y to replace the whole row in x that has the lowest
> >>value in column a.
> >>This is how I'd do it.
> >>
> >>if(y$a>min(x$a)){
> >>  whichmin<-which(x$a==min(x$a))
> >>  x[whichmin,]<-y[1,]
> >>}
> >>
> >>
> >>I am wondering if there is a faster way of doing it. What would be the
> >>fastest possible way? I'd have to do it, unfortunately, many-many times.
> >>
> >>Thank you very much!
> >>
> >>--
> >>Dimitri Liakhovitski
> >>
> >>gfk.com <http://marketfusionanalytics.com/>
> >>
> >>    [[alternative HTML version deleted]]
> >>
> >>______________________________________________
> >>R-help@r-project.org mailing list
> >>https://stat.ethz.ch/mailman/listinfo/r-help
> >>PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
> >>and provide commented, minimal, self-contained, reproducible code.
> >>
> >>
> >
> >
> >--
> >
> >Dimitri Liakhovitski
> >gfk.com
>
>
> --
>
> Dimitri Liakhovitski
> gfk.com
>



-- 
Dimitri Liakhovitski
gfk.com <http://marketfusionanalytics.com/>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Fastest way to compare a single value with all values in one column of a data frame

Reply via email to