Re: [R] Alternate ways of finding number of occurrence of an element in a vector.

David Winsemius Sun, 21 Jun 2009 09:41:38 -0700

If one puts the gc() call prior to the expressions themselves, onegets consistently ... different results:


library("rbenchmark")
v<-rep(1:500,1:500); x<-5; benchmark(

which= c(gc(),length(which(x==v))), index= c(gc(),length(v[v==x])), sum= c(gc(), sum(v==x)),

     replications=200,  columns=c("test","elapsed"), order="elapsed" )
   test elapsed
3   sum   3.299
2 index   3.536
1 which   4.172

Since the gc call takes up mor than half the time, the differences maybe more dramatic


> v<-rep(1:500,1:500); x<-5; benchmark(
+      which= c(gc()),  index= c(gc()), sum= c(gc()),
+      replications=200,  columns=c("test","elapsed"), order="elapsed" )
   test elapsed
2 index   2.621
3   sum   2.621
1 which   2.631

> within( benchmark(

+ which= c(gc(),length(which(x==v))), index= c(gc(),length(v[v==x])), sum= c(gc(), sum(v==x)),+ replications=200, columns=c("test","elapsed"),order="elapsed" ), {corrected = elapsed-2.62})

   test elapsed corrected
3   sum   3.304     0.684
2 index   3.543     0.923
1 which   4.180     1.560

So the "answer" may not be so simple.



Allan Engelhardt wrote:

Answering my own question: if I explicitly garbage collecte before the
benchmark then 'index' always wins, which probably also answers the
original question.

v<-rep(1:1000,1:1000); x<-5; gc(); benchmark(replications=200,

columns=c("test","elapsed"), order="elapsed",which=length(which(x==v)),

index=length(v[v==x]), sum=sum(v==x))

On 19/06/09 16:51, Allan Engelhardt wrote:

When trying out a couple of different approaches to this problem Iget

rather different answers between runs.  Anybody know why?

library("rbenchmark")
v<-rep(1:1000,1:1000); x<-5; benchmark(replications=200,

columns=c("test","elapsed"), order="elapsed",
which=length(which(x==v)), index=length(v[v==x]), sum=sum(v==x))
  test elapsed
3   sum   2.513
2 index   5.512
1 which   6.712

v<-rep(1:1000,1:1000); x<-5; benchmark(replications=200,

columns=c("test","elapsed"), order="elapsed",
which=length(which(x==v)), index=length(v[v==x]), sum=sum(v==x))
  test elapsed
3   sum   2.502
2 index   3.779
1 which   6.650

v<-rep(1:1000,1:1000); x<-5; benchmark(replications=200,

columns=c("test","elapsed"), order="elapsed",
which=length(which(x==v)), index=length(v[v==x]), sum=sum(v==x))
  test elapsed
2 index   3.796
3   sum   5.808
1 which   6.633

This pattern appears to repeat (so on the next two runs "sum" willwin

followed by "index" followed by "sum" twice followed by "index" ...)


On 19/06/09 14:55, Praveen Surendran wrote:

Hi,

I have a vector "v" and would like to find the number ofoccurrence of

element "x" in the same.

Is there a way other than,

sum(as.integer(v==x)) or length(which(x==v))

to do the this.

I have a huge file to process and do this.  Both the above described
methods
are pretty slow while dealing with a large vector.

Please have your comments.

Praveen Surendran.


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Alternate ways of finding number of occurrence of an element in a vector.

Reply via email to