If one puts the gc() call prior to the expressions themselves, one
gets consistently ... different results:
library("rbenchmark")
v<-rep(1:500,1:500); x<-5; benchmark(
which= c(gc(),length(which(x==v))), index= c(gc(),
length(v[v==x])), sum= c(gc(), sum(v==x)),
replications=200, columns=c("test","elapsed"), order="elapsed" )
test elapsed
3 sum 3.299
2 index 3.536
1 which 4.172
Since the gc call takes up mor than half the time, the differences may
be more dramatic
> v<-rep(1:500,1:500); x<-5; benchmark(
+ which= c(gc()), index= c(gc()), sum= c(gc()),
+ replications=200, columns=c("test","elapsed"), order="elapsed" )
test elapsed
2 index 2.621
3 sum 2.621
1 which 2.631
> within( benchmark(
+ which= c(gc(),length(which(x==v))), index= c(gc(),
length(v[v==x])), sum= c(gc(), sum(v==x)),
+ replications=200, columns=c("test","elapsed"),
order="elapsed" ), {corrected = elapsed-2.62})
test elapsed corrected
3 sum 3.304 0.684
2 index 3.543 0.923
1 which 4.180 1.560
So the "answer" may not be so simple.
Allan Engelhardt wrote:
Answering my own question: if I explicitly garbage collecte before the
benchmark then 'index' always wins, which probably also answers the
original question.
v<-rep(1:1000,1:1000); x<-5; gc(); benchmark(replications=200,
columns=c("test","elapsed"), order="elapsed",
which=length(which(x==v)),
index=length(v[v==x]), sum=sum(v==x))
On 19/06/09 16:51, Allan Engelhardt wrote:
When trying out a couple of different approaches to this problem I
get
rather different answers between runs. Anybody know why?
library("rbenchmark")
v<-rep(1:1000,1:1000); x<-5; benchmark(replications=200,
columns=c("test","elapsed"), order="elapsed",
which=length(which(x==v)), index=length(v[v==x]), sum=sum(v==x))
test elapsed
3 sum 2.513
2 index 5.512
1 which 6.712
v<-rep(1:1000,1:1000); x<-5; benchmark(replications=200,
columns=c("test","elapsed"), order="elapsed",
which=length(which(x==v)), index=length(v[v==x]), sum=sum(v==x))
test elapsed
3 sum 2.502
2 index 3.779
1 which 6.650
v<-rep(1:1000,1:1000); x<-5; benchmark(replications=200,
columns=c("test","elapsed"), order="elapsed",
which=length(which(x==v)), index=length(v[v==x]), sum=sum(v==x))
test elapsed
2 index 3.796
3 sum 5.808
1 which 6.633
This pattern appears to repeat (so on the next two runs "sum" will
win
followed by "index" followed by "sum" twice followed by "index" ...)
On 19/06/09 14:55, Praveen Surendran wrote:
Hi,
I have a vector "v" and would like to find the number of
occurrence of
element "x" in the same.
Is there a way other than,
sum(as.integer(v==x)) or length(which(x==v))
to do the this.
I have a huge file to process and do this. Both the above described
methods
are pretty slow while dealing with a large vector.
Please have your comments.
Praveen Surendran.
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.