S Ellison wrote:
Sorting with an appropriate algorithm is nlog(n), so it's very hard to
get the 'exact' median any faster. However, if you can cope with a less
precise median, you could use a binary search between max(x) and min(x)
with low tolerance or comparatively few iterations. In native R, though,
that isn;t going to be fast; interpreter overhead will likely more than
wipe out any reduction in number of comparisons.
In any case, it looks like you are not constrained by the median
algorithm, but by the number of calls. You might do a lot better with
apply, though
apply(df,2,median)
well, for data frames, I think sapply(...) or even unlist(lapply(...))
will be faster, e.g.,
mat <- matrix(rnorm(50*2e05), 50, 2e05)
DF <- as.data.frame(mat)
invisible({gc(); gc()})
system.time(apply(DF, 2, median))
invisible({gc(); gc()})
system.time(sapply(DF, median))
invisible({gc(); gc()})
system.time(unlist(lapply(DF, median), use.names = FALSE))
Best,
Dimitris
On my system 200k columns were processed in negligible time by apply
and I'm still waiting for mapply.
S
"Zheng, Xin (NIH) [C]" <zheng...@mail.nih.gov> 14/04/2009 05:29:40
Hi there,
I got a data frame with more than 200k columns. How could I get median
of each column fast? mapply is the fastest function I know for that,
it's not yet satisfied though.
It seems function "median" in R calculates median by "sort" and "mean".
I am wondering if there is another function with better algorithm.
Any hint?
Thanks,
Xin Zheng
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
*******************************************************************
This email and any attachments are confidential. Any use...{{dropped:8}}
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--
Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus University Medical Center
Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.