Hi Val,
[off list... I don't want to compromise your chances to start a
constructive discussion ;-)]
Thanks for reporting this. Just wanted to mention that the reason I
think the situation is worst when you use the paste() generic defined
in BiocGenerics than when you make paste() a generic with
setGeneric("paste") is because of the signature of the generic.
With the latter dispatch is on the 'sep' and 'collapse' args only
(which is surprising but that's another story), while
with the former it's on ...:
> setGeneric("paste")
[1] "paste"
> paste
standardGeneric for "paste" defined from package "base"
function (..., sep = " ", collapse = NULL)
standardGeneric("paste")
<environment: 0x157a028>
Methods may be defined for arguments: sep, collapse
Use showMethods("paste") for currently available ones.
## Note that showMethods() is broken (it contradicts the above
## that indicates dispatch is on 'sep' and 'collapse').
> showMethods("paste")
Function: paste (package base)
...="ANY"
> microbenchmark(fun0(lst), fun1(lst), times=10)
Unit: milliseconds
expr min lq median uq max neval
fun0(lst) 27.374228 27.508580 28.144858 28.895889 33.528221 10
fun1(lst) 5.474173 5.739289 5.803471 6.050482 6.825982 10
> removeGeneric("paste")
[1] TRUE
> setGeneric("paste", signature="...") # this how it's defined in
BiocGenerics
Creating a new generic function for ‘paste’ in the global environment
[1] "paste"
> microbenchmark(fun0(lst), fun1(lst), times=10)
Unit: milliseconds
expr min lq median uq max neval
fun0(lst) 149.828201 153.192866 155.845508 157.916067 176.313906 10
fun1(lst) 4.924387 5.088094 5.114532 5.200432 5.332386 10
Dispatch on ... seems to have a ridiculously high cost!
H.
On 07/01/2013 10:04 PM, Valerie Obenchain wrote:
Hi,
S4 method dispatch can be very slow. Would it be reasonable to cache the
most
recent dispatch, anticipating the next invocation will be on the same
type? This
would be very helpful in loops.
fun0 <- function(x)
sapply(x, paste, collapse="+")
fun1 <- function(x) {
paste <- selectMethod(paste, class(x[[1]]))
sapply(x, paste, collapse="+")
}
lst <- split(rep(LETTERS, 100), rep(1:1300, 2))
library(microbenchmark)
microbenchmark(fun0(lst), times=10)
## Unit: milliseconds
## expr min lq median uq max neval
## fun0(lst) 4.153287 4.180659 4.513539 5.19261 5.280481 10
setGeneric("paste")
microbenchmark(fun0(lst), fun1(lst), times=10)
## > microbenchmark(fun0(lst), fun1(lst), times=10)
## Unit: milliseconds
## expr min lq median uq max neval
## fun0(lst) 21.093180 21.27616 21.453174 21.833686 24.758791 10
## fun1(lst) 4.517808 4.53067 4.582641 4.682235 5.121856 10
Dispatch seems to be especially slow when packages are involved, e.g.,
with the Bioconductor IRanges package
(http://bioconductor.org/packages/release/bioc/html/IRanges.html)
removeGeneric("paste")
library(IRanges)
showMethods(paste)
## Function: paste (package BiocGenerics)
## ...="ANY"
## ...="Rle"
selectMethod(paste, "ANY")
## Method Definition (Class "derivedDefaultMethod"):
##
## function (..., sep = " ", collapse = NULL)
## .Internal(paste(list(...), sep, collapse))
## <environment: namespace:base>
##
## Signatures:
## ...
## target "ANY"
## defined "ANY"
microbenchmark(fun0(lst), fun1(lst), times=10)
## Unit: milliseconds
## expr min lq median uq max
neval
## fun0(lst) 233.539585 234.592491 236.311209 237.268506 243.181123
10
## fun1(lst) 4.564914 4.592996 4.642898 4.729009 5.492706
10
sessionInfo()
## R version 3.0.0 Patched (2013-04-04 r62492)
## Platform: x86_64-unknown-linux-gnu (64-bit)
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=C LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] parallel stats graphics grDevices utils datasets
methods
## [8] base
##
## other attached packages:
## [1] IRanges_1.19.15 BiocGenerics_0.7.2 microbenchmark_1.3-0
##
## loaded via a namespace (and not attached):
## [1] stats4_3.0.0
Thanks,
Valerie
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpa...@fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel