Just wanted to start a discussion on whether R could ship with more appropriate GC parameters. Right now, loading the recommended package Matrix leads to:
> library(Matrix) > gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 1076796 57.6 1368491 73.1 1198505 64.1 Vcells 1671329 12.8 2685683 20.5 1932418 14.8 Results may vary, but here R needed 64MB of N cells and 15MB of V cells to load one of the most important packages. Currently, the default GC triggers are ~20MB (64 bit systems) for N cells and ~6MB of V cells. Martin Morgan found that this leads to a lot of GC overhead during package loading and at least in our tests can significantly increase the load time of complex packages. If we set the triggers at the command line beyond the reach of library(Matrix) (--min-vsize=2048M --min-nsize=45M), then we see: used (Mb) gc trigger (Mb) max used (Mb) Ncells 1076859 57.6 47185920 2520 6260069 334.4 Vcells 1671431 12.8 268435456 2048 9010303 68.8 So by effectively disabling the GC, we let R consume 335MB N + 70MB of V, but loading goes a lot faster: Loading Matrix with default settings: > system.time(library(Matrix)) user system elapsed 1.600 0.011 1.610 With high GC triggers (): > system.time(library(Matrix)) user system elapsed 0.983 0.097 1.079 Given modern hardware capabilities and the need to efficiently load software for the user to be able to do something, perhaps we should bump the default settings so that the GC is fired sparingly when loading a large package. For users of Bioconductor, we see this for library(GenomicRanges): used (Mb) gc trigger (Mb) max used (Mb) Ncells 1322124 70.7 47185920 2520 15591302 832.7 Vcells 1216015 9.3 268435456 2048 13992181 106.8 So perhaps that user would want 900 MB of N and 100 MB of V as the trigger (corresponding to --min-vsize=100M --min-nsize=16M). Thoughts? [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel